The Oxford Handbook of Causal Reasoning [online version ed.] 0199399557, 9780199399550

Causal reasoning is one of our most central cognitive competencies, enabling us to adapt to our world. Causal knowledge

1,067 138 9MB

English Pages 768 [1291] Year 2017

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

The Oxford Handbook of Causal Reasoning [online version ed.]
 0199399557, 9780199399550

  • Commentary
  • pdf from online version
Citation preview

Oxford Library of Psychology

Oxford Library of Psychology   Michael R. Waldmann The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology Online Publication Date: May 2017

(p. ii)

Oxford Library of Psychology

AREA EDITORS: Clinical Psychology David H. Barlow Cognitive Neuroscience Kevin N. Ochsner and Stephen M. Kosslyn Cognitive Psychology Daniel Reisberg Counseling Psychology Elizabeth M. Altmaier and Jo-Ida C. Hansen Developmental Psychology Philip David Zelazo Health Psychology Howard S. Friedman History of Psychology David B. Baker Methods and Measurement Todd D. Little

Page 1 of 2

Oxford Library of Psychology Neuropsyhology Kenneth M. Adams Organizational Psychology Steve W. J. Kozlowski Personality and Social Psychology Kay Deaux and Mark Snyder

Page 2 of 2

The Oxford Handbook of Causal Reasoning

The Oxford Handbook of Causal Reasoning   Michael R. Waldmann The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology Online Publication Date: May 2017

(p. iv)

Oxford University Press is a department of the University of Oxford. It furthers the University’s objective of excellence in research, scholarship, and education by publishing worldwide. Oxford is a registered trade mark of Oxford University Press in the UK and certain other countries. Published in the United States of America by Oxford University Press 198 Madison Avenue, New York, NY 10016, United States of America. © Oxford University Press 2017 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, by license, or under terms agreed with the appropriate reproduction rights organization. Inquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above. You must not circulate this work in any other form and you must impose this same condition on any acquirer. Library of Congress Cataloging-in-Publication Data Names: Waldmann, Michael R., editor. Title: The Oxford handbook of causal reasoning / edited by Michael R. Waldmann. Description: New York, NY : Oxford University Press, 2017. | Series: Oxford library of psychology | Includes bibliographical references and index. Identifiers: LCCN 2016034292 | ISBN 9780199399550 Subjects: LCSH: Reasoning (Psychology) | Causation. Classification: LCC BF442 .O94 2017 | DDC 153.4/3—dc23 LC record available at https://lccn.loc.gov/2016034292 9 8 7 6 5 4 3 2 1 Page 1 of 2

The Oxford Handbook of Causal Reasoning Printed by Sheridan Books, Inc., United States of America

Page 2 of 2

About the Editor

About the Editor   Michael R. Waldmann The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology Online Publication Date: May 2017

(p. vii)

About the Editor

Michael R. Waldmann is Professor of Psychology at the University of Göttingen, Germany. He has received the early career research award from the German Society for Psycholo­ gy, and is a Fellow of APS. Currently he is serving as an associate editor of the Journal of Experimental Psychology: Learning, Memory, and Cognition, and as chair of the Scientific Advisory Board of the Max Planck Institute for Human Development, Berlin. The focus of his research is on higher-level cognitive processes across different species and cultures. (p. viii)

Page 1 of 1

Contributors

Contributors   Michael R. Waldmann The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology Online Publication Date: May 2017

(p. ix)

Contributors Woo-kyoung Ahn

Department of Psychology

Yale University

New Haven, Connecticut, USA

Aron K. Barbey

Decision Neuroscience Laboratory Beckman Institute

University of Illinois Urbana-Champaign

Urbana, Illinois, USA

Tom Beckers

Page 1 of 19

Contributors

Department of Psychology

KU Leuven

Leuven, Belgium

Tom Beesley

School of Psychology

UNSW Australia

Sydney, Australia

Sieghard Beller

Department of Psychosocial Science

University of Bergen

Bergen, Norway

Andrea Bender

Department of Psychosocial Science

University of Bergen

Page 2 of 19

Contributors

Bergen, Norway

Roland Bluhm

Institute of Philosophy and Political Science

TU Dortmund University

Dortmund, Germany

Yannick Boddez

Centre for the Psychology of Learning and Experimental Psychopathology

KU Leuven

Leuven, Belgium

Elizabeth Bonawitz

Department of Psychology

Rutgers University - Newark

Newark, New Jersey, USA

Page 3 of 19

Contributors Oliver Bott

University of Tübingen

Tübingen, Germany

Marc J. Buehner

School of Psychology

Cardiff University

Cardiff, Wales, UK

Antonio Cándido

Mind, Brain, and Behavior Research Center

Experimental Psychology Department

University of Granada

Granada, Spain

Andrés Catena

Mind, Brain, and Behavior Research Center

Page 4 of 19

Contributors

Experimental Psychology Department

University of Granada

Granada, Spain

Nick Chater

Behavioural Sciences Group, Warwick Business School

Warwick University

Coventry, England, UK

Patricia W. Cheng

Department of Psychology

University of California, Los Angeles

Los Angeles, California, USA

David Danks

Departments of Philosophy & Psychology

Carnegie Mellon University

Page 5 of 19

Contributors

Pittsburgh, Pennsylvania, USA

Jan De Houwer

Department of Experimental, Clinical, and Health Psychology

Ghent University

Ghent, Belgium

Philip M. Fernbach

Leeds School of Business

University of Colorado, Boulder

Boulder, Colorado, USA

(p. x)

Klaus Fiedler

Department of Psychology

University of Heidelberg

Heidelberg, Germany

Page 6 of 19

Contributors Julia Fischer

Cognitive Ethology LaboratoryLeibniz Institute for Primate Cognition

German Primate Center

Göttingen, Germany

Samuel J. Gershman

Department of Psychology and Center for Brain Science

Harvard University

Cambridge, Massachusetts, USA

Tobias Gerstenberg

Department of Brain and Cognitive Sciences

Massachusetts Institute of Technology

Cambridge, Massachusetts, USA

Oren Griffiths

School of Psychology

Page 7 of 19

Contributors

UNSW Australia

Sydney, Australia

Thomas L. Griffiths

Department of Psychology

University of California, Berkeley

Berkeley, California, USA

York Hagmayer

Department of Psychology

University of Göttingen

Göttingen, Germany

Ulrike Hahn

Department of Psychological Sciences

Birkbeck, University of London

London, England, UK

Page 8 of 19

Contributors

Denis Hilton

Department of Psychology

University of Toulouse

Toulouse, France

Keith J. Holyoak

Department of Psychology

University of California, Los Angeles

Los Angeles, California, USA

Bernhard Hommel

Cognitive Psychology UnitLeiden Institute for Brain and Cognition

Leiden University

Leiden, The Netherlands

Samuel G. B. Johnson

Page 9 of 19

Contributors Department of Psychology

Yale University

New Haven, Connecticut, USA

P. N. Johnson-Laird

Department of PsychologyPrinceton University

Princeton, New Jersey, USA;

Department of Psychology New York University

New York, New York, USA

Sangeet S. Khemlani

Navy Center for Applied Research in Artificial Intelligence

Naval Research Laboratory

Washington, DC, USA

Nancy S. Kim

Department of Psychology

Page 10 of 19

Contributors Northeastern University

Boston, Massachusetts, USA

Florian Kutzner

Department of Psychology

University of Heidelberg

Heidelberg, Germany

David A. Lagnado

Department of Experimental Psychology

University College London

London, England, UK

Matthew S. Lebowitz

Department of Psychiatry

Columbia University

New York, New York, USA

Page 11 of 19

Contributors Mike E. Le Pelley

School of Psychology

UNSW Australia

Sydney, Australia

Hee Seung Lee

Department of Education

Yonsei University

Seoul, South Korea

Tania Lombrozo

Department of Psychology

University of California, Berkeley

Berkeley, California, USA

Hongjing Lu

Department of Psychology

Page 12 of 19

Contributors

University of California, Los Angeles

Los Angeles, California, USA

(p. xi)

Antonio Maldonado

Mind, Brain, and Behavior Research Center

Experimental Psychology Department

University of Granada

Granada, Spain

Ralf Mayrhofer

Department of Psychology

University of Göttingen

Göttingen, Germany

Björn Meder

Center for Adaptive Behavior and Cognition

Max Planck Institute for Human Development

Page 13 of 19

Contributors

Berlin, Germany

Douglas L. Medin

Department of Psychology

Northwestern University

Evanston, Illinois, USA

Paul Muentener

Department of Psychology

Tufts University

Medford, Massachusetts, USA

Mike Oaksford

Birkbeck College

University of London

London, England, UK

Page 14 of 19

Contributors Joachim T. Operskalski

Decision Neuroscience Laboratory Beckman Institute

University of Illinois Urbana-Champaign

Urbana, Illinois, USA

Magda Osman

School of Biological and Chemical Sciences

Queen Mary University of London

London, England, UK

David E. Over

Psychology Department

Durham University

Durham, England, UK

José C. Perales

Mind, Brain, and Behavior Research Center

Page 15 of 19

Contributors

Experimental Psychology Department

University of Granada

Granada, Spain

Bob Rehder

Department of Psychology

New York University

New York, New York, USA

Benjamin Margolin Rottman

Learning Research and Development Center

University of Pittsburgh

Pittsburgh, Pennsylvania, USA

Christian Schloegl

Cognitive Ethology LaboratoryLeibniz Institute for Primate Cognition

German Primate Center

Page 16 of 19

Contributors

Göttingen, Germany

Torgrim Solstad

Centre for General Linguistics (ZAS)

Berlin, Germany

Joshua B. Tenenbaum

Department of Brain and Cognitive Sciences

Massachusetts Institute of Technology

Cambridge, Massachusetts, USA

Robert Thorstad

Department of Psychology

Emory University

Atlanta, Georgia, USA

Nadya Vasilyeva

Page 17 of 19

Contributors Department of Psychology

University of California, Berkeley

Berkeley, California, USA

Michael R. Waldmann

Department of Psychology

University of Göttingen

Göttingen, Germany

Peter A. White

School of Psychology

Cardiff University

Cardiff, Wales, UK

Phillip Wolff

Department of Psychology

Emory University

Page 18 of 19

Contributors Atlanta, Georgia, USA

Frank Zenker

Department of Philosophy & Cognitive Science

Lund University

Lund, Sweden

(p. xii)

Page 19 of 19

Causal Reasoning: An Introduction

Causal Reasoning: An Introduction   Michael R. Waldmann The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.1

Abstract and Keywords Although causal reasoning is a component of most human cognitive functions, it has been neglected in cognitive psychology for many decades. To date, textbooks on cognitive psy­ chology do not contain chapters on causal reasoning. The goal of this Handbook is to fill this gap, and to offer state-of-the-art reviews of the field. This introduction to the Hand­ book provides a general review of different competing theoretical frameworks modeling causal reasoning and learning. It outlines the relationship between psychological theories and their precursors in normative disciplines, such as philosophy and machine learning. It reviews the wide scope of tasks and domains in which the important role of causal knowledge has been documented. In the final section it previews the chapters of the handbook. Keywords: causal reasoning, learning, psychology, cognitive psychology, philosophy, machine learning

Causal reasoning is one of our most central cognitive competencies, enabling us to adapt to the world. Causal knowledge allows us to predict future events, or to diagnose the causes of observed facts. We plan actions and solve problems using knowledge about cause–effect relations. Without our ability to discover and empirically test causal theo­ ries, we would not have made progress in various empirical sciences, such as physics, medicine, biology, or psychology, and would not have been able to invent the many tech­ nologies that have changed human society. The ubiquity of causal reasoning has attracted researchers from various disciplines to this topic. Philosophers have studied causality for centuries, but more recently the topic has also motivated research in the fields of eco­ nomics, biology, physics, anthropology, statistics, and artificial intelligence, to name just a few. Thus, causality is a genuinely interdisciplinary topic. Yet causal reasoning has been curiously absent from mainstream cognitive psychology until recently. Although there has been some limited work in more specific areas, such as developmental or social psychology, it has not played a significant role in fundamental theories of human cognition. To date, textbooks on cognitive psychology do not contain chapters on causal reasoning. One reason for this neglect may be that psychology was, Page 1 of 17

Causal Reasoning: An Introduction for many decades, dominated by the view that cognitive mechanisms, such as associative learning or deductive reasoning, are general, and can be specified independently from the domains to which they are being applied. For example, an associative learning mecha­ nism in a particular task can learn about diseases and symptoms; this same mechanism, however, also can learn about arbitrary associations between shapes and colors. The learning domain is, according to this long-held view, arbitrary. What interested re­ searchers was the nature of the general associative learning mechanism. The situation has slowly changed in the past decades, with more and more research de­ voted to the particular characteristics of reasoning and learning about causal systems. Given that causal reasoning has become an increasingly important research area within cognitive science, the lack of coverage (p. 2) in standard introductory textbooks is unsat­ isfactory. There are some specialized monographs and book chapters that provide selec­ tive introductions to subfields of research, but there is no comprehensive introductory handbook that covers the entire field. The goal of The Oxford Handbook of Causal Rea­ soning is to fill this gap, and to offer state-of-the-art reviews of the field.

Descriptive and Normative Theories Whereas cognitive psychology has for a long time neglected causal reasoning, causality has been one of the central topics of philosophy throughout its history. In fact, psychologi­ cal theories of causal reasoning have been greatly influenced by philosophical accounts (see Waldmann & Hagmayer, 2013). For example, associative and probabilistic theories of causal learning have been inspired by Hume’s (1748/1977) analyses of causation. Disposi­ tional theories and force dynamics can be traced back to Aristotelian theories (see Kistler & Gnassounou, 2007). Others have incorporated ideas of Kant’s philosophy of causation (Cheng, 1997). The discussions in philosophy have influenced other disciplines as well, fo­ cusing on the development of normative accounts, such as methodology, statistics, and machine learning. A broad introduction to philosophical accounts can be found in the Ox­ ford Handbook of Causation (Beebee, Hitchcock, & Menzies, 2009). For some, it may be surprising that psychological theories aiming to provide descriptive accounts might be inspired by normative theories of causation. However, the overlap is not accidental. Both scientists and laypeople aim to be correct when they make causal claims. Thus, causal claims are generally associated with a normative force (see Spohn, 2002; Waldmann, 2011). This commonality may be the reason that normative theories provide insights into the cognitive processes associated with causal reasoning. A recent example of the mutual influence between normative and descriptive theories are causal Bayes nets, which were first developed in philosophy and engineering (see Pearl, 1988, 2000; Spirtes, Glymour, & Scheines, 2001) but have also been adopted by psychologists as models of everyday causal reasoning (see Rottman & Hastie, 2014; Waldmann & Hag­ mayer, 2013, for reviews).

Page 2 of 17

Causal Reasoning: An Introduction Despite similar goals of scientists and laypeople, it is implausible to expect a complete overlap between normative and descriptive accounts. Unlike scientists, laypeople typical­ ly have little knowledge of mechanisms (see Rozenblit & Keil, 2002). An important differ­ ence between normative and descriptive approaches is that philosophers and scientists generally try to develop a uniform coherent account that is based on a small set of basic principles. By contrast, laypeople often care little about overall coherence and consisten­ cy (see Arkes, Gigerenzer, & Hertwig, 2016), and therefore frequently violate normative principles (see, e.g., Rottman & Hastie, 2014).

The Ubiquity of Causal Reasoning The renewed interest in causal reasoning has led to research in numerous areas of cogni­ tive science. In all these areas, new theories have been developed that incorporate causal representations. These theories are typically contrasted with previous competing ap­ proaches that focus on domain-general non-causal principles. Learning was one of the first fields in which new causal theories were tested against the dominant associative view (e.g., Cheng, 1997; Griffiths & Tenenbaum, 2005; Shanks & Dickinson, 1987; Wald­ mann, 1996; Waldmann & Holyoak, 1992). The initial focus on learning is not surprising in light of the huge influence of Hume (1748/1977) on psychological learning theories. Hume paved the way to associative learning theories by forcefully arguing that the im­ pression of causality is merely based on repeated observations of regular event se­ quences. Similar debates between domain-general and causal theories were conducted in other areas. For example, the important role of causal knowledge was also discovered in research on deductive reasoning (e.g., Ali, Chater, & Oaksford, 2011; Cummins, 1995; Fernbach & Erb, 2013) in which previous theories based on psychological variants of log­ ics were the dominant approach. In categorization research, similarity-based theories (e.g., prototype and exemplar theories) competed with theory-based accounts that high­ lighted the role of causal knowledge (e.g., Ahn, Kim, Lassaline, & Dennis, 2000; Lien & Cheng, 2000; Murphy & Medin, 1985; Rehder & Hastie, 2001; Waldmann & Hagmayer, 2006; Waldmann, Holyoak, & Fratianne, 1995). Similar developments can be seen in oth­ er areas, such as inductive reasoning (e.g., Kemp & Tenenbaum, 2009; Rehder, 2009), analogical reasoning (e.g., Lee & Holyoak, 2008), visual perception (e.g., Buehner & Humphreys, 2009; White, 2009), moral reasoning (e.g., Samland & Waldmann, 2016; Waldmann & Dieterich, 2007), decision-making (e.g., Hagmayer & Sloman, 2009), and language understanding (e.g., Talmy, 1988; Wolff & Song, 2003). The increasing awareness of the important role of causal knowledge in virtually all cognitive functions has also led to a heightened interest in elucidating the role of this knowledge in applied domains, such as legal reasoning (e.g., Fenton, Neil, & Lagnado, 2013; Spellman & Tenney, 2010) or psychopathology (e.g., Kim & Ahn, 2002). (p. 3)

Page 3 of 17

Causal Reasoning: An Introduction

Frameworks of Causal Reasoning This Handbook presents various specific theories of causal reasoning. To provide a broad general overview, I summarize these here in terms of general prototypic frameworks. The main distinguishing features of these frameworks are the proposed causal relata (i.e., the type of entities that enter causal relations) and the type of causal relations that are used to represent causal scenarios.

The Dependency Framework The dependency view of causation is shared by several psychological theories that other­ wise compete with each other, including associative theories (see López & Shanks, 2008), conditional reasoning accounts (e.g., Goldvarg & Johnson-Laird, 2001; Over, Hadjichris­ tidis, Evans, Handley, & Sloman, 2007), covariation theories (e.g., Cheng & Novick, 1992; Perales & Shanks, 2007), power PC theory (Cheng, 1997), causal model theories (e.g., Gopnik, Glymour, Sobel, Schulz, Kushnir, & Danks, 2004; Rehder & Hastie, 2001; Sloman, 2005; Waldmann, 1996; Waldmann & Holyoak, 1992), and Bayesian inference theories (Griffiths & Tenenbaum, 2005, 2009; Lu, Yuille, Liljeholm, Cheng, & Holyoak, 2008; Med­ er, Mayrhofer, & Waldmann, 2014). According to dependency theories, a variable C is a cause of its effect E if variable E depends upon C. The causal relata C and E are generally understood as variables that, in the binary case, denote the presence or absence of events, facts, properties, or states of affairs. The variables may also be continuous. The key issue in competing theories within the dependency framework concerns the mod­ eling of causal relations. Many theories assume that the observed covariation between causes and effects in a sample is a direct measure of causation. Causes alter, in the bina­ ry case, the probability of their effects. Some more advanced theories are sensitive to co­ variations between alternative causes, interactions, and control for possible confoundings (e.g., Novick & Cheng, 2004; Rescorla & Wagner, 1972). Covariations are symmetric and therefore do not allow a distinction between cause and effect. Hence, many causal theories have added cues that allow us to distinguish between these two types of events. One cue that has already been proposed by Hume (1748/1977) is temporal order (causes precede their effects) (e.g., Goldvarg & Johnson-Laird, 2001). In standard cases, the ordering of learning events (cues, outcomes) typically honors tempo­ ral order. When we observe events in real time, we perceive cause events before we per­ ceive the corresponding effect events. Thus, temporal order corresponds here to causal order. However, a physician, for example, may be confronted with a symptom (an effect) first, which is then used in diagnostic procedures designed to discover the probable cause. Honoring temporal order here is more complicated and requires a separation be­ tween the ordering of learning events and the ordering of events in the causal model rep­ resentation (see Waldmann, 1996, 2000; Waldmann, & Holyoak, 1992).

Page 4 of 17

Causal Reasoning: An Introduction Apart from temporal order, other cues to causality have been proposed, including coun­ terfactuals (Halpern, 2016; Halpern & Hitchcock, 2015; Lagnado, Gerstenberg, & Zultan, 2013; Lewis, 1973), hypothetical interventions (Lagnado & Sloman, 2004; Pearl, 2000; Spirtes et al., 2001; Woodward, 2003), mechanism information (Ahn, Kalish, Medin, & Gelman, 1995; Pearl, 2000), or a recourse to prior knowledge (Griffiths & Tenenbaum, 2009; Lagnado, Waldmann, Hagmayer, & Sloman, 2007; Waldmann, 1996). A more recent development separates the observed sample from the underlying causal structure that generated the observed patterns (see Dwyer & Waldmann, 2016). Accord­ ing to this view, observed data are used to make statistical inferences about the generat­ ing causal structure, for example about unobservable causal powers or about the struc­ ture of the causal model (e.g., Cheng, 1997; Griffiths & Tenenbaum, 2005; Lu et al., 2008; Meder et al., 2014). Causal models also allow for a representation of mechanisms that can be modeled as chains or networks of interconnected variables. For example, the causal relation between smoking and lung cancer can be elaborated by specifying intervening variables, such as genetic alterations caused by the inhalation of carcinogenic substances. Causal model theories are particularly good at explaining how people make statistical in­ ferences from observed causes to effects (predictive reasoning) or from observed effects to probable causes (diagnostic reasoning; see Fernbach, Darlow, & Sloman, 2011 (p. 4) ; Meder et al., 2014; Waldmann, 2000; Waldmann & Holyoak, 1992). One especially impor­ tant feature is their capability to predict the outcomes of hypothetical interventions when only observational knowledge is available (Blaisdell, Sawa, Leising, & Waldmann, 2006; Meder, Hagmayer, & Waldmann, 2008, 2009; Pearl, 2000; Sloman & Lagnado, 2005; Spirtes et al., 2001; Waldmann & Hagmayer, 2005). A further strength of dependency the­ ories is their focus on learning. Various models of causal learning processes have been developed within several (otherwise competing) theories. An important commonality of accounts within each framework is the kind of task that is used in experimental research. The fact that dependency theories focus on causal depen­ dencies between variables is manifest in the typical empirical research paradigms. In ex­ periments, causal information is typically presented in terms of described (e.g., Ali, Chater, & Oaksford, 2011; Fernbach, Darlow, & Sloman, 2011; Rehder, 2014) or experi­ enced (e.g., Gopnik et al., 2004; Waldmann, 2000; see Rehder & Waldmann, 2016, for a contrast between the two formats) covariations between causal variables that represent events. Examples of cover stories are scenarios that describe medicines causing headache (e.g., Buehner, Cheng, & Clifford, 2003), foods causing allergies (e.g., Shanks & Darby, 1998), or fertilizers causing plants to bloom (e.g., Lien & Cheng, 2000).

The Process Framework The core idea of process theories is that causation involves some kind of transfer of quan­ tity from cause to effect. For example, atoms decaying or billiard balls moving across a ta­ ble are examples of causal processes. The main focus in these theories is on continuous Page 5 of 17

Causal Reasoning: An Introduction causal processes, which specify causal relations. Causes and effects, the causal relata, are event representations that are abstracted over the causal processes. Most philosophi­ cal accounts are restricted to physical causation and turn to physics to identify the right kind of quantity that is being propagated (see Paul & Hall, 2013). Fair (1979) suggests en­ ergy, while Salmon (1984) and Dowe (2000) propose that any kind of conserved quantity (e.g., linear momentum, charge) is transmitted. For psychology these theories are of limit­ ed value because causal reasoning is not restricted to physical domains, and laypeople of­ ten lack detailed knowledge about physics. However, more generally applicable theories have been developed in philosophy that analyze mechanisms in terms of the involved enti­ ties and activities (Machamer, Darden, & Craver, 2000). These theories may prove more useful for psychology. In psychology, there has been a debate about whether causal representations primarily embody knowledge of covariations or of mechanisms linking causes and effects. This de­ bate has inspired a large number of studies not only in developmental psychology (e.g., Bullock, Gelman, & Baillargeon, 1982; Koslowski, 1996; Shultz & Kestenbaum, 1985), but also more recently in cognitive psychology (Ahn et al., 1995; Fugelsang & Thompson, 2003). The debate between proponents of the covariation and the mechanism view has led to the development of models within the dependency framework that express mecha­ nism knowledge so that the claim that covariations and mechanisms are mutually exclu­ sive constructs became obsolete. However, it is still important to find out how people rep­ resent mechanisms, how much they know about mechanisms in their environment, and how knowledge of mechanisms influences causal reasoning (Buehner, 2005; Johnson & Ahn, 2015; Park & Sloman, 2013; Rozenblit & Keil, 2002).

The Disposition Framework A third framework can be traced back to Aristotle’s treatment of causation (see Kistler & Gnassounou, 2007). Whereas dependency theories focus on interrelations between events, the primary causal relata of the disposition framework are the objects involved in causal interactions; for example, the two colliding balls in Michotte’s (1963) task, or as­ pirin and a person with a headache in a medical scenario. A dispositional account of cau­ sation would analyze the causal relation between aspirin and the removal of a headache, for example, as a product of the interaction between aspirin, a substance endowed with the disposition (or capacity, potentiality, power) to relieve headaches, and human bodies, which have the disposition to be influenced by aspirin under specific circumstances. Ac­ cording to this view, causal dependency relations are secondary; they arise as a product of the interplay of objects that are endowed with causal dispositions. Theories within the dispositional framework vary with respect to the abstractness of the object types and the characterization of the dispositional properties. The dominant theo­ ries in psychology and linguistics may be traced back to Aristotle’s distinction between two abstract kinds of objects: causal (p. 5) agents and causal patients (other terms have also been proposed). A causal patient is an animate or inanimate, concrete or abstract ob­ ject that is acted on by a causal agent. Thus, agents (who need not be human) are the Page 6 of 17

Causal Reasoning: An Introduction more active part of a causal interaction, whereas patients are more passive recipients of the agents’ influence, exerting some degree of resistance. For example, in “Peter pushes Mary,” “push” has two arguments, with the subject describing an agent (Peter), and the object referring to the patient (Mary). A popular theory, first developed in linguistic semantics, is force dynamics. According to this account, agents emit forces that are received by patients. Forces are abstract notions and can be used to model various kinds of influences in the physical, social, or psychologi­ cal domains. This theory was initially developed and empirically tested in the context of verb semantics and was later applied to more complex linguistic expressions and the visu­ al perception of causal scenes (e.g., Gärdenfors, 2014; Mayrhofer & Waldmann, 2014, 2016; Talmy, 1988; White, 2006; Wolff, 2007). Whereas theories in psychology and linguistics have postulated abstract characteriza­ tions of the causal participants (agents, patients), philosophers have developed disposi­ tional theories that are intended to model scientific theory building. These theories use more elaborate characterizations of dispositional properties and do not restrict their the­ ories to just two types of entities (e.g., Cartwright & Pemberton, 2013; Mumford and An­ jum, 2011; Waldmann & Mayrhofer, 2016).

Unitary, Pluralistic, and Hybrid Causal Theo­ ries Frameworks differ in terms of the causal relata they invoke and the way causal relations are construed. These differences make them more or less suitable for modeling specific tasks. For example, dependency theories are particularly good at modeling learning or predictive and diagnostic inferences within complex causal models, whereas dispositional theories are typically applied to linguistic phrases or visual scenarios showing interacting objects. Psycholinguistics rarely uses causal Bayes nets, while dispositional theories are rarely applied to causal learning tasks. Most of the debates in the field concern compet­ ing theories within a framework (e.g., associative vs. causal learning theories), not across frameworks. This division of labor raises the question of how different theories are interrelated. There are attempts to promote a unitary account by showing that it is possible to model phe­ nomena that previously appeared to be outside the scope of the theory within a unitary framework (e.g., Cheng, 1993; Wolff, 2014). However, these attempts are rare and have not led the community to converge on a single framework. Therefore, other proposals have been put forward that attempt to deal with the fact that there is no overarching unitary concept of causality. One position initially proposed in phi­ losophy is causal pluralism, which accepts that different tasks may be modeled best by different types of theories (e.g., Lombrozo, 2010).

Page 7 of 17

Causal Reasoning: An Introduction Another strategy is to develop hybrid theories that also postulate different kinds of causal representations, but focus on the way they interact in specific tasks (Waldmann & Mayrhofer, 2016). Various hybrid accounts have been developed both within and across frameworks. For example, causal model theory (Waldmann, 1996, 2007; Waldmann & Holyoak, 1992) has postulated that abstract knowledge about general properties of cau­ sation interacts with the learning of statistical dependencies (see also Griffiths & Tenen­ baum, 2009). Another example of a hybrid theory is a model that connects intuitive knowledge about physics with probabilistic inferences (“noisy Newton”; Battaglia, Ham­ rick, & Tenenbaum, 2013; Sanborn, Mansinghka, & Griffiths, 2013). An example of a hy­ brid theory that combines a dependency with a dispositional representation has been pre­ sented by Mayrhofer and Waldmann (2015) (see also Waldmann & Mayrhofer, 2016).

Overview of the Handbook This handbook brings together the leading researchers in the field of causal reasoning and offers state-of-the-art presentations of theories and research. Each chapter provides a bit of historical background and presents the most relevant theoretical approaches along with empirical research. The book is divided into five parts.

Part I: Theories of Causal Cognition The 12 chapters in Part I (Chapters 2–13) address foundational issues. Chapter 2, by Le Pelley, Griffiths, and Beesley, reviews research on associative theories of causal learning and discusses the relation between this class of theories and causal model theories. In Chapter 3, Perales, Catena, Cándido, and Maldonado focus on evidence for competing statistical rules that have been proposed as measures of (p. 6) causal strength. Chapter 4, by Boddez, DeHouwer, and Beckers, presents the authors’ inferential theory of causal learning, which views causal learning as a process that combines high-level reasoning with learning of covariation information. Chapter 5, by Cheng and Lu, discusses recent developments of power PC (probabilistic contrast) theory with a special focus on the role of causal invariance. In Chapter 6, Rottman provides an introduction to causal Bayes net theories with a particular focus on how these models handle learning. Griffiths presents in Chapter 7 theories and experiments that, using a Bayesian framework, study the role of prior knowledge in causal induction. Chapter 8, by Johnson and Ahn, reviews the litera­ ture on the role of mechanism knowledge in causal reasoning. Wolff and Thorstad present in Chapter 9 variants of theories of force dynamics, discussing various tasks, including language understanding and scene perception. In Chapter 10, Johnson-Laird and Khem­ lani provide an up-to-date review of the mental model theory of causation. Fiedler and Kutzner present their work on pseudo-contingencies in Chapter 11. In Chapter 12, Danks reviews theories modeling singular causation in individual cases and discusses the rela­ tion between singular and general causation. Finally, in Chapter 13, Operskalski and Bar­ bey review the literature on the neuroscience of causal reasoning in the context of com­ peting cognitive theories of causal reasoning.

Page 8 of 17

Causal Reasoning: An Introduction

Part II: Basic Cognitive Functions Whereas the chapters in Part I present general theories of causal reasoning, Part II focuses on specific cognitive functions. Most of the tasks discussed in this section have in the past been studied in the context of non-causal theories. However, more recently, re­ search has elucidated the important role of causality in these cognitive functions. Part II starts off with Chapter 14 by White, who presents research on visual impressions of cau­ sation (e.g., Michotte task). Apart from vision, action planning is a key competency of or­ ganisms that interact with a causally textured world. In Chapter 15, Hommel reviews the­ ories and studies on goal-directed actions, followed by a chapter on planning and control by Osman (Chapter 16). A related chapter, Chapter 17, authored by Gershman, reviews formal theories of reinforcement learning, along with relevant evidence. The next two chapters focus on conditional reasoning. Chapter 18, by Over, discusses general proba­ bilistic theories, whereas Chapter 19, by Oaksford and Chater, presents a causal model theory of conditional reasoning. The next two chapters are authored by Rehder. They present work studying the role of causal model representation in categorization (Chapter 20) and induction (Chapter 21). The following two chapters discuss causal explanation (Chapter 22 by Lombrozo and Vasilyeva) and models of diagnostic reasoning (Chapter 23 by Meder and Mayrhofer). Chapter 24, by Holyoak and Lee, reviews work on the causal underpinnings of analogical reasoning. Chapter 25, by Hahn, Bluhm, and Zenker, presents a new research program on causal argumentation, a research area that has been neglected so far. Finally, Hagmayer and Fernbach (Chapter 26) discuss work on the role of causality in decision-making.

Part III: Domains of Causal Reasoning Whereas the focus in Part II is on specific cognitive functions, Part III presents work on the role of causal reasoning in the most widely studied content domains. Chapter 27, by Gerstenberg and Tenenbaum, starts off with a review of work on the relation between in­ tuitive theories of physics and psychology and causal inference. Next, Buehner presents research in Chapter 28 on the role of spatiotemporal relations in causal learning. Lagna­ do and Gerstenberg review in Chapter 29 theories and studies on the role of causality in legal and moral reasoning. In Chapter 30, Ahn, Kim, and Lebowitz discuss recent work on causal reasoning about mental disorders. Solstad and Bott provide a review of research on causal reasoning in natural language understanding in Chapter 31. Finally, Hilton presents in Chapter 32 research on causal attribution and explanation in the social do­ main.

Part IV: Development, Phylogeny, and Culture Whereas the Handbook has a strong focus on research in cognitive psychology, causal reasoning is an interdisciplinary research topic that cuts across traditional boundaries. The final part of the Handbook presents work in neighboring research areas. Some of these areas would deserve their own handbooks, but their work is important here as well because they not only are influenced by theories developed in cognitive psychology, but Page 9 of 17

Causal Reasoning: An Introduction also have had a great influence on them. Part IV starts with two chapters on ontogenetic and phylogenetic topics. Chapter 33 by Muentener and Bonawitz reviews the literature on the development of causal reasoning, with a particular focus (p. 7) on early childhood. Chapter 34 by Schloegl and Fischer addresses the highly debated question of whether nonhuman animals are capable of causal reasoning, or whether they employ more basic strategies to orient in their world. Finally, Chapter 35 addresses a question that has lately attracted more and more interest in psychology. Researchers in various areas have ques­ tioned whether the standard theories of psychology are culturally biased toward Western industrialized societies (see Henrich, Heine, & Norenzayan, 2010). Given that theories of causality have mainly been developed in Western philosophy, a number of researchers have wondered whether this critique also might apply to research on causal reasoning. Bender, Beller, and Medin review relevant research in the final Chapter 35.

References Ahn, W.-K., Kalish, C. W., Medin, D. L., & Gelman, S. A. (1995). The role of covariation ver­ sus mechanism information in causal attribution. Cognition, 54, 299–352. Ahn, W.-K., Kim, N. S., Lassaline, M. E., & Dennis, M. J. (2000). Causal status as a deter­ minant of feature centrality. Cognitive Psychology, 41, 361–416. Ali, N., Chater, N., & Oaksford, M. (2011). The mental representation of causal condition­ al reasoning: Mental models or causal models. Cognition, 119, 403–418. Arkes, H. R., Gigerenzer, G., & Hertwig, R. (2016). How bad is incoherence? Decision, 3, 20–39. Battaglia, P. W., Hamrick, J. B., & Tenenbaum, J. B. (2013). Simulation as an engine of physical scene understanding. Proceedings of the National Academy of Sciences, 110, 18327–18332. Beebee, H., Hitchcock, C., & Menzies, P. (Eds.) (2009). The Oxford handbook of causation. Oxford: Oxford University Press. Blaisdell, A. P., Sawa, K., Leising, K. J., & Waldmann, M. R. (2006). Causal reasoning in rats. Science, 311, 1020–1022. Buehner, M. J. (2005). Contiguity and covariation in human causal inference. Learning and Behavior, 33, 230–238. Buehner, M. J., Cheng, P. W., & Clifford, D. (2003). From covariation to causation: A test of the assumption of causal power. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 1119–1140. Buehner, M. J., & Humphreys, G. R. (2009). Causal binding of actions to their effects. Psy­ chological Science, 20, 1221–1228.

Page 10 of 17

Causal Reasoning: An Introduction Bullock, M., Gelman, R., & Baillargeon, R. (1982). The development of causal reasoning. In W. J. Friedman (Ed.), The developmental psychology of time (pp. 209–254). New York: Academic Press. Cartwright, N., & Pemberton, J. M. (2013). Aristotelian powers: Without them, what would modern science do? In J. Greco & R. Groff (Eds.), Powers and capacities in philosophy: The new Aristotelianism (pp. 93–112). New York: Routledge. Cheng, P. W., & Novick, L. R. (1992). Covariation in natural causal induction. Psychologi­ cal Review, 99, 365–382. Cheng, P. W. (1993). Separating causal laws from casual facts: Pressing the limits of sta­ tistical relevance. In D. L. Medin (Ed.), The psychology of learning and motivation (vol. 30, pp. 215–264). New York: Academic Press. Cheng, P. W. (1997). From covariation to causation: A causal power theory. Psychological Review, 104, 367–405. Cummins, D. D. (1995). Naïve theories and causal deduction. Memory & Cognition, 23, 646–658. Dowe, P. (2000). Physical causation. Cambridge, UK: Cambridge University Press. Dwyer, D. M., & Waldmann, M. R. (2016). Beyond the information (not) given: Represen­ tation of stimulus absence in rats (Rattus norvegicus). Journal of Comparative Psychology, 130, 192–204. Fair, D. (1979). Causation and the flow of energy. Erkenntnis, 14, 219–250. Fenton, N., Neil, M., & Lagnado, D. A. (2013). A general structure for legal arguments about evidence using Bayesian networks. Cognitive Science, 37, 61–102. Fernbach, P. M., & Erb, C. D. (2013). A quantitative causal model theory of conditional reasoning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 39, 1327–1343. Fernbach, P. M., Darlow, A., & Sloman, S. A. (2011). Asymmetries in predictive and diag­ nostic reasoning. Journal of Experimental Psychology: General, 140, 168–185. Fugelsang, J. A., & Thompson, V. (2003). A dual-process model of belief and evidence in­ teractions in causal reasoning. Memory & Cognition, 31, 800–815. Gärdenfors, P. (2014). The geometry of meaning: Semantics based on conceptual spaces. Cambridge, MA: MIT Press. Goldvarg, E., & Johnson-Laird, P. N. (2001). Naïve causality: A mental model theory of causal meaning and reasoning. Cognitive Science, 25, 565–610.

Page 11 of 17

Causal Reasoning: An Introduction Gopnik, A., Glymour, C., Sobel, D. M., Schulz, L. E., Kushnir, T., & Danks, D. (2004). A the­ ory of causal learning in children: Causal maps and Bayes nets. Psychological Review, 111, 3–32. Griffiths, T. L., & Tenenbaum, J. B. (2005). Structure and strength in causal induction. Cognitive Psychology, 51, 285–386. Griffiths, T. L., & Tenenbaum, J. B. (2009). Theory-based causal induction. Psychological Review, 116, 661–716. Hagmayer, Y., & Sloman, S. A. (2009). Decision makers conceive of their choices as inter­ ventions. Journal of Experimental Psychology: General, 138, 22–38. Halpern, J. Y. (2016). Actual causality. Cambridge, MA: The MIT Press. Halpern, J. Y., & Hitchcock, C. (2015). Graded causation and defaults. British Journal for the Philosophy of Science, 66, 413–457. Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Be­ havioral and Brain Sciences, 33, 61–83. Hume, D. (1748/1977). An enquiry concerning human understanding. Indianapolis: Hack­ ett. Johnson, S. G. B., & Ahn, W. (2015). Causal networks or causal islands? The representa­ tion of mechanisms and the transitivity of causal judgment. Cognitive Science, 39, 1468– 1503. Kemp, C., & Tenenbaum, J. B. (2009). Structured statistical models of inductive reason­ ing. Psychological Review, 116, 20–58. Kim, N. S., & Ahn, W. (2002). Clinical psychologists’ theory-based representations of men­ tal disorders predict their (p. 8) diagnostic reasoning and memory. Journal of Experimen­ tal Psychology: General, 131, 451–476. Kistler, M., & Gnassounou, B. (Eds.) (2007). Dispositions and causal powers. Aldershot, UK: Ashgate. Koslowski, B. (1996). Theory and evidence: The development of scientific reasoning. Cam­ bridge, MA: MIT Press. Lagnado, D. A., & Sloman, S. A. (2004). The advantage of timely intervention. Journal of Experimental Psychology: Learning, Memory, and Cognition, 30, 856–876. Lagnado, D. A., Gerstenberg, T., & Zultan, R. (2013). Causal responsibility and counter­ factuals. Cognitive Science, 37, 1036–1073.

Page 12 of 17

Causal Reasoning: An Introduction Lagnado, D. A., Waldmann, M. R., Hagmayer, Y., & Sloman, S. A. (2007). Beyond covaria­ tion: Cues to causal structure. In A. Gopnik & L. E. Schultz (Eds.), Causal learning: Psy­ chology, philosophy, and computation (pp. 154–172). Oxford: Oxford University Press. Lee, H. S., & Holyoak, K. J. (2008). The role of causal models in analogical inference. Jour­ nal of Experimental Psychology: Learning, Memory, and Cognition, 34, 1111–1122. Lewis, D. (1973). Counterfactuals. Cambridge, MA: Harvard University Press. Lien, Y., & Cheng, P. W. (2000). Distinguishing genuine from spurious causes: A coherence hypothesis. Cognitive Psychology, 40, 87–137. Lombrozo, T. (2010). Causal–explanatory pluralism: How intentions, functions, and mech­ anisms influence causal ascriptions. Cognitive Psychology, 61, 303–332. López, F. J., & Shanks, D. R. (2008). Models of animal learning and their relations to hu­ man learning. In R. Sun (Ed.), Cambridge handbook of computational psychology (pp. 589–611). Cambridge, UK: Cambridge University Press. Lu, H., Yuille, A. L., Liljeholm, M., Cheng, P. W., & Holyoak, K. J. (2008). Bayesian generic priors for causal learning. Psychological Review, 115, 955–982. Machamer P., Darden L., & Craven C. (2000). Thinking about mechanisms. Philosophy of Science, 67, 1–25. Mayrhofer, R., & Waldmann, M. R. (2014). Indicators of causal agency in physical interac­ tions: The role of the prior context. Cognition, 132, 485–490. Mayrhofer, R., & Waldmann, M. R. (2015). Agents and causes: Dispositional intuitions as a guide to causal structure. Cognitive Science, 39, 65–95. Mayrhofer, R., & Waldmann, M. R. (2016). Causal agency and the perception of force. Psy­ chonomic Bulletin & Review, 23, 789–796. Meder, B., Hagmayer, Y., & Waldmann, M. R. (2008). Inferring interventional predictions from observational learning data. Psychonomic Bulletin & Review, 15, 75–80. Meder, B., Hagmayer, Y., & Waldmann, M. R. (2009). The role of learning data in causal reasoning about observations and interventions. Memory & Cognition, 37, 249–264. Meder, B., Mayrhofer, R., & Waldmann, M. R. (2014). Structure induction in diagnostic causal reasoning. Psychological Review, 121, 277–301. Michotte, A. E. (1963). The perception of causality. New York: Basic Books. Mumford, S., & Anjum, R. L. (2011). Getting causes from powers. New York: Oxford Uni­ versity Press.

Page 13 of 17

Causal Reasoning: An Introduction Murphy, G. L., & Medin, D. L. (1985). The role of theories in conceptual coherence. Psy­ chological Review, 92, 289–316. Novick, L. R., & Cheng, P. W. (2004). Assessing interactive causal power. Psychological Review, 111, 455–485. Over, D. E., Hadjichristidis, C., Evans, J. St. B. T., Handley, S. J., & Sloman, S. A. (2007). The probability of causal conditionals. Cognitive Psychology, 54, 62–97. Park, J., & Sloman, S. A. (2013). Mechanistic beliefs determine adherence to the Markov property in causal reasoning. Cognitive Psychology, 67, 186–216. Paul, L. A., & Hall, P. (2013). Causation: A user’s guide. New York: Oxford University Press. Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible infer­ ence. San Mateo, CA: Morgan Kaufmann. Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge, UK: Cambridge University Press. Perales, J. C., & Shanks, D. R. (2007). Models of covariation-based causal judgment: A re­ view and synthesis. Psychonomic Bulletin and Review, 14, 577–596. Rehder, B., & Hastie, R. (2001). Causal knowledge and categories: The effects of causal beliefs on categorization, induction, and similarity. Journal of Experimental Psychology: General, 130, 323–360. Rehder, B., & Waldmann, M. R. (2016). Failures of explaining away and screening off in described versus experienced causal learning scenarios. Memory & Cognition. Rehder, B. (2009). Causal-based property generalization. Cognitive Science, 33, 301–343. Rehder, B. (2014). Independence and dependence in human causal reasoning. Cognitive Psychology, 72, 54–107. Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and non-reinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 64–99). New York: Ap­ pleton-Century-Crofts. Rottman, B. M., & Hastie, R. (2014). Reasoning about causal relationships: Inferences on causal networks. Psychological Bulletin, 140, 109–139. Rozenblit, L., & Keil, F. C. (2002). The misunderstood limits of folk science: An illusion of explanatory depth. Cognitive Science, 26, 521–562. Salmon, W. C. (1984). Scientific explanation and the causal structure of the world. Prince­ ton, NJ: Princeton University Press. Page 14 of 17

Causal Reasoning: An Introduction Samland, J., & Waldmann, M. R. (2016). How prescriptive norms influence causal infer­ ences. Cognition, 156, 164–176. Sanborn, A. N., Mansinghka, V. K., & Griffiths, T. L. (2013). Reconciling intuitive physics and Newtonian mechanics for colliding objects. Psychological Review, 120, 411–437. Shanks, D. R., & Darby, R. J. (1998). Feature- and rule-based generalization in human as­ sociative learning. Journal of Experimental Psychology: Animal Behavior Processes, 24, 405–415. Shanks, D. R., & Dickinson, A. (1987). Associative accounts of causality judgment. In G. H. Bower (Ed.), The psychology of learning and motivation: Advances in research and the­ ory (Vol. 21, pp. 229–261). New York: Academic Press. Shultz, T. R., & Kestenbaum, N. R. (1985). Causal reasoning in children. In G. Whitehurst (Ed.), Annals of Child Development (Vol. 2, pp. 195–249). Greenwich, CT: JAI Press. Sloman, S. A., & Lagnado, D. A. (2005). Do we “do?” Cognitive Science, 29, 5–39. Sloman, S. A. (2005). Causal models: How we think about the world and its alternatives. Oxford: Oxford University Press. Spellman, B. A., & Tenney, E. R. (2010). Credibility in and out of court. Psychonomic Bul­ letin & Review, 17, 168–173. Spirtes, P., Glymour, C., & Scheines, R. (2001). Causation, prediction and search. New York: Springer. Spohn, W. (2002). The many facets of the theory of rationality. Croatian Journal of Philosophy, 2, 247–262. (p. 9)

Talmy, L. (1988). Force dynamics in language and cognition. Cognitive Science, 12, 49– 100. Waldmann, M. R., & Dieterich, J. (2007). Throwing a bomb on a person versus throwing a person on a bomb: Intervention myopia in moral intuitions. Psychological Science, 18, 247–253. Waldmann, M. R., & Hagmayer, Y. (2005). Seeing vs. doing: Two modes of accessing causal knowledge. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 216–227. Waldmann, M. R., & Hagmayer, Y. (2006). Categories and causality: The neglected direc­ tion. Cognitive Psychology, 53, 27–58. Waldmann, M. R., & Hagmayer, Y. (2013). Causal reasoning. In D. Reisberg (Ed.), Oxford handbook of cognitive psychology (pp. 733–752). New York: Oxford University Press.

Page 15 of 17

Causal Reasoning: An Introduction Waldmann, M. R., & Holyoak, K. J. (1992). Predictive and diagnostic learning within causal models: Asymmetries in cue competition. Journal of Experimental Psychology: Gen­ eral, 121, 222–236. Waldmann, M. R., & Mayrhofer, R. (2016). Hybrid causal representations. The Psychology of Learning and Motivation (pp. 85–127). New York: Academic Press. Waldmann, M. R. (1996). Knowledge-based causal induction. In D. R. Shanks, K. J. Holyoak, & D. L. Medin (Eds.), The psychology of learning and motivation, Vol. 34: Causal learning (pp. 47–88). San Diego: Academic Press. Waldmann, M. R. (2000). Competition among causes but not effects in predictive and di­ agnostic learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 53–76. Waldmann, M. R. (2007). Combining versus analyzing multiple causes: How domain as­ sumptions and task context affect integration rules. Cognitive Science, 31, 233–256. Waldmann, M. R. (2011). Neurath’s ship: The constitutive relation between normative and descriptive theories of rationality. Behavioral and Brain Sciences, 34, 273–274. Waldmann, M. R., Holyoak, K. J., & Fratianne, A. (1995). Causal models and the acquisi­ tion of category structure. Journal of Experimental Psychology: General, 124, 181–206. White, P. A. (2006). The causal asymmetry. Psychological Review, 113, 132–147. White, P. A. (2009). Perception of forces exerted by objects in collision events. Psychologi­ cal Review, 116, 580–601. Wolff, P. (2007). Representing causation. Journal of Experimental Psychology: General, 136, 82–111. Wolff, P. (2014). Causal pluralism and force dynamics. In B. Copley & F. Martin (Eds.), Causation in grammatical structures (pp. 100–119). Oxford, UK: Oxford University Press. Wolff, P., & Song, G. (2003). Models of causation and the semantics of causal verbs. Cog­ nitive Psychology, 47, 276–332. Woodward, J. (2003). Making things happen: A theory of causal explanation. Oxford: Ox­ ford University Press. (p. 10)

Michael R. Waldmann

Michael R. Waldmann is Professor of Psychology at the University of Göttingen, Ger­ many. He has received the early career research award from the German Society for Psychology, and is a Fellow of APS. Currently he is serving as an associate editor of the Journal of Experimental Psychology: Learning, Memory, and Cognition, and as chair of the Scientific Advisory Board of the Max Planck Institute for Human Devel­ Page 16 of 17

Causal Reasoning: An Introduction opment, Berlin. The focus of his research is on higher-level cognitive processes across different species and cultures.

Page 17 of 17

Associative Accounts of Causal Cognition

Associative Accounts of Causal Cognition   Mike E. Le Pelley, Oren Griffiths, and Tom Beesley The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.2

Abstract and Keywords Humans are clearly sensitive to causal structures—we can describe and understand causal mechanisms and make predictions based on them. But this chapter asks: Is causal learning always causal? Or might seemingly causal behavior sometimes be based on asso­ ciations that merely encode the information that two events “go together,” not that one causes the other? This associative view supposes that people often (mis)interpret associa­ tions as supporting the existence of a causal relationship between events; they make the everyday mistake of confusing correlation with causation. To assess the validity of this view, one must move away from considering specific implementations of associative mod­ els and instead focus on the general principle embodied by the associative approach— that the rules governing learning are general-purpose, and so do not differentiate be­ tween situations involving cause–effect relationships and those involving signaling rela­ tionships that are non-causal. Keywords: learning, causal, association, associative, covariation, correlation

The roots of the associative approach to causal cognition can be traced back to the British Empiricist philosophers of the seventeenth and eighteenth centuries. Most influ­ ential among these was David Hume, who viewed the causal relationship as central to all understanding, describing causation as “the cement of the universe.” Hume (1740/1978) noted in his “A Treatise on Human Understanding” that neither the senses nor reason can establish that one object (a cause) is connected together with another object (an effect) in such a way that the presence of one necessarily entails the existence of the other. In oth­ er words, we cannot observe the nature of the causal connection between two events. In the case of a falling stone, we are unable to directly perceive some law of gravity that causes the stone to fall. But despite our inability to see or prove that there are necessary causal connections, we continue to think and act as if we had knowledge of such connec­ tions. Hume proposed that our beliefs in the connectivity between causes and effects arise simply as a result of the constant conjunction (or association) of those events—that is, by the perception of regularities in our experience. Hume asks us to consider the case of a putative Adam, brought to life in the midst of the world and in “the full vigour of un­ Page 1 of 26

Associative Accounts of Causal Cognition derstanding.” Adam would behave as if he had no knowledge of causal relationships in the world. He would be unable to predict that putting his hand in a flame would cause pain, or that clouds anticipate the likely onset of rain, whereas we, endowed with the same senses and faculties of reasoning, are unable to resist making these and countless other such predictions. The critical difference between Adam and ourselves is the experience of associations be­ tween events in the world—in particular, between causes and effects. Thus we have ob­ served many examples of one object striking another and causing this second object to move. Despite, as Hume notes, the causal connection between these events being unob­ servable, this constant conjunction gives rise to an expectation that one event will be fol­ lowed (p. 14) by a second. Hume proposed that it is this expectation that is manifest in the mind as a necessary connection, or belief. Hence the idea of necessary connection or causality, which arises as a result of regularities of experience, has its root in the mind, and is then projected onto the world in the form of predictive knowledge. This idea that causal knowledge derives from the experience of statistical regularities be­ tween events in the world clearly has strong ties to the notion of associative learning: learning that events are associated with one another, on the basis of experience of their co-occurrence. As a result of over a century’s scientific study of associative learning in humans and non-human animals (going back to Thorndike, 1898), we now have a reason­ able grasp of the fundamental principles underlying associative learning. Encouraged by this corpus of knowledge, Dickinson, Shanks, and Evenden (1984) suggested that the es­ tablished principles of associative learning could usefully be brought to bear on the issue of causal learning; in essence, they argued that causal cognition could usefully be brought under the umbrella of associative learning. This suggestion prompted a wave of empirical studies that have attempted to test the validity of this framework. We shall ar­ gue in this chapter that while the results of this research were, at best, mixed, the most important outcome was a greater understanding of the complexities and nuances of causal learning and behaviour.

Causal Learning as Associative Learning “Causal learning” and “associative learning” are not synonyms. Crucially, associative learning simply requires a relationship to exist between the occurrence of two events, but it does not require this relationship to be a causal one—that is, associative learning is based on covariation, not causality. For example, Pavlov’s (1927) dogs learned that a bell preceded delivery of food, but the sound of the bell did not cause the food to be delivered. Similarly, I might learn that the seminars in our department on Fridays tend to be more interesting than those on Wednesdays, but the day of the week does not cause one semi­ nar to be better than another (that is, correlation does not equal causation). In fact, we can divide examples of associative learning into three categories. The first in­ volves instrumental (or operant) learning; that is, learning about the relationship between an action and the outcome that it produces. For example, flipping a light switch is associ­ Page 2 of 26

Associative Accounts of Causal Cognition ated with a light turning on. Clearly, instrumental learning must involve a causal relation­ ship.1 The second category we can term causal Pavlovian learning. This involves the situa­ tion in which one external event signals the occurrence of a second external event by virtue of a causal relationship between them. For example, we might learn that when clouds appear in the sky, rain is likely to follow. The final category is non-causal Pavlovian learning, where one external event signals another but does not cause that second event to occur. The examples in the preceding paragraph of Pavlov’s dogs and interesting semi­ nar days fall into this latter category. Recent work on the conditioning of attention provides a particularly clear demonstration of Pavlovian learning that is not based on causal knowledge. Le Pelley, Pearson, Griffiths, and Beesley (2015) trained participants on a visual search task in which they were re­ quired to look at a target (a diamond) as quickly as possible. The appearance of a particu­ lar distractor stimulus (say, a red circle) in the search display on a given trial signaled that participants would receive a large monetary reward for looking at the target, while the presence of a different distractor (a blue circle) signaled that looking at the target would receive a low reward. However, if participants looked at the distractor at any point prior to looking at the target, the reward that was scheduled for that trial was omitted. Under these circumstances, participants were nevertheless more likely to look at the “high-value” distractor (the red circle in this case) than the “low-value” distractor, even though doing so meant that they were more likely to miss out on large rewards than small rewards. That is, the high-value distractor was more likely to capture attention than was the low-value distractor. The implication is that attentional capture in this task was modu­ lated by the Pavlovian relationship between distractor color and reward: the high-value distractor had a Pavlovian relationship with large reward (i.e., it signaled the availability of large reward), and as a consequence of this relationship it became more likely to cap­ ture attention in future. Notably, this conditioned capture of attention by the high-value distractor occurred even if participants were explicitly informed that looking at the dis­ tractor caused omission of the reward, both in initial instructions and following every tri­ al on which reward was omitted (Pearson, Donkin, Tran, Most, & Le Pelley, 2015). In fact, the magnitude of the conditioned effect was unaltered compared to the case in which par­ ticipants were not explicitly informed of this causal relationship regarding omissions. So even when participants were fully aware of (p. 15) the true causal relationship present in the task, their conditioned attentional response was not influenced by this knowledge; in­ stead, their conditioned responding continued to follow the (non-causal) Pavlovian rela­ tionship between colors and rewards. The suggestion, then, is that this example of a con­ ditioned attentional response—wherein stimuli that predict high reward are more likely to capture attention than those that signal low reward—reflects the operation of a relatively automatic Pavlovian process that is not sensitive to causal knowledge. To summarize, associative learning can be based on learning about causal relationships, but it can also be insensitive to causality. One view, then, is that causal learning is simply a subset of associative learning. According to this framework, there is a general-purpose set of principles that underlie associative learning, and these principles apply equally to situations involving causality and those that merely involve signaling (that is, covaria­ Page 3 of 26

Associative Accounts of Causal Cognition tion). The alternative is that there is something “special” about causal relationships that sets them apart from instances of mere covariation (Cheng & Lu, Chapter 5 in this vol­ ume; Rottman, Chapter 6 in this volume; Hagmayer & Fernbach, Chapter 26 in this vol­ ume, 2005). To the extent that this alternative hypothesis is true, the associative account of causal learning will be found wanting. In the following we consider some of the evi­ dence for parallels between causal and non-causal associative learning.

Contiguity, Contingency, and Cue Competition In a series of papers, Dickinson, Shanks, and colleagues (for reviews, see Dickinson, 2001; Shanks, 1995) investigated whether the formation of causal beliefs in humans was subject to certain “standard” associative principles that had been established in earlier studies of animal conditioning. The first of these is the temporal contiguity of events— that is, the delay between their occurrence. Animal studies of non-causal Pavlovian condi­ tioning show that the greater the delay between two events, the weaker the association that is formed between them. For example, if a tone (CS) reliably signals a puff of air to the eye (US), then a rabbit will learn to move its nictitating membrane (third eyelid) in re­ sponse to the tone so as to attenuate the influence of the air puff. This conditioning oc­ curs rapidly if the US follows 250 milliseconds after the CS during training, but the rate of conditioning steadily decreases as the CS–US interval used during training becomes longer (Schneiderman & Gormezano, 1964; for related examples, see Gibbon, Baldock, Locurto, Gold, & Terrace, 1977; Hawkins, Carew, & Kandel, 1986; Ost & Lauer, 1965). Shanks and Dickinson (1991) demonstrated that causal learning in humans shows a simi­ lar sensitivity to temporal contiguity. In their task, participants could press a button whenever they liked, at a cost of 1 point. The outcome was the illumination of a figure on the screen, and every time this outcome occurred, participants gained 3 points. Partici­ pants were asked to maximize the number of total points gained. In fact, 90% of button presses generated an outcome, so the optimal strategy was to press the button as fre­ quently as possible. If the outcome immediately followed the action, then participants did indeed learn to press the button at a high rate, and at the end of the experiment reported a strong belief in a causal relationship between action and outcome. However, training with a delay of 2 or 4 seconds between action and outcome disrupted this pattern, pro­ ducing a rate of button-pressing that did not differ from that of other participants trained under a regime in which pressing the button had no influence at all on whether the out­ come occurred. This effect of delay was also reflected in participants’ judgments of the strength of the causal relationship between action and outcome; with a 4-second delay, this relationship was judged to be around half as strong as with no delay. Causal and non-causal learning also show similar sensitivity to the degree of contingency between events, where contingency is defined as the difference between the probability of event 2 (E2) occurring given that event 1 (E1) has occurred [denoted P(E2|E1)], and the probability of E2 occurring in the absence of E1 [denoted P(E2| ¬E1)]. Using rats, Rescorla (1968; see also Rescorla, 1967) demonstrated that the degree of contingency be­ tween a tone CS and an electric shock US modulated the strength of Pavlovian condition­ Page 4 of 26

Associative Accounts of Causal Cognition ing. One group of rats experienced training in which shocks only ever occurred in the presence of the tone CS. These rats showed evidence of rapidly developing a fear of the tone (in that hearing the tone would cause them to stop pressing a lever that delivered food). A second group of rats experienced the same number of occasions on which the CS and US were paired (with the same degree of temporal contiguity). However, for this group, additional USs could occur when the CS was not present, such that overall there was no contingency between CS and US, that is, P(shock | tone) = P(shock | ¬ tone). These rats showed no evidence of developing fear of the CS, suggesting that (p. 16) the lack of contingency between CS and US prevented the learning of an association between them. Shanks and Dickinson (1991; see also Chatlosh, Neunaber, & Wasserman, 1985) demonstrated that causal learning of action–outcome relationships by humans is also sen­ sitive to contingency. Using the procedure outlined in the previous paragraph, they found that participants’ rate of button pressing and judgments of the strength of the causal re­ lationship between action and outcome declined systematically as the probability of un­ paired outcomes (i.e., illuminations of the figure when the button had not been pressed) increased. In fact, the relationship between contingency and learning is more subtle, and more inter­ esting, than this. Studies of non-causal Pavlovian conditioning have shown that, when ex­ posed to non-contingent presentations of a tone and shock, rats initially develop fear of the tone, but as experience continues, this fear diminishes and eventually disappears (Rescorla, 1972). Moreover, the strength of pre-asymptotic fear depends on the overall frequency of shock: Rescorla showed that this fear was greater in a condition in which P(shock | tone) = P(shock | ¬ tone) = 0.4 than in a condition in which P(shock | tone) = P(shock | ¬ tone) = 0.1. This is known as the outcome density bias. Both of these findings are mirrored in studies of causal learning in humans (for a review, see Shanks, 1995). Specifically, people’s judgments of the strength of a relationship between a cause and an effect are influenced by the overall frequency of the outcome (the effect), even if there is no contingency between the “cause” and the effect (Allan & Jenkins, 1983). However, as training continues, this influence of outcome frequency on causal judgments decreases, with judgments eventually coming to reflect normative contingencies (Shanks, López, Darby & Dickinson, 1996; Wasserman, Chatlosh & Neunaber, 1983). Further research shows that learning about the relationship between E1 and E2 is influ­ enced not only by the degree of contingency between E1 and E2, but also by the degree of contingency between other events (E3, E4, etc.) and E2, if these other events some­ times occur at the same time as E1. This is most neatly illustrated by the case of blocking (Kamin, 1968). A particularly clear, within-subjects demonstration of blocking of noncausal Pavlovian conditioning in animals is provided by McNally and Cole (2006; see Ta­ ble 2.1). Rats were initially exposed to pairings of a visual CS, denoted stimulus A, with shock. In a subsequent phase of training, rats experienced trials on which A was present­ ed in compound with a novel auditory stimulus, B, and again followed by shock. Also oc­ curring in this latter phase were trials in which a compound of a novel visual stimulus (C) and a novel auditory stimulus (D) was paired with shock. Following this treatment, rats showed evidence of greater fear of D than of B, suggesting that they had formed a Page 5 of 26

Associative Accounts of Causal Cognition stronger D–shock association than their B–shock association. This is interesting, because the experienced contingency between B and shock is exactly the same as that between D and shock: both stimuli were paired with shock on the same number of trials, and shock was experienced in the absence of each stimulus the same number of times. Moreover, the temporal contiguity between CS and shock is the same for all CSs. This demonstrates that associative learning is not entirely determined by contingency and contiguity. In par­ ticular, it shows that associative learning about cue–outcome relationships does not pro­ ceed independently for each cue considered in isolation: B and D, considered in isolation, both have the same relationship with shock, and yet more is learned about D. Instead, the implication is that learning in Stage 1 that A predicts shock acts to block subsequent learning that B predicts shock on AB→ shock trials in Stage 2. In contrast, since neither C nor D has been previously established as a predictor of shock when CD → shock trials are first experienced in Stage 2, neither will block learning about the other. Table 2.1 Design of the Within-Subjects Blocking Experiment by McNally and Cole (2006) Stage 1

Stage 2

Test

Result

A → shock

AB → shock

B

Greater fear of B than D

CD → shock

D

Note: A and C are visual stimuli (constant light and flashing light, counterbalanced across subjects); B and D are auditory stimuli (white noise and clicker, counterbal­ anced across participants). Blocking shows that cues interact in the learning process, and seem to compete for a lim­ ited amount of a learning resource that the outcome can support—hence blocking is often described as an example of cue competition in associative learning. And notably for the purposes of this chapter, blocking also occurs in human causal learning (e.g., Aitken, Larkin, & Dickinson, 2000; Griffiths & Le Pelley, 2009; Griffiths & Mitchell, 2008; Le Pel­ ley, Beesley, & Griffiths, 2014; Le Pelley, Beesley, & (p. 17) Suret, 2007; Shanks, 1985). For example, Aitken et al. (2000) used the common “allergy prediction” paradigm, in which participants play an allergist whose aim is to discover the cause of reactions in a fictitious patient, Mr. X. On each trial, the participant is told the contents of a meal eaten by Mr. X, and must predict whether or not he suffered an allergic reaction as a result. Immediate corrective feedback is provided, and each different meal is encountered many times, which allows participants to learn the correct response for each meal. For example, they might learn that eating beef and sprouts reliably results in an allergic reaction, but eating bananas is not followed by a reaction. Aitken et al.’s Experiment 2 contained a blocking contingency and its control. In the blocking contingency, Stage 1 trials in which food A was always paired with allergic reaction (denoted A+ trials) were followed by Stage 2 tri­ als in which a compound of foods A and B was always paired with allergic reaction (AB+). Page 6 of 26

Associative Accounts of Causal Cognition In the control contingency, participants experienced Stage 2 trials in which a compound of foods C and D caused reaction (CD+), but neither had previously been experienced in Stage 1. In a subsequent test phase, participants were presented with each cue individu­ ally and were asked to rate the efficacy of that food as a cause of allergic reaction. Block­ ing was demonstrated, in that food B was perceived as a weaker cause of allergy than was food D. Another demonstration of an interaction between cues in learning comes from studies of the effect of signaling non-contingent outcomes. Rescorla (1972) showed that delivering shocks in the inter-trial intervals (ITIs) between pairings of a tone and shock resulted in reduced fear of the tone. This is not surprising, since these ITI-shocks result in a reduced contingency between tone and shock (see earlier). However, for a further group of rats, all ITI-shocks were signaled by a different cue (a clicker). Even though this signal does not alter the contingency between tone and shock, rats in this latter group showed signifi­ cantly greater fear of the tone than did those for whom the ITI-shocks were unsignaled (see also Durlach, 1983; Rescorla, 1984). Similarly, signaling non-contingent occurrences of an outcome results in higher judgments of the strength of a causal relationship be­ tween an action (pressing a button) and an outcome (illumination of a figure on a screen) in humans (Shanks, 1986, 1989). So, causal and non-causal learning show similar sensitivities to a range of factors. Both are influenced by the degree of contiguity and contingency between events, and both are subject to cue competition. These findings seem to support the idea that there is nothing “special” about causal learning, that instead there is a set of general principles for asso­ ciative learning, and that these principles apply equally to situations involving causality as those that do not. At first glance, then, the case for an associative account of causal learning seems to be on solid ground. However, this initial promise has subsequently been met with criticisms that can be broadly divided into two categories: those that question specific implementations of associative principles, and those that question associative principles in general. We shall argue that the former present no challenge to the associative account, whereas the latter force us to concede (unsurprisingly) that such an account must fall short of provid­ ing a full account of human causal reasoning.

The “If It’s Not Rescorla–Wagner, It’s Not Asso­ ciative” Fallacy: General Principles Versus Spe­ cific Theories Many of the phenomena of associative learning described in the preceding section follow naturally from one of the most influential and famous theories of associative learning, the Rescorla–Wagner model (Rescorla & Wagner, 1972). At the heart of this model is the idea that the amount that is learned about a given cue in a given trial is related to the “sur­ prisingness” of the outcome on that trial. Formally, the model states that the strength of Page 7 of 26

Associative Accounts of Causal Cognition the association between a cue X and an outcome (denoted VX) is updated in each trial ac­ cording to the expression:

(1)

where ΔVX represents the change in VX on the current trial, and αX and βUS are fixed para­ meters representing the salience of cue X and the outcome, respectively. The error term (λ – ΣV) represents the discrepancy between the observed magnitude of the outcome (λ) and the magnitude of the outcome expected on the basis of all currently presented cues (ΣV)—that is, how surprising the occurrence of the outcome is, given the presence of the presented cues. Let us consider how this applies to a blocking contingency in which A+ trials are followed by AB+ trials. On initial A+ trials, the outcome is surprising, since it is not predicted by A; formally, we start (p. 18) with VA = 0, and the magnitude of the outcome on this trial is λ (where λ > 0) so the error term on the first trial will be λ – 0 = λ. Hence an association will develop between A and the outcome; that is, VA will increase, and will continue to in­ crease until (asymptotically) VA = λ. Consider now the AB+ trials of Stage 2. The outcome occurring in these trials is not surprising, since it is well predicted by the presence of A; formally, on the first Stage 2 trial we have VA ≈ λ (if we assume that Stage 1 training ap­ proaches asymptote) and VB = 0 (since it is novel), and so the error term is given by λ – ΣV = λ – (VA + VB) ≈ 0. Hence little is learned in these trials, and consequently B does not form a strong association with the outcome. So the model correctly predicts that prior training with A will block later learning about B. The Rescorla–Wagner model also provides a natural explanation of the finding that signal­ ing non-contingent outcomes results in perception of a stronger cue–outcome relationship (Rescorla, 1972, 1984; Shanks, 1986, 1989). If we denote the critical cue as A, and the ex­ perimental context as C, then the condition in which non-contingent outcomes are not signaled can be thought of as involving AC+, C+, and C– trials (where the latter repre­ sents periods during which the subject is exposed to the context but outcomes do not oc­ cur). The context will develop a (weak) association with the outcome as a consequence of C+ trials. This will render the outcome occurring on AC+ trials less surprising than would otherwise be the case, since the error term on these trials is λ – (VA + VC) with VC > 0. Hence (to an extent) learning about the context will block learning about the cue, A. Now, if the non-contingent outcomes are signaled by a different stimulus, S, then training involves AC+, SC+, and C– trials. The presence of the salient stimulus S on SC+ trials will reduce conditioning of the context C on these trials (through a process of overshad­ owing: see Rescorla & Wagner, 1972). Consequently, in this signaled group, the context will not compete with and block learning about the cue A to the same extent as in the unsignaled group. Hence the model anticipates stronger learning about A in the signaled group than in the unsignaled group, and this is the effect observed empirically both in causal and non-causal learning. Page 8 of 26

Associative Accounts of Causal Cognition Successes of the Rescorla–Wagner model in explaining phenomena of causal learning such as blocking, the effect of signaling, and the outcome density bias (Shanks, 1995, provides a clear description of this latter success) fueled the view that causal learning might be understood in terms of the associative principles that are implemented by this model, or one like it. This view had an unfortunate consequence, in that it has sometimes been taken to imply that any effect observed in causal learning that does not follow from the Rescorla–Wagner model therefore undermines the associative account of causal learn­ ing more generally. An example of this is provided by the phenomenon of backward blocking. Recall that blocking involves A+ trials in Stage 1, followed by AB+ and CD+ trials in Stage 2; judg­ ments of the causal strength of B following such training are lower than for D. A number of experiments have shown that a similar effect can be obtained even if the order of the training stages is reversed; that is, if initial training is with AB+ and CD+, and this is fol­ lowed by training with A+, then ratings of the causal strength of B are lower than those for D (e.g., Le Pelley & McLaren, 2001; Shanks, 1985; Wasserman & Berglan, 1998). The implication of this finding is that A+ trials in Stage 2 lead to a decline in the causal strength of B, even though B is not experienced during this phase—effects such as back­ ward blocking provide examples of the retrospective revaluation of cues. Notably, the Rescorla–Wagner model cannot account for backward blocking. Implicit in the model is the idea that only presented cues can undergo changes in associative strength: the salience of cue X ( ) is assumed to be zero if the cue is not presented, and hence according to Equation (1), VX cannot change if X is absent. So the model cannot ex­ plain the change in beliefs about B that must occur during A+ training in Stage 2 of a backward blocking procedure. The fact that backward blocking lies beyond the Rescorla–Wagner model (and the alterna­ tive “acquisition-based” models of associative learning that were dominant at the time, e.g., Mackintosh, 1975; Pearce & Hall, 1980; Wagner, 1981) led Shanks and Dickinson (1987) to wonder whether such findings undermined the associative approach to causal inference. But this concern is premature. While backward blocking clearly undermines the Rescorla–Wagner model (among others) as a complete account of causal inference, it does not necessarily undermine all possible implementations of an associative account. Indeed, it has been noted many times that the Rescorla–Wagner model does not provide a complete account of associative learning phenomena, regardless of the issue of causality (see Le Pelley, 2004; Miller, Barnet, & Grahame, 1995). The model remains influential be­ cause, despite its (p. 19) simplicity, it is able to account for a variety of phenomena of con­ ditioning—it has excellent heuristic value—but it does not explain everything. To reiterate, the occurrence of backward blocking undermines certain implementations of an associative account of causal learning (including the Rescorla–Wagner model), but that does not mean that it necessarily falls outside the scope of associative models more gen­ erally. Indeed, several researchers have proposed associative models that are able to ac­ count for this effect (e.g., Dickinson & Burke, 1996; Le Pelley & McLaren, 2001; Tassoni, Page 9 of 26

Associative Accounts of Causal Cognition 1995; Van Hamme & Wasserman, 1994). We should be clear here that we do not mean to endorse a particular associative account of backward blocking. We would not necessarily even argue that backward blocking is best interpreted as the outcome of an associative process. Our point is merely that this finding is, at least in theory, amenable to an analy­ sis in terms of associative learning, and hence does not provide strong evidence on which to evaluate the utility of associative accounts considered as a class. At the risk of becoming repetitive, we will restate our general point here since it is criti­ cal to our aim of evaluating associative accounts of causal learning: “associative learn­ ing” and “the Rescorla–Wagner model” are not synonymous. “Associative learning” is a generic term referring to learning about the covariation between events based on experi­ ence with those events. The Rescorla–Wagner model is just one possible mechanism for implementing associative learning. In this chapter, when we contrast associative learning and causal learning, this does not commit us to a particular mechanistic view of how as­ sociative learning occurs. For example, one view (typified by the Rescorla–Wagner model) sees associative learning as mediated by the formation of physical links between mental representations of stimulus elements. In some models, formation of these links is mediat­ ed by the attention that is paid to events (e.g., Le Pelley, 2004, 2010; Mackintosh, 1975; Pearce & Hall, 1980). An alternative, exemplar-based approach instead assumes that it is configurations of elements that form links with other events (e.g., Kruschke, 1992, 2003; Medin & Schaffer, 1978). Yet another view eschews the idea of links altogether, and in­ stead argues that associative learning is mediated by the calculation of probabilities (e.g., Cheng & Novick, 1990; Peterson & Beach, 1967), or by propositions that describe the as­ sociative relationship between events (Mitchell, De Houwer, & Lovibond, 2009). A large body of fascinating research has been conducted in an attempt to decide between these alternative mechanisms for associative learning (see, for example, Boddez, De Houwer, & Beckers, Chapter 4 in this volume), but this issue is orthogonal to the scope of this chap­ ter. Instead, for the purposes of this chapter, “associative learning” simply means “learn­ ing about covariation, however this is implemented,” and is contrasted with “causal learn­ ing,” which implies sensitivity to causal structure beyond merely covariation.2

Is There Anything Special About Causality? Causal Order At this point, the reader may be growing concerned about the issue of falsifiability. Might it be possible to come up with an associative model that could explain any pattern of da­ ta? If so, then we might never be able to find evidence that would allow us to judge the value of the associative approach considered in its most general terms, which would ren­ der it uninteresting from a research perspective. However, we shall argue that this is not the case. Specifically, we believe there are certain properties that should be common to any associative account, and which—when attempting to apply that account to studies of causal learning—generate testable and falsifiable predictions. The crucial point is that, as noted earlier, the associative account provides a general-purpose set of principles that ap­ Page 10 of 26

Associative Accounts of Causal Cognition ply equally to all learning situations, whether those situations involve causality or not. In other words, associative accounts are essentially blind to the nature of the events that are involved in learning. On this view, there is nothing “special” about causal relationships that sets them apart from instances of mere covariation. An example should make this clear. Associative accounts involve learning about signaling relationships based on temporal order—that the occurrence of cue events A and B signal that an outcome event X will occur subsequently. Hence according to any associative ac­ count, the “direction” of learning—from cues to outcomes—is determined by the temporal order of events. In contrast, the direction of causality flows from causes to effects, re­ gardless of the temporal order in which those events are encountered. Imagine that participants experience a number of trials in which cues A and B appear to­ gether, and signal that outcome X will occur. Under these circumstances, we would ex­ pect cue competition between A and B for the limited amount of learning that the out­ come X can support (similar to the (p. 20) case of blocking described earlier). That is, learning about the relationship between A and X will be weaker if training is with AB→X (in which competition between A and B will occur; the presence of B is said to overshad­ ow learning about A, and vice versa) than if it is with A→X (in which case there will be no competition). We can now contrast two cases. In the first, the scenario is such that cues A and B are naturally interpreted as causes (e.g., foods eaten by a patient) and X as an effect (an al­ lergic reaction caused by eating allergenic foods). In this case the temporal order of events is aligned with their causal order; this is often referred to as a predictive learning scenario. In the second case, the scenario is such that cues A and B are naturally inter­ preted as effects (e.g., symptoms suffered by a patient) and X as a cause (a disease that produces those symptoms). In this latter case, the temporal order of effects (A and B pre­ cede X) is opposite to their causal order (X causes A and B); this is a diagnostic learning scenario, in that participants learn to diagnose the cause on the basis of the presence of symptoms. The important point is that, to an associative model, these two cases are equivalent. In both, the model learns to predict X on the basis of the presence of A and B, regardless of the causal status of these events. Hence any phenomenon of learning, such as cue compe­ tition, that occurs in a predictive learning scenario should also be observed in an other­ wise comparable diagnostic learning scenario. However, if people are sensitive to the causal status of events, then we might expect to see differences between predictive and diagnostic scenarios, reflecting the different causal order inherent in each. In particular, Waldmann and Holyoak (1992, 1997) noted that one should expect competition between multiple independent causes of a common effect, but not between multiple independent effects of a common cause. If we have the experience that eating apple and banana caus­ es an allergic reaction, and we encounter independent evidence that apples cause allergy, then this should lead us to reduce the strength of our belief that bananas cause allergy. However, if we have experience that suffering from Jominy fever causes nausea and pus­ Page 11 of 26

Associative Accounts of Causal Cognition tules, and we encounter independent evidence that Jominy fever causes nausea, this should not lead us to change our belief that it also causes pustules. In other words, if par­ ticipants are sensitive to causal order, then cue competition should be observed in a pre­ dictive scenario, but not in a diagnostic scenario. The empirical evidence on this issue is mixed. Some studies have found evidence of an asymmetry, with cue competition observed in predictive but not diagnostic scenarios (e.g., Booth & Buehner, 2007; Tangen & Allan, 2004, Experiments 1 and 2; Van Hamme, Kao, & Wasserman, 1993; Waldmann, 2000, 2001; Waldmann & Holyoak, 1992). These findings imply a sensitivity to causality; that there is something special about a causal re­ lationship that distinguishes it from mere signaling of outcomes by cues. Such demonstra­ tions of asymmetry therefore undermine associative accounts of causal learning. And, un­ like backward blocking, this undermining applies not just to a specific implementation of an associative account, but to the approach in general. Having said that, however, a num­ ber of other studies have reported similar cue competition effects in both predictive and diagnostic scenarios (e.g., Cobos, López, Caño, Almaraz, & Shanks, 2002; Le Pelley & McLaren, 2001; Matute, Arcediano, & Miller, 1996; Shanks & López, 1996; Tangen & Al­ lan, 2004, Experiments 3 and 4). The symmetry between conditions found in these latter studies implies a lack of sensitivity to causal order, and thus supports the general-pur­ pose, cue→outcome approach taken by associative models of learning. How, then, are we to reconcile these discrepant findings? A likely possibility relates to the “availability” of the causal scenario to participants. In the studies cited earlier that found symmetry between predictive and diagnostic learning (suggesting insensitivity to causal order), participants simply read instructions regarding the particular causal scenario un­ der which they were being tested. The importance of considering causal order was not made explicitly salient to them, and, perhaps as a result, the data suggest that they did not consider this information when making judgments. In contrast, in the studies demon­ strating asymmetry, causal order was typically made more salient. For example, Wald­ mann (2000, 2001) had participants summarize the instructions prior to the experiment in order to verify that they had correctly understood the different causal structures involved in the different cover stories. It seems likely that this requirement emphasized to partici­ pants the importance of making use of causal order when forming their judgments, and as a consequence no cue competition was observed in the diagnostic condition. More direct evidence that the salience of the causal structure is the critical determinant of (a)symmetry between predictive and diagnostic (p. 21) learning comes from a study by López, Cobos, and Caño (2005). In Experiment 1A, participants read “standard” instruc­ tions about the causal scenario, which involved an electrical box with lights on the front and back. In the predictive condition, participants were told that illumination of lights on the front caused illumination of the lights on the back; in the diagnostic condition, they were told that illumination of lights on the front was caused by the lights on the back. Participants then completed a cue competition task in which, on each trial, they were shown which lights were illuminated on the front of the box and were required to predict which bulbs would be lit on the back (with corrective feedback). Results showed near-per­ Page 12 of 26

Associative Accounts of Causal Cognition fect symmetry, with equal evidence of cue competition in predictive and diagnostic condi­ tions. Importantly, this symmetry did not reflect participants’ failure to understand the task’s causal structure, since symmetry was also observed if analysis was restricted to on­ ly those participants who could remember and understand the causal structure in a com­ prehension test conducted after the cue competition task. Experiment 2 was exactly the same as Experiment 1A, except that a sentence was added to the initial instructions to emphasize the importance of making use of the information regarding causal structure: “To solve the task correctly, it is VERY IMPORTANT to take into account what you have just read in the instructions. Most importantly, in order to solve the different examples of the task … bear in mind that the lights on one side of the box cause the illumination of the lights on the other side of the box.” This instruction was sufficient to induce asymmetry, with significantly stronger cue competition in the predictive scenario than the diagnostic scenario. However, it did not improve participants’ understanding of the causal structure of the task as assessed in the final comprehension test, which was similar regardless of the inclusion or exclusion of the critical sentence in the instructions. To summarize, in Experiment 1A López et al. (2005) showed that participants will not nec­ essarily show sensitivity to causal structure even when they are aware of, and under­ stand, this structure. In Experiment 2 López et al. (2005) showed that, given explicit prompting to use information about causal structure when making judgments, partici­ pants were able to do so. Here we follow Shanks (2007) in arguing that these data sup­ port a “dual-model” approach to learning. The implication is that, given sufficient motiva­ tion, people can reason about causality in a non-associative way that distinguishes be­ tween predictive and diagnostic causal structures. This should come as no surprise what­ soever—the fact that we can write about, and you can understand, the distinction be­ tween these causal structures makes it clear that people are able to comprehend and make use of the difference between them. However, it also seems that, without prompt­ ing, people’s default approach is simply to learn about relationships between cues and outcomes in a manner that is insensitive to causal order. This default pattern operates in exactly the way anticipated by an associative account. A caveat of sorts is required here, relating to differences in the ease of mapping between temporal orders and causal orders. Consider López et al.’s (2005) study, in which partici­ pants were required to use information regarding which lights were illuminated on the front of the box (cues) to predict which lights would be illuminated on the rear of the box (outcomes). In the predictive learning condition, participants were told that illumination of lights on the front (causes) caused illumination of lights on the rear (effects). Hence for these participants the temporal order of events (in which cues precede outcomes) was aligned with their causal order (in which causes precede effects), making it easy to map from temporal order to causal order by mapping cues to causes, and outcomes to effects. In contrast, in the diagnostic learning condition, participants were told that illumination of lights on the front (effects) was caused by illumination of lights on the rear (causes). Hence in this condition the temporal order of events opposed their causal order. So diag­ nostic causal learning under these circumstances would require separating the temporal order of learning events from their causal order: it requires mapping cues to effects, and Page 13 of 26

Associative Accounts of Causal Cognition outcomes to causes. This separation of learning events and mental representations may be cognitively demanding. Perhaps participants might simply be unwilling to engage in this effortful process, and fall back on the simpler, predictive mapping (cues = causes, outcomes = effects), even though it is at odds with the actual causal structure. On this ac­ count, symmetry between diagnostic and predictive learning conditions—as observed in López et al.’s Experiment 1—does not necessarily reflect non-causal reasoning in the di­ agnostic condition. People may instead be reasoning causally, but incorrectly, in this con­ dition. At this point the problem of unfalsifiability looms once more. If we find that a participant’s learning is at odds with causal structure, we could (p. 22) presumably al­ ways save a single-process, causal-only account by arguing that the participant is reason­ ing causally, but is doing so based on an incorrect causal structure (even if, as in López et al.’s study, the participant is fully able to report the correct causal structure when asked). In the following, we shall consider a case that stretches the plausibility of this “incorrect causal reasoning” account even further, but more generally we leave it up to the reader to decide whether this type of essentially unfalsifiable argument is satisfactory.

Revisiting Temporal Contiguity We noted earlier that a critical factor in the learning of cue→outcome associations (whether they are causal or non-causal) is the degree of temporal contiguity between cue and outcome. Typically, a shorter delay between cue and outcome produces stronger learning, and (in the case of a causal relationship) stronger judgments of causality (see Buehner, Chapter 28 in this volume). The word “typically” is important here, because in fact there are cases in which this law of temporal contiguity is broken. As in the case of causal order, these exceptions arise as a result of people taking account of the causal con­ text of the judgment they are being asked to make. And once again, this influence of the specific nature of the events involved in learning on the pattern of what is learned runs counter to the general-purpose approach offered by associative accounts. Consider, for example, a study by Buehner and May (2004; see also Buehner & May, 2002, 2003), in which participants were asked to judge whether pressing a light switch made a bulb illuminate. For all participants, there was a 75% chance of the bulb illuminating if the switch had been pressed. In the zero-delay condition, this illumination would occur immediately; in the long-delay condition, the bulb would illuminate 4 seconds after the button was pressed. The bulb never illuminated if the switch was not pressed. One group of participants was told that the bulb was an ordinary light bulb that should light up right away. These participants showed a standard temporal contiguity effect: af­ ter a training period during which they could press (or not press) the switch and observe the consequences, judgments of the strength of the causal relationship between switch and light were lower in the long-delay condition than in the zero-delay condition. In con­ trast, a second group of participants was told that the bulb was an energy-saving model that took 4 seconds to light up. These participants did not show a detrimental influence of Page 14 of 26

Associative Accounts of Causal Cognition delay on causal beliefs—ratings of causality were equally strong in the zero-delay and long-delay conditions. These results clearly demonstrate that a change in the causal mod­ el that underlies the task can change the influence of temporal contiguity on the beliefs that are developed. If the task was framed with a causal structure in which delay was ex­ pected, then causal judgments no longer suffered as a result of the experience of a delay. The implication is that participants’ understanding of the nature of events that they are experiencing influences the beliefs that they develop as a result of that experience. This finding is very difficult to reconcile with a purely associative account in which learning is about the relationship between cues and outcomes and is essentially blind to beliefs about the causal nature of the events that constitute those cues and outcomes. It is worth reflecting on the finding that participants who expected a delay in Buehner and May’s (2004) experiment were not sensitive to temporal contiguity (see also Buehner & May, 2002, 2003): that is, their judgments were equally strong in the zero-delay and long-delay conditions. Surely, if participants were making proper use of the causal model inherent in the task description, judgments should actually have been lower in the zerodelay condition. If they expected the bulb to illuminate 4 seconds after pressing the switch, then an immediate illumination should have been perceived as an “uncaused ef­ fect,” and should have weakened the perception of causality. The fact that this did not happen might be taken to suggest that close temporal contiguity is sufficient to support the development of a causal belief even in the face of contrary expectation, and this sug­ gestion is clearly easier to reconcile with an associative approach. That said, if the causal structure of the task is made more available, in order to strengthen participants’ expecta­ tion of a delay, then an advantage for training under long-delay conditions relative to short delay is observed (Buehner & McGregor, 2006). This is somewhat similar to the ma­ nipulations used to strengthen participants’ attention to causal order described in the previous section. As in that case, the unsurprising conclusion is that, given sufficient mo­ tivation to consider the causal structure of a task when making judgments, participants are able to do so. A study by Schlottmann (1999) is particularly interesting in this regard. Children aged be­ tween 5 and 10 years were presented with a “mystery box.” (p. 23) A ball dropped into one of two holes in one end of the box would cause a bell to ring at the other end, either immediately or after a short delay (3 seconds), depending on the hidden mechanism in­ serted into the box. Initially, children received extensive experience with the two mecha­ nisms outside the box, and were guided by the experimenter through a series of exercises designed to help them understand how one mechanism (a seesaw) would cause the bell to ring quickly when a ball was dropped, and the other (a runway) caused it to ring after a pause. This culminated in a prediction test in which, on each trial, one of the mechanisms was inserted into the box out of sight of the children. A ball was then dropped in, and children observed whether the bell rang immediately or after a delay; they were then asked to predict which of the mechanisms (seesaw or runway) was inside the box. Perfor­ mance on this prediction test was near-perfect, regardless of the children’s age, demon­

Page 15 of 26

Associative Accounts of Causal Cognition strating that children understood the difference in timing produced by the two mecha­ nisms, and could make inferences on the basis of this understanding. In the final, critical test, children saw the experimenter place one of the two mechanisms in the box, but could not see which of the two holes the mechanism was placed under. A picture of the mechanism was placed in view on the outside of the box as a reminder. Next, the experimenter dropped one ball into one of the holes, paused, and then dropped a second ball into the second hole; the bell rang immediately after the second ball was dropped. Children were asked which of the two balls had made the bell ring. The correct answer to this question depends on the mechanism hidden inside the box. If the fast mechanism was present, then the correct answer would be that the second (contiguous) ball had made the bell ring. If the slow mechanism was present, then the correct answer would be the first (delayed) ball. Ten-year-olds were clearly able to make this distinction. However, younger children (5–7-year-olds) could not. In particular, when the slow mecha­ nism was in the box, a majority of these younger children still claimed that the second (contiguous) ball had caused the bell to ring, even though these same children had demonstrated understanding that the slow mechanism produced a delay in the earlier prediction test. These results suggest that while older children are able to incorporate their causal under­ standing into their judgments, younger children instead continued to be led by contiguity, even when this was at odds with the causal structure of the task. One interpretation of this difference is that young children’s judgments are more likely to reflect the product of relatively automatic, associative processes that are insensitive to causal knowledge, whereas older children have developed the cognitive flexibility and executive function necessary to override these associative influences by the appropriate deployment of causal knowledge (given sufficient motivation to do so, and a sufficiently clear causal mechanism). This suggestion is certainly not without precedent in the developmental lit­ erature (e.g., see Kendler & Kendler, 1959, 1962, 1970; Kuhn & Pease, 2006). At the end of the previous section, we considered the possibility that cases of insensitivity to true causal structure, rather than supporting a dual-process account in which learning is sometimes the product of non-causal, associative processes, may instead reflect causal reasoning based on an incorrect causal structure. While such an argument could in theo­ ry be extended to explaining the insensitivity to the true causal mechanism demonstrated by Schlottmann’s (1999) younger children, this stretches the bounds of plausibility. These children were able to understand the true causal structure, as demonstrated by their per­ formance in the prediction test. And application of a causal structure involving a threesecond delay between cause and effect does not seem like it should be significantly more cognitively demanding than a structure with no delay. Hence we would argue that insensi­ tivity to causal structure under these circumstances provides good evidence that learning need not always be mediated by causal reasoning.

Page 16 of 26

Associative Accounts of Causal Cognition

Selective Conditioning in Non-Human Animals The discussion in the previous sections is based around the premise that associative mod­ els involve learning about relationships between cues and outcomes, but are blind to the nature of the events that constitute those cues and outcomes. In fact, the argument needs to be more nuanced than this. This is because there are well-established demonstrations in the literature on animal conditioning showing that learning can be influenced by the nature of the stimuli involved. Perhaps the clearest demonstration of this comes from classic studies of selective condi­ tioning (also known as preparedness) in rats (e.g., Domjan & Wilson, 1972; Garcia & Koelling, 1966). For example, Domjan and Wilson trained two groups of rats. For one group, an infusion of saccharin-flavored (p. 24) water into the mouth was used as the CS; for the other group, the sounding of a buzzer was the CS. For both groups, the US that was delivered at the termination of the CS was injection with lithium chloride in order to induce nausea. After three conditioning trials, rats were given two preference tests. Each test featured two drinking tubes. In the saccharin preference test, one tube was filled with saccharin and the other with water. In the buzzer preference test, both tubes were filled with water, but a buzzer was activated whenever the rat licked at one of the two tubes. Results indicated that rats had learned to avoid the saccharin flavor, but not the buzzer: on saccharin preference tests, rats drank more from the water tube than the sac­ charin tube, but on buzzer preference tests, rats drank from the buzzer tube and nonbuzzer tube equally. The implication is that rats had learned to associate the flavor, but not the sound, with illness. Crucially, this pattern was not simply a result of the flavor be­ ing a more intense or noticeable CS. Two further groups of rats were conditioned using the same CSs, but the US was now an electric shock. In the final preference test, these rats showed avoidance of the buzzer tube, but not of the saccharin tube—the exact oppo­ site of rats trained with the nausea US—suggesting that they had learned to associate the sound, but not the flavor, with shock. This double dissociation clearly shows that the nature of the stimuli can be an important determinant of conditioning in animals. Rats learned to associate an interoceptive CS (fla­ vor) with an interoceptive US (nausea), and an exteroceptive stimulus (sound) with an ex­ teroceptive US (shock), but did not learn to associate an interoceptive CS with an extero­ ceptive US, or vice versa. So what, if any, is the fundamental difference between this ex­ ample and the demonstrations of sensitivity to causal structure in humans described in previous sections? One option would be to argue that there is no difference: that exam­ ples of selective conditioning demonstrate that even the simplest examples of condition­ ing are a consequence of animals reasoning based on an internal causal model of the world (in which internal cues tend to produce internal outcomes, and external cues tend to produce external outcomes). If this is the correct interpretation, then the case for asso­ ciative accounts of learning seems bleak.

Page 17 of 26

Associative Accounts of Causal Cognition However, this interpretation is not one to which we subscribe. For one thing, the sugges­ tion that even simple examples of Pavlovian conditioning reflect reasoning about causal structure has difficulty explaining the persistence of conditioned responding in the face of an omission contingency (see earlier). Instead, it seems more likely that selective condi­ tioning reflects innate predispositions that have been shaped by evolutionary processes (Rozin & Kalat, 1971). For example, in the evolutionary environment of a rat, tastes are likely to provide more reliable information than sounds regarding whether a food is poiso­ nous. As such, animals who are predisposed to learn about such relationships will be more likely to survive and pass on this characteristic. Strong evidence in favor of this evolutionary account of selective conditioning comes from studies suggesting that the effect is present from birth, and certainly before rats have had useful prior experience with the sorts of stimuli that are involved. Gemberling and Domjan (1982) conditioned rats when they were only 24 hours old with either lithium-in­ duced nausea or shock. These newborn rats showed clear evidence of selective condition­ ing. When the US was nausea, the saccharin flavor constituted an effective CS, but an ex­ teroceptive texture cue (either a rough cloth or the smooth interior of a cardboard box) did not. In contrast, when the US was shock, rats showed stronger evidence of condition­ ing to the texture than to the flavor. In summary, while selective conditioning in rats demonstrates a sensitivity to the nature of the events that are involved in conditioning, it seems unlikely that this is a conse­ quence of the animals engaging in “online” inferences based on an internalized causal model of the world. Instead, this sensitivity seems to reflect innate predispositions toward associating certain classes of stimuli, but not others. Hence a general-purpose associative account is sufficient to explain these findings if it is augmented with the idea that this as­ sociative process operates through a filter of innate mechanisms that will tend to favor formation of some associations over others. This is quite different from the cases of sensi­ tivity to causal structure in humans described earlier. Many of those experiments used scenarios in which it is implausible to suggest that we have developed innate evolution­ ary tendencies (e.g., balls rolling along runways and ringing bells; switches turning on light bulbs; grenade-launchers firing at tanks, etc). Instead, it seems clear that sensitivity to causal structure in such studies results from a process of causal reasoning based on an internalized model of the world, with this reasoning taking place during the task itself. And it seems equally clear that such a (p. 25) process lies beyond any purely associative approach to learning.

Conclusions In this chapter, we have argued that in attempting to assess the validity of an associative account of causal learning, we need to move away from considering specific implementa­ tions of associative models and instead focus on the general principle embodied by the as­ sociative approach. This principle is that learning is concerned with the relationship be­ tween cues and outcomes; that the rules governing learning are general purpose, and Page 18 of 26

Associative Accounts of Causal Cognition hence do not differentiate between situations involving cause–effect relationships and those involving signaling relationships that are non-causal. An association merely en­ codes the information that two events “go together”, not that one causes the other. Ulti­ mately, the associationist framework argues that causal learning can be based on a process that is fundamentally non-causal. In other words, people may encode an associa­ tion between event E1 and event E2 that has no notion of causality about it. However, they may then interpret this association as supporting the existence of a causal relation­ ship between events. Suppose that I have formed a strong association between E1 and E2, but a weak association between E3 and E2. I am then asked which of E1 or E3 is more likely to be a cause of E2. Under these circumstances it would seem a sensible heuristic to choose the event that has the stronger association with E2 (E1 in this case), even though the associations do not actually encode causal information. This idea that associations have heuristic value with regard to causality is an important one. Events that are causally related will typically tend to go together in the world, and hence will tend to become associated. As such, the existence of an association can rea­ sonably be taken as an indication of a causal relationship. Of course, this inductive leap from observing correlation to inferring causation is actually logically invalid: as psycholo­ gy undergraduates are repeatedly told, “Correlation does not imply causation.” But the fact that we need to repeat this mantra so often to our students is also an indication of how seductive it is to equate correlation and causation. Indeed, humans seem to be very susceptible to such correlational learning: superstitions (e.g., walking under a ladder will cause bad luck; carrying a rabbit’s foot will bring good luck) occur in all societies around the world, and can be seen as a failure to distinguish between causes and mere asso­ ciates. The implication is that, often, people are happy to take associations between events as a proxy for causal relations. Why should this be the case? The most likely reason, as is gen­ erally the case for heuristics, is one of effort. It is much easier to encode the covariation between two events than to impute a causal relationship between them, because the lat­ ter requires making interventions and observing their consequences (cf. Rottman, Chap­ ter 6 in this volume; Waldmann & Hagmayer, 2005), but this may not always be easy or even possible. For example, it is simple to observe that the phase of the moon covaries with the size of ocean tides, but proving a causal relationship by intervening to change the moon’s phase and observing the effect on the tide is impossible. The effort argument becomes even more apparent as additional potential causes are introduced. It is simple to apply the Rescorla–Wagner model to any number of cues, presented individually or in combination, in order to track their covariation with any number of outcomes, again pre­ sented individually or in combination. However, deriving an accurate causal model of the relationships between multiple stimuli and outcomes on the basis of experience is a con­ siderably more challenging computational problem (Chickering, Heckerman, & Meek, 2004; Ellis & Wong, 2008).

Page 19 of 26

Associative Accounts of Causal Cognition Of course, this is not to suggest for a moment that people are unable to reason in a more careful fashion about causal relationships. We have provided examples in this chapter in which people have been shown to be sensitive to causal structures when making judg­ ments in a way that does not follow from an associative approach. But really, these exam­ ples are unnecessary—it is immediately obvious that people can describe and understand causal mechanisms, and so it would be lunacy to suggest that an associative account could ever provide a full explanation of human causal behavior. However, as Shanks (2007) noted, “To focus (as many researchers do) on the fact that there is a pattern of judgements … that is inconsistent with associative theory is to miss the point: To repeat, it is not a matter of debate that people can reason normatively or logically under certain circumstances.” (p. 300). Of more interest (at least for the purposes of the current chap­ ter) is the finding that judgments often do not reflect sensitivity to causal structure, even when participants can evidently understand the causal structure involved in the task. Such a pattern has typically been reported in situations in which participants are not giv­ en strong prompting to make use of causal structure (p. 26) information when making their judgments. These findings point to a role for associative processes in explaining some aspects of causal behavior, and suggest more generally that, in some cases, causal judgments may be made based on the output of a process that is blind to the notion of causality.

References Aitken, M. R. F., Larkin, M. J. W., & Dickinson, A. (2000). Super-learning of causal judge­ ments. Quarterly Journal of Experimental Psychology, 53B, 59–81. Allan, L. G., & Jenkins, H. M. (1983). The effect of representations of binary variables on judgment of influence. Learning and Motivation, 14, 381–405. Booth, S. L., & Buehner, M. J. (2007). Asymmetries in cue competition in forward and backward blocking designs: Further evidence for causal model theory. Quarterly Journal of Experimental Psychology, 60, 387–399. Buehner, M. J., & May, J. (2002). Knowledge mediates the timeframe of covariation assess­ ment in human causal induction. Thinking & Reasoning, 8, 269–295. Buehner, M. J., & May, J. (2003). Rethinking temporal contiguity and the judgement of causality: Effects of prior knowledge, experience, and reinforcement procedure. Quarter­ ly Journal of Experimental Psychology, 56A, 865–890. Buehner, M. J., & May, J. (2004). Abolishing the effect of reinforcement delay on human causal learning. Quarterly Journal of Experimental Psychology, 57B, 179–191. Buehner, M. J., & McGregor, S. (2006). Temporal delays can facilitate causal attribution: Towards a general timeframe bias in causal induction. Thinking & Reasoning, 12, 353– 378.

Page 20 of 26

Associative Accounts of Causal Cognition Chatlosh, D. L., Neunaber, D. J., & Wasserman, E. A. (1985). Response-outcome contin­ gency: Behavioral and judgmental effects of appetitive and aversive outcomes with col­ lege students. Learning and Motivation, 16, 1–34. Cheng, P. W., & Novick, L. R. (1990). A probabilistic contrast model of causal induction. Journal of Personality and Social Psychology, 58, 545–567. Chickering, D. M., Heckerman, D., & Meek, C. (2004). Large-sample learning of Bayesian networks is NP-hard. Journal of Machine Learning Research, 5, 1287–1330. Cobos, P. L., López, F. J., Caño, A., Almaraz, J., & Shanks, D. R. (2002). Mechanisms of pre­ dictive and diagnostic causal induction. Journal of Experimental Psychology: Animal Be­ havior Processes, 28, 331–346. Dickinson, A. (2001). Causal learning: An associative analysis. Quarterly Journal of Exper­ imental Psychology, 54B, 3–25. Dickinson, A., & Burke, J. (1996). Within-compound associations mediate the retrospec­ tive revaluation of causality judgements. Quarterly Journal of Experimental Psychology, 49B, 60–80. Dickinson, A., Shanks, D., & Evenden, J. (1984). Judgment of act-outcome contingency: The role of selective attribution. Quarterly Journal of Experimental Psychology, 36A, 29– 50. Domjan, M., & Wilson, N. E. (1972). Specificity of cue to consequence in aversion learn­ ing in the rat. Psychonomic Science, 26, 143–145. Durlach, P. J. (1983). Effect of signaling intertrial unconditional stimuli in autoshaping. Journal of Experimental Psychology: Animal Behavior Processes, 9, 374–389. Ellis, B., & Wong, W. H. (2008). Learning causal Bayesian network structures from experi­ mental data. Journal of the American Statistical Association, 103, 778–789. Garcia, J., & Koelling, R. A. (1966). Relation of cue to consequence in avoidance learning. Psychonomic Science, 5, 121–122. Gemberling, G. A., & Domjan, M. (1982). Selective associations in one-day old Rats: Tastetoxicosis and texture-shock aversion learning. Journal of Comparative and Physiological Psychology, 96, 105–113. Gibbon, J., Baldock, M. D., Locurto, C., Gold, L., & Terrace, H. S. (1977). Trial and intertri­ al durations in autoshaping. Journal of Experimental Psychology: Animal Behavior Processes, 3, 264–284. Griffiths, O., & Le Pelley, M. E. (2009). Attentional changes in blocking are not a conse­ quence of lateral inhibition. Learning & Behavior, 37, 27–41.

Page 21 of 26

Associative Accounts of Causal Cognition Griffiths, O., & Mitchell, C. J. (2008). Selective attention in human associative learning and recognition memory. Journal of Experimental Psychology: General, 137, 626–648. Hawkins, R. D., Carew, T. J., & Kandel, E. R. (1986). Effects of interstimulus interval and contingency on classical conditioning of the Aplysia siphon withdrawal reflex. Journal of Neuroscience, 6, 1695–1701. Hume, D. (1740/1978). A treatise of human nature. Oxford: Oxford University Press. Kamin, L. J. (1968). Attention-like processes in classical conditioning. In M. R. Jones (Ed.), Miami symposium on the prediction of behavior: Aversive stimulation (pp. 9–32). Coral Gables, FL: University of Miami Press. Kendler, H. H., & Kendler, T. S. (1962). Vertical and horizontal processes in problem solv­ ing. Psychological Review, 69, 1–16. Kendler, T. S., & Kendler, H. H. (1959). Reversal and nonreversal shifts in kindergarten children. Journal of Experimental Psychology, 58, 56–60. Kendler, T. S., & Kendler, H. H. (1970). An ontogeny of optional shift behavior. Child De­ velopment, 41, 1–27. Kruschke, J. K. (1992). ALCOVE: An exemplar-based connectionist model of category learning. Psychological Review, 99, 22–44. Kruschke, J. K. (2003). Attention in learning. Current Directions in Psychological Science, 12, 171–175. Kuhn, D., & Pease, M. (2006). Do children and adults learn differently? Journal of Cognition and Development, 7, 279–293. (p. 27)

Le Pelley, M. E. (2004). The role of associative history in models of associative learning: A selective review and a hybrid model. Quarterly Journal of Experimental Psychology, 57B, 193–243. Le Pelley, M. E. (2010). Attention and human associative learning. In C. J. Mitchell & M. E. Le Pelley (Eds.), Attention and associative learning: From brain to behaviour (pp. 187– 215). Oxford: Oxford University Press. Le Pelley, M. E., Beesley, T., & Griffiths, O. (2014). Relative salience versus relative validi­ ty: Cue salience influences blocking in human associative learning. Journal of Experimen­ tal Psychology: Animal Behavior Processes, 40, 116–132. Le Pelley, M. E., Beesley, T., & Suret, M. B. (2007). Blocking of human causal learning in­ volves learned changes in stimulus processing. Quarterly Journal of Experimental Psy­ chology, 60, 1468–1476. Le Pelley, M. E., & McLaren, I. P. L. (2001). Retrospective revaluation in humans: Learn­ ing or memory? Quarterly Journal of Experimental Psychology, 54B, 311–352. Page 22 of 26

Associative Accounts of Causal Cognition Le Pelley, M. E., Pearson, D., Griffiths, O., & Beesley, T. (2015). When goals conflict with values: Counterproductive attentional and oculomotor capture by reward-related stimuli. Journal of Experimental Psychology: General, 144, 158–171. López, F. J., Cobos, P. L., & Caño, A. (2005). Associative and causal reasoning accounts of causal induction: Symmetries and asymmetries in predictive and diagnostic inferences. Memory & Cognition, 33, 1388–1398. Mackintosh, N. J. (1975). A theory of attention: Variations in the associability of stimuli with reinforcement. Psychological Review, 82, 276–298. Matute, H., Arcediano, F., & Miller, R. R. (1996). Test question modulates cue competition between causes and between effects. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22, 182–196. McNally, G. P., & Cole, S. (2006). Opioid receptors in the midbrain periaqueductal gray regulate prediction errors during Pavlovian fear conditioning. Behavioral Neuroscience, 120, 313–323. Medin, D. L., & Schaffer, M. M. (1978). Context theory of classification learning. Psycho­ logical Review, 85, 207–238. Miller, R. R., Barnet, R. C., & Grahame, N. J. (1995). Assessment of the Rescorla-Wagner model. Psychological Bulletin, 117, 363–386. Mitchell, C. J., De Houwer, J., & Lovibond, P. F. (2009). The propositional nature of human associative learning. Behavioral and Brain Sciences, 32, 183–246. Ost, J. W. P., & Lauer, D. W. (1965). Some investigations of salivary conditioning in the dog. In W. F. Prokasy (Ed.), Classical Conditioning (pp. 192–207). New York: AppletonCentury-Crofts. Pavlov, I. P. (1927). Conditioned reflexes. London: Oxford University Press. Pearce, J. M., & Hall, G. (1980). A model for Pavlovian conditioning: Variations in the ef­ fectiveness of conditioned but not of unconditioned stimuli. Psychological Review, 87, 532–552. Pearson, D., Donkin, C., Tran, S. C., Most, S. B., & Le Pelley, M. E. (2015). Cognitive con­ trol and counterproductive oculomotor capture by reward-related stimuli. Visual Cogni­ tion, 23, 41–66. Peterson, C. R., & Beach, L. R. (1967). Man as an intuitive statistician. Psychological Bul­ letin, 68, 29–46. Rescorla, R. A. (1967). Pavlovian conditioning and its proper control procedures. Psycho­ logical Review, 74, 71–80.

Page 23 of 26

Associative Accounts of Causal Cognition Rescorla, R. A. (1968). Probability of shock in the presence and absence of CS in fear con­ ditioning. Journal of Comparative and Physiological Psychology, 66, 1–5. Rescorla, R. A. (1972). Informational variables in Pavlovian conditioning. In G. H. Bower (Ed.), The psychology of learning and motivation (Vol. 6, pp. 1–46). New York: Academic Press. Rescorla, R. A. (1984). Signaling intertrial shocks attenuates their negative effect on con­ ditioned suppression. Bulletin of the Psychonomic Society, 22, 225–228. Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and non-reinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 64–99). New York: Ap­ pleton-Century-Crofts. Rozin, P., & Kalat, J. W. (1971). Specific hungers and poison avoidance as adaptive special­ izations of learning. Psychological Review, 78, 459–486. Schlottmann, A. (1999). Seeing it happen and knowing how it works: How children under­ stand the relation between perceptual causality and underlying mechanism. Developmen­ tal Psychology, 35, 303–317. Schneiderman, N., & Gormezano, I. (1964). Conditioning of the nictitating membrane of the rabbit as a function of CS-US interval. Journal of Comparative and Physiological Psy­ chology, 57, 188–195. Shanks, D. R. (1985). Forward and backward blocking in human contingency judgement. Quarterly Journal of Experimental Psychology, 37B, 1–21. Shanks, D. R. (1986). Selective attribution and the judgment of causality. Learning and Motivation, 17, 311–334. Shanks, D. R. (1989). Selectional processes in causality judgment. Memory & Cognition, 17, 27–34. Shanks, D. R. (1995). The psychology of associative learning. Cambridge, UK: Cambridge University Press. Shanks, D. R. (2007). Associationism and cognition: Human contingency learning at 25. Quarterly Journal of Experimental Psychology, 60, 291–309. Shanks, D. R., & Dickinson, A. (1987). Associative accounts of causality judgment. Psy­ chology of Learning and Motivation, 21, 229–261. Shanks, D. R., & Dickinson, A. (1991). Instrumental judgment and performance under variations in action-outcome contingency and contiguity. Memory & Cognition, 19, 353– 360.

Page 24 of 26

Associative Accounts of Causal Cognition Shanks, D. R., & López, F. J. (1996). Causal order does not affect cue selection in human associative learning. Memory & Cognition, 24, 511–522. Shanks, D. R., López, F. J., Darby, R. J., & Dickinson, A. (1996). Distinguishing associative and probabilistic contrast theories of human contingency judgment. The Psychology of Learning and Motivation, 34, 265–311. Tangen, J. M., & Allan, L. G. (2004). Cue interaction and judgments of causality: Contribu­ tions of causal and associative processes. Memory & Cognition, 32, 107–124. Tassoni, C. J. (1995). The least mean squares network with information coding: A model of cue learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 193–204. Thorndike, E. L. (1898). Animal intelligence: An experimental study of the associative processes in animals. Psychological Review, 8, Monograph supplement. Van Hamme, L. J., Kao, S. F., & Wasserman, E. A. (1993). Judging interevent rela­ tions: From cause to effect and from effect to cause. Memory & Cognition, 21, 802–808. (p. 28)

Van Hamme, L. J., & Wasserman, E. A. (1994). Cue competition in causality judgments: The role of nonrepresentation of compound stimulus elements. Learning and Motivation, 25, 127–151. Wagner, A. R. (1981). SOP: A model of automatic memory processing in animal behaviour. In N. E. Spear & R. R. Miller (Eds.), Information processing in animals: Memory mecha­ nisms (pp. 5–47). Hillsdale, NJ: Lawrence Erlbaum Associates. Waldmann, M. R. (2000). Competition among causes but not effects in predictive and di­ agnostic learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 53–76. Waldmann, M. R. (2001). Predictive versus diagnostic causal learning: Evidence from an overshadowing paradigm. Psychonomic Bulletin & Review, 8, 600–608. Waldmann, M. R., & Hagmayer, Y. (2005). Seeing versus doing: Two modes of accessing causal knowledge. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 216–227. Waldmann, M. R., & Holyoak, K. J. (1992). Predictive and diagnostic learning within causal models: Asymmetries in cue competition. Journal of Experimental Psychology: Gen­ eral, 121, 222–236. Waldmann, M. R., & Holyoak, K. J. (1997). Determining whether causal order affects cue selection in human contingency learning: Comment. Memory & Cognition, 25, 125–134.

Page 25 of 26

Associative Accounts of Causal Cognition Wasserman, E. A., & Berglan, L. R. (1998). Backward blocking and recovery from over­ shadowing in human causal judgement: The role of within-compound associations. Quar­ terly Journal of Experimental Psychology, 51B, 121–138. Wasserman, E. A., Chatlosh, D. L., & Neunaber, D. J. (1983). Perception of causal relations in humans: Factors affecting judgments of response-outcome contingencies under freeoperant procedures. Learning and Motivation, 14, 406–432.

Notes: (1.) A possible exception is superstitious beliefs. For example, I might (by coincidence) ex­ perience several occasions on which I wear a particular pair of socks and my favorite football team wins their match, and as a result I might develop the instrumental belief that wearing those socks causes the team to win. But of course there is no real causal re­ lationship between these events. Nevertheless, this form of superstitious instrumental learning is based on the belief in a causal relationship. (2.) There is a subtlety here that is worth noting. As mentioned, one view that has been proposed is that associative learning is mediated by the formation of propositions that de­ scribe the relationship between events (Mitchell et al., 2009). Unlike the other approach­ es to associative learning outlined here, which are restricted to representing covariation, the propositional account has the potential to represent either covariation information (i.e., the proposition that ‘event A is associated with event B’), or causal information (the proposition that ‘event A causes event B’). So learning based on propositions could be ei­ ther causal or non-causal (associative, in the current terminology) in different situations.

Mike E. Le Pelley

School of Psychology UNSW Australia Sydney, Australia Oren Griffiths

School of Psychology UNSW Australia Sydney, Australia Tom Beesley

School of Psychology UNSW Australia Sydney, Australia

Page 26 of 26

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs

Rules of Causal Judgment: Mapping Statistical Infor­ mation onto Causal Beliefs   José C. Perales, Andrés Catena, Antonio Cándido, and Antonio Maldonado The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.6

Abstract and Keywords Our environment is rich in statistical information. Frequencies and proportions—or their visual depictions—are pervasive in the media, and frequently used to support or weaken causal statements, or to bias people’s beliefs in a given direction. The topic of this chap­ ter is how people integrate naturally available frequencies and probabilities into judg­ ments of the strength of the link between a candidate cause and an effect. We review studies investigating various rules that have been claimed to underlie intuitive causal judgments. Given that none of these rules has been established as a clear winner, we con­ clude presenting a tentative framework describing the general psychological processes operating when people select, ponder, and integrate pieces of causally-relevant evidence with the goal of meeting real-life demands. Keywords: contingency matrix, rules, causal judgment, causal learning, intuitive beliefs

Aims and Scope In 1998, the Lancet published an early report (Wakefield et al., 1998, retracted) in which it was said that 12 children showing signs of mental development regression had been ob­ served to suffer a chronic intestinal infection “associated in time with possible environ­ mental triggers” (p. 637). Among these potential triggers, the MMR (measles, mumps, and rubella) combined vaccine was the most widely publicized. Even if Wakefield et al.’s results had been honestly obtained and reported (which was not the case), the assertion that MMR vaccine causes developmental problems was based on only 12 not-randomly-selected children for whom the vaccine administration and a devel­ opmental problem had occurred approximately together. Today, concerned parents world­ wide continue to base their anti-vaccination claims on cases similar to these, sometimes after incidentally witnessing or being told about just one of them.

Page 1 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs There are many other cases in which relevant portions of the general population hold strong causal beliefs in the face of partial or inaccurate statistical information. For exam­ ple, most people think children of divorced parents display poor psychological adjustment in the long run (Arkowitz & Lilienfeld, 2013), or incorrectly connect mental disorders with violent behavior (Arkowitz & Lilienfeld, 2011). In both cases, judgmental biases seem to have to do with how events are presented in the media, and how information on individual cases cumulates over time in the view of the public. These stories illustrate the far-reaching consequences of the way we interpret informa­ tion on correlation between potential causes and effects. Humans do use frequencies of combinations of events and estimated probabilities to come to causal beliefs, or to sup­ port or update the ones we already have. Information of this sort is pervasive. Still, the available literature shows that, when judging the causal relation between a candidate cause and an (p. 30) effect, laypeople can both provide surprisingly accurate responses (Kushnir & Gopnik, 2007), and fall victim to systematic and potentially serious biases (Kahneman & Klein, 2009). Table 3.1 The Contingency Table Effect Present

Effect Absent

Cause present

Type a cases

Type b cases

Cause absent

Type c cases

Type d cases

In the simplest case, when there are only a candidate cause and a potential effect to be assessed, and both the cause and the effect can be either present or absent, the basic sta­ tistical information regarding their association can be displayed in the contingency table (see Table 3.1). Applied to one of the preceding examples, children of divorced parents (cause present, C) who later display psychological problems (effect present, E) would qualify as type a cases. Children of divorced parents who do not present such problems ( ) would be type b cases. Type c and d cases would be the children of undivorced parents ( ) who do and do not present psychological problems, respectively. For a generative causal connection be­ tween divorce and children’s psychological problems to exist, the probability of psycho­ logical problems in children of divorced parents, , should be larger than the prob­ ability of problems among children of undivorced parents,

. The difference be­

tween these two probabilities, or contingency (ΔP), can be estimated from the available data as follows:

(1)

Page 2 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs where a, b, c, and d are the frequencies of the four case types in the contingency table. In many situations of daily life, case-type frequencies, or the conditional probabilities and

that can be estimated from them, are naturally available to reason­

ers. In some of those situations, the four cells of the contingency table are available and representative; in some others, they are missing or unrepresentative (which does not seem to deter people from forming opinions). Indeed, the scope of this chapter is restrict­ ed to how people integrate naturally available statistical information (i.e., frequencies and proportions/estimated probabilities) into judgments of the strength of the causal link between a candidate cause and an effect. This is mostly a mapping problem, so we will collect the studies carried out to find out what is mapped onto what, and the rules people plausibly use to do so: these describe only a part of causal reasoning abilities, but, as il­ lustrated earlier, certainly a relevant one. The importance of this endeavor is patent in the amount of energy that has been devoted in the field of causal reasoning to try to support or weaken the position that judgments of causal strength depend on the operation of rules (see Hattori & Oaksford, 2007; Perales & Catena, 2006, for reviews). However, the contenders have not always invested the same energy in clarifying what they were exactly supporting or contesting. Thus, let us start with the fundamentals.

What Is and What Is Not a Rule? Scandura (1972) defined rule-governed behavior as a mathematical function in which each class of overt stimuli is paired with a unique class of overt responses. In cognitive terms, a rule is the mental algorithm determining that function. Therefore, generally speaking, rules are suitable for decisions, predictions, probability estimates, utility esti­ mates, forecasts, and, not least, causal judgments. In our specific realm, we will use the term causal judgment rule as any algorithm by means of which the cell frequencies in the contingency table, or conditional probabilities as derivable from them (regardless of whether the original frequencies are available or not), are selected and translated into a causal judgment. Importantly, and in order to clearly differentiate between rules and oth­ er judgment models, (1) we will restrict the term rule to an algorithm to be applied on the available information, either present in the environment or retrievable from memory, when the judgment is required; and (2) we restrict the term rule to algorithmic models, that is, to models proposed to be descriptive of the actual mental operations that people perform on such information. Nevertheless, this definition is still overinclusive. So, before getting into the details of how well different rules fit the available evidence, it is impor­ tant to clarify what a rule, at least in the context of the current discussion, is not. First, a causal judgment rule is a response-production mechanism, not a learning- or memory-accrual mechanism. A rule can be progressively tuned, and its output can be saved and used later as input for a subsequent application of the same rule (i.e., anchor­ ing-and-adjustment; see (p. 31) Catena et al., 1998; Hogarth & Einhorn, 1992). Moreover, Page 3 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs when several, qualitatively different rules are available to be applied on the same input information, the selection of the best one can obey the laws of reinforcement (Rieskamp & Otto, 2006). This distinction between rules and learning mechanisms is important to discriminate between rule-based and other types of models (e.g., associative models; Le Pelley, Griffiths, & Beesley, Chapter 2 in this volume), but also to integrate them in a sin­ gle framework (e.g. Pineño & Miller, 2007; Sedlmeier, 2002). Second, a rule is not necessarily a linear combination equation. The advantage of linear rules of judgment is how easily they can be tested by using simple model-fitting tech­ niques; however, there are cases in which linear rules are far from being the best models, and many cases exist in which humans seem able to learn to make judgment and deci­ sions according to non-linear rules (Ashby & Maddox, 1992). Third, rules are not necessarily complex or based on a large number of inputs. An entire research tradition is based on the finding that adaptive and robust decision rules-ofthumb (i.e., heuristics) can be astonishingly simple (Gigerenzer, Todd, & the ABC Group, 1999). People can make causal judgments based on a tiny part of the relevant statistical information provided by the environment on the relationship between the candidate cause and its potential effect (Goedert, Ellefson, & Rehder, 2014; White, 2000, 2014). Fourth, and in relation to the previous argument, rules are not necessarily effortful (Shak­ lee & Tucker, 1980). For example, heuristics are part of a system to jump to conclusions from present evidence (Morewedge & Kahneman, 2010). Some other times, humans even seem to be able to use rules that combine numerous cues with surprising cognitive ease (Fiedler, 1996). This implies that showing that mental workload interferes with certain in­ stances of causal judgment proves such specific causal judgments to depend on analytical rules, but it does not prove that, in general, all instances of causal judgment depend on rules (Karazinov & Boakes, 2007; Marsh & Ahn, 2006). And fifth, a rule is not necessarily rational. Some rules for causal inference are directly inspired by probability theory or Bayesian network approaches to causality. More specifi­ cally, causal power (Cheng, 1997) and causal support (Griffiths & Tenenbaum, 2005) are two prominent examples of rational models for causal judgment (see Holyoak & Cheng, 2011). Interpreted in a hard sense (not necessarily their original one), these models advo­ cate that humans use rational algorithms by means of which causal relationships can be inferred from statistical information. Indeed, these algorithms, implemented in artificial intelligence devices, are apt at recovering the causal structure behind a set of statistical dependencies among variables, in a varied range of circumstances (see, for example, Pearl, 2000; Rebane & Pearl, 2013). However, not all models are like these. Many theo­ rists would agree that evolution has pushed the cognitive system to meet rational solu­ tions, but only to the degree that such solutions are adaptive in the real world, and can realistically operate on the information a human reasoner can handle. In such sense, we will advocate here that the output of the rules people use should approach the output of a rational norm (e.g., Anderson & Sheu, 1995), although rules themselves are subject to the limits of bounded cognition. Page 4 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs

Probability and Frequency Estimates as Inputs for Contingency-Based Causal Judgment Rules All the rules to be considered here exploit a very basic truth: causes and effects tend to covary (e.g., patients who take antidepressants are more likely to see their mood improve than those who remain unmedicated; accidents are more likely on badly maintained roads than on good ones; and so on). That is, although correlation does not imply causation, the fact that A and B regularly co-occur has a limited set of explanations: chance; B causes A; A and B are caused by a third unknown factor X; and A (directly or indirectly) causes B. So, contingency between discrete events is indeed quite an informative clue to causality (Sloman & Lagnado, 2004; White, 2014). We do not hold a strong position about whether contingency has preponderance over oth­ er information sources in causal judgment. Indeed, it has been shown that other clues can easily overrule covariation in determining causal judgments (Ahn, Kalish, Medin, & Gel­ man, 1995; Buehner & Humphreys, 2010; White, 2012). Nevertheless, the fact that hu­ mans use contingency information seems undeniable. The seminal works in the field (Al­ lan, 1980; Allan & Jenkins, 1983; Alloy & Tabachnik, 1984; Chatlosh, Neunaber, & Wasser­ man, 1985; Jenkins & Ward, 1965, Smedslund, 1963) presented people with contingency tables, or the frequencies corresponding to them (e.g., number of people in a party who ate shrimp and felt sick afterward, number of people who ate shrimp and did not (p. 32) feel sick, number of people who felt sick without having eaten shrimp, and number of people who did not eat shrimp and did not feel sick; namely a, b, c, and d-cell frequencies, respectively). These works convincingly demonstrated that judgments are strongly influ­ enced by contingency manipulations, but also that they were not perfectly tuned to con­ tingency, in such a way that conditions with equal contingency elicited different judg­ ments, and some zero-contingency situations elicited non-zero judgments. Information about contingency between events comes in a number of formats (see Figure 3.1 for examples), both in the laboratory and natural environments. Experiments aimed at studying the parallelism between causal judgment and conditioning, or those interested in judgments of control, tend to present the information trial by trial, so that case-type frequencies are not evident (see Perales & Shanks, 2007, for a review). Other studies have used contingency tables, including cells for a, b, c, and d-type frequencies (for dis­ cussions on similarities and differences between the trial-by-trial and the table presenta­ tion formats, see Catena, Maldonado, Megías, & Frese, 2002; Mandel & Lehman, 1998; Mandel & Vartanian, 2009; Wasserman & Shaklee, 1984). A third type has used icons or other types of items representing the individual cases, either intermixed, or arranged in such a way that estimates of conditional probabilities

and

are made

transparent (see, for example, Liljeholm & Cheng, 2009). Frequencies, proportions, prob­ abilities, and other types of graphical depictions are also pervasive in advertising, the press, the Internet, and television.

Page 5 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs The fact that formats are sometimes intentionally used to bias the reader/viewer in a pre­ determined direction proves by itself that presentation format is not a trivial factor. As discussed in more detail later, in some cases humans seem to be prone to base their judg­ ments in trial-type frequencies, whereas in others they seem to be more prone to use pro­ portions or probabilities estimated from them (Perales & Shanks, 2008). Similarly, some formats evoke reasoning strategies in which all relevant information is taken into ac­ count, whereas others induce people to underweight or neglect part of the information necessary to make an accurate judgment (García-Retamero, Galesic, & Gigerenzer, 2010; Slovic, Monahan, & McGregor, 2000).

Figure 3.1 Different presentation formats to display contingency between a treatment and a side effect. (a) Contigency table with numeric frequencies of the four cells. (b) Contingency table in which numbers have been replaced by individual cases. (c) Inter­ mixed individual cases: treated cases are highlighted with a square, non-treated cases are not highlighted. Smiling (clear) icons represent cases with the side effect, non-smiling (dark) ones cases without the side effect. (d) Clustered individual cases: non-treated cases are displayed together on the left side, treated cases on the right side.

(p. 33)

To complicate things even further, the probe question to evoke causal judgments

also varies across tasks. A vast majority of studies have used some variant of the causal strength question, by means of which the reasoner is simply asked to estimate the degree to which A causes B (e.g., “To what extent did fish cause xianethis?”; Mitchell, Lovibond, & Gan, 2005). Given the large number of studies carried out with this question and the trial-by-trial (or frequency table) presentation format, the basic pattern effects for this specific case is known in great detail (Perales & Shanks, 2007). However, the standard causal-strength probe question has been criticized for not being specific enough, at least with regard to its normative reference (Buehner & Cheng, 1997; Buehner, Cheng, & Clif­ ford, 2003; Griffiths & Tenenbaum, 2005, 2009; Liljeholm & Cheng, 2009).

Page 6 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs In any case, altogether, the available evidence strongly suggests that people use trial-type frequencies and conditional probabilities/proportions (or representational analogues) as the main proximal cues for contingency-based causal strength judgments. Or, at least, they do so rather intuitively and systematically when provided which such information.

How Many Rules Are There? Even assuming that people base their causal judgments on a combination of cell frequen­ cy or conditional probability estimates, such a combination can take many different forms. The most extensive list of causal judgment rules reviewed so far is the one pre­ sented by Hattori and Oaksford (2007). Such a list still fairly represents the scope of available theoretical alternatives and will not be fully reproduced again here. Indeed, some of the rules in that list have never been really advocated as models of causal judg­ ment, but were inspired by statistics or normative analysis of causality. In addition, many of these rules were convincingly refuted by Hattori and Oaksford’s model-fitting simula­ tions. Consequently, our basic list (Table 3.2) contains only those rules that have reached some theoretical relevance in the field of causal judgment, support well the available evidence, and still offer some promise of being representative of the rules people do use when mak­ ing a causal judgment. The list presents the models in historical order, and categorizes some of the available rules as variants of subcases of a general form. Importantly, the general forms not only subsume the specific rules, but also confer them their main psy­ chological meaning. As will be discussed in the following, the different variants of each general rule differ in how the common cognitive mechanism is implemented. The left-hand side in the equations in Table 3.2 represents the output of the rule, accord­ ing to which a judgment is predicted. The right-hand side stands for the mathematical form of the rule operating upon cell frequencies, or the conditional probabilities derivable from them. At difference with the original formulations, in all cases, we take into account the possibility that the frequencies and probabilities people use to make judgments are not the objective ones as presented during the task [a, b, c, d, but their subjective estimates, [

], ]. This implies that any

psychophysical, coding, or retrieval factor affecting frequency or probability estimates will also indirectly influence judgments. Some rules also incorporate the possibility that frequencies or probabilities are subjec­ tively weighted. Weights (wa, wb, wc, wd, w1, w2, wa', wb', wc', wd') represent the evidential value attributed by reasoners to different types of evidence. For example, if a-type trials are weighted more heavily than the other trial types, that would imply that reasoners sys­ tematically consider a-type trials as more important in making a causal judgment, so that the impact of those trials on judgments will also be ultimately larger. Some versions of the confirmatory versus disconfirmatory rule also consider the possibility that the trialtype weights in the denominator are attributed the values 0 or 1, independently of the Page 7 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs values they are attributed in the numerator (for example, Catena et al., 1998, propose a version of the Information Integration model in which the denominator is the sum of all cell frequencies, N, that is, wa', wb', wc', and wd' in the denominator are made equal to 1). That said, rules can be classified in two broad categories. Confirmatory versus disconfir­ matory cell contrasts share the assumption that people pit the instances that seem to con­ firm the existence of a causal link against those that seem to contradict it. That principle was originally proposed by Inhelder and Piaget (1958) as an unweighted version of Equa­ tion 2 in Table 3.2 (i.e., all weights are 1), and has been reformulated several times to ac­ commodate the available evidence. Different versions of the model differ (1) in their as­ sumptions about what people consider confirmatory or disconfirmatory (in such a way that frequencies pertaining to one side of the contrast, the other, or none, can vary); (2) in the evidential value attributed to the different (p. 34) cell frequencies (wi); (3) in whether weights differ or not across individuals; and (4) in whether the main contrast is normalized or not to keep judgments within a certain range. Table 3.2 A Summarized List of Covariation-Based Rules for Causal Judgment Rule

General Form

Specific Vari­ ants as Pro­ posed in the Literature

First Ver­ sion Pro­ posed by

Confirmato­ ry vs. dis­ confirmato­ ry cell con­ trast

(2)

Unweighted ΔD Proportion of confirmatory in­ stances Proportion of hits Hits minus false positives Sum of diago­ nals Linear regres­ sion model Information

Inhelder & Piaget (1958)

integration model Weighted posi­ tive test strat­ egy Accountingfor-occur­ rences rule Page 8 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs Generalized ΔP

(3)

ΔP Weighted ΔP

Smed­ slund (1963)

Generalized power

(4)

Power Power with sparse and strong priors (SSP)

Cheng (1997)

Dual-factor heuristic

Hattori & Oaksford (2007)

(5)

Generalized dual-factor heuristic

(6)

Note: The rules in bold italics will be fully described in the text and pitted against cur­ rent evidence. The Power-SSP will not be described in full detail, but will be consid­ ered in the discussion. ß parameter in equations 2 and 3 only determines the cut point of the function with the vertical axis (the judgment corresponding to the absence of statistical evidence), and will not be considered in further discussions. Frequency-based contrasts incorporate a large number of subcase rules, often considered as differentiated models in the literature. For example, removing the denominator from the quotient in the rule’s general form (i.e., attributing a zero value to all the weights in the denominator of Equation 2) renders it equivalent to the linear regression equation. An even simpler heuristic, the hits minus false positives test (i.e., a–c), results from attribut­ ing 1 weights to a and c frequencies, and 0 weights to b and d cells in the numerator, and 0 weights to all frequencies in the denominator. Importantly, simple rules like this have been proposed as heuristic proxies for more complex rules (see, for example, Meder, Ger­ stenberg, Hagmayer, & Waldmann, 2010). On the other hand, conditional probability-based rules (∆P, power, and the dual-factor heuristic) are inspired in rational analyses of causation, and share the assumption that judgments are made on the basis of conditional probabilities, estimated from the informa­ tion provided by the environment. This does not mean that these models are bound to any specific presentation format; in fact, most tests of all models are carried out on tasks in which contingency information is provided as cell frequencies or case by case. Still, these rules operate upon the conditional probabilities that can be estimated from those fre­ quencies. ∆P is an estimate of the linear increment or decrement in the probability of the effect at­ tributable to the candidate cause. A positive ∆P value would indicate that the candidate cause generates the effect, whereas a negative value would indicate that it prevents the effect. In line with Hattori and Oaksford (2007), we take into account the (p. 35) possibili­

Page 9 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs ty that the conditional probabilities contributing to ∆P are attributed different evidential weights (i.e., weighted ∆P). Causal power was originally formulated to overcome the limitations of ∆P. Actually, the model presupposes that people use the observed value of ∆P to decide whether to apply Equation 4 (to compute the power of the candidate cause to generate the effect) or Equa­ tion 5 (to compute the power of the candidate cause to prevent the effect) (see Table 3.2). The validity of p as a measure of causal power depends on the assumption that the influ­ ence of the candidate cause on the probability of the effect and the effect of other un­ known causes in the background are probabilistically additive and independent of each other (i.e., the noisy-OR, and noisy-AND-NOT schemas, for generative and preventive causes, respectively). Additionally, combined hidden causes are assumed to be genera­ tive, and power of a cause is assumed to be independent of the frequency of such a cause. Finally, all events in the world are supposed to be caused, that is, effects cannot occur spontaneously. Under such assumptions, Equation 4 (generative power) estimates how probable the ef­ fect would be in the presence of the candidate case, if all other possible causes of the same effect were removed, so p is an estimate of the capacity of the candidate cause to generate the effect by itself. Equation 5 follows a similar logic: how much the probability of the effect would decrease after introducing the candidate cause in an ideal, counterfac­ tual scenario in which the effect would otherwise occur with a probability of 1. In both cases, invalidity of the assumptions would render p non-representative of true causal power (although it can be shown that p could still be useful in practical terms; Wu & Cheng, 1999). Importantly, it can be discussed whether p is actually a rule (an algorithmic model), or a computational or normative model, namely, a description of what the cognitive system is designed to do, whatever the cognitive mechanism performing the necessary computa­ tions is. For the purposes of the present review, p will be taken into account as a viable candidate rule (independently of whether its advocates consider it as such or not). In the discussion, we will get back to the possibility that p can be considered a normative refer­ ence for causal judgment heuristics, instead of a rule itself. Finally, the dual-factor rule is a heuristic that approximates the fourfold point correlation coefficient between the candidate cause and the effect, ϕ, as the number of d-type trials tends to infinite (that is, when the cause and the effect are relatively rare). At difference with ∆P and causal power, the probabilities contributing to judgments in the dual-factor heuristic are not the probability of the effect in the presence and the absence of the cause, but the probability of the effect in the presence of the cause, , and the probability of the cause in the presence of the effect

. In other words, it does not

only consider the predictive value of the cause, but also the diagnostic value of the effect. The (quasi-normative) rationale behind this rule will be discussed later.

Page 10 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs A trained eye must have noted that the causal support model (Tenenbaum & Griffiths, 2005; see also Griffiths, Chapter 7 in this volume) is not included in this list. That model adopts a normative approach to causal judgment, based on the assumption that reasoners represent causal structures as causal Bayes nets (Pearl, 2000; see also Rottman, Chapter 6 in this volume). Causal judgment is proposed to be based on an estimate of the degree to which the statistical dependencies observed during the task support the existence of a causal link between the candidate cause and the effect, relative to the degree to which such dependencies support the non-existence of that causal link. There are two reasons to exclude support from the present review. First, we have some doubts that support quali­ fies as a rule-based model; from our point of view, it is more accurately described as a normative computational model that could inspire more parsimonious and mathematically simpler rules. And second, support is normally presented as combinable and compatible with causal power. If causal judgment probes are adequately worded to elicit causal strength judgments (namely, if causal strength judgments are experimentally dissociated from judgments of causal structure), the support model would predict judgments to match causal power. In any case, the relevance of Bayesian approaches to causal judg­ ment will be acknowledged and discussed several times across this chapter. Next, we will review the basic facts that all models should be able to accommodate. In the last three decades or so, research on causal judgment has collected a solid corpus of ex­ perimental effects. For the sake of brevity, this review will be limited to one-cause one-ef­ fect experiments, in which both the effect and the candidate cause can be either present or absent. In other words, we will collect the relevant information from experiments us­ ing the information in the basic contingency table (Table 3.1) as input for people to make judgments of causal (p. 36) strength, independently of the format in which that informa­ tion is presented. This compilation is aimed at providing a checklist for evaluating how well the different models fare against the available evidence.

A Basic Collection of Facts Most studies on causal judgment follow a similar procedure: people are first exposed to information on the covariation between a candidate cause and an effect, and then are asked to judge the degree to which they think the candidate cause is responsible for the effect. For example, in the pervasive allergist’s task (see Dickinson, 2001, for a descrip­ tion), the participant plays the role of an allergist who has to determine whether or not, and to what degree, eating a certain food produces an allergic reaction. That judgment must be based on a series of fictitious cases, some of whom have eaten the food, whereas some others have not. Similarly, some of them suffer the allergic reaction, whereas others do not, so that the cases can be classified in accordance to the contingency table. This task can be subject to a number of manipulations. People can be exposed to condi­ tions with different degrees of contingency, with different conditional probabilities, or with different cell frequencies. One of these factors can be held constant while manipulat­ ing another, or two of them can be manipulated orthogonally. In addition, the relevant sta­ Page 11 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs tistical information can be presented in the form of tables, graphically, as clusters of cas­ es, and case-by-case, among other formats (see Figure 3.1). The main dependent variable is normally a judgment of the strength of the cause-effect link, although the wording of the question probe can largely vary across studies. Some other studies have manipulated the frequency with which the judgment is made across the task, or have compared final versus intermixed judgments. And some other works have complemented causal judg­ ments with other responses, as predictions, probability judgments, or subjective esti­ mates of how important the different pieces of information presented during the task were to make a causal judgment. In the remaining part of this section we will describe how some of these manipulations systematically influence judgment.

Cell Weights and Density Biases The seminal experiments in the field were aimed at checking whether human judgments are sensitive to contingency manipulations and, more specifically, how manipulating the frequencies of the four cells impacts judgments (Jenkins & Ward, 1965; Smedslund, 1963; Ward & Jenkins, 1965). So, the fact that manipulations of cell frequencies do have differ­ ent relative impacts on judgments has been known for long (Anderson, 1990; Anderson & Sheu, 1995; Catena, Maldonado, & Cándido, 1998; Kao & Wasserman, 1993; Levin, Wasserman, Kao, 1993; Mandel & Lehman, 1998; Mandel & Vartanian, 2009; Wasserman, Dorner, & Kao, 1990; Wasserman, Kao, Van Hamme, Katagiri, & Young, 1996; White, 2003b). In the best-known scenario, namely, when contingency information is presented in an in­ stance-by-instance or a frequency table format, and people are asked to estimate causal strength in the standard fashion, the impact of each trial type obeys the inequality wa > wb > wc > wd, (where wi stands for the weights attributed to the frequencies of each trial type). In other words, experimental variations in the number of a-type trials elicit large variations in judgments across conditions, whereas variations in the number of d-type tri­ als elicit little or no variations in judgments, with the effects of b- and c-type trials laying somewhere in between. This pattern is observed regardless of whether weights are computed by directly estimat­ ing the impact of frequencies on judgments across a sufficiently large set of experimental conditions (see Perales & Shanks, 2007, for a review), or asking participants to select or ponder the pieces of information provided in contingency tables as a function of the im­ portance they attribute to each of them (Crocker, 1982; Goedert et al., 2014; Maldonado, Catena, Cándido, & García, 1999). Cell weight ordering is also tightly related to other two well-described effects in the causal judgment literature. The effect-density bias refers to the fact that, holding all oth­ er factors constant, causal judgments correlate with how probable the effect is in the sce­ nario under consideration, P(E). Let us imagine we are assessing the effectiveness of a new type of antidepressant and we observe that, out of 100 depressed people who took the antidepressant, 75 showed an improvement in their mood. However, remission occurs Page 12 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs also in 75 out of 100 patients who did not use the drug, which renders the drug ineffec­ tive

. In cases like this, people often

overestimate the effectiveness of the candidate cause, and the severity of the overestima­ tion increases as P(E) grows larger. Importantly, the effect-density bias occurs even in ze­ ro-contingency conditions (Vallée-Tourangeau Murphy, Drew, & Baker, 1998; White, 2003a), in which, independently of the norm taken as (p. 37) reference, any non-zero causal judgments should be considered as biased. Relatedly, the cause-density bias refers to the fact that, holding all other factors constant, causal judgments tend to covary with P(C) (Perales, Catena, Shanks, & González, 2005). The cause-density bias and the effect-density bias cannot be experimentally dissociated unless ∆P = 0. When ∆P > 0, a larger density of the cause implies a larger density of the effect, whereas when ∆P < 0, a larger density of the cause implies a smaller density of the effect. Still, the cause-density bias seems to exist beyond the effect-density bias (and the former is probably weaker than the latter; Vadillo, Musca, Blanco, & Matute, 2011).1 However, these effects and the interaction between the two are derivable from the cus­ tomary cell weight pattern (wa > wb > wc > wd; White, 2004), but not the other way around; the pattern of weight inequalities cannot be mathematically derived from a sim­ ple combination of cause-and-effect density biases.

Hypothesis Dependence Most judgment rules have been formulated to compute causal strength as if people treat­ ed the available information in a completely neutral way. In models in which prior knowl­ edge is taken into account, the new evidence (i.e., the contingency information effectively presented to the participant in an experimental condition) is integrated with prior knowl­ edge via belief updating mechanisms (Anderson & Sheu, 1995; Catena et al., 1998; Hoga­ rth & Einhorn, 1992; Müller, Garcia-Retamero, Galesic, & Maldonado, 2013; Perales, Catena, Maldonado, & Cándido, 2007), but the rule itself is assumed to operate unconta­ minated by prior hypotheses. Yet the hypothesis-independence assumption is unwarranted. The hypotheses about the world the reasoner holds determine how the new information is treated, so that they work like lenses that alter the information that passes through them (Goedert et al., 2014). Such an effect is well known in other areas of knowledge acquisition and updating (Crocker, 1982; Evans, 1998; Klayman & Ha, 1987). Unfortunately, in the causal judgment domain, it is not easy to prove it by merely examining the bulk of experiments carried out so far. In most studies, even when people are asked to judge whether the candidate cause generates or prevents the target outcome, and the judgment scale ranges from “the cause prevents the outcome to a maximum degree” (maximum negative score) to “the cause generates the outcome to a maximum degree” (maximum positive score), the mate­ rials subtly bias the participant to represent the link in a certain direction. This is the case, for example, of the allergist’s task: even if people’s answers include that possibility, it would not be easy for them to conceive a priori that a food prevents an allergic reac­

Page 13 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs tion. As a consequence, experiments in which the a priori hypothesis is probably genera­ tive largely outnumber those in which it is preventive. Here lies a potential explanation to the observed asymmetry between judgments in mir­ ror-image positive and negative contingency conditions. It has been repeatedly shown that negative and positive contingencies of equal magnitude do not elicit equal absolute value judgments. For example, in Perales, Catena, and Maldonado (2004; see also Mal­ donado, Catena, Cándido, & García, 1999), the mean judgment for a condition with ∆P = . 71 was 73 (in a scale ranging from –100 to +100), whereas the mean judgment for an op­ posite sign, ∆P = –.71, condition was only –27. Importantly, such an asymmetry was tight­ ly linked to the wa > wb > wc > wd inequality, as shown by both direct and indirect meth­ ods. Hence, if cell weights are hypothesis-dependent, so should be the positive/negative con­ tingency asymmetry. To our knowledge, the only study in which neutral materials (equally interpretable in preventive or generative terms) have been used, and the content of the hypothesis-to-be-tested has been actively manipulated is the one by Mandel and Vartanian (2009). Confirming the preceding argument, the generative hypothesis led to the usual wa > wb > wc > wd inequality, but the preventive hypothesis led to a neat wb > wa > wd > wc inequality. As a consequence, the negative/positive contingency judgment asymmetry al­ most completely reversed in the preventive hypothesis conditions, so that judgments for preventive causes were larger in negative contingency conditions than in their mirror-im­ age positive contingency conditions.

Presentation Format Effects Although systematic manipulations are sparse, they suggest that presentation format does influence causal judgment (Perales & Shanks, 2008; White, 2003a). Surprisingly, as pointed out by Vallée-Tourangeau, Payton, and Murphy (2008), virtually none of the avail­ able models has taken its importance into consideration. The effect of different presentation formats could be due to memory or perceptual distor­ tions (p. 38) of the inputs (subjective frequencies or probabilities) on which the judgment rule operates (Buehner, Cheng, & Clifford, 2003; Liljeholm & Cheng, 2009; Novick & Cheng, 2004). For example, if the rule is based on conditional probabilities, and the pre­ sentation format generates distorted estimates of such conditional probabilities, then fi­ nal judgments will reflect an effect of presentation format (please note the difference we established between objective frequencies/proportions and subjective frequencies/proba­ bilities for the models in Table 3.2). As a matter of fact, this can happen. Maldonado, Jiménez, Herrera, Perales, and Catena (2006) found that inattention can make people ne­ glect some cell frequencies, so that frequency estimates are distorted and judgments tend to vary in accordance with such distortions. However, memory or perceptual distortions do not seem to account for the full pattern of effects of presentation formats. Perales et al. (2005), Perales and Shanks (2008), and Vadillo and Matute (2007) have shown that people who make unbiased probability esti­ Page 14 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs mates still show the customary cell-weight effects and density biases. Maldonado et al. (2006), on the other hand, showed that people tended to overestimate the frequency of a-type trials when asked to remember them, but so they did with the frequency of d-type trials, which is in sharp contrast with the fact that d trials are the least heavily weighted trials in causal judgments. Vallée-Tourangeau et al. (2008) compared three presentation formats. In the first one, fre­ quencies from the contingency table were presented in the form of verbal propositions. In the second one, individual cases (icons) were presented in a 2 x 2 table, so that cases were individuated but visually organized according to cell type (similarly to Panel b in Fig­ ure 3.1). And, in the third type, cell frequencies were presented in the form of a cumula­ tive frequency tree. This last format, although frequently used to improve Bayesian rea­ soning (see Sedlmeier & Gigerenzer, 2001), had never been used in causal judgment tasks: the top node of the tree represented the total number of cases (patients) in the sample. This node branched into two subsamples of patients with and without a disease, and these subgroups were further divided into patients with and without a virus (please note that, in this task, the judgment was not strictly causal, but diagnostic; see Wald­ mann, 2012). The third group was the most accurate, and the first the least accurate, at distinguishing between a zero-contingency condition and a positive-contingency one, across three conditions with different base-rates of the effect [P(E)]. Although these ex­ periments did not allow a test of the whole pattern of cell inequalities, the cumulative fre­ quency-tree group clearly showed a clear decrease of the difference between a/b and c/d trial weights, that is, they made more use of base-rate information. Nevertheless, the cell-weighting pattern varies depending on the presentation format. White (2003a) used a task in which the cases (fish exposed or not exposed to a mineral, later presenting or not green spots) were displayed in columns, simultaneously on the same sheet. However, at difference with other case-by-case presentation formats, the cas­ es were separated in two columns, with the fish not exposed to the mineral (baseline in­ formation) in the left column, and the fish exposed to the mineral (experimental informa­ tion) in the right one (similarly to Panel d in Figure 3.1). Although the cell-weight inequal­ ity did not disappear, participants attributed more weight to c and d trials in this situation than in the standard one, and were also more consistent at attributing positive weight to d trials. Finally, Perales and Shanks (2008) used an individuated format to present information on how many plants treated or not treated with a fertilizer bloomed or not. In one version of the task, all the plants (icons) were presented intertwined (similarly to Panel c in Figure 3.1); in the other version, treated and non-treated plants were presented in two different clusters, and, in each cluster, the plants that bloomed and the plants that did not bloom were grouped separately [in such a way that both frequencies and conditional probabili­ ties,

and

, were made transparent to the reasoner; similarly to Panel d in

Figure 3.1]. Although, once again, the combination of conditions in the set of experiments did not allow a complete assessment of the pattern of cell-weight inequalities, judgments in the intertwined presentation format showed a larger weighting for a and b trials than Page 15 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs for c and d ones, and were perfectly predicted by a linear combination of weighted fre­ quencies. However, in the clustered format, all participants made contingency judgments that matched either , or contingency [the exact diference between and ]. Additionally, in spite of inequality of the impact of

and

in the in­

tertwined presentation task, in all conditions, participants’ estimates of both probabilities were virtually exact. (p. 39)

Probe Question Effects

A vast majority of studies on causal judgment have used some variant of the standard probe question (“To what degree does A cause B?”). However, this question does not have a clear normative interpretation. For example, it can be interpreted as referring to the probability with which A would cause B in the absence of other causes (i.e., power, p), or as referring to the confidence the reasoner has that the A→B causal link exists (support; Griffiths & Tenenbaum, 2005). It could also be that different reasoners interpret the ques­ tion in different ways (White, 2000, 2008, 2009). A sensible possibility, known as the conflation hypothesis, is that when a reasoner makes a standard causal judgment, such a judgment conflates power (the capacity of the candi­ date cause to generate or prevent the effect) and confidence in such a judgment. Imag­ ine, for example, that one sees that the administration of a drug is perfectly correlated with remission of an illness (∆P = 1), but that contingency arises from only six cases. It could well be that such a reasoner makes a submaximal judgment, just because she is still unsure such perfect contingency will be observed in a larger sample. Assuming the hypothesis that people conflate causal strength and confidence in standard causal judgments has led to proposing alternative wordings of the question probe. Among these, the most unambiguous one is the counterfactual question: the reasoner is first asked to decide whether she thinks A (the candidate cause) has any causal influence on B (the effect). In case she answers “yes,” she is asked to imagine a scenario in which none of the cases would show the effect if the candidate cause were absent (e.g., “Suppose that there are 100 people who do not have headaches”), and then to estimate how many of those cases would show the effect if exposed to the cause (e.g., “If this mineral was given to these 100 people, how many of them would have a headache?”). This question format has been proposed by advocates of the power PC theory (Buehner & Cheng, 1997; Buehn­ er, Cheng, & Clifford, 2003; Cheng, 1997; Liljeholm & Cheng, 2009; Wu & Cheng, 1999). There is evidence that standard and counterfactual judgments follow different patterns. Beyond the controversy about how well models fit both types of judgments, in all condi­ tions judged via counterfactual judgments in Perales and Shanks (2008) and Collins and Shanks (2006), the difference between a and b cell weights (as well as the difference be­ tween c and d) was abolished. Moreover, for virtually every participant across all condi­ tions, judgments were perfectly fitted by some combination of conditional probabilities, and

. In other words, as seems to happen with presentation formats that

make conditional probabilities transparent, the counterfactual wording seems to make Page 16 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs reasoners think in terms of probabilities, namely, to translate contingency information, re­ gardless of the format in which it is presented, into estimated probabilities, and then op­ erate upon them to make a causal judgment. Yet this is no reason to think one question probe or the other is a better way to elicit causal judgments. Different questions seem to impose different demands on reasoners, who try to meet those demands by using the information at hand. However, it would be il­ luminating to put those demands in context, and see how they relate to real-life behavior. Actually, people are very rarely asked to make judgments in real life; more often, they are asked to make choices or to identify where to intervene in their environments in order to obtain a desired result (Meder, Gerstenberg, Hagmayer, & Waldmann, 2010; Osman, Chapter 16 in this volume). Causal judgment models share an underlying assumption that a judgment is a monotonic translation of a subjective mental representation of strength, valid for whatever use the reasoner wants to make of it (e.g., judging the efficacy of two treatments vs. choosing one of them). Unfortunately, this assumption, although contested (Mandel, 2003), has barely been tested, and the available evidence reveals the need for further investigation. In one of the few exceptions, Osman and Shanks (2005) evaluated the tendency some peo­ ple show to neglect the base-rate of the effect [

, and thus c and d cells] when eval­

uating causal evidence. Their results showed that people differ according to the weight they place on base-rate information, but the way individuals do this is consistent across causal judgment and decision-making tasks. Still, in this case, causal judgments and deci­ sions were not based on the same information. To our knowledge, the studies by Müller et al. (Garcia-Retamero, Müller, Catena, and Maldonado, 2009; Müller, García-Retamero, Cokely, & Maldonado, 2011; Müller, García-Retamero, Galesic, & Maldonado, 2013) are the only works so far having directly pitted decisions against causal judgments, based on the same covariational information. Although the conclusions of these works are not di­ rectly relevant for the aims of this section, it is important to note that (p. 40) decisions were only moderately predicted by judgments (canonical R = .45).

How Well Do the Rules Account for the Avail­ able Evidence? p and Causal Power The first model of causal judgment inspired in a normative analysis of causality was prob­ ably the ∆P model. However, as noted earlier, early studies soon showed that causal judg­ ments depart from contingency in a systematic way. The power PC model was proposed to overcome some of the limitations of the ∆P model, maintaining the basic idea that causal judgments are ultimately based on a probabilistic contrast (Cheng, 1997).

Page 17 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs As noted earlier, power (p) can be defined as a subjective estimation of the probability of the effect given the cause, in an ideal scenario in which all other alternative causes of the same effect have been removed. In other words, it represents the potency of the candi­ date cause to generate the effect by itself. If the model’s basic prerequisites are met, then causal power can be derived from the contingency table. Let us imagine that a person suffers headaches in 2 out of 10 days in which she does not drink coffee, but the propor­ tion rises to 6 out of 10 in days in which she drinks coffee. To what degree does coffee cause headache in that person? Assuming additivity and no interaction of the candidate cause and the hidden causes in the background, in 2 of the 6 days in which she drank cof­ fee and had a headache, she would have suffered headache anyway, so in 4 of 8 (instead of 6 of 10) days she suffered headache because of coffee; p is exactly that proportion (4/8 or .50). Similarly, in the case of preventive power, p can be interpreted as the proportion of cases in which the cause is not followed by an effect that would have otherwise ap­ peared. This line of reasoning is implemented mathematically by Equations 4 and 5 (Table 3.2). p, at least as originally formulated, seems to have some difficulties in accommodating part of the pattern of results described earlier (e.g., Collins & Shanks, 2006; Hattori & Oaksford, 2007; Perales & Shanks, 2007, 2008; Shanks, 2002; White, 2003b, 2004, 2005, 2008, 2009, 2014). Such difficulties boil down to the demonstration that p does not pre­ dict the exact wa > wb > wc > wd inequality pattern. More specifically, as shown by Man­ del and Vartanian (2009), and Perales and Shanks (2007), Equations 1 and 2 predict that wa = wb and wc = wd, in terms of the impact of cell frequencies on judgments. The cases in which judgments depart from zero, or are affected by P(E) or P(C) in zero-contingency conditions, are particularly problematic for the theory, because in these cases judgments cannot be explained either by power, or by a conflation of power and confidence. Causal power deals better with counterfactual judgments (Buehner, Cheng, & Clifford, 2003; Collins & Shanks, 2006). According to its advocates, diverging results with the two measures are due to the fact that the counterfactual probe is unambiguous with respect to the main task aim. The counterfactual question asks the reasoner to project the avail­ able contingency information onto the ideal scenario causal power counterfactually refers to, and thus elicits an estimation of power less influenced by factors extraneous to causal inference itself (e.g., subjective estimations of reliability of the information on which the judgment is based). In spite of these improved predictions, counterfactual judgments do not perfectly match p. For example, counterfactual judgments show a bimodal distribution. Liljeholm and Cheng’s (2009) and Perales and Shanks’s (2008) works concur at reporting the existence of a proportion of participants whose judgments match ∆P, instead of p. In the former study, however, the second modal judgment matched p, whereas in the latter, the second modal judgment matched

and none of the participants judged power in close ac­

cordance with p. This discrepancy demands further investigation.

Page 18 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs A related question is the one regarding the theoretical value of the judgments collected with both formats. Advocates of the power theory pursue the study of “pure” causal in­ duction processes, assuming that circumstances other than the ideal are useless because, in them, causal induction processes are contaminated by extraneous factors. Researchers from the judgment and decision-making tradition (in which the study of causal judgment was born) take causal judgments as representative of real-life tasks, and are thus more concerned about the possibility that the biases shown by causal judgments also emerge in ordinary beliefs and decisions (e.g., pseudoscientific beliefs, Matute, Yarritu, & Vadillo, 2011; Barberia, Blanco, Cubillas, & Matute, 2013; medical decisions, Müller, García-Re­ tamero, Cokely, & Maldonado, 2011; abnormal cognition, Msetfi, Wade, & Murphy, 2013). From this latter perspective, standard causal judgments cannot be simply discarded as non-representative of causal cognition processes. Converging evidence for the possibility that people use different rules in different tasks, and (p. 41) that specific scenarios can elicit normative, power-matched judgments, comes from a recent work by Vadillo, Ortega-Castro, Barbería, and Baker (2014; see also Perales, Maldonado, Cándido, Contreras, & Catena, 2008). In this case, people were asked to combine the effect of two separately trained candidate causes, each of which had been observed to cause an effect with some probability, in the absence of alternative causes. In virtually all situations, reasoners failed to combine the two causes normatively (adding one upon the other according to the noisy-OR schema, as predicted by the power PC theory), and their judgments and choices based on combined power obeyed either a linear aggregation strategy or an averaging heuristic. Interestingly, a very careful design of a situation in which the information was presented in clustered sets of cases for and

, and the potential incongruences resulting from non-normative

strategies were made explicit (Experiment 5B) did elicit responses closely in accordance with the noisy-OR schema, and thus with the causal power theory. In summary, non-normativity is likely to be consubstantial to causal inference processes in operation in some scenarios. Hence, with regard to the causal power theory, the doubt remains whether the distilled scenarios in which p does better actually succeed in trigger­ ing rational causal induction processes. If that is the case, such evidence could be useful to find ways to de-bias judgments and decisions in scenarios in which such biases have serious practical consequences (e.g., Barberia et al., 2013; García-Retamero, Galesic, & Gigerenzer, 2010; McCormack, Simms, McGourty, & Beckers, 2013).

The Dual-Factor Heuristic Computing power seems to impose significant demands on the cognitive system (in such a way that people are prone to substitute power with judgment rules with some algebraic appeal, but less close to a rational norm; Vadillo et al., 2014). Such cognitive demands can grow even larger in real-life causal induction. As noted by Hattori and Oaksford (2007; see White, 2009, for a related argument), although in our daily lives it is easy to count how many times a candidate cause and an effect have occurred together (cell a), and how many times the cause or the effect has occurred by itself (cells b and c), “[i]t is Page 19 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs not clear what information to record when neither cause nor effect occur (cell d). Most of the time, any particular cause and effect are not occurring as we move around the world” (p. 769). One possibility would be to divide the continuous stream of time between events in discrete intervals and count them as d-type trials, but that would soon impose an unreasonable burden on our processing capacity. Assuming that the number of d-type trials is indefinitely large in any given scenario equals the assumption that the effect and the candidate cause are rare. Under this rarity assumption, it can be shown that the degree of correlation between any two events (com­ puted as the fourfold point correlation coefficient, p) can be approximated according to Equation 6. In other words, as the number of d trials tends to infinite, ϕ approaches the geometric mean of the probability of the effect given the cause, and the probability of the cause given the effect. This expression is denoted as the dual-factor heuristic, or H (Hat­ tori & Oaksford, 2007), which can thus be considered a quasi-rational rule. Importantly, d-cell frequency does not contribute to H, which implies that d-cell carries zero weight in predictions from the dual-factor heuristic. In this way, H surpasses the d-type trial count­ ing problem. In order to apply this rationale to preventive causes, it can be assumed that the outcome to account for is the non-occurrence of E (preventing equals causing non-occurrence of an event). That implies interchanging a and b, and replacing c with d, in Equation 3. Assum­ ing rarity of non-E, the neglected cases will be those in which the effect occurs in the ab­ sence of the cause (cell c). However, assuming rarity of non-E implies being in a context in which, in absence of the cause, E is expected to occur with almost complete certainty (i.e., laypeople tend to seek preventive explanations of things that either disappear, or do not appear when they were expected: namely, when some other cause of the event is present). Actually, this is not an unreasonable assumption. Both humans and animals learn inhibitory relationships between events much more easily when the target cue sig­ nals the absence of an outcome in the presence of a second cue that had been previously trained to signal the same outcome (i.e., the absence of the outcome is rendered surpris­ ing; Chapman & Robbins, 1990; Van Hamme & Wasserman, 1994). The dual-factor heuristic is very simple in computational terms, yet it provides very accu­ rate predictions for standard human causal judgments. By definition (in the generative scenario), a-cell frequency contributes to the two factors in H, b and c contribute to only one each, and d does not (p. 42) contribute to H at all. That renders the predicted cellweight inequality as wa > wb = wc > wd (where wd = 0). H thus fully predicts the outcome density effect, and the fact that cell weights are hypoth­ esis-dependent (according to the assumptions about the nature of inhibition outlined ear­ lier). Still, it does not account for (a) the cue-density effect in zero-contingency scenarios; (b) why, more often than not, wb > wc; and (c) why d, also more often than not, carries non-zero weight.

Page 20 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs In the preceding, (a) and (b) are logically linked. Holding other factors constant, the cause-density effect in zero-contingency scenarios arises from the fact that a- and c-type trials (cause-present trials) carry more weight than c and d trials (cause-absent trials). In other words, the cause-density effect depends on the extremity of the wa > wd difference, and on the existence of the wb > wc difference. In H, the weights of b and c are equal, and lie just between the weights of a and d. But it is important to note that H lacks any free parameters; conversely, all models accommodating the wa > wb > wc > wd inequality are parameterized. A one-parameter reformulation of H would easily account for the full pat­ tern [assuming, for example, that is weighted more heavily or estimated more ac­ curately than

].

So, the only fact that remains problematic for H is (c), the non-zero weighting of d trials. However, d-cell weight seems to be more sensitive to individual differences than the weight of the other cells (White, 1998, 2000, 2008, 2009). Collapsing two experiments (one with a trial-by-trial presentation, and the other with summary tables), Mandel and Vartanian (2009) found close-to-zero correlations between d-cell frequency and causal judgments. So, although the dual-factor heuristic does not fully explain group tendencies regarding the use of d-type trials, it is claimed to explain (ad hoc) the behavior of a major­ ity of individuals.

Confirming-Versus-Disconfirming Evidence Contrasts Beyond purely mathematical flaws, the most obvious limitation of the unweighted original rule proposed by Inhelder and Piaget (1958) was the prediction that judgments are pre­ dicted to be equally influenced by the four cell frequencies. So, the most widely used ver­ sion of this general model is the weighted ∆D rule (also known as evidence integration or evidence evaluation model, Equation 2). Weights can arise from individual differences, in such a way that reasoners use only 0 and 1 weights, and the use of 0 or 1 for each cell varies in consistency across participants (the values of wi would be averages of sets with different proportions of 0 and 1). The second possibility is that people do attribute degrees of evidential value to the four cells. In gen­ eral terms, the available literature shows that, although there is some evidence of the ex­ istence of variability in cell weight use, the impacts of cell frequencies on judgments, indi­ vidual by individual, lie somewhere between 0 and 1, and, in a majority, show the wa > wb > wc > wd inequality (e.g., White, 2000). Weighted ∆D was the winning model in Perales and Shanks’s (2007) model-fitting ap­ proach to standard causal judgments in trial-by-trial presentation tasks, although the dual-factor heuristic had not been proposed yet. After correcting for its lack of parsimony (at least 3 degress of freedom), w∆D still showed a slight superiority over its competitors. Nevertheless, the main limitation of w∆D is immediately evident. In its raw form, it is purely descriptive, not explanatory. The rule itself does not respond to any psychological or normative principle according to which the best-fitting parameters should obey the wa > wb > wc > wd inequality: It implements the general principle that weights are subjec­ Page 21 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs tive attributions of importance (which is consistent with information selection patterns and judgments of evidential weight), but does not explain why cells are weighted as they are. In addition, neither does it provide any account of why estimations of cell impacts are hypothesis dependent. In summary, the w∆D model, in spite of its predictive power and intuitive appeal, and de­ spite the fact that it can incorporate a number of other rules as subcases via individual differences, is quite an unsatisfactory model of causal judgment. Still, provided its bestfitting parameters across different contexts are known, it remains a very useful tool to predict how people will assess statistical information, even in cases in which other mod­ els fail to make any predictions at all (e.g., when relevant information of the contingency table is lacking or misrepresented). With empirically estimated parameters, and general­ ized across structurally equivalent tasks, it also predicts some extremely non-normative trends, for example some cases in which people judge positive contingency as preventive, or negative contingency as generative (White, 2009, 2011).

Theory-Laden Confirming-Versus-Disconfirming Evidence Con­ trasts (p. 43)

In face of the explanatory limitations of w∆D (and related models), there have been recent attempts to justify cell weights as serving some type of computational aim—or, what amounts to be the same, psychological principles have been postulated to explain why judgments respond to cell frequencies as they do. The PSB (Positive Event and Sufficiency Biases) account (Mandel & Lehman, 1998) hy­ pothesized that cell weights have two origins. According to this approach, people assume that “true” causes are necessary and sufficient to generate the effect. So, what reasoners do when asked to make a causal judgment is to judge the degree to which the candidate cause is necessary for the effect to occur (the effect would not occur without the cause), and the degree to which the cause is sufficient for the effect to occur (the cause can gen­ erate the effect by itself) (for an analysis of causality in terms of necessity and sufficiency, see Mackie, 1965; Lu et al., 2006; see also Cheng & Lu, Chapter 5 in this volume). Howev­ er, people are also assumed to be prone to consider sufficiency tests as more important than necessity tests, because, allegedly, in real life, violations of sufficiency can have more severe consequences than violations of necessity. As put by Klayman and Ha (1989), “We don’t mind passing up some potentially acceptable cars if we can avoid buying a lemon” (p. 603). In the generative case, a and d cells are confirmatory for both necessity and sufficiency tests, cell b reveals insufficiency, and cell c reveals non-necessity. If sufficiency tests are weighted more heavily than necessity ones, the sufficiency bias explains why b-type trials are weighted more heavily than c-type trials. On the other hand, the sufficiency bias is hypothesized to add to a positive-test bias. This second bias refers to the tendency of reasoners to work more easily with information stat­ ed in positive terms, as shown in a variety of cognitive tasks (Klayman & Ha., 1987; Levin, Page 22 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs Wasserman, & Kao, 1993). In the contingency domain, it has been shown that registering occurrence is mostly an automatic process, whereas registering non-occurrence requires attentional control (Maldonado et al., 2006). This bias would account for a wa > wb = wc > wd inequality, as cue-present and outcome-present trials are expected to be weighted more heavily than cue-absent and outcome-absent trials. In a more recent work, Mandel and Vartanian (2009) showed that judgments are better fitted by a Weighted Positive test Strategy (WPS) account than by the PBS account. Ac­ cording to this approach, people’s judgments are aimed at implementing two types of tests: the one that checks for conformity to the hypothesized cause (a positive hypothesis test), and the one that tests for conformity to the hypothesized effect (a positive target test). If the reasoner is aimed at testing conformity for the sentence “X causes Y” (vs. “X does not cause Y”), then the two cells to be attributed more weight should be a and b. If that reasoner is aimed as testing conformity with the sentence “Y is due to X” (vs. “Y is caused by something else”), the two cells to be most heavily weighted should be a and c. Importantly, if the hypothesis to test is preventive, then positive tests should be reformu­ lated as “X causes the absence of Y,” and “the absence of Y is due to X,” in which case the weights of cells a and b (and those of c and d) should be shifted. Hence, the blended positive hypothesis and positive target tests predict wa > wb = wc > wd and wb > wa = wd > wc inequalities for generative and preventive hypothesis tests, re­ spectively. As it happens with the dual-factor heuristic, in its raw form, this approach does not account for the wb > wc inequality (the wa > wd inequality in the case of preven­ tive hypotheses). This is why the positive test strategy must be weighted, so that hypothe­ sis tests are attributed more relevance than target tests. This assumption is based on the idea that people find it easier to think from cause to effect (predictive reasoning), that from effect to cause (diagnostic reasoning; Einhorn & Hogarth, 1982, 1986; see Fern­ bach, Darlow, & Sloman, 2011; Meder, Mayrhofer, & Waldmann, 2014; Meder & Mayrhofer, Chapter 23 in this volume, for more fine-grained discussions on this issue). Finally, according to the accounting-for-occurrences (AFO) model (White, 2008, 2009), people’s main computational aim when making causal judgments is twofold: judging the generative strength of the cause,2 and accounting (i.e., finding an explanation) for the oc­ currences of the effect. The first aim leads to direct attention to a and b trials (the more X causes Y, the more likely Y will be in the presence of X), and the second one to direct at­ tention to a and c trials (the more accountable by X is Y, the more likely X will be in the presence of Y). Once again, this leads to the wa > wb = wc > wd (p. 44) (where wd = 0) in­ equality. In a first approximation, the AFO model for one-cause, one-effect scenario was formulated as follows:

(7)

Page 23 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs Namely, a subcase of Equation 2, with arbitrary weights conforming to the two AFO com­ putational aims (and, seemingly, with a non-justified assumption that the second aim is less important than the first one, so that wb > wc). So, up to this point, there is little difference between the AFO and the WPS account. The AFO model, however, makes the extra assumption that “causal judgment will never be high if there are few or no Cell A instances, because, in that case, the cause clearly does not account for many occurrences of the outcome” (White, 2009, p. 502). In a series of ex­ periments, White (2009) showed that, holding all other factors constant (including the rel­ ative proportions between cell frequencies), manipulating a-cell frequency in the low-tointermediate range significantly influenced judgments. So, according to the AFO model, the scale in which the judgment is expressed is determined in accordance to the availabil­ ity of cause-present information. In other words, if P(C) is low, judgments will covary with a-cell frequency in a narrower scale than if it is high. Although this account seems ad hoc, at the present moment there is no alternative explanation for this counter-normative ef­ fect.

General Assessment of the Models Summarizing the last two sections, the power PC theory seems to have problems with the exact differential cell-weighting normally found with standard causal judgment tasks, al­ though such problems can be surpassed in two ways. The first is to present materials for assessment in such a way that the relevant pieces of information (conditional probabili­ ties) are made transparent, and the question probe is unambiguous about what the aim of the judgment exactly is. The better fit of p to judgments collected in this way provides practical clues about how materials should be designed to avoid potentially harmful bias­ es. The second way, to be discussed in the next section, is to incorporate p into a Bayesian approach, so that its combination with reasonable priors about the world yields values more in accordance with the results with standard causal judgment reported in the litera­ ture. Heuristics for causal judgment (the dual-factor heuristic and the weighted version of ∆D) yield better fits for standard judgments in the majority of scenarios used for experimenta­ tion. The dual-factor heuristic (H) is parameter-free, and thus more parsimonious, but w∆D can incorporate individual differences in the use of cell frequencies (and also a large part of presentation format and question probe effects). The major weakness of the latter arises from the fact that weights are purely descriptive, and thus work as free parame­ ters, unless they are related to certain computational aims. That is what the WPS and the AFO accounts try to do. Consequently, none of the models seems to be able to provide a full account of the avail­ able data. The close fitting of p with judgments in the specific cases in which information is provided in a certain way and the judgments collected using a question probe prompt­ ing a specific computational goal seem to indicate that naïve reasoners are capable of computing power. However, in the rest of scenarios, judgments seem to depart from p and Page 24 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs to align with the predictions of relatively simple heuristics. This does not seem to be due to misperception or poor recall of the necessary frequency or conditional probability in­ formation (judgments depart from p even in cases in which frequencies and probabilities are accurately estimated). Judgments cannot be accounted either by the possibility that reasoners conflate power and confidence. Thus, the possibility exists that reasoners actu­ ally use different rules for different purposes in different scenarios. The degree to which different pieces of information are more or less salient or easily computable and the task demands implicit both in the information provided and the question probe become crucial to understand how people make causal judgments.

Discussion: A Learning-Driven Cues-to-Causali­ ty Approach Throughout this chapter, we have compiled most available evidence on how the informa­ tion in the contingency table is mapped onto causal judgments. Beyond the details, this evidence reflects two basic overarching facts. First, when making a causal judgment, naïve reasoners try to meet what they interpret as their main task goals. In most cases, these goals are half-idiosyncratic, half-induced. Here, induction mostly operates via the question probe, so that the task goals can greatly vary depending on the question asked. However (and here is the second general fact), task goals are also implicit in the sort of information provided. Putting it in simple words, people try to make what they are asked for, using what they are provided with. Unless we are vigilant, we will not refrain from making a judg­ ment even after receiving information from one or two cells only, without showing any signs of lack of confidence. Nonetheless, investigating causal judgments is not a futile en­ deavor. In causal judgment tasks, people behave in quite a consistent manner, as reflect­ ed by the catalog of facts reviewed in the section “A Basic Collection of Facts” earlier in this chapter, and that consistency relates to non-trivial adaptive goals. (p. 45)

In the best-known scenario (standard causal judgment tasks), the dual-factor, the WPS, and the AFO accounts predict judgments to merge the predictive value of the cause, and the diagnostic value of the effect. Other analyses emphasize the fact that causes are judged to be strong if they are seen as sufficient and necessary to produce the effect. What frequently remains unnoticed is the fact that, when the sufficiency/necessity goals of causal judgment are combined with the rarity assumption, the sufficiency/necessity goals and the predictive/diagnostic goals approach become virtually indistinguishable. Without extra assumptions, both analyses predict the wa > wb = wc > wd inequality (for generative causes). In the dual-factor heuristic, the different impacts of cell frequency manipulation emerge from the geometric mean of and (such impacts are derivable from the model), whereas in the AFO and WPS accounts trial-type frequencies are combined linearly and their weights depend on whether each single trial type con­

Page 25 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs tributes to two, one, or none of the two tests the judgment is assumed to serve (the im­ pacts are not derived from the rule, but incorporated into it). Therefore, the rarity assumption is crucial to make approaches compatible. But how rea­ sonable is it? Actually, evidence from different areas of human reasoning converges at re­ vealing that people assume that both causes and effects are normally absent (Klayman & Ha, 1987; McKenzie, Ferreira, Mikkelsen, McDermott, & Skrable, 2001). Only rare facts seem to call for an explanation and, in order to look for causes for such facts, it does not seem to make much intuitive sense to search for such causes among things that are al­ most constantly present. Indeed, some results seem to show that covariation is subjec­ tively important for inferring causality mostly when the event to be explained is relatively rare, whereas, in other cases, causal attributions seem to be more influenced by informa­ tion on causal mechanisms (Johnson, Boyd, & Magnani, 1994). Still, that does not mean that the rarity assumption is necessarily rational or normative. Some events are generat­ ed by very prevalent causes (e.g., lung cancer and pollution), and assuming rarity in these scenarios would lead to biased judgments. Normative models (e.g., causal power) can incorporate people’s pre-assumptions about the world in the form of priors for Bayesian reasoning. Lu, Yuille, Liljeholm, Cheng, and Holyoak (2008; see also Lu, Yuille, Liljeholm, Cheng, & Holyoak, 2006, and Cheng & Lu, Chapter 5 in this volume), for example, proposed that naïve reasoners expect that, if the candidate cause is effective, the power of the candidate cause will be high, and the power of alternative causes low. And the other way around, if the candidate cause is ineffective, the power of the candidate cause is expected to be low, and the power of alternative caus­ es high. Once these priors are combined with the new evidence contained in the contin­ gency table, the resulting value of p for the candidate cause departs from its raw value (computed from Equation 4; a similar argument holds for preventive causes). McKenzie and Mikkelsen (2007) have adopted a similar approach to demonstrate that “joint pres­ ence is normatively more informative than joint absence if the presence of variables is rarer than their absence” (p. 33), that is, they show that, under the rarity assumption, cell weighting is actually a way to meet a rational goal. Interestingly, that idea had been pre­ viously advanced by Anderson (1990) and Anderson and Sheu (1995). Normative models need priors of this sort to accommodate the evidence available in causal and confidence judgments. The power-based model with strong and sparse (SS) priors proposed by Lu et al. (2008) yielded model-fitting values quite similar to the ones obtained by Perales and Shanks (2007) for the same set of data with w∆D (and accounted for more than 97% of variance in mean strength judgments). However, the SS model is parameter free, whereas w∆D has at least three degrees of freedom. What the normative model needs (instead of free parameters) is a subset of priors from a larger set of equally reasonable ones. In summary, rational or quasi-rational approaches equipped with certain prior assump­ tions, and heuristic combinatory rules equipped with certain weights, end up making very similar predictions. This coincidence is tightly related to Anderson and Sheu’s (1995) idea Page 26 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs that “a rather sophisticated Bayesian inference can be achieved by (p. 46) a very simple response rule” (p. 513). In normative models, priors combine with the information in the environment. The result of such combination is a pattern of judgments in which cell weights are much in accordance with the customarily observed ones. In heuristic models, cell weights result from a different, although related, mechanism: reasoners have a proto­ type of what a true cause is, and judge candidate causes based on the degree to which they behave in accordance with the prototype (judgment’s computational aim is assessing the presence of the features of the prototype in the candidate). In the remaining part of this section, we propose the learning-driven cues-to-causality ap­ proach as a way to reconcile rational and non-rational rule-based accounts of causal judg­ ment. This approach holds three principles: (1) contingency is only one of the many clues that inform people about causality; (2) when contingency information is available, people use this information to meet both adaptive and rational goals; and (3) the weights of the different pieces of available contingency-related information are learned, that is, they arise from feedback received from past experiences in similar situations. Next, we will try to elucidate how many clues to causality there are (and what happens when they are con­ tradictory), and then we will focus on the second and third principles, to speculate how learning mechanisms could help judgment rules to match the rationality/adaption de­ mands imposed by the environment. Partially compatible versions of the cues-to-causality approach have been proposed by Einhorn and Hogarth (1982), Sloman and Lagnado (2004), Perales and Catena (2006), Lagnado, Waldmann, Hagmayer, and Sloman (2007), and White (2014). Perales and Cate­ na mentioned contingency from observation, contingency from intervention, control of ex­ traneous factors, time, order, and pre-stored knowledge as sources of both causal strength and causal plausibility. Presence or lack of control determines plausibility. So do time and order, in such a way that X cannot cause Y unless X consistently occurs before Y, and X is sufficiently contiguous to Y as to cause it. Contingency and prior knowledge jointly determine both strength and plausibility. With regard to our purposes, the most theoretically relevant case is when two clues col­ lide, and thus the operation of the integration mechanism is unveiled. Let us imagine we receive confirmatory contingency evidence about a causal link previously regarded as im­ plausible (e.g., plants bloom more frequently in red pots than in blue pots). Catena et al. (1998) reformulated the anchoring-and-adjustment heuristic previously proposed by Hog­ arth and Einhorn (1992) to model this situation as follows:

(8)

Where Jn is the integrative judgment, Jn-1 is the prior judgment on the causal link under scrutiny, and NewEvidence is the causal strength information portrayed by the available contingency information (w∆D in the original model)—that is, the new evidence and the old judgment are linearly integrated. However, the key parameter of the model is β, the Page 27 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs one that quantifies the relative plausibility for a causal interpretation of the two conflict­ ing clues (in the present case, pre-stored knowledge and new contingency information). Some cues can even switch β off, rendering the new contingency information irrelevant; that would occur, for example, if the candidate cause is known to occur after the effect, which violates the temporal arrow of causality, or if the source of new evidence is consid­ ered untrustworthy. Conversely, some factors can boost β. For example, holding all other factors constant, contingency arising from intervention is attributed more causal value than that arising from mere observation (Lagnado & Sloman, 2004). Fugelsang and Thompson (2003) reported a series of results on mechanism-covariation integration showing that β tends to favor the prior belief over new evidence when such prior belief is based in pre-stored knowledge about a causal transmission mechanism, making the link plausible or implausible. Relatedly, Müller et al. (2011) showed that, in some contexts (i.e., a medical diagnosis task vs. an economic prediction task), causal be­ liefs are particularly resistant to change. And other reports show that β can depend on the adequacy of the methods to obtain the new covariational information, or the reliability of the information source (Perales et al., 2007). Importantly, belief-updating effects seem to be well fitted by this simple combinatory heuristic, in which several sources of differen­ tial plausibility are channeled through a single β parameter. An alternative account for belief updating is provided by Bayesian models. In these mod­ els, beliefs take the form of priors so that

(9)

where the integrative judgment is a function of the relative likelihood of the exis­ tence of the causal link, expressed here as H1, , relative to its inexistence, (p. 47)

, given the available contingency data D. This, in turn, depends on the belief on the relative likelihood of the existence of the link,

, relative to its inexistence,

, prior to any new evidence, and the relative likelihood of the available data given the hypothetical existence or inexistence of a causal link,

, and

, respec­

tively. In other words, the integrative judgment depends on the integration of two factors, one reflecting the prior belief and the other depending on the observable contingency in­ formation (both in the log space). Bayesian models have some strengths, but also face some difficulties. An advantage of Bayesian models over the anchoring-and-adjustment heuristic is that priors not only refer to the relative likelihood of the existence of the causal link, but also can reflect precon­ ceptions of the world (Lu et al., 2006, 2008), which could alter how new contingency in­

Page 28 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs formation is translated into causal evidence. As noted earlier, this possibility is supported by data, and has not been yet incorporated into integrative heuristics. On the side of limitations, the computation of

and

is computationally

complex, and is likely to be constrained by cognitive boundaries (see Hattori & Oaksford, 2007, p. 800, for a discussion on the matter). Second, in principle, simple Bayesian mod­ els do not incorporate a straightforward way to represent the reliability of the source of the new statistical evidence. Third, convergent evidence supports a heuristic-based ap­ proach in related causal induction tasks (Mayrhofer & Waldmann, 2011; Meder, Gersten­ berg, Hagmayer, & Waldmann, 2010). Please note, however, that Bayesians usually view their accounts as computational theories. According to that view (which is actually very similar to the one we are advocating here), heuristics may approximate the rational rule, and therefore be rational. And, fourth, although it is true that prior beliefs can influence how new evidence is regarded and selected, people do not seem to do so exactly in the way predicted by Bayesian models (Goedert at al., 2014). So, the key issue for the cues-to-causality approach continues to be to identify where the parameter values come from. In our framework, the clues relevant for a causal interpreta­ tion of contingency information—at least in the family of tasks covered by the present chapter—are cell frequencies (or conditional probabilities, if that is the most salient for­ mat). The weights customarily attributed to each cell make some rational sense in a nar­ row context, in which people are assessing predictivity/diagnosticity, or sufficiency/neces­ sity of relatively rare causes to generate relatively rare effects. Even though that context is not necessarily representative of the world as it is, people seem to hold the rarity assumption across contexts. Imagine yourself suffering from occa­ sional headaches (let us say, one day a month). It could be the case that a frequent cause (e.g., pollution, if you live in a large city) causes the headaches with a very low power (e.g. p = 1/30 = .033). Still, you would probably check for many other (rare) causes be­ fore directing your attention to pollution. That implies that real-life feedback on causality judgments tends to be “wicked” to some degree (Hogarth, 2001): we obtain most feed­ back from situations in which we make judgments and predictions, and we make these al­ most solely when we check for infrequent candidate causes of unexpected events. That opens the possibility for cell weights to be tuned by experience as if rarity were a true and general feature of our world. As proposed by lens models (e.g., Hastie & Dawes, 2009), the weights people attribute to cues informing about a certain criterion (i.e., judg­ ment accuracy or decision correctness) tend to progressively match their objective validi­ ties, that is, the degree to which they predict the criterion in the context in which learn­ ing occurs. The parsimony principle, shared many other judgment and decision models, leads us to assume that cues are linearly integrated, and that the linear integration model built by repeated experience is also generalized to less common scenarios where system­ atic biases are thus likely to occur.

Page 29 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs

Concluding Remarks: The Pursuit of De-bias­ ing Tools Early learning experiences seem to have a perdurable impact on how we incorporate con­ tingency information into causal judgment (Piaget, 1954; White, 2014). Such learning ex­ periences seem to crystallize as priors, or previous beliefs, about the rarity of causes and effects, and about the necessity and sufficiency of causes. Independently of the specific theoretical perspective one adopts, such preconceptions of the world seem to be respon­ sible for the fact that some pieces of covariational evidence are attributed more value than others. However, differential weighting has been normally considered a biased strategy (see McKenzie & Mikkelsen, 2007, for a discussion on the matter). Indeed, in the artificial scenarios designed for experimental research, the adaptive value of such a strategy is, more often than not, difficult to see. Importantly, in technological and information soci­ eties, we are growingly exposed to scenarios in which the way information is portrayed is likely to elicit biased or erroneous judgments. (p. 48)

Still, learning could reasonably operate in the opposite direction. In other words, prior as­ sumptions could be overwritten with sufficient evidence to the contrary, from scenarios in which such presumptions are no longer valid. In addition, adaptive causal judgments are likely to be boosted by adequate design of materials, not only providing all the relevant information, but also displaying it in a way that incidentally elicits its perception and con­ sideration by the untrained reasoner, and facilitates the potential contradictions between consecutive or simultaneous judgments and decisions. Only the development of a systematic research program, including tasks allowing for cor­ rective feedback over judgments across diverse scenarios, could provide convincing sup­ port for this hypothesis, and, simultaneously, help to design tools for de-biasing. Promis­ ingly, and in accordance with the role of learning and generalization on setting the para­ meters of causal induction rules, sufficient direct experience with non-effective causes has been proved to increase the weight of disconfirmatory pieces of evidence in a subse­ quent task, with different candidate causes and effects, in scenarios in which naïve rea­ soners are very prone to fall victims to biases (i.e., medical diagnosis, Müller et al., 2011; development of beliefs on the effectiveness of pseudo-cures, Barbería, Blanco, Cubillas, & Matute, 2013).

References Ahn, W. K., Kalish, C. W., Medin, D. L., & Gelman, S. A. (1995). The role of covariation ver­ sus mechanism information in causal attribution. Cognition, 54(3), 299–352. Allan, L. G. (1980). A note on measurement of contingency between two binary variables in judgement tasks. Bulletin of the Psychonomic Society, 15, 147–149.

Page 30 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs Allan, L. G., & Jenkins, H. M. (1983). The effect of representations of binary variables on judgment of influence. Learning and Motivation, 14(4), 381–405. Alloy, L. B., & Tabachnik, N. (1984). Assessment of covariation by humans and animals: the joint influence of prior expectations and current situational information. Psychological Review, 91(1), 112–149. Alloy, L. B., & Abramson, L. Y. (1979). Judgment of contingency in depressed and nonde­ pressed students: Sadder but wiser? Journal of Experimental Psychology: General, 108(4), 441–485. Anderson, J. R. (1990). The adaptive character of thought. Hillsdale, NJ: Lawrence Erl­ baum Associates. Anderson, J. R., & Sheu, C. F. (1995). Causal inferences as perceptual judgments. Memory & Cognition, 23(4), 510–524. Arkowitz, H., & Lilienfeld, S. O. (2011). Deranged and dangerous? Scientific American Mind, 22(3), 64–65. Arkowitz, H., & Lilienfeld, S. O. (2013). Is divorce bad for children? Scientific American Mind, 24(1), 68–69. Ashby, F. G., & Maddox, W. T. (1992). Complex decision rules in categorization: Contrast­ ing novice and experienced performance. Journal of Experimental Psychology: Human Perception and Performance, 18(1), 50–71. Barberia, I., Blanco, F., Cubillas, C. P., & Matute, H. (2013). Implementation and assess­ ment of an intervention to debias adolescents against causal illusions. PloS One, 8(8), e71303. Blanco, F., Matute, H., & Vadillo, M. A. (2009). Depressive realism: Wiser or quieter?. Psy­ chological Record, 59, 551–562. Blanco, F., Matute, H., & Vadillo, M. A. (2013). Interactive effects of the probability of the cue and the probability of the outcome on the overestimation of null contingency. Learn­ ing & Behavior, 41(4), 333–340. Buehner, M. J., & Cheng, P. W. (1997, August). Causal induction: The power PC theory ver­ sus the Rescorla-Wagner model. In Proceedings of the nineteenth annual conference of the Cognitive Science Society (pp. 55–60). Hillsdale, NJ: Lawrence Erlbaum Associates. Buehner, M. J., Cheng, P. W., & Clifford, D. (2003). From covariation to causation: A test of the assumption of causal power. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29(6), 1119–1140. Buehner, M. J., & Humphreys, G. R. (2010). Causal contraction spatial binding in the per­ ception of collision events. Psychological Science, 21(1), 44–48. Page 31 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs Catena, A., Maldonado, A., & Cándido, A. (1998). The effect of frequency of judgement and the type of trials on covariation learning. Journal of Experimental Psychology: Human Perception and Performance, 24(2), 481–495. Catena, A., Maldonado, A., Megías, J. L., & Frese, B. (2002). Judgement frequency, belief revision, and serial processing (p. 49) of causal information. The Quarterly Journal of Ex­ perimental Psychology: Section B, 55(3), 267–281. Chapman, G. B., & Robbins, S. J. (1990). Cue interaction in human contingency judgment. Memory & Cognition, 18(5), 537–545. Chatlosh, D. L., Neunaber, D. J., & Wasserman, E. A. (1985). Response-outcome contin­ gency: Behavioral and judgmental effects of appetitive and aversive outcomes with col­ lege students. Learning and Motivation, 16(1), 1–34. Cheng, P. W. (1997). From covariation to causation: A causal power theory. Psychological Review, 104(2), 367–405. Collins, D. J., & Shanks, D. R. (2006). Short article conformity to the power PC theory of causal induction depends on the type of probe question. The Quarterly Journal of Experi­ mental Psychology, 59(2), 225–232. Crocker, J. (1982). Biased questions in judgment of covariation studies. Personality & So­ cial Psychology Bulletin, 8, 214–220. Dickinson, A. (2001). The 28th Bartlett memorial lecture causal learning: An associative analysis. The Quarterly Journal of Experimental Psychology: Section B, 54(1), 3–25. Einhorn, H. J., & Hogarth, R. M. (1982). Prediction, diagnosis, and causal thinking in fore­ casting. Journal of Forecasting, 1(1), 23–36. Einhorn, H. J., & Hogarth, R. M. (1986). Judging probable cause. Psychological Bulletin, 99(1), 3–19. Evans, J. S. B. (1998). Matching bias in conditional reasoning: Do we understand it after 25 years? Thinking & Reasoning, 4(1), 45–110. Fernbach, P. M., Darlow, A., & Sloman, S. A. (2011). Asymmetries in predictive and diag­ nostic reasoning. Journal of Experimental Psychology: General, 140(2), 168–185. Fiedler, K. (1996). Explaining and simulating judgment biases as an aggregation phenom­ enon in probabilistic, multiple-cue environments. Psychological Review, 103(1), 193. Fugelsang, J. A., & Thompson, V. A. (2003). A dual-process model of belief and evidence interactions in causal reasoning. Memory & Cognition, 31(5), 800–815. Garcia-Retamero, R., Müller, S. M., Catena, A., & Maldonado, A. (2009). The power of causal beliefs and conflicting evidence on causal judgments and decision making. Learn­ ing and Motivation, 40(3), 284–297. Page 32 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs Gigerenzer, G., Todd, P. M., & The ABC Group. (1999). Simple heuristics that make us smart. Oxford: Oxford University Press. Goedert, K. M., Ellefson, M. R., & Rehder, B. (2014). Differences in the weighting and choice of evidence for plausible versus implausible causes. Journal of Experimental Psy­ chology: Learning, Memory, and Cognition, 40(3), 683–702. Griffiths, T. L., & Tenenbaum, J. B. (2005). Structure and strength in causal induction. Cognitive Psychology, 51(4), 334–384. Griffiths, T. L., & Tenenbaum, J. B. (2009). Theory-based causal induction. Psychological Review, 116(4), 661–716. Harris, A. J., & Osman, M. (2012). The illusion of control: A Bayesian perspective. Syn­ these, 189(1), 29–38. Hastie, R., & Dawes, R. M. (2009). Rational choice in an uncertain world: The psychology of judgment and decision making. London: Sage Publications. Hattori, M., & Oaksford, M. (2007). Adaptive non‐interventional heuristics for covariation detection in causal induction: Model comparison and rational analysis. Cognitive Science, 31(5), 765–814. Holyoak, K. J., & Cheng, P. W. (2011). Causal learning and inference as a rational process: The new synthesis. Annual Review of Psychology, 62, 135–163. Hogarth, R. M. (2001). Educating intuition. Chicago: University of Chicago Press. Hogarth, R. M., & Einhorn, H. J. (1992). Order effects in belief updating: The belief-ad­ justment model. Cognitive Psychology, 24(1), 1–55. Inhelder, B., & Piaget, J. (1958). The growth of logical thinking from childhood to adoles­ cence. London: Routledge & Kegan Paul. Jenkins, H. M., & Ward, W. C. (1965). Judgment of contingency between responses and outcomes. Psychological Monographs: General and Applied, 79(1), 112–149. Johnson, J. T., Boyd, K. R., & Magnani, P. S. (1994). Causal reasoning in the attribution of rare and common events. Journal of Personality and Social Psychology, 66(2), 229. Kahneman, D., & Klein, G. (2009). Conditions for intuitive expertise: A failure to disagree. American Psychologist, 64(6), 515–526. Kao, S. F., & Wasserman, E. A. (1993). Assessment of an information integration account of contingency judgment with examination of subjective cell importance and method of in­ formation presentation. Journal of Experimental Psychology: Learning, Memory, and Cog­ nition, 19(6), 1363–1386.

Page 33 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs Karazinov, D. M., & Boakes, R. A. (2007). Second-order conditioning in human predictive judgements when there is little time to think. The Quarterly Journal of Experimental Psy­ chology, 60(3), 448–460. Klayman, J., & Ha, Y. W. (1987). Confirmation, disconfirmation, and information in hypoth­ esis testing. Psychological Review, 94(2), 211–228. Klayman, J., & Ha, Y. W. (1989). Hypothesis testing in rule discovery: Strategy, structure, and content. Journal of Experimental Psychology: Learning, Memory, and Cognition, 15(4), 596. Kushnir, T., & Gopnik, A. (2007). Conditional probability versus spatial contiguity in causal learning: Preschoolers use new contingency evidence to overcome prior spatial as­ sumptions. Developmental Psychology, 43(1), 186–196. Lagnado, D. A., & Sloman, S. (2004). The advantage of timely intervention. Journal of Ex­ perimental Psychology: Learning, Memory, and Cognition, 30(4), 856–876. Lagnado, D. A., Waldmann, M. R., Hagmayer, Y., & Sloman, S. A. (2007). Beyond covaria­ tion: Cues to causal structure. In A. Gopnik & L. Schulz (Eds.), Causal learning: Psycholo­ gy, philosophy, and computation (pp. 86–100). Oxford: Oxford University Press. Levin, I. P., Wasserman, E. A., & Kao, S. F. (1993). Multiple methods for examining biased information use in contingency judgments. Organizational Behavior and Human Decision Processes, 55(2), 228–250. Liljeholm, M., & Cheng, P. W. (2009). The influence of virtual sample size on confidence and causal-strength judgments. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35(1), 157–172. Lu, H., Yuille, A., Lijeholm, M., Cheng, P. W., & Holyoak, K. J. (2006). Modeling causal learning using Bayesian generic priors on generative and preventive powers. In R. Sun & N. Miyake (Eds.), Proceedings of the 28th annual conference of the Cognitive Science So­ ciety. Mahwah, NJ: Lawrence Erlbaum Associates. Lu, H., Yuille, A. L., Liljeholm, M., Cheng, P. W., & Holyoak, K. J. (2008). Bayesian generic priors for causal learning. Psychological Review, 115(4), 955. McKenzie, C. R., Ferreira, V. S., Mikkelsen, L. A., McDermott, K. J., & Skrable, R. P. (2001). Do conditional hypotheses target (p. 50) rare events? Organizational Behavior and Human Decision Processes, 85(2), 291–309. Mackie, J. L. (1965). Causes and conditions. American Philosophical Quarterly, 2(4), 245– 264. Maldonado, A., Catena, A., Cándido, A., & García, I. (1999). The belief revision model: Asymmetrical effects of noncontingency on human covariation learning. Animal Learning & Behavior, 27(2), 168–180. Page 34 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs Maldonado, A., Jiménez, G., Herrera, A., Perales, J. C., & Catena, A. (2006). Inattentional blindness for negative relationships in human causal learning. The Quarterly Journal of Experimental Psychology, 59(3), 457–470. Mandel, D. R., & Lehman, D. R. (1998). Integration of contingency information in judg­ ments of cause, covariation, and probability. Journal of Experimental Psychology: General, 127(3), 269–285. Mandel, D. R., & Vartanian, O. (2009). Weighting of contingency information in causal judgement: Evidence of hypothesis dependence and use of a positive-test strategy. The Quarterly Journal of Experimental Psychology, 62(12), 2388–2408. Marsh, J. K., & Ahn, W. K. (2006). Order effects in contingency learning: The role of task complexity. Memory & Cognition, 34(3), 568–576. Matute, H., Yarritu, I., & Vadillo, M. A. (2011). Illusions of causality at the heart of pseu­ doscience. British Journal of Psychology, 102(3), 392–405. Mayrhofer, R., & Waldmann, M. R. (2011). Heuristics in covariation-based induction of causal models: Sufficiency and necessity priors. In L. Carlson, C. Hölscher, & T. Shipley (Eds.). Proceedings of the 33rd annual conference of the Cognitive Science Society (pp. 3110–3115). Austin, TX: Cognitive Science Society. McCormack, T., Simms, V., McGourty, J., & Beckers, T. (2013). Encouraging children to think counterfactually enhances blocking in a causal learning task. The Quarterly Journal of Experimental Psychology, 66(10), 1910–1926. McKenzie, C. R., & Mikkelsen, L. A. (2007). A Bayesian view of covariation assessment. Cognitive Psychology, 54(1), 33–61. Meder, B., Gerstenberg, T., Hagmayer, Y., & Waldmann, M. R. (2010). Observing and in­ tervening: Rational and heuristic models of causal decision making. The Open Psychology Journal, 3, 119–135. Meder, B., Mayrhofer, R., & Waldmann, M. R. (2014). Structure induction in diagnostic causal reasoning. Psychological Review, 121(3), 277–311. Mitchell, C. J., Lovibond, P. F., & Gan, C. Y. (2005). A dissociation between causal judg­ ment and outcome recall. Psychonomic Bulletin & Review, 12(5), 950–954. Morewedge, C. K., & Kahneman, D. (2010). Associative processes in intuitive judgment. Trends in Cognitive Sciences, 14(10), 435–440. Msetfi, R. M., Wade, C., & Murphy, R. A. (2013). Context and time in causal learning: con­ tingency and mood dependent effects. PloS One, 8(5), e64063.

Page 35 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs Müller, S. M., Garcia-Retamero, R., Cokely, E., & Maldonado, A. (2011). Causal beliefs and empirical evidence: Decision-making processes in two-alternative forced-choice tasks. Ex­ perimental Psychology, 58(4), 324. Müller, S. M., Garcia-Retamero, R., Galesic, M., & Maldonado, A. (2013). The impact of domain-specific beliefs on decisions and causal judgments. Acta Psychologica, 144(3), 472–480. Novick, L. R., & Cheng, P. W. (2004). Assessing interactive causal influence. Psychological Review, 111(2), 455–485. Orgaz, C., Estévez, A., & Matute, H. (2013). Pathological gamblers are more vulnerable to the illusion of control in a standard associative learning task. Frontiers in Psychology, 4. http://journal.frontiersin.org/article/10.3389/fpsyg.2013.00306/full Osman, M., & Shanks, D. R. (2005). Individual differences in causal learning and decision making. Acta Psychologica, 120(1), 93–112. Pearl, J. (2000). Causality: models, reasoning and inference. Cambridge, MA: MIT Press. Perales, J., & Catena, A. (2006). Human causal induction: A glimpse at the whole picture. European Journal of Cognitive Psychology, 18(2), 277–320. Perales, J. C., Catena, A., & Maldonado, A. (2004). Inferring non-observed correlations from causal scenarios: The role of causal knowledge. Learning and Motivation, 35(2), 115–135. Perales, J. C., Catena, A., Maldonado, A., & Cándido, A. (2007). The role of mechanism and covariation information in causal belief updating. Cognition, 105(3), 704–714. Perales, J. C., Catena, A., Shanks, D. R., & González, J. A. (2005). Dissociation between judgments and outcome-expectancy measures in covariation learning: a signal detection theory approach. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31(5), 1105. Perales, J. C., Maldonado, A., Candido, A., Contreras, D., & Catena, A. (2008, June). How are causal powers combined? International Journal of Psychology, 43(3–4), 510–510. Perales, J. C., & Shanks, D. R. (2007). Models of covariation-based causal judgment: A re­ view and synthesis. Psychonomic Bulletin & Review, 14(4), 577–596. Perales, J. C., & Shanks, D. R. (2008). Driven by power? Probe question and presentation format effects on causal judgment. Journal of Experimental Psychology: Learning, Memo­ ry, and Cognition, 34(6), 157–172. Piaget, J. (1954). The construction of reality in the child. New York: Basic Books.

Page 36 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs Pineño, O., & Miller, R. R. (2007). Comparing associative, statistical, and inferential rea­ soning accounts of human contingency learning. The Quarterly Journal of Experimental Psychology, 60(3), 310–329. Rebane, G., & Pearl, J. (2013). The recovery of causal poly-trees from statistical data. arX­ iv:1304.2736. https://arxiv.org/ftp/arxiv/papers/1304/1304.2736.pdf Rieskamp, J., & Otto, P. E. (2006). SSL: A theory of how people learn to select strategies. Journal of Experimental Psychology: General, 135(2), 207–236. Scandura, J. M. (1972). What is a rule? Journal of Educational Psychology, 63(3), 179–185. Sedlmeier, P. (2002). Associative learning and frequency judgments: The PASS model. In P. E. Sedlmeier & T. E. Betsch (Eds.), ETC: Frequency processing and cognition (pp 137– 152). Oxford: Oxford University Press. Sedlmeier, P., & Gigerenzer, G. (2001). Teaching Bayesian reasoning in less than two hours. Journal of Experimental Psychology: General, 130, 380–400. Shaklee, H., & Tucker, D. (1980). A rule analysis of judgments of covariation between events. Memory & Cognition, 8(5), 459–467. Shanks, D. R. (2002). Tests of the power PC theory of causal induction with negative con­ tingencies. Experimental Psychology, 49(2), 81–88. Sloman, S., & Lagnado, D. A. (2004). Causal invariance in reasoning and learning. Psychology of Learning and Motivation, 44, 287–326. (p. 51)

Slovic, P., Monahan, J., & MacGregor, D. G. (2000). Violence risk assessment and risk communication: the effects of using actual cases, providing instruction, and employing probability versus frequency formats. Law and Human Behavior, 24(3), 271–296. Smedslund, J. (1963). The concept of correlation in adults. Scandinavian Journal of Psy­ chology, 4(3), 165–173. Vadillo, M. A., Musca, S. C., Blanco, F., & Matute, H. (2011). Contrasting cue-density ef­ fects in causal and prediction judgments. Psychonomic Bulletin & Review, 18(1), 110–115. Vadillo, M. A., & Matute, H. (2007). Predictions and causal estimations are not supported by the same associative structure. The Quarterly Journal of Experimental Psychology, 60(3), 433–447. Vadillo, M. A., Ortega-Castro, N., Barberia, I., & Baker, A. G. (2014). Two heads are better than one, but how much? Evidence that people’s use of causal integration rules does not always conform to normative standards. Experimental Psychology, 61, 356–367. Vallée-Tourangeau, F., Murphy, R. A., Drew, S., & Baker, A. G. (1998). Judging the impor­ tance of constant and variable candidate causes: A test of the power PC theory. The Quar­ terly Journal of Experimental Psychology: Section A, 51(1), 65–84. Page 37 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs Vallée-Tourangeau, F., Payton, T., & Murphy, R. A. (2008). The impact of presentation for­ mat on causal inferences. European Journal of Cognitive Psychology, 20(1), 177–194. Van Hamme, L. J., & Wasserman, E. A. (1994). Cue competition in causality judgments: The role of nonpresentation of compound stimulus elements. Learning and Motivation, 25(2), 127–151. Wakefield, A. J., Murch, S. H., Anthony, A., Linnell, J., Casson, D. M., Malik, M., et al. (1998). RETRACTED: Ileal-lymphoid-nodular hyperplasia, non-specific colitis, and perva­ sive developmental disorder in children. The Lancet, 351(9103), 637–641. Waldmann, M. R. (2012). Predictive versus diagnostic causal learning. In N. Seel (Ed.), Encyclopedia of the sciences of learning (pp. 2665–2667). New York: Springer. Ward, W. C., & Jenkins, H. M. (1965). The display of information and the judgment of con­ tingency. Canadian Journal of Psychology/Revue Canadienne de Psychologie, 19(3), 231– 241. Wasserman, E. A., Dorner, W. W., & Kao, S. F. (1990). Contributions of specific cell infor­ mation to judgments of interevent contingency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16(3), 509–521. Wasserman, E. A., Kao, S. F., Van Hamme, L. J., Katagiri, M., & Young, M. E. (1996). Cau­ sation and association. Psychology of Learning and Motivation, 34, 207–264. Wasserman, E. A., & Shaklee, H. (1984). Judging response-outcome relations: The role of response-outcome contingency, outcome probability, and method of information presenta­ tion. Memory & Cognition, 12(3), 270–286. White, P. A. (1998). Causal judgement: Use of different types of contingency information as confirmatory and disconfirmatory. European Journal of Cognitive Psychology, 10(2), 131–170. White, P. A. (2000). Causal judgment from contingency information: Relation between subjective reports and individual tendencies in judgment. Memory & Cognition, 28(3), 415–426. White, P. A. (2003a). Effects of wording and stimulus format on the use of contingency in­ formation in causal judgment. Memory & Cognition, 31(2), 231–242. White, P. A. (2003b). Making causal judgments from the proportion of confirming in­ stances: The pCI rule. Journal of Experimental Psychology: Learning, Memory, and Cogni­ tion, 29(4), 710–727. White, P. A. (2004). Causal judgment from contingency information: A systematic test of the pCI rule. Memory & Cognition, 32(3), 353–368.

Page 38 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs White, P. A. (2005). The power PC theory and causal powers: Comment on Cheng (1997) and Novick and Cheng (2004). Psychological Review, 112(3), 675–684. White, P. A. (2008). Accounting for occurrences: a new view of the use of contingency in­ formation in causal judgment. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34(1), 204–218. White, P. A. (2009). Accounting for occurrences: An explanation for some novel tenden­ cies in causal judgment from contingency information. Memory & Cognition, 37(4), 500– 513. White, P. A. (2011). Causal judgements about two causal candidates: Accounting for oc­ currences, estimating strength, and the importance of interaction judgements. Journal of Cognitive Psychology, 23(4), 485–506. White, P. A. (2012). The experience of force: the role of haptic experience of forces in vi­ sual perception of object motion and interactions, mental simulation, and motion-related judgments. Psychological Bulletin, 138(4), 589–615. White, P. A. (2014). Singular clues to causality and their use in human causal judgment. Cognitive Science, 38(1), 38–75. Wu, M., & Cheng, P. W. (1999). Why causation need not follow from statistical association: Boundary conditions for the evaluation of generative and preventive causal powers. Psy­ chological Science, 10(2), 92–97. (p. 52)

Notes: (1.) Independently of their origin, cell weights and density biases have important practi­ cal consequences. For example, in causal judgment tasks in which participants are asked to judge the causal efficacy of their own behavior (e.g., the degree to which pressing a button makes an electronic device light up), people often claim to have some control over an outcome that is completely independent of their response (Blanco, Matute, & Vadillo, 2009, 2013; Harris & Osman, 2012). This illusion of control has played a key role in ac­ counts of depression (e.g., Alloy & Abramson, 1979) and problem gambling (e.g., Orgaz, Estévez, & Matute, 2013). (2.) White refers to the generative strength of the candidate cause as “power.” However, this term here has little to do with its meaning in the power PC theory. In the AFO model, power is supported if the effect follows the cause, whereas it remains unsupported if the cause appears alone. In mathematical terms, it mostly coincides with the a-b heuristic rule included in Table 3.2.

José C. Perales

Page 39 of 40

Rules of Causal Judgment: Mapping Statistical Information onto Causal Be­ liefs Mind, Brain, and Behavior Research Center Experimental Psychology Department University of Granada Granada, Spain Andrés Catena

Mind, Brain, and Behavior Research Center Experimental Psychology Department University of Granada Granada, Spain Antonio Cándido

Mind, Brain, and Behavior Research Center Experimental Psychology Department University of Granada Granada, Spain Antonio Maldonado

Mind, Brain, and Behavior Research Center Experimental Psychology Department University of Granada Granada, Spain

Page 40 of 40

The Inferential Reasoning Theory of Causal Learning: Toward a MultiProcess Propositional Account

The Inferential Reasoning Theory of Causal Learning: Toward a Multi-Process Propositional Account   Yannick Boddez, Jan De Houwer, and Tom Beckers The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.7

Abstract and Keywords Chapter 4 describes the inferential reasoning theory of causal learning and discusses how thinking about this theory has evolved in at least two important ways. First, the authors argue that it is useful to decouple the debate about different possible types of mental rep­ resentations involved in causal learning (e.g., propositional or associative) from the de­ bate about processes involved therein (e.g., inferential reasoning or attention). Second, at the process level inferential reasoning is embedded within a broad array of mental processes that are all required to provide a full mechanistic account of causal learning. Based on those insights, the authors evaluate five arguments that are often raised against inferential reasoning theory. They conclude that causal learning is best understood as in­ volving the formation and retrieval of propositional representations, both of which de­ pend on multiple cognitive processes (i.e., the multi-process propositional account). Keywords: Causal, learning, inferential reasoning, association, proposition

The Associative Theory of Causal Learning The question of how we learn that one event causes the occurrence of another event has intrigued philosophers and psychologists since time immemorial. About three decades ago, Dickinson, Shanks, and Evenden (1984) proposed a challenging answer: they sug­ gested that human causal learning can be explained by association formation between the representation of cues and outcomes (i.e., causes and effects). In short, this theoretical proposal holds that repeated pairing of a cue and an outcome results in the formation of an association between cue and outcome. Such association is typically conceived as an unqualified link that transmits activation from one representation to another, very much analogous to the way a strip of copper wire conducts electricity. Once it has been formed, presentation of the cue will result in the activation of its mental representation, and this will in turn produce an increase in the activation of the representation of the outcome. In­ terestingly, the hypothesis of Dickinson et al. (1984) implies that the established princi­ Page 1 of 21

The Inferential Reasoning Theory of Causal Learning: Toward a MultiProcess Propositional Account ples of associative learning theory, initially developed to account for animal conditioning, can be brought to bear on the issue of human causal learning (Le Pelley, Griffiths, & Beesley, Chapter 2 in this volume). This means that phenomena that are predicted by as­ sociative learning theory should also be observed in human causal learning. Dickinson and colleagues (1984) chose the blocking effect to put this suggestion to the test. This choice had some symbolic value: arguably, blocking was (and still is) the most important phenomenon in the history of associative learning theory, because it inspired a whole gen­ eration of influential learning models (e.g., Mackintosh, 1975; Pearce and Hall, 1980; Rescorla and Wagner, 1972; Wagner, 1981). Kamin (1967) pioneered the blocking procedure by presenting rats with pairings of a white noise stimulus with shock, followed by pairings of a compound of the white noise stimulus and a new light stimulus with shock. This training procedure is commonly denot­ ed as “A+ then AX+” training (where “A” means that a neutral stimulus, for example a burst of white noise, is presented; “AX” means that two stimuli, for example a burst of white noise and the flashing of a light, are presented together; and “+” means that those stimuli are followed by an outcome, for example a shock). Kamin (1967) made the famous (p. 54)

observation that “prior conditioning to an element might block conditioning to a new, su­ perimposed element” (p. 5): The light stimulus X elicited low fear responding when it was presented by itself during the test, despite its being paired with the shock outcome. Im­ portantly, Dickinson et al. (1984) observed blocking in a human causal learning study, mir­ roring the observations in animal conditioning. Procedurally speaking, a causal learning study is similar to an animal conditioning study in that it involves exposure to trials in which potential causes and effects may or may not co-occur (for a detailed description, see Perales, Catena, Cándido, & Maldonado, Chapter 3 in this volume). An often-used pro­ cedure requires participants to imagine that they are allergists who are trying to discover the cause(s) of an allergic reaction. In a trial-by-trial fashion, participants are then pre­ sented with records of a fictitious patient that show one or more food items that the pa­ tients has supposedly eaten, with an outcome message indicating whether an allergic re­ action occurred following ingestion of those food items. For example, in a blocking prepa­ ration, in a first series of trials, eating paprika would be followed by an allergic reaction and, in a next series of trials, eating paprika and coconut together would be followed by that same allergic reaction (i.e., A+, AX+ training). In this example, successful blocking would be reflected in the participant making the judgment that coconut does not produce an allergic reaction (i.e., cue X is not causally related to the outcome). How do associative theories (e.g., Rescorla & Wagner, 1972) account for this phenome­ non? These theories typically rely on an error-prediction mechanism that can be de­ scribed in terms of expectancy and surprise: learning is conceived as a function of how surprising the occurrence of the outcome is, which is determined by the extent to which the outcome is expected. Accordingly, the associative explanation of blocking holds that the preceding A+ training renders the outcome to be expected on AX+ trials and that therefore an associative link between cue X and the outcome cannot form (i.e., is “blocked”), which then (one way or another; see later discussion in this chapter) results in the judgment that X is not causally linked to the outcome. Going back to our example: Page 2 of 21

The Inferential Reasoning Theory of Causal Learning: Toward a MultiProcess Propositional Account people are supposed to display blocking, because they fail to acquire an association be­ tween the occurrence of eating coconut and the occurrence of the allergic reaction. The observation of blocking in human causal learning prompted a wave of empirical stud­ ies, attesting to the popularity of this framework (for reviews, see De Houwer & Beckers, 2002; Shanks, 2010). Successes of associative theories in explaining phenomena of causal learning did not remain limited to blocking but extended to a range of phenomena, fur­ ther fueling the view that causal learning might be understood in terms of association for­ mation (for a detailed discussion, see Le Pelley et al., Chapter 2 of this volume). However, for people who have taken part in such a causal learning experiment, the associative ex­ planation is rather counterintuitive: if you take part in such an experiment, you might ex­ perience reasoning about whether eating coconut causally results in an allergic reaction. It is in this context that we must consider the significance of the inferential learning theo­ ry, which we will discuss in the following section.

The Inferential Theory of Causal Learning As an alternative to the associative view, inferential reasoning theory has stated that causal learning involves inferential reasoning, which can be defined as a slow and effort­ ful process that starts from premises and returns a conclusion. In a blocking procedure, for example, participants supposedly infer that the blocked cue is unlikely to be causally related to the outcome, because the relation between the blocked cue and the outcome disappears if one controls for the relation between the blocking cue and the outcome (De Houwer, Beckers, & Vandorpe, 2005; Waldmann, 2000). More formally, the inferential rea­ soning process underlying blocking can be represented as a modus tollens argument (Beckers, De Houwer, Pineño, & Miller, 2005): I. [if p, then q] If A and X are both causes of the outcome, then the outcome should be stronger when these causes are both present than when only one cause is present. (p. 55) II. [not q] The outcome is not stronger when A and X are both present than when A is presented alone. III. [therefore, not p] Thus, A and X cannot both be causes of the outcome. People can infer that cue X is not a cause of the outcome because they experienced that A results in the outcome when presented alone and therefore needs to be a cause. However, the validity of this modus tollens rule depends on a number of constraints. Hence, if blocking results from applying a modus tollens rule, then the blocking effect should vary as a function of those constraints. Manipulating the validity of the constraints therefore allows us to empirically evaluate the inferential reasoning theory in at least two ways. First, the conclusion of the modus tollens argument does not follow if premise I does not hold. In such case, blocking should not be observed. To test this prediction, Beckers et al. (2005) presented to half of their participants pretraining that confirmed premise I. They did this by showing that two cues that resulted in a single outcome when presented indi­ Page 3 of 21

The Inferential Reasoning Theory of Causal Learning: Toward a MultiProcess Propositional Account vidually resulted in a double outcome when presented together (e.g., eating cheese re­ sults in a moderate allergic reaction, eating mushrooms results in a moderate allergic re­ action, and eating cheese and mushrooms together results in a severe allergic reaction). The remaining participants received training that suggested that outcomes would be nonadditive: Two cues that resulted in a single outcome when presented individually still re­ sulted in a single outcome when presented together (e.g., eating cheese results in a mod­ erate allergic reaction, eating mushrooms results in an allergic reaction, and eating both cheese and mushrooms also results in a moderate allergic reaction). Only participants who received the addititive pretraining displayed blocking when subsequently exposed to a blocking contingency involving new food cues. This finding suggests that assumptions about cue additivity (sometimes) control blocking (for related evidence, see Lovibond, Been, Mitchell, Bouton, & Frohardt, 2003; Vandorpe, De Houwer, & Beckers, 2007). The additivity pretraining supposedly urges participants to go from using a noisy-OR causal integration rule to using a linear-sum causal integration rule. The noisy-OR integration rule assumes that causal influences are independent in producing a binary outcome (which should result in weak blocking), whereas the linear-sum causal integration rule as­ sumes that causal influences are additive in producing a continuous outcome (which should result in strong blocking; Lu, Rojas, Beckers, & Yuille, 2016). Second, the modus tollens argument is not applicable if outcomes occur to a maximal ex­ tent on all trials. If cue A causes the outcome to a maximal extent and if cues A and X to­ gether cause the outcome to the same maximal extent, one cannot be sure that X has no additive effect on top of A because of a ceiling effect. Stated differently, participants can­ not verify the veracity of premise II. Therefore, if blocking depends on the modus tollens argument, one should observe weak blocking if outcomes that occur to a maximal extent are used during training. Beckers et al. (2005) tested this prediction by showing all par­ ticipants outcomes of maximal and of submaximal intensity (i.e., allergic reactions of se­ vere and moderate intensity) before presenting them with blocking training. In one condi­ tion, the intensity of the outcome used during blocking training corresponded with the maximal intensity shown during pretraining, whereas in the second condition this intensi­ ty corresponded with the submaximal intensity shown previously. As predicted, blocking was much stronger when the outcome occurred with submaximal strength during block­ ing training than when it occurred with maximal strength. It is of interest that Beckers, Miller, De Houwer, and Urushihara (2006) later demonstrat­ ed that blocking in rats is also modulated by additivity and outcome maximality informa­ tion, which suggests that inferential reasoning is not only involved in human causal learn­ ing but also in animal conditioning. This creates an interesting situation: whereas Dickin­ son et al. (1984) argued that associative models developed on the basis of animal condi­ tioning research can also account for human causal learning, Beckers et al. (2006) suggested that it might be the other way around: inferential reasoning theory, originally developed to explain human causal learning, might also apply to animal conditioning.

Page 4 of 21

The Inferential Reasoning Theory of Causal Learning: Toward a MultiProcess Propositional Account

Reasoning as an Effortful Cognitive Process As stated, inferential reasoning theory assumes that causal learning (e.g., the sensitivity of causal learning to blocking) results from inferential reasoning, defined as the slow and effortful production of propositional conclusions on the basis of propositional premises. We now briefly consider the two core elements in our definition of inferential reasoning. In this section, we discuss reasoning as being an effortful cognitive process; in the next section, (p. 56) we discuss the propositional nature of the input (premises) and output (conclusions) of this cognitive process. Inferential reasoning as a cognitive process is presumed to be effortful. Traditional rea­ soning theories invoke the idea of limited working memory capacity when explaining rea­ soning performance (e.g., Baddeley & Hitch, 1974; Johnson-Laird & Byrne, 1991; Rips, 1994). In line with this idea, research demonstrates that the number of errors in syllogis­ tic reasoning tasks increases when working memory is overloaded (e.g., De Neys, Schaeken, & d’Ydewalle, 2005; Toms, Morris, & Ward, 1993). Therefore, if the occurrence of blocking in causal learning results from inferential reasoning, one would expect block­ ing to be reduced under working memory load, which turns out to be the case (De Houw­ er & Beckers, 2003; Liu & Luhmann, 2013; Waldmann & Walker, 2005). Although this finding suggests that an effortful process is involved in blocking, it does not need to imply that inferential reasoning is involved. More direct evidence for the role of inferential rea­ soning was provided by Vandorpe et al. (2005), who asked participants to explain how they arrived at their causal ratings in a blocking task. Crucially, working memory load modulated the number of participants who were able to verbally report a valid blocking inference (i.e., “X did not add to the effect of A, hence it is not a cause of the outcome”). Also of interest is that the presence of blocking effects in children’s causal learning seems to go hand in hand with the development of working memory and general reason­ ing abilities (McCormack, Simms, McGourty, & Beckers, 2013a; Simms, McCormack, & Beckers, 2012). Moreover, encouraging children to engage in counterfactual reasoning, which is known to enhance inferential reasoning performance, enhances their propensity to show blocking in a causal learning task (McCormack, Simms, McGourty, & Beckers, 2013b). Of note, although we argue that the preceding evidence points to a central role of infer­ ential reasoning in causal learning, we by no means claim that inferential reasoning is the only process involved (see the section “Evaluating the Inferential Reasoning Theory of Causal Learning” in this chapter).

Propositions We now focus on the second element in our definition of inferential reasoning, the propo­ sitional nature of the input it operates on (premises) and the output it generates (conclu­ sions). Indeed, in logic, reasoning involves a set of propositions known as the premises along with another proposition known as the conclusion. When describing propositions, we will, for reasons of clarification, also focus on how they differ from associations. Page 5 of 21

The Inferential Reasoning Theory of Causal Learning: Toward a MultiProcess Propositional Account A proposition is a bearer of truth-value (i.e., it can be true or false; e.g., “smoking causes cancer”) and the object of propositional attitudes (e.g., “I believe that smoking causes cancer”). This is different from an association that cannot be true or false, but just does what it does: transmitting activation (e.g., the representation of smoking activates the representation of cancer). In addition to bearing truth-value, propositions are composi­ tional, meaning that they are composed of parts that can be recombined, just like words in sentences can be recombined to form other sentences (Moors, 2014). For example, the propositions “smoking causes cancer” and “cancer causes smoking” both comprise the el­ ements “smoking,” “causes” and “cancer”; yet, they have distinctly different meanings. Most important for the present purposes is that propositions can contain qualified, rela­ tional information. For example, the propositions “smoking causes cancer” and “yellow fingers predict cancer” both specify the type of relation between cue and outcome. Rela­ tional representations are conceptualized as a set of elements (e.g., smoking and cancer, or elephant and mouse) that are bound together by a relational symbol (e.g., causes or is larger than; Halford, Wilson, & Phillips, 2010). In this regard, propositions again differ from associations: an association is an unqualified link that can at most have a direction, but even then, this direction is determined only by the temporal order in which events are presented during training. Associations can differ on only one variable (i.e., the strength of the association). Variations in the events’ frequency, probability, and so forth can be mapped onto this single variable, but variations in the type of relation (e.g., predictive or causal) between events cannot be coded (Holland, 1993). It is worth noting that because of this mere theoretical reason, it is difficult for associative theories to provide a full ac­ count of causal learning: simply activating the outcome is not sufficient to form a state­ ment about causality specifically (i.e., as different from non-causal relations). In re­ sponse, theorists have suggested that the mind contains both associative and proposition­ al representations. Low-level associations could then underlie higher-level propositions (Gawronski & Bodenhausen, 2014). For example, (p. 57) the representation of smoking ac­ tivating the representation of cancer would give rise to the conscious proposition “smok­ ing causes cancer.” Although this makes sense intuitively, the question remains how the organism can know which type of relation binds the elements: Does smoking predict can­ cer, cause cancer, enable cancer, prevent cancer, or is there still another relation at play (Moors, 2014)? We refer the interested reader to Johnson-Laird and Khemlani (Chapter 10 in this volume) for a more extensive discussion of the representation of causal rela­ tions. With respect to the acquisition of propositions, the inferential theory of causal learning holds that the propositions on which causal inferential reasoning operates may them­ selves originate from inference, as well as from experience, observation, and instruction (Gopnik, Sobel, Schulz, & Glymour, 2001). The underlying idea is that all knowledge is represented in a propositional form, in principle rendering it irrelevant how it is acquired (but see Perales et al., Chapter 3 in this volume). Accordingly, blocking can be modulated not only by experience that contradicts the proposition of causal additivity (Beckers et al., 2005; see preceding discussion) but also by mere instructions that causes do not sum­ mate (e.g., Lovibond et al., 2003; Mitchell & Lovibond, 2002). Likewise, verbal informa­ Page 6 of 21

The Inferential Reasoning Theory of Causal Learning: Toward a MultiProcess Propositional Account tion about the occurrence or absence of the outcome on A alone trials can retrospectively modulate blocking when outcome information was masked during the first phase of a blocking task (De Houwer, 2002; for similar findings, see Boddez, Baeyens, Hermans, Van der Oord, & Beckers, 2013). It is difficult to see how associative models can account for learning by instruction or inference, as these models simply lack a plausible mechanism for this type of learning (for a detailed discussion, see Lovibond, 2003). Perhaps because inferential reasoning normally operates on and produces propositions, the terms “inferential reasoning theory” and “propositional theory” have often been treat­ ed as interchangeable. However, there is a fundamental asymmetry between inferences and propositions: in order to reason, one typically needs propositional premises. There­ fore, an inferential reasoning account of causal learning presupposes propositional repre­ sentations. Propositions, however, can be acquired and can influence behavior in the ab­ sence of inferences (and hence, in the absence of inferential reasoning; De Houwer, 2014a, 2014b; Moors, 2014). Accordingly, whereas evidence against the involvement of propositional representations casts doubt on inferential reasoning models, evidence against the involvement of inferential processes in causal learning does not necessarily invalidate the idea that propositional representations mediate causal learning and does not necessarily require invoking an associative system. In principle, one could even go so far as to develop and defend a propositional theory that does not make any claim about cognitive processes such as inferential reasoning.

From an Inferential Reasoning to a MultiProcess Account of Causal Learning In the previous section, we argued for the role of inferential reasoning in causal learning. However, we do not claim that inferential reasoning is the only process involved in causal learning. In fact, learning principles traditionally associated with association-formation models also might play a role in the formation and retrieval of propositional representa­ tions involved in causal learning. For example, updating due to error-correction, a hall­ mark feature of many associative-learning models, might motivate the formation of propo­ sitions. Let us argue by the example of scientific progress. An experimental observation that falsifies a prediction ideally results in an updated theory that can account for the ob­ servation that falsified the initial theory. So, falsification of theories results in updates of theories, which incrementally leads to increasingly better predictions. Yet, scientific theo­ ries consist of propositions, not associations. Next, we will discuss some cognitive processes—other than inferential reasoning—that may play a role in the formation and retrieval of propositions about causal relations. Our aim is to illustrate that inferential reasoning is embedded in a variety of cognitive processes that affect the formation and retrieval of propositions involved in causal learn­ ing (Boddez, Haesen, Baeyens, & Beckers, 2014). This discussion will further clarify that

Page 7 of 21

The Inferential Reasoning Theory of Causal Learning: Toward a MultiProcess Propositional Account claims about the involvement of propositional representations in causal learning are dif­ ferent from claims about cognitive processes.

Perception There is an extensive empirical literature showing that the visual world is interpreted in terms of causality before slow, non-automatic causal reasoning processes begin to oper­ ate (Hubbard, 2013; Muentener & Bonawitz, Chapter 33 in this volume; White, Chapter 14 in this volume). If participants view a moving stimulus that strikes a stationary stimu­ lus, and that latter stimulus then begins moving, there is a clear and immediate percep­ tion that (p. 58) this movement was caused by the initially moving object. This effect is termed the “launching effect” (e.g., Michotte 1946/1963) and it has been argued that it concerns an automatic impression of causality that does not involve inference (Hubbard, 2013). Although one might dispute whether the launching effect is a learning effect (be­ cause most observers would accurately predict that the one stimulus would launch the other before observing such launching trial, so nothing new would be learned), the phe­ nomenon of causal perception at least illustrates that inferential reasoning is not the only process that plays a role in forming propositions about causal relations. For the sake of completeness, it nonetheless deserves mention that there are theories that assume that certain aspects of perception are produced by processes of automatic, probabilistic Bayesian inference (Knill & Pouget, 2004; Lee & Mumford, 2003).

Attention Attention is another example of a process other than inferential reasoning that may play a role in the formation or retrieval of propositions about causal relations. Attention has long been the focus of theoretical consideration in the associative learning field. Evidence for the involvement of attention in, for example, blocking includes demonstrations that new learning about a blocked cue is slowed down: blocking treatment interferes with subse­ quent learning, even when an outcome different from the one during blocking training is used. This interference effect is presumably due to a decrease in attention paid to the blocked cue, caused by the preceding blocking treatment (e.g., Le Pelley, Beesley, & Suret, 2007; Mackintosh & Turner, 1971). In line with this, studies that used the eyetracking method found that participants spent less time gazing at blocked cues (e.g., Beesley & Le Pelley, 2011; Eippert, Gamer, & Büchel, 2012; Kruschke, Kappenman, & Hetrick, 2005; also see Wills, Lavric, Croft, & Hodgson, 2007). Associative models explain these findings by assuming that the amount of attention determines the amount of asso­ ciative learning. For example, Mackintosh (1975) proposed that a blocked cue is a poorer predictor of the outcome than a blocking cue, and that this leads to lower attention to and a failure to learn about blocked cues. However, attention may impact not only the formation of associations, but also of proposi­ tions involved in causal learning. Whereas blocking could be due to inferential reasoning (i.e., about how propositions are related after they have been formed) under some train­ ing conditions, similar blocking effects could be due to the effect of attentional processes Page 8 of 21

The Inferential Reasoning Theory of Causal Learning: Toward a MultiProcess Propositional Account on the initial formation of propositions. That is, attentional failure might cause the ele­ ments needed to form a proposition to not be represented, so one would simply not be able to form the proposition that X is causally related to the outcome. Note that this inter­ pretation differs substantially from the way in which inferential reasoning theorists have previously explained the effects of attention. Inferential reasoning theory holds that at­ tentional shifts in a blocking procedure do not cause blocking, but rather, that they are a consequence of the organism’s non-automatic reasoning that the blocked cue is redun­ dant. Obviously, those possibilities are not mutually exclusive: selective attention may sometimes cause blocking, and blocking may sometimes give rise to selective attention.

Memory Mitchell, Lovibond, Minard, and Lavis (2006) devised a blocking task with many different foods as the cues and with allergies as the outcomes, such that recall could be tested. In addition to revealing blocking on causal judgments, the results were clear in showing that recall of the outcome related to the blocked cue was poor in comparison with appropriate control cues. Shanks (2010) argues that these results provide positive evidence for asso­ ciative explanations but challenge inferential accounts. The idea is that participants would need to remember that the blocked cue has been paired with the outcome in order to successfully make the modus tollens inference. Indeed, the proposition that A and X have been paired with the outcome is a crucial premise in the modus tollens argument de­ scribed in the previous section of this chapter. However, in contrast to Shanks’s claim, not only associative learning theory, but also theories invoking propositions (e.g., inferential reasoning theory) can account for such findings: when many cues and outcomes are pre­ sented, as in Mitchell et al.’s (2006) study, these may not gain access to memory (e.g., Kastner and Ungerleider, 2000), so upon presentation of the blocked cue during testing, no proposition about its causal efficacy would be retrieved from memory. Until now we focused on memory encoding; let us now turn to the role of memory re­ trieval processes. Boddez, Baeyens, Hermans, and Beckers (2011) investigated the effect of extinguishing a blocking cue on the causal judgment about the blocked cue (i.e., A+ and AX+ training followed by A– training) in a causal learning task. The results (p. 59) in­ dicated that extinguishing A increased causal judgments about X. Crucially, this increase was context dependent: increased judgment about X was limited to the context in which extinction of A took place. This finding can be used to illustrate that the blocking effect depends on memory retrieval of the blocking cue as an effective cause of the outcome, since memory accessibility is known to be context dependent (contexts can either facili­ tate or hamper retrieval of specific memories; Bouton, 2002). More precisely, our inter­ pretation assumes that the proposition “A produces the outcome” is only retrieved in con­ texts that differ from the context in which A is extinguished. Because this propositional premise is required to come to the conclusion that X does not result in the outcome, re­ covery from blocking would be observed in the extinction context, whereas blocking would be observed in contexts that differ from the context in which A is extinguished.

Page 9 of 21

The Inferential Reasoning Theory of Causal Learning: Toward a MultiProcess Propositional Account In summary, we argue that causal learning is not a unitary phenomenon, but that differ­ ent cognitive processes may be at play. We have focused on additional roles of perception, attention, and memory, but future research should obviously focus on the possible role of still other processes (e.g., the role of inhibition; for an extensive discussion, see Boddez et al., 2014).

Evaluating the Inferential Reasoning Theory of Causal Learning We have described the nature of inferential reasoning theory and have clarified that (1) inferential theories should be distinguished from propositional theories, and (2) inferen­ tial reasoning is never the only process that is involved in causal learning. Based on those clarifications, we now turn to the evaluation of five of the most important arguments raised against inferential reasoning theory (e.g., McLaren et al., 2014; Penn and Povinelli, 2007; Shanks, 2010). First, many people have intuitive sympathy for the argument that an associative system might better account for learning of non-verbal material or for trial-by-trial learning. However, several of the findings used as evidence for the inferential reasoning account come from experiments that make use of non-verbal stimulus material that is presented in a trial-by-trial fashion (e.g., the previously discussed studies in which assumptions about outcome additivity and outcome maximality were manipulated; Beckers et al. 2005). So, explaining findings obtained with this type of training procedure does not require invok­ ing associative theories. In fact, even when trained using this type of procedure, people show behavior that goes beyond the scope of associative theories (also see Meder, Hag­ mayer, & Waldmann, 2008). Second, associative learning theorists (e.g., McLaren et al., 2014) have argued that irra­ tional behavior is out of scope for inferential learning theory and that, therefore, an addi­ tional associative system is needed to explain the often irrational behavior of humans. However, adhering to an inferential account does not imply that we should assume people to always behave in a perfectly rational manner. Indeed, inferential processes and propo­ sitional representations can produce irrational learning effects (De Houwer, 2014b; Mitchell, De Houwer, & Lovibond, 2009). People can make errors when forming proposi­ tions (e.g., seeing relations where there are none) or make incorrect inferences on the ba­ sis of justified premises, and act irrationally on the basis of the resulting false beliefs. Still another possible reason for irrational inferences may be that subjects do not know the necessary logical inference rules (Johnson-Laird & Khemlani, Chapter 10 in this volume). The extent to which inferences are logically valid can vary with a number of factors, such as time and effort. For instance, when given sufficient time, it may be evident that a par­ ticular reasoning is valid, whereas another reasoning is logically weak. But when time is limited, the weaker reasoning may nonetheless prevail (Chater, 2009). That is, different lines of argument or different sampling from all potentially relevant pieces of information can result in the formation or behavioral expression of different propositions, thus lead­ Page 10 of 21

The Inferential Reasoning Theory of Causal Learning: Toward a MultiProcess Propositional Account ing to dissociations in behavior. People may also be inconsistent; for example, automatic retrieval of propositional beliefs that were endorsed in the past (e.g., spiders are danger­ ous) might guide behavior at times when they are no longer endorsed. Verbal-autonomic response dissociations, that is, contradicting responses in different response channels (e.g., Perruchet, 1985), may also be explained in that way: a proposition that is verbally evaluated as false (e.g., spiders are not dangerous) may activate a (behavioral or physio­ logical) response before being evaluated as false. In such case, behavior can come to dif­ fer from beliefs that people report, therefore appearing irrational. Such effects are un­ likely to be due to inferences and thus contradict purely inferential theories of causal learning. However, as pointed out earlier, these effects do not contradict propositional theories of causal learning. Third, researchers have argued that an exclusively propositional account, let alone an inferential account, is incompatible with neuroscientific evidence. For example, McLaren et al. argue that there is neuroscientific evidence that associations exist in at least some animals (e.g., Aplysia californica), so that it must be the case that associative learning has evolved and that associations must exist in other animals including humans (p. 60)

(McLaren et al., 2014). However, McLaren and colleagues do not distinguish between the neurological and the psychological level of analysis when they state that observed changes in the strength of synaptic connections in Aplysia provide neuroscientific evi­ dence for the existence of associations. Inferential and propositional accounts are situat­ ed at the psychological level of analysis in that they specify assumptions about the mental processes and representations that mediate learning effects. Multiple candidate psycho­ logical representation and processes are compatible with any finite set of neuroscientific data: there is no way of knowing whether the changes in the strength of synaptic connec­ tions correspond with associations or not. Nonetheless, knowledge about the neural level does constrain theories at the mental level and, in contrast with McLaren’s claim, there is actually strong evidence that the properties of the changes in synaptic transmission align poorly with the properties of associative learning as revealed by behavioral experimenta­ tion (for a review, see Gallistel and Matzel, 2013). Fourth, Penn and Povinelli (2007) argued that it is problematic that inferential reasoning theory so far lacks a formal, computational specification, because it renders it impossible to refute these accounts. A couple of points can be made here. Most important, a model like Rescorla–Wagner is a rule that mathematically predicts behavioral output from envi­ ronmental input. The same holds for Bayesian models (Kruschke, 2008). However, at a psychological level that rule can in principle be described as the operation of associations (i.e., Va is the strength of the association) as well as as the operation of propositions (e.g., Va is the extent to which the cue predicts or causes the US). So, in principle, existing for­ mal computational models could be conceptualized equally well in terms of associations and propositions (Shanks, 2007). In fact, a Bayesian account for some of the inferential reasoning predictions concerning blocking has recently been proposed (Lu et al., 2016).

Page 11 of 21

The Inferential Reasoning Theory of Causal Learning: Toward a MultiProcess Propositional Account Moreover, formalization and the accompanying degree of precision would indeed allow one to easily refute computational models if predictions turn out to be incorrect. Howev­ er, this is not what seems to happen in scientific practice: many associative models, in particular the Rescorla–Wagner model, remain highly influential despite a history of falsi­ fication (Le Pelley at al., Chapter 2 in this volume; Miller, Barnet, & Grahame, 1995). Like­ wise, showing that causal judgments do not conform to the predictions of a specific for­ malized propositional theory would probably not make the research community refute the concept of inferential reasoning or propositional representations altogether. There are so many ways to formalize psychological concepts, such as inferential reasoning and associ­ ation formation, that the refutation of one of those formalizations will have little or no im­ pact on more abstract psychological theorizing. It is also worth noting that the proliferation of computational models developed in the as­ sociative tradition has actually made it difficult to make precise predictions. Suppose one presents two cues with an outcome (e.g., coconut oil and paprika followed by an allergic reaction) and subsequently presents one of those cues without the outcome (i.e., extinc­ tion training; e.g., paprika without an allergic reaction) before testing the other cue for its relation with the outcome (i.e., the cue of interest; e.g., coconut oil). Computational models exist that predict that the extinction training will strengthen (Dickinson & Burke, 1996; Van Hamme & Wasserman, 1994), weaken (Holland, 1983, 1990; also see Dwyer, Mackintosh, & Boakes, 1998), or not affect (Rescorla & Wagner, 1972; Wagner, 1981) the association between the cue of interest and the outcome. More generally, the computa­ tional models developed in the associative tradition actually form a family of models in which divergent predictions are often made. Although having diverging predictions is not a problem if there is an empirical way of distinguishing between the candidate models, this realization still puts associationists’ claims of formal specification in perspective: if an effect does not follow a specific model, it is typically not seen as a falsification of the associative account in general (e.g., Le Pelley at al., Chapter 2 in this volume). Importantly, however, formalization of inferential and propositional theories would still be useful because it is bound to further increase their predictive and heuristic function. An important future challenge for inferential and propositional accounts is to identify which environmental conditions favor the formation of which propositions and to identify how sets of simultaneously (and potentially (p. 61) contradictory) retrieved propositions affect the different response channels. Although post hoc propositional accounts of dissocia­ tions can already be tested empirically, current accounts are too vague to allow for strong a priori predictions about when behavioral dissociations will occur. At this point, it is probably also worth noting that explaining response generation in general is challenging to propositional learning theory (the same holds for associative theories, though): re­ sponses are assumed to be the behavioral expression of propositions entertained by the subject, but how this translation is done is poorly understood (Baeyens, Vansteenwegen, & Hermans, 2009; Mitchell et al., 2009).

Page 12 of 21

The Inferential Reasoning Theory of Causal Learning: Toward a MultiProcess Propositional Account Fifth, contrary to popular belief (e.g., McLaren et al., 2014), a convincing demonstration of automatic learning effects will not lead to a rejection of the inferential reasoning or propositional theory of causal learning: As argued extensively by Mitchell et al. (2009), automatic retrieval of propositions is entirely possible. For example, if subjects in a block­ ing procedure are able to form a proposition about the blocked cue before testing, they should be able to retrieve that acquired knowledge automatically. Interestingly, Morís, Co­ bos, Luque, and López (2014) recently demonstrated blocking using a priming test (a test that depends on automatic retrieval processes), providing some evidence for this possibil­ ity (it should be noted that the interpretation of the authors differs from ours). They used a standard learning task. However, instead of asking for verbal judgments at test, they used a priming task for testing. Results showed that a cue that underwent blocking train­ ing facilitated the recognition of the outcome to a lesser extent than a control cue and its outcome. This reveals that blocking can be detected through measures that are based on automatic retrieval processes. With respect to the learning itself, association formation is often said to occur automati­ cally, and formation of proposition is often said to occur non-automatically. It is, however, questionable whether mapping these dichotomies onto each other is justified (Moors, 2014). One could, for example, readily build and program a (presumably) automatic and unaware robot that can work with propositional representations (also see Shanks, 2007). Interestingly, a recent study confirms that awareness is not a prerequisite for the forma­ tion of propositions: in a set of experiments, it was demonstrated that predictive relations can be formed even when awareness of the relation is actively prevented (Custers & Aarts, 2011). As discussed earlier, the launching effect could also be regarded as a proofof-principle that propositions about causal relations can be formed automatically (Hub­ bard, 2013). One important exception where the formation of a proposition must be nonautomatic is the case in which a proposition results from a non-automatic inferential rea­ soning process. For example, if blocking is achieved through inferential reasoning with the modus tollens rule, then the proposition “cue X does not cause the outcome” will have been formed non-automatically. Under such circumstances, then, blocking should be re­ duced when effortful processing is prevented (e.g., under working memory load; see ear­ lier discussion). In summary, we argue against the use of stimulus material, stimulus presentation mode, rationality, isomorphisms between psychological and neural mechanisms, formal specifi­ cation, and automaticity as criteria for deciding whether causal learning depends on in­ ferential processes and propositional representations. How to proceed then? Even in their present, non-formalized state, inferential and propositional theories do allow for interest­ ing novel predictions. For instance, propositional theories allow for the impact of relation­ al information. As said, propositions, but not associations, encode information about how events are related (see Lagnado, Waldmann, Hagmayer, & Sloman, 2007). Accordingly, fu­ ture efforts in distinguishing between propositional and non-propositional (e.g., associa­ tive) accounts should focus on whether learned behavior is always moderated by relation­ al information (see Zanon, De Houwer, Gast, & Smith, 2014, for a recent example). So, we believe that there is merit in proposing an inferential and propositional theory of causal Page 13 of 21

The Inferential Reasoning Theory of Causal Learning: Toward a MultiProcess Propositional Account learning, because of its ability to provide an interpretation for previous findings (i.e., heuristic value) and because of its potential to generate new predictions that can lead to new, empirical knowledge (i.e., predictive value).

Summary, Further Thoughts, and Conclusions The present chapter is built on two pillars. First, we situated the inferential theory of causal learning within the context of a propositional account of behavior. It is difficult for associative theories to provide a full account of causal learning, because simply activating the outcome is not sufficient to form a statement about causality. Such statement re­ quires representation of the type of relation that exists between cue and outcome (e.g., causes, enables, etc.), which necessitates a propositional (p. 62) representation. Second, we argued that inferential reasoning does not suffice if one wants a cognitive account of causal learning: a variety of processes will need to be considered when writing the final story of causal learning. Tension between higher-order and associative views of learning has existed for over a century (Shanks, 2010). Regrettably, this debate sometimes seems to come down to a matter of personal preference—are we more impressed by the finding that causal learn­ ing often seems to follow associative principles of the sort formalized in the Rescorla– Wagner model, or by the fact that this form of learning shows properties that lie outside the scope of models of this sort (see Hall, 2009, p. 210)? Arguably, clearer understandings and concepts are critical to break the stasis in this debate and to “prevent the field from wasting time chasing after ever-more-nuanced predictions in an attempt to differentiate ever-more-similar theories” (Liu and Luhmann, 2015, p. 10). Contrasting associations with propositions might be more revealing than contrasting associative models with infer­ ential reasoning models, because the latter debate mixes up aspects of representations and processes (Moors, 2014). Distinctions in terms of representational content (e.g., propositions or associations) do not necessarily map onto distinctions in terms of process­ es that operate on these representations. Following this line of reasoning, we argued that learning principles and processes typically invoked by associative theories (e.g., predic­ tion error and attention) might very well facilitate the formation or retrieval of proposi­ tional representation. In principle, one could even go on to develop a propositional theory that remains completely silent with respect to the cognitive processes that operate on said propositions. The central tenet of such theory would be simply that behavior is medi­ ated by propositional representations, that is, by representational units that contain rela­ tional information. Interestingly, such theory would make fewer assumptions than the in­ ferential learning theory, because it would not make any assumptions about the nature of the cognitive processes involved. Most learning researchers strongly resist an account that explains all learning effects through inferential reasoning (McLaren et al., 2014; Penn and Povinelli, 2007; Shanks, 2010). For example, McLaren et al. (2014) argued that associations are still needed to provide a full account of human associative learning. More precisely, they claimed that Page 14 of 21

The Inferential Reasoning Theory of Causal Learning: Toward a MultiProcess Propositional Account “no one disputes that we solve problems by testing hypotheses and inducing underlying rules, so the issue amounts to deciding whether there is evidence that we (and other ani­ mals) also rely on a simpler, associative system, that detects the frequency of occurrence of different events in our environment and the contingencies between them” (p. 185). Thus, the alternative to the propositional approach advocated in this chapter is the dualsystem approach: behavior is determined by both propositional and associative represen­ tations. However, we made clear that the breach between propositional versus associa­ tive representations does not parallel the breach between rational versus irrational be­ havior, between non-automatic versus automatic behavior, or between different kinds of stimulus material (e.g., verbal versus non-verbal stimuli, tabulated data versus trial-by-tri­ al data). In summary, many arguments for a dual-process account disappear if one consid­ ers an inferential reasoning theory of causal learning tasks in tandem with a proposition­ al theory of learning and if one considers inferential reasoning as one in a series of cogni­ tive processes that contribute to causal learning.

References Baddeley, A. D., & Hitch, G. J. (1974). Working memory. In G. Bower (Ed.), Recent ad­ vances in learning and motivation (pp. 47–90). New York: Academic Press. Baeyens, F., Vansteenwegen, D., & Hermans, D. (2009). Associative learning requires as­ sociations, not propositions. Behavioral and Brain Sciences, 32, 198–199. Beckers, T., De Houwer, J., Pineño, O., & Miller, R. R. (2005). Outcome additivity and out­ come maximality influence cue competition in human causal learning. Journal of Experi­ mental Psychology: Learning, Memory, and Cognition, 31, 238. Beckers, T., Miller, R. R., De Houwer, J., & Urushihara, K. (2006). Reasoning rats: For­ ward blocking in Pavlovian animal conditioning is sensitive to constraints of causal infer­ ence. Journal of Experimental Psychology: General, 135, 92–102. Beesley, T., & Le Pelley, M. E. (2011). The influence of blocking on overt attention and as­ sociability in human learning. Journal of Experimental Psychology: Animal Behavior Processes, 37, 114–120. Boddez, Y., Baeyens, F., Hermans, D., & Beckers, T. (2011). The hide-and-seek of retro­ spective revaluation: Recovery from blocking is context dependent in human causal learn­ ing. Journal of Experimental Psychology: Animal Behavior Processes, 37, 230. Boddez, Y., Baeyens, F., Hermans, D., Van der Oord, S., & Beckers, T. (2013). Increasing the selectivity of threat through post-training instructions: Identifying one stimulus as source of danger reduces the threat value of surrounding stimuli. Journal of Experimental Psychopathology, 4, 315–324. Boddez, Y., Haesen, K., Baeyens, F., & Beckers, T. (2014). Selectivity in associative learn­ ing: A cognitive stage framework for blocking and cue competition phenomena. Frontiers in Psychology, 5, 1305. Page 15 of 21

The Inferential Reasoning Theory of Causal Learning: Toward a MultiProcess Propositional Account Bouton, M. E. (2002). Context, ambiguity, and unlearning: Sources of relapse after behavioral extinction. Biological Psychiatry, 52, 976–986. (p. 63)

Chater, N. (2009). Rational models of conditioning. Behavioral and Brain Sciences, 32, 198–199. Custers, R., & Aarts, H. (2011). Learning of predictive relations between events depends on attention, not on awareness. Consciousness and Cognition, 20, 368–378. De Houwer, J. (2002). Forward blocking depends on retrospective inferences about the presence of the blocked cue during the elemental phase. Memory & Cognition, 30, 24–33. De Houwer, J. (2014a). A propositional model of implicit evaluation. Social and Personality Psychology Compass, 8, 342–353. De Houwer, J. (2014b). Why a propositional single-process model of associative learning deserves to be defended. In J. W. Sherman, B. Gawronski, & Y. Trope (Eds.), Dual process­ es theories of the social mind (pp. 530–541). New York: Guilford. De Houwer, J., & Beckers, T. (2002). A review of recent developments in research and the­ ories on human contingency learning. The Quarterly Journal of Experimental Psychology, 55B, 289–310. De Houwer, J., Beckers, T., & Vandorpe, S. (2005). Evidence for the role of higher order reasoning processes in cue competition and other learning phenomena. Learning & Be­ havior, 33, 239–249. De Neys, W., Schaeken, W., & d’Ydewalle, G. (2005). Working memory and everyday con­ ditional reasoning: Retrieval and inhibition of stored counterexamples. Thinking & Rea­ soning, 11, 349–381. Dickinson, A., & Burke, J. (1996). Within-compound associations mediate the retrospec­ tive revaluation of causality judgments. Quarterly Journal of Experimental Psychology, 49B, 60–80. Dickinson, A., Shanks, D., & Evenden, J. (1984). Judgement of act-outcome contingency: The role of selective attribution. The Quarterly Journal of Experimental Psychology, 36, 29–50. Dwyer, D. M., Mackintosh, N. J., & Boakes, R. A. (1998). Simultaneous activation of the representations of absent cues results in the formation of an excitatory association be­ tween them. Journal of Experimental Psychology: Animal Behavior Processes, 24, 163– 171. Eippert, F., Gamer, M., & Büchel, C. (2012). Neurobiological mechanisms underlying the blocking effect in aversive learning. The Journal of Neuroscience, 32, 13164–13176.

Page 16 of 21

The Inferential Reasoning Theory of Causal Learning: Toward a MultiProcess Propositional Account Gallistel, C. R., & Matzel, L. D. (2013). The neuroscience of learning: Beyond the Hebbian synapse. Annual Review of Psychology, 64, 169–200. Gawronski, B., & Bodenhausen, G. V. (2014). Implicit and explicit evaluation: A brief re­ view of the Associative-Propositional Evaluation Model. Social and Personality Psychology Compass, 8, 448–462. Gopnik, A., Sobel, D. M., Schulz, L. E., & Glymour, C. (2001). Causal learning mechanisms in very young children: Two-, three-, and four-year-olds infer causal relations from pat­ terns of variation and covariation. Developmental Psychology, 37, 620–629. Halford, G. S., Wilson, W. H., & Phillips, S. (2010). Relational knowledge: The foundation of higher cognition. Trends in Cognitive Sciences, 14, 497–505. Hall, G. (2009). Learning in simple systems. Behavioral and Brain Sciences, 32, 210–211. Hubbard, T. L. (2013). Phenomenal causality I: Varieties and variables. Axiomathes, 23, 1– 42. Holland, P. C. (1983). Representation mediated overshadowing and potentiation of condi­ tioned aversions. Journal of Experimental Psychology: Animal Behavior Processes, 9, 1– 13. Holland, P. C. (1990). Event representation in Pavlovian conditioning: Image and action. Cognition, 37, 105–131. Holland, P. C. (1993). Cognitive aspects of classical conditioning. Current Opinion in Neu­ robiology, 3, 230–236. Johnson-Laird, P. N., & Byrne, R. M. J. (1991). Deduction. Hove: Lawrence Erlbaum Asso­ ciates. Kamin, L. J. (1967). Predictability, surprise, attention, and conditioning. Unpublished tech­ nical report No. 13, Department of Psychology, McMaster University. Kastner, S., & Ungerleider, L. G. (2000). Mechanisms of visual attention in the human cor­ tex. Annual Review of Neuroscience, 23, 315–341. Knill, D. C., & Pouget, A. (2004). The Bayesian brain: the role of uncertainty in neural cod­ ing and computation. Trends in Neurosciences, 27, 712–719. Kruschke, J. K. (2008). Bayesian approaches to associative learning: From passive to ac­ tive learning. Learning & Behavior, 36, 210–226. Kruschke, J. K., Kappenman, E. S., & Hetrick, W. P. (2005). Eye gaze and individual differ­ ences consistent with learned attention in associative blocking and highlighting. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 830–845.

Page 17 of 21

The Inferential Reasoning Theory of Causal Learning: Toward a MultiProcess Propositional Account Lagnado, D. A., Waldmann, M. R., Hagmayer, Y., & Sloman, S. A. (2007). Beyond covaria­ tion: Cues to causal structure. In A. Gopnik & L. Schulz (Eds.), Causal learning: Psycholo­ gy, philosophy, and computation (pp. 154–172). Oxford: Oxford University Press. Le Pelley, M. E., Beesley, T., & Suret, M. B. (2007). Blocking of human causal learning in­ volves learned changes in stimulus processing. The Quarterly Journal of Experimental Psychology, 60, 1468–1476. Lee, T. S., & Mumford, D. (2003). Hierarchical Bayesian inference in the visual cortex. JOSA A, 20, 1434–1448. Liu, P. P., & Luhmann, C. C. (2013). Evidence that a transient but cognitively demanding process underlies forward blocking. The Quarterly Journal of Experimental Psychology, 66, 744–766. Liu, P. P., & Luhmann, C. C. (2015). Evidence for online processing during causal learn­ ing. Learning & Behavior, 43, 1–11. Lovibond, P. F. (2003). Causal beliefs and conditioned responses: Retrospective revalua­ tion induced by experience and by instruction. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 97–106. Lovibond, P. F., Been, S.-L., Mitchell, C. J., Bouton, M. E., & Frohardt, R. (2003). Forward and backward blocking of causal judgement is enhanced by additivity of effect magni­ tude. Memory & Cognition, 31, 133–142. Lu, H., Rojas, R. R., Beckers, T., & Yuille, A. (2016). A Bayesian theory of sequential causal learning and abstract transfer. Cognitive Science, 40, 404–439. Mackintosh, N. J. (1975). A theory of attention: Variations in the associability of stimuli with reinforcement. Psychological Review, 82, 276. Mackintosh, N. J., & Turner, C. (1971). Blocking as a function of novelty of CS and pre­ dictability of UCS. The Quarterly Journal of Experimental Psychology, 23, 359–366. McCormack, T., Simms, V., McGourty, J., & Beckers, T. (2013a). Blocking in children’s causal learning depends (p. 64) on working memory and reasoning abilities. Journal of Ex­ perimental Child Psychology, 115, 562–569. McCormack, T., Simms, V., McGourty, J., & Beckers, T. (2013b). Encouraging children to think counterfactually enhances blocking in a causal learning task. Quarterly Journal of Experimental Psychology, 66, 1910–1926. McLaren, I. P. L., Forrest, C. L. D., McLaren, R. P., Jones, F. W., Aitken, M. R. F., & Mackin­ tosh, N. J. (2014). Associations and propositions: The case for a dual-process account of learning in humans. Neurobiology of Learning and Memory, 108, 185–195.

Page 18 of 21

The Inferential Reasoning Theory of Causal Learning: Toward a MultiProcess Propositional Account Meder, B., Hagmayer, Y., & Waldmann, M. R. (2008). Inferring interventional predictions from observational learning data. Psychonomic Bulletin & Review, 15, 75–80. Michotte, A. (1946/1963). The perception of causality. New York: Basic Books. Miller, R. R., Barnet, R. C., & Grahame, N. J. (1995). Assessment of the Rescorla-Wagner model. Psychological Bulletin, 117, 363. Mitchell, C. J., De Houwer, J., & Lovibond, P. F. (2009). The propositional nature of human associative learning. Behavioral and Brain Sciences, 32, 183–198. Mitchell, C. J., & Lovibond, P. F. (2002). Backward and forward blocking in human auto­ nomic conditioning requires an assumption of outcome additivity. Quarterly Journal of Ex­ perimental Psychology, 55B, 311–329. Mitchell, C. J., Lovibond, P. F., Minard, E., & Lavis, Y. (2006). Forward blocking in human learning sometimes reflects the failure to encode a cue–outcome relationship. The Quar­ terly Journal of Experimental Psychology, 59, 830–844. Moors, A. (2014). Examining the mapping problem in dual process models. In J. Sherman, B. Gawronski & Y. Trope (Eds.), Dual process theories of the social mind (pp. 20–34). New York: Guilford Press. Morís, J., Cobos, P. L., Luque, D., & López, F. J. (2014). Associative repetition priming as a measure of human contingency learning: Evidence of forward and backward blocking. Journal of Experimental Psychology: General, 143, 77. Pearce, J. M., & Hall, G. (1980). A model for Pavlovian learning: variations in the effective­ ness of conditioned but not of unconditioned stimuli. Psychological Review, 87, 532–552. Penn, D. C., & Povinelli, D. J. (2007). Causal cognition in human and nonhuman animals: A comparative, critical review. Annual Review of Psychology, 58, 97–118. Perruchet, P. (1985). A pitfall for the expectancy theory of human eyelid conditioning. Pavlovian Journal of Biological Science, 20, 163–170. Rescorla, R. A., and Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black and W. F. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 64–99). New York: Ap­ pleton-Century-Crofts. Rips, L. J. (1994). The psychology of proof: Deductive reasoning in human thinking. Cam­ bridge, MA: MIT Press. Shanks, D. R. (2007). Associationism and cognition: Human contingency learning at 25. The Quarterly Journal of Experimental Psychology, 60, 291–309. Shanks, D. R. (2010). Learning: From association to cognition. Annual Review of Psycholo­ gy, 61, 273–301. Page 19 of 21

The Inferential Reasoning Theory of Causal Learning: Toward a MultiProcess Propositional Account Simms, V., McCormack, T., & Beckers, T. (2012). Additivity pretraining and cue competi­ tion effects: Developmental evidence for a reasoning-based account of causal learning. Journal of Experimental Psychology: Animal Behavior Processes, 38, 180–190. Toms, M., Morris, N., & Ward, D. (1993). Working memory and conditional reasoning. The Quarterly Journal of Experimental Psychology, 46, 679–699. Van Hamme, L. J., & Wasserman, E. A. (1994). Cue competition in causality judgments: The role of nonpresentation of compound stimulus elements. Learning and Motivation, 25, 127–151. Vandorpe, S., De Houwer, J., & Beckers, T. (2005). Further evidence for the role of infer­ ential reasoning in forward blocking. Memory & Cognition, 33, 1047–1056. Vandorpe, S., De Houwer, J., & Beckers, T. (2007). Outcome maximality and additivity training also influence cue competition in causal learning when learning involves many cues and events. The Quarterly Journal of Experimental Psychology, 60, 356–368. Wagner, A. R. (1981). SOP: A model of automatic memory processing in animal behavior. In N. E. Spear & R. R. Miller (Eds.), Information processing in animals: Memory mecha­ nisms (pp. 5–47). Hillsdale, NJ: Lawrence Erlbaum Associates. Waldmann, M. R. (2000). Competition among causes but not effects in predictive and di­ agnostic learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 53–76. Waldmann, M. R., & Walker, J. M. (2005). Competence and performance in causal learn­ ing. Learning & Behavior, 33, 211–229. Wills, A. J., Lavric, A., Croft, G. S., & Hodgson, T. L. (2007). Predictive learning, prediction errors, and attention: Evidence from event-related potentials and eye tracking. Journal of Cognitive Neuroscience, 19, 843–854. Zanon, R., De Houwer, J., & Gast, A., & Smith, C. (2014). When does relational informa­ tion influence evaluative conditioning? Quarterly Journal of Experimental Psychology, 67, 2105–2122.

Yannick Boddez

Centre for the Psychology of Learning and Experimental Psychopathology KU Leuven Leuven, Belgium Jan De Houwer

Department of Experimental, Clinical, and Health Psychology Ghent University Ghent, Belgium Tom Beckers

Page 20 of 21

The Inferential Reasoning Theory of Causal Learning: Toward a MultiProcess Propositional Account Department of Psychology KU Leuven Leuven, Belgium

Page 21 of 21

Causal Invariance as an Essential Constraint for Creating a Causal Repre­ sentation of the World: Generalizing the Invariance of Causal Power

Causal Invariance as an Essential Constraint for Creat­ ing a Causal Representation of the World: Generalizing the Invariance of Causal Power   Patricia W. Cheng and Hongjing Lu The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.9

Abstract and Keywords This chapter illustrates the representational nature of causal understanding of the world and examines its implications for causal learning. The vastness of the search space of causal relations, given the representational aspect of the problem, implies that powerful constraints are essential for arriving at adaptive causal relations. The chapter reviews (1) why causal invariance—the sameness of how a causal mechanism operates across con­ texts—is an essential constraint for causal learning in intuitive reasoning, (2) a psycholog­ ical causal-learning theory that assumes causal invariance as a defeasible default, (3) some ways in which the computational role of causal invariance in causal learning can be­ come obscured, and (4) the roles of causal invariance as a general aspiration, a default assumption, a criterion for hypothesis revision, and a domain-specific description. The chapter also reviews a puzzling discrepancy in the human and non-human causal and as­ sociative learning literatures and offers a potential explanation. Keywords: causal invariance, psychology, learning, representation, constraints

Perceived Causality and the Postulate of Rela­ tivity Consider the perception of causality in each of three horizontal motion episodes involving the collision of two balls, of the type studied by Michotte (1946/1963; see White, Chapter 14 in this volume; see Figure 5.1 or the corresponding animated episodes at http:// reasoninglab.psych.ucla.edu/PerceivedCausalityAndRelativity.html). Assume an idealized world in which there is no friction and no background scene to convey the position of the balls relative to the background. Episode 1 (top of Figure 5.1): A light-colored ball is sta­ tionary (at the center). A dark ball appears on the left, moves toward the light ball with constant velocity v and collides with it. The dark ball stops and the light ball moves away with velocity v. Episode 2: Now, the dark ball instead is stationary. The light ball appears Page 1 of 33

Causal Invariance as an Essential Constraint for Creating a Causal Repre­ sentation of the World: Generalizing the Invariance of Causal Power on the right, moves toward the dark ball with velocity –v and collides with it. (The nega­ tive sign indicates movement from right to left.) The light ball stops as the dark ball moves away with velocity –v. According to Michotte’s findings, virtually everyone per­ ceives that in Episode 1 the dark ball “causes” the light ball to move, “launches” it, or “transfers its momentum” to the light ball. The reverse holds in Episode 2: here the light ball “causes” the dark ball to move. Now, Episode 3: The dark ball and the light ball si­ multaneously enter from the left and from the right, respectively, at half the speed as in the previous episodes. They collide, and each moves away in opposite directions at the same speed after their collision as before. This time, the perception is that each ball caus­ es the other to rebound after (p. 66) their collision. If the balls were real objects rather than cartoons, the preceding perceptions of causality would hold just the same.

Figure 5.1 Views of the same collision event from three inertial reference frames.

Simple collision events such as these give a compelling perception of causality. They are often regarded as a paragon of situations in which causality is directly perceived, rather than inferred from repeated associations (also see Scholl & Tremoulet, 2000). Although we perceive the three collision episodes as involving different configurations of causal roles, these episodes are identical in Newtonian mechanics. Since Copernicus’s (1543/1992) heliocentric theory of the solar system, astronomers have come to realize that the Earth is not an absolute reference point for motion. This realization was followed by the formulation of the concept of an inertial frame of reference, a concept that rests on the relativity of motion (Newton, 1687/1713/1726/1999). An inertial reference frame is sometimes defined as a system of coordinates that moves at a constant velocity. A feature of Newtonian mechanics is that it is invariant across inertial reference frames. Einstein further generalized the invariance to apply to all physical laws. His postulate of relativity states, “If a system of coordinates K is chosen so that, in relation to it, physical laws hold good in their simplest form, the same laws hold good in relation to any other system of co­ ordinates K′ moving in uniform translation relative to K” (Einstein, 1916, section 1). Un­ der this postulate, different inertial frames are equally good viewpoints.

Page 2 of 33

Causal Invariance as an Essential Constraint for Creating a Causal Repre­ sentation of the World: Generalizing the Invariance of Causal Power The three episodes are views of an identical physical event from three inertial reference frames. To see that, imagine watching the top episode from two clouds (inertial frames) moving respectively with constant velocity v and v/2. Watching the top episode from these two “clouds” transforms that episode respectively into the middle and bottom episode. If the three inertial frames give good valid depictions of the same event, the balls’ causal roles cannot possibly differ across episodes. Our three episodes illustrate that, counterintuitively, even in this compelling case of col­ liding balls, our perception of causation is not a direct reflection of nature. Nature does not come defined by variables or concepts. The concept of an inertial reference frame, for example, is not something nature has to know in order to operate “correctly” in an invari­ ant manner. The concept is a human construct. Perceived or conceived causation is a mat­ ter of how we choose to represent reality, in everyday thinking and in science. Whereas our perception chooses variables and concepts that describe these episodes as different events involving different causal roles, Newton’s laws of motion choose concepts that treat the three episodes as equivalent. The latter choice yields greater causal invariance —Newton’s laws operate over a greater range of contexts, covering both terrestrial and celestial motion. Our illustration implies that the reasoner’s goal cannot be to “accurately” represent reali­ ty, to find “the truth.” We and our colleagues (Carroll & Cheng, under review; Cheng, Lil­ jeholm & Sandhofer, 2013) distinguish between classical realism and what Hawking and Mlodinow (2010) termed model-dependent realism. Under classical realism, there is a predefined reality for the reasoner to accurately capture. Finding “hidden truths” is like finding pre-packaged treasures that reality has placed in a treasure hunt. Under modeldependent realism, however, our only access to reality is through our representations. The physicist Niels Bohr succinctly notes the distinction between classical and model-de­ pendent realism: “It is wrong to think that the task of physics is to find out how nature is. Physics concerns what we say about nature” (Pais, 2000, p. 24). Likewise, intuitive causal reasoning concerns what we “say” about everyday reality. The physicist David Mermin (2005, p. 179) draws a similar distinction: “At the solid, unshakable core [of Einstein’s theories of relativity] … is Einstein’s great 1905 discovery that the simultaneity of two events that happen in different places is … but a way of talking about them, appropriate to a particular frame of reference, and inappropriate to frames of reference moving with respect to that particular frame along the line joining the events.” He continues, “An im­ portant lesson of relativity is that there is less that is intrinsic in things than we once be­ lieved. Much of what we used to think was inherent in (p. 67) phenomena turns out to be merely a manifestation of how we choose to talk about them” (p. 186). Even the funda­ mental concepts of space and time are our representations. In this view, representing re­ ality is a matter of weaving the most beautiful story—one that is logically consistent, par­ simonious, and best predicts the desired outcomes over the widest range of contexts. Model-dependent realism’s thesis that an intelligent agent’s sole access to reality is through its representations (Kant, 1781) may be obvious once explicitly discussed, but it is sometimes forgotten, and often ignored for practical purposes. The Causal Bayes nets Page 3 of 33

Causal Invariance as an Essential Constraint for Creating a Causal Repre­ sentation of the World: Generalizing the Invariance of Causal Power approach (Pearl, 2000; Rehder, Chapters 20 and 21 in this volume; Rottman, Chapter 6 in this volume; Spirtes et al., 1993/2000), for example, begins causal inference with vari­ ables supplied by the human users. But model-dependent realism is inherent to the prob­ lem of causal learning that humans and other animals face. If reality does not come with predefined variables and concepts, the search space of causal relations is massively in­ creased. A new hierarchical layer consisting of the space of the possible formulations of variables and concepts is added. Classical realism’s search space of predefined ideas is multiplied by a combinatorial expression of the possible formulations in the presumably infinitely large search space in this new layer. In this vast search space, constraints are clearly essential. What constraints guide an intelligent agent in its representation of reality to arrive at an adaptive solution? Three constraints have often been noted by philosophers and scien­ tists: logical consistency, parsimony, and causal invariance (Einstein, 1916; Newton, 1687/1713/1726/1999; Woodward, 2000, 2003). Logical consistency is the first and fore­ most constraint, so basic it requires no mention. Einstein’s postulate of relativity quoted earlier explicitly incorporates parsimony and invariance. In his conception, the parsimony of physical laws is a precondition that defines a system of coordinates in which the invari­ ance of physical laws obtains. Likewise, Newton writes, “nature is always simple and ever consonant with itself” (Principia, 1687/1713/1726/1999, p. 795), which may be interpret­ ed to involve all three constraints. Whether or not nature is simple, logically consistent, and causally invariant, the reasoner assumes that it is. We believe that these constraints, for the same reasons, are essential in intuitive causal inference, and must have shaped its evolution. In the present chapter, we review arguments and empirical evidence from related papers relevant to understanding the concept of causal invariance and its implications for causal learning. Causal invariance is the sameness of how a cause operates on an outcome across contexts in which different other causes of the outcome occur. We first examine the essential role of causal invariance in causal learning, particularly in the construction of a representation of the world that enables a reasoner to achieve desired outcomes (Cheng, Liljeholm, & Sandhofer, 2013; Liljeholm & Cheng, 2007). Causal invariance is the most interesting of the four assumptions in the causal power theory of the probabilistic contrast model (Cheng, 1997; abbreviated as the causal power theory, or the power PC theory) and its extensions (Lu, Yuille, Liljeholm, Cheng & Holyoak, 2008; Novick & Cheng, 2004), which we briefly review here (for a more detailed review of the theory and its empirical support, see Cheng & Buehner, 2012). We also review six issues that may ob­ scure the role of causal invariance in causal learning (Carroll & Cheng, under review). We then review an explanation of what is causal about causal invariance using binary causes and effects as an example (Cheng, Liljeholm, & Sandhofer, 2013). Finally, we review re­ cent evidence showing that untutored reasoners are sensitive to two mathematical mani­ festations of causal invariance depending on the type of outcome variable: the noisy-logi­ cal functions for binary outcome variables, and the additive function for continuous out­ come variables (Lu, Rojas, Beckers & Yuille, 2015). Page 4 of 33

Causal Invariance as an Essential Constraint for Creating a Causal Repre­ sentation of the World: Generalizing the Invariance of Causal Power

Why Causal Invariance Is Essential to an Adap­ tive Causal Representation Whenever we humans or other animals apply prior causal knowledge to achieve a desired outcome, we tacitly assume that the acquired knowledge will generalize to the applica­ tion context (Hume, 1748). Causal contexts, however, are like rivers: we never step into the same causal context twice. We cannot know what unknown causes may lurk in the ap­ plication context. Specifically, we cannot know whether unobserved causes operating in the background are the same across the learning and application contexts (Liljeholm & Cheng, 2007). To predict the occurrence of the outcome in the application context there­ fore requires an assumption about how the cause in question combines its influence with the influences of the unknown background causes in the new context. Given that there is no relevant prior knowledge about unobserved causes of unknown types, a default as­ sumption (held until there is contradicting evidence) would (p. 68) be that the target cause operates the same way across the learning and application contexts. This assump­ tion may or may not hold, but there is no reason to expect other assumptions to work bet­ ter. Not only is it impossible to know about unobserved causes in a new context, even for known causes there are infinitely many combinations in which these causes can occur in an application context. It would therefore be desirable for causes to operate the same way regardless of other causes occurring in a context. In other words, it would be desir­ able for causes to operate in a compositional manner—once we know what the individual causes are, we can assume causal invariance and analytically derive the outcome due to their combined occurrence based on knowledge of the influence of each component in isolation (Cheng, Novick, Liljeholm, & Ford, 2007). Let us illustrate the usefulness of compositionality with Newton’s law of universal gravita­ tion (Carroll & Cheng, under review). This law can be applied to any set of celestial bod­ ies. Because celestial bodies can be configured in an infinite variety of ways, a great ap­ peal of Newton’s law is that it supports compositionality—it allows the combined gravita­ tional force on a celestial body x to be calculated from the superposition of the separate gravitational forces from each of the other celestial bodies close enough to have a signifi­ cant influence. The force from each nearby celestial body on x is obtained by applying Newton’s law to the pair. The reasoner may never have observed the resulting force vec­ tors occur in combination. Nonetheless, based on the invariance of each force vector re­ gardless of the presence of other force vectors, the total gravitational force on x is just the vector sum of the gravitational forces from these nearby bodies on x. Similarly, at a more general level, force vectors of different types—gravitational, electromagnetic, nu­ clear forces—also combine in a compositional manner. The discovery of Neptune by nineteenth-century astronomers was made possible by the compositionality of Newton’s law of universal gravitation. Astronomers had noticed dis­ crepancies between the observed orbit of Uranus and that predicted by the superposition Page 5 of 33

Causal Invariance as an Essential Constraint for Creating a Causal Repre­ sentation of the World: Generalizing the Invariance of Causal Power of gravitational forces from the Sun and the other known planets according to Newton’s law. Astronomers speculated that gravitational perturbations from a yet-undiscovered planet might explain the anomalous orbit. When a French astronomer deduced the pre­ cise position of the undiscovered planet, again making use of the superposition of gravita­ tional forces, astronomers were able to closely attend to a specific region of the sky and promptly sighted the planet through a telescope. Our understanding of the solar system owes this discovery to the compositionality in Newton’s law of universal gravitation.

Review of the Causal Power Theory of the Prob­ abilistic Contrast Model and Its Extensions Not only does causal invariance motivate sublime scientific progress, it also underlies mundane everyday reasoning. For example, a child learning to ignite a match may notice that she usually has to strike the match three times before it ignites. On a rainy camping trip, she may notice that on the campground she strikes the match many more times than three and it still does not ignite. The deviation from what is expected based on general­ ization from experience in her kitchen at home may jolt her into searching for an explana­ tion, a cause for the change. She may observe that rain dampened the matchbox and its content, and she may accordingly revise her causal structure regarding match ignition, adding the dryness of the match as a new component in her conjunctive cause of ignition. The more complex cause with interacting components holds across more contexts. Remarkably, the psychological literature shows that people spontaneously assume causal invariance in their causal judgments, both for causal strength (e.g., Buehner, Cheng, & Clifford, 2003; Lu et al., 2008; Novick & Cheng, 2004) and for causal structure (evaluat­ ing whether or not a candidate is a cause of an effect; Liljeholm & Cheng, 2007; Lu et al., 2008; Lu et al., 2015; Wu & Cheng, 1999), even though they are probably unaware of making the assumption (Park, McGillivray, and Cheng, under review). People’s causal judgments are better explained by the causal power theory (Cheng, 1997; Lu et al., 2008; Novick & Cheng, 2004)—a causal induction theory that makes the causal-invariance as­ sumption—than by theories that do not make that assumption. In the causal power theo­ ry, the causal-invariance assumption is embedded in the causal learning process. Next we review the causal power theory (Cheng, 1997) and explain how people’s causal judgments reflect the causal invariance assumption. This theory is both a normative and a descriptive theory for the evaluation of the strength of a candidate cause c to produce, or to prevent, an effect e. The mathematics in the theory applies to candidate (p. 69) causes and effects that are binary variables with a “present” and an “absent” value. The theory is causal rather than merely associative in that it assumes the construct of causal power, an unobservable enduring capacity to influence the occurrence of e. (Binary cause and effect variables most readily illustrate the need for postulating causal power; however, the con­ struct of causal power is not limited to binary variables, as our examples from science in this chapter illustrate.) The theory partitions all causes of effect e into the candidate cause in question, c, and a, a composite of all (observed and unobserved, known or un­ Page 6 of 33

Causal Invariance as an Essential Constraint for Creating a Causal Repre­ sentation of the World: Generalizing the Invariance of Causal Power known) alternative causes of e. “Alternative causes” of e include all and only those causes of e that are not on the same causal path to e as c. This partitioning is a general commoneffect causal structure that maps onto all learning situations. The generative causal power of c with respect to e is the unobservable probability with which c produces e (i.e., the probability that e occurs due to c occurring). Denoted by qc, it is the desired unknown, the causal strength to be estimated, when e occurs equally or more often in the presence of c than in its absence. But, when e occurs equally or less often in the presence of c than in its absence, the preventive causal power of c, denoted by pc, is the desired unknown. The idea of a cause producing an effect and of a cause pre­ venting an effect are primitives in the theory. Two other relevant theoretical unknowns are qa, the probability with which a produces e when it occurs, and P(a), the probability with which a occurs. Because any causal power variable may have a value of 0, these vari­ ables are merely hypotheses—they do not presuppose that c and a indeed have causal in­ fluence on e. The theory assumes four general prior beliefs (Cheng, 1997; Novick & Cheng, 2004; Lu et al., 2008): 1. c and a influence e independently, 2. a could produce e but not prevent it, 3. the power of a cause is independent of the frequency of occurrence of the cause, and 4. e does not occur unless it is caused. Assumption 1 is a leap of faith inherent to this incremental learning variant of causal dis­ covery. It is the assumption we term “causal invariance” earlier and in the rest of this chapter. This assumption enables causal relations to be learned one at a time, when there is information on only the occurrences of a single candidate cause and of an effect, the minimal types of information that support causal induction. The type of learning de­ scribed by this theory therefore requires less processing capacity than standard causal Bayes nets (Pearl, 2000; Spirtes et al., 1993/2000), which does not make as strong an as­ sumption. If two causes influence effect e independently, then the influence of each cause on e, as indicated by its causal power, remains unchanged regardless of whether e is influ­ enced by the other cause. Like Assumption 1, Assumption 2 is a defeasible default, adopt­ ed until evidence discredits it. (Alternative models apply if either assumption is discredit­ ed; see Novick & Cheng, 2004, on conjunctive causes; see Cheng, 2000, for implications of the relaxation of these assumptions.) It is assumed that as in associative models (see Le Pelley, Griffiths, & Beesley, Chapter 2 in this volume), for an effect in question, the causal learner iterates through candidate causes of the effect, grouping all potential caus­ es other than the candidate in question as the composite alternative cause. These four assumptions imply a specific function for integrating the influences of multiple causes (Cheng, 1997; Glymour, 2001), different from the additive function assumed by as­ sociative models. For the situation in which a potentially generative candidate cause c

Page 7 of 33

Causal Invariance as an Essential Constraint for Creating a Causal Repre­ sentation of the World: Generalizing the Invariance of Causal Power occurs independently of other causes, the probability of observing the effect e is given by a noisy-OR function,

(1)

where c ∈ {0,1} denotes the absence and the presence of candidate cause c, e ∈ {0,1} de­ notes the absence and the presence of effect e, and wa represents P(a = 1) • qa. As just mentioned, variable qc represents the generative power of the candidate cause c. The variable wa represents P(a = 1) • qa because P(a = 1) is unknowable (some alternative cause may be unobserved) and one cannot estimate the causal power of an unobserved cause. In the preventive case, the same assumptions are made, except that c is potentially preventive. The resulting noisy-AND-NOT integration function for preventive causes is

(2)

where pc is the preventive causal power of c. Using these “noisy-logical” integration functions (terminology due to Yuille & Lu 2008), Cheng (1997) derives normative quantitative predictions for judgments of causal strength. She shows that (p. 70) when there is confounding [i.e., when P(a = 1) differs in the presence and absence of c], qc and pc cannot be estimated: in each case there are four unknowns in one equation. But, under the four assumptions just listed, together with the assumption that there is no confounding [i.e., when P(a = 1|c = 1) = P(a = 1|c = 0)], the causal power theory explains the two observable probabilities as follows:

(3)

(4)

Equation 3 “explains” that when c has occurred, e is produced by c or by the composite a, non-exclusively (e is jointly produced by both with a probability that follows from the in­ dependent influence of c and a on e). Equation 4 “explains” that when c does not occur, e is produced by a alone. Now, if one is willing to assume “no confounding,” substituting wa with P(e = 1|c = 0) in Equation 3 (by making use of Equation 4) and rearranging the re­ sulting equation gives Equation 5,

Page 8 of 33

Causal Invariance as an Essential Constraint for Creating a Causal Repre­ sentation of the World: Generalizing the Invariance of Causal Power (5)

in which all variables other than qc are observable. Therefore, qc can be solved. Thus, the theory explains why “no confounding” is a prerequisite for causal inference in the princi­ ples of experimental design. Being able to solve for qc only under the condition of inde­ pendent occurrence also explains why manipulation by free will encourages causal infer­ ence in everyday life—reasoners believe that alternative causes are less likely to covary with their action if they are free to decide when to act. An analogous explanation yields pc, the power of c to prevent e:

(6)

Empirical evidence across a variety of experimental methods supports the causal power theory and its extensions (e.g., Buehner et al., 2003; Cheng, 1997; Liljeholm & Cheng, 2007; Lu et al., 2008; Lu et al., 2015; Wu & Cheng, 2000). For example, Buehner et al. (Experiment 2) studied human adult causal inference in the simplest of situations: infer­ ence regarding the strength of a candidate cause based on information regarding the oc­ currence of an effect in the presence and absence of the candidate cause. (Participants were encouraged to believe, and apparently did believe, that the situation involved no confounding.) Buehner et al. found that across 10 experimental conditions varying in data patterns, participants’ median and modal generative and preventive causal judgments ex­ actly matched the predictions according to causal power. In one condition, for instance, the data pattern instantiated in 72 independent trials showed that effect e occurred in 12 of the 36 trials in the absence of candidate cause c, and in 30 of the 36 trials in the pres­ ence of c. In accord with Equation 5, participants estimated that e occurs in 75 out of 100 trials in a new context in which c is the only cause of e present, that is, c produces e with a .75 probability. The observed causal judgment in the preceding condition is consistent with the causal in­ variance assumption: specifically, that (a) the background cause (composite a in the theo­ ry) produces e with the same probability of 1/3 across two contexts, in the presence of c and in its absence, and (b) c produces e with the same probability of .75 across two con­ texts, in the presence of a and in its absence. Under the invariance assumption, c and the background cause in combination would produce e 5/6 of the time (see the noisy-OR func­ tion in Equation 1), that is, in 30 of the 36 trials in the presence of c as shown in the data pattern. The role played by the causal-invariance assumption in arriving at intuitive causal judg­ ments may become clearer if we contrast the causal-power prediction with the prediction according to the additive function assumed by dominant associative models (e.g., Rescor­ la & Wagner, 1972) for the just mentioned condition in Buehner et al. (2003). The latter models predict that c produces e with a .50 probability (5/6 – 1/3 = 1/2); that is, partici­ Page 9 of 33

Causal Invariance as an Essential Constraint for Creating a Causal Repre­ sentation of the World: Generalizing the Invariance of Causal Power pants should predict that in 100 new trials in the new context in which no other cause of e occurs, the presence of c would result in e occurring in 50 trials. To see how this associa­ tive prediction violates causal invariance, recall that in the data pattern, in the absence of c, e did not occur in 24 of the 36 trials. Assuming “no confounding” and the invariance of the background cause a, one would expect that in the presence of c, e would likewise not have occurred—except for c—in about 24 of the 36 trials. Therefore, if c produces e with a .50 probability in each of these 24 trials, e should occur in 12 of them. That is, contrary to the data pattern, in the presence of c, e would have occurred in only 24 rather than 30 of the 36 trials (p. 71) (12 caused by c in addition to the 12 caused by the background). In other words, c produces e with two different probabilities (.75 and .50, respectively) across contexts, in the presence versus the absence of background cause a. The causal strength inferred without the causal-invariance assumption should not and does not transfer to a new context. Griffiths and Tenenbaum (2005) show that if one represents prior uncertainty about the possible causal strengths of all causes by a uniform prior distribution of causal strengths, then Equations 5 and 6, respectively, give maximum likelihood point estimates of the gen­ erative and preventive powers of the candidate cause; that is, they are the peak of the posterior likelihood distributions. In a meta-analysis of causal strength judgments across 114 conditions evaluated in Perales and Shanks (2007), Lu et al. (2008) show that a Bayesian causal-strength model incorporating causal invariance functions (noisy-OR and noisy-AND-NOT) and a uniform prior, with no free parameters, obtains a high correlation with human causal strength judgments (r = .96), higher than that shown by any existing model that does not adopt the causal-invariance assumption, including post hoc models with as many as four free parameters. Lu et al. (2008) developed a Bayesian variant of the causal power theory (Cheng, 1997). Their SS (Sparse and Strong) power model assumes that, other things being equal, causal learning favors stronger causes and fewer causes. These assumptions are incorporated as a prior joint distribution of the causal strengths of candidate c and alternative composite a (see Griffiths, Chapter 7 in this volume, for further discussions of causal priors). Return­ ing to Buehner et al.’s (2003) Experiment 2, although the median and modal causal judg­ ments were exactly as predicted by causal power, the mean judgments show small but systematic deviations from the predicted values. In particular, for conditions in which al­ ternative causes produce the effect with a high probability, judgment that a candidate has high generative causal strength tended to have lower values and deviate more from causal-power predictions compared to the analogous preventive strength judgments. Lu et al.’s SS power model explains this asymmetry as well as the finding that human judg­ ments of causal structure—relative to the chi-square test and to a Bayesian model with uniform priors—are influenced more by the causal powers of the candidate and the alter­ native cause and less by sample size. The causal-invariance assumption that people use as a default assumption in their intu­ itive causal judgments is the same assumption that motivates many groundbreaking sci­ entific advances. The mathematics expressing causal invariance take different forms de­ Page 10 of 33

Causal Invariance as an Essential Constraint for Creating a Causal Repre­ sentation of the World: Generalizing the Invariance of Causal Power pending on the type of variables representing the causes and effects, but these forms share the core concept of a causal mechanism operating the same way across contexts in which other causes of the outcome occur with different probabilities.

Causal Invariance as an Aspiration, a Default Assumption, and a Criterion for Hypothesis Re­ vision: Three Roles of an Exit Condition in the While Loop of Causal Knowledge Construction Given that developing usable causal knowledge is a goal for any intelligent adaptive sys­ tem, and causal invariance is essential for usable causal knowledge, constructing invari­ ant causal representations should be an aspiration. It may seem that causal invariance as an aspiration is unrelated to causal invariance as a default assumption. When scientists do a statistical test, to measure and describe whether manipulated factors in an experi­ ment interact, the results can turn out either way. The factors may, or may not, show “no interaction.” When causal invariance as a default is violated, a scientist may well settle with the interaction observed and give up the aspiration (Luhmann & Ahn, 2005). But, consider the issue from a top-down perspective. Carroll and Cheng (under review) note that if causal invariance is an aspiration, it should also be a default assumption and criterion for hypothesis revision: one would expect a reasonable algorithm of the causal learning process to check and see whether causal invariance already holds before revis­ ing a hypothesis. The checking requires causal invariance to be a default. If and only if causal invariance is violated would there be a need to reformulate one’s causal knowl­ edge toward the goal of greater causal invariance. Revising depending on the violation of causal invariance makes use of the criterion role of the assumption. Under this concep­ tion, the aspiration, default, and criterion aspects of causal invariance merge into the same unitary role as the stopping condition in the iterative process of representation con­ struction. The violation of the default assumption of causal invariance is a criterion for hypothesis revision aimed at recapturing causal invariance. Using the metaphor of a while loop in computer programming, causal invariance (p. 72) is the “exit condition” in our while loop of causal knowledge construction. The algorithm is as follows: Begin with the most parsimonious causal explanation avail­ able for a target phenomenon (a desired outcome or a set of observations of interest). If applications of this explanation result in evidence consistent with causal invariance, exit the while loop. Otherwise, modify the old explanation in some way (when resources are available) to attain the most parsimonious invariant explanation, which will supersede the refuted explanation. This process is iterated for further applications of causal knowledge. As the exit condition in a while loop, the concept of causal invariance is a navigating de­ vice for the journey of formulating the simplest explanation of phenomena that general­ izes across contexts (Carroll & Cheng, 2010). The device gives a signal whenever the rea­

Page 11 of 33

Causal Invariance as an Essential Constraint for Creating a Causal Repre­ sentation of the World: Generalizing the Invariance of Causal Power soner deviates from the path. (See Waldmann & Hagmayer, 2006, for situations under which people prefer not to revise a representation.)

Other Potential Puzzles Obscuring the Essen­ tial Role of Causal Invariance: Is Causal Invariance Merely Wishful Thinking? As just mentioned, scientists often obtain experimental results that show an interaction between hypothesized factors, and causal invariance as a default may seem to be no more than a weak heuristic. To be sure, causal invariance would be wishful thinking if our rep­ resentation were arbitrary and fixed. But, in science and in everyday life, reasoners re­ vise their representations toward greater causal invariance, to form categories or con­ cepts that obey causal laws (Lewis, 1929; Lien & Cheng, 2000). The philosopher of sci­ ence James Woodward (2000, 2003) analyzes a broad range of scientific theories to show that causal invariance across contexts, rather than subsumption under an exceptionless general law, is the critical feature underlying scientific explanations. A more satisfying ex­ planation means greater causal invariance. Earlier we saw that to restore the invariance of Newton’s law of universal gravitation, astronomers modified their representation of the solar system by positing a hidden cause, a previously unobserved planet. In our match ig­ nition example, the child striking a damp match notices a deviation from her expectation under causal invariance, and adds the dryness of the match as a component cause to ar­ rive at a more invariant causal representation. Lien and Cheng asked college students to play the role of a gardener’s assistant whose task is to learn to predict which fertilizers make plants bloom given frequency data regarding blooming. The experiment shows that participants revised their representation (of potential fertilizers of unknown chemical compositions) from their original representation in terms of familiar perceptual features (e.g., blue fertilizers cause blooming) to one involving novel abstract perceptual features that they spontaneously formulated in the learning context of the experiment (e.g., coolcolored fertilizers cause blooming or fertilizers with bilaterally asymmetrical granules cause blooming). Their novel representation best supported the generalization of their ac­ quired causal knowledge in transfer tests.

Soccer Balls Versus Carbon Molecules Traveling Through Two Slits Here we illustrate with an example from science in which aspiring toward causal invari­ ance led to a radical representational change. We borrow Hawking and Mlodinow’s (2010) explanation of the phenomenon. The example is highly intuitive at a qualitative lev­ el despite the technical quantitative background information. Consider two analogous scenarios involving a soccer player shooting soccer balls one at a time through two slits in a wall. On the far side is a long net parallel to the wall that catches the balls (see Figure 5.2 a). If the soccer player has somewhat shaky aim but launches balls with a consistent speed, a time-lapse photograph in that case might show Page 12 of 33

Causal Invariance as an Essential Constraint for Creating a Causal Repre­ sentation of the World: Generalizing the Invariance of Causal Power the obvious pattern of landings at the net illustrated in Figure 5.2 a. In that situation, if one were to close off one slit, so that the corresponding stream of soccer balls through that slit would no longer get through, this would have no influence on the other stream. In other words, what we expect with both slits open (p. 73) is the superposition of what we would expect with each slit in the wall separately opened.

Figure 5.2 (a) Two-slit soccer. A soccer player kick­ ing balls at slits in a wall would produce an obvious pattern. Source: Hawking & Mlodinow (2010).

Figure 5.2 (b) Buckyball soccer. When molecular soccer balls are fired at slits in a screen, the result­ ing pattern reflects unfamiliar quantum laws. Source: Hawking & Mlodinow (2010).

Now, consider a second scenario. Suppose that when only one slit is open, the pattern of balls collected at the net is exactly as would be expected based on the preceding sce­ nario. But, when both slits are open, the pattern of balls at the net is instead as shown in Figure 5.2 b. If we observed these stunning results, how should we revise our model of how balls be­ have? The landing pattern depicted in Figure 5.2 b is in fact what occurred when the par­ ticles shot one at a time through the two slits were much smaller objects ranging from Page 13 of 33

Causal Invariance as an Essential Constraint for Creating a Causal Repre­ sentation of the World: Generalizing the Invariance of Causal Power photons and electrons to ball-shaped molecules each made of 60 carbon atoms (the last are labeled “buckyballs” in Figure 5.2 b; for a more detailed explanation for non-physi­ cists, see Hawking & Mlodinow, 2010, chapter 4). Some physicists proposed that particles that are very small behave as if they are also waves when there is uncertainty regarding their position, such as which slit they pass through. According to this theory, the phases of waves passing through each of the slits superpose to form the outcome pattern. At po­ sitions on the net where the waves of the buckyballs traveling through the two slits are in phase, the amplitudes of the waves sum up, resulting in a pile-up of buckyballs. At posi­ tions where the waves from the two slits are out of phase (a crest of one wave coincides with the trough of another wave), the amplitudes of the waves cancel each other out, re­ sulting in the absence of buckyballs. When only one slit is open, there is no wave interfer­ ence effect, in which case waves passing through the slit arrive at the net in a similar pat­ tern as particles. Revising one’s representation toward greater causal invariance is not the only hypothesisrevision strategy. One might instead represent the balls through the two slits as interact­ ing, and attempt to construct a function describing the particular form in which balls from the two streams interact. For example, one might develop alternative functional forms by applying a hierarchical Bayesian causal model that learns different functional forms depending on the value of a parameter in the model (cf. Griffiths, Chapter 7 in this volume; Lucas & Griffiths, 2010). Functional forms, sometimes called parameterizations, are mathematical specifications describing how types of variables combine their influ­ ences. Causal invariance functions are functional forms that specify that causes combine their capacities by superposing them, each cause retaining its capacity in the combina­ tion as if the other causes were not there. Although learning functional form is clearly important in causal inference, providing a way of specifying interacting causal factors, understanding causal learning only in terms of learning functional forms would be problematic for two reasons. First, the resulting function or structure may not show compositionality. For example, if one of the slits in the wall is shifted slightly to the left, or if a third slit is added, different functional forms would be required. Because the pattern of balls landing at the net changes with each modification of the slits, the acquired functional forms would not be useful for predicting the pattern for any novel situation. Second, without causal invariance as the default ex­ pectation (e.g., as in Figure 5.2 a), there is no deviation from expectation to guide revi­ sion. This second problem is general across approaches that do not treat causal invari­ ance as an aspiration and instead aim to form the best post hoc models. In our examples, scientists confronted with a violation of causal invariance sought to for­ mulate new concepts, by revising their representation, to re-attain causal invariance. Concept formation and tests of causal invariance hand-in-hand enable the achievement of causal invariance.

Page 14 of 33

Causal Invariance as an Essential Constraint for Creating a Causal Repre­ sentation of the World: Generalizing the Invariance of Causal Power

A Gap Between Causal Invariance as Aspiration and as Description: In What Way Can Interacting Causal Factors Be Invariant? Carroll and Cheng (under review) note that representational changes notwithstanding, there is a gap between the roles of causal invariance as aspiration and as a description of findings. Description is a fourth role of the concept of causal invariance, the role most fa­ miliar to us in an explicit form. Even (p. 74) allowing for representation revision, causal invariance may still be wishful thinking. Elemental forces that operate independently of other forces are the exception rather than the rule. Most causes we encounter are repre­ sented as complex causes, with multiple factors interacting to produce an outcome. Car­ roll and Cheng explain that because causal invariance concerns invariance across con­ texts, it does not concern the independence of factors constituting a potentially complex cause with respect to each other. Instead, the complex cause as a whole is potentially what is invariant across contexts. Returning to our gravitation example, the gravitational attraction F between two masses m1 and m2 is an interaction specified as follows, according to Newton’s law of universal gravitation:

where G is the gravitational constant and r is the distance between the centers of the masses. The three variables m1, m2, and r do not exert separate influences that superpose to yield the overall force. Invariance holds at the level of the “whole cause” rather than at the level of its parts, for F rather than for m1, m2, and r. The gravitational forces on a ce­ lestial body due to other celestial bodies do superpose; the total gravitational force on a planet (e.g., Uranus) is the vector sum of the forces from each of the nearby celestial bod­ ies (the sun and other planets). The gravitational force from each nearby body remains the same (as specified by the law) regardless of the forces from other bodies that happen to be nearby. The complex cause that is ideally invariant across contexts can also take the form of a conjunction of necessary factors that are not individually sufficient to produce the effect (Mackie, 1980). For example, in our match ignition example, the interacting factors in the girl’s complex cause might be the following: the match is made of flammable material, the match head and strike pad are made of materials that are chemically reactive when com­ bined, the striking creates enough friction to generate high heat, the match has to be dry, and so on. To summarize, whereas the concept of independence in its statistical usage is applied to describe the relation between factors that are potentially merely components of a cause, causal invariance as aspiration is independence at the level of the “whole cause” with re­ spect to other whole causes that occur in different contexts. Thus, the candidate cause c in the causal power theory is not necessarily an elemental cause. It can stand for a com­

Page 15 of 33

Causal Invariance as an Essential Constraint for Creating a Causal Repre­ sentation of the World: Generalizing the Invariance of Causal Power plex cause that is potentially invariant across contexts, whose components are identified and described by an interaction among the components.

Is Causal Invariance Unnecessary if We Do Not Seek to Generalize, Applying Causal Knowledge Only Within the Sampled Population? Because extrapolation is speculative, scientists are careful not to extrapolate. For exam­ ple, the effect of a drug treatment on rats may not generalize to humans. In fact, the de­ fault assumption in scientists’ application of statistics is to avoid generalizing beyond the population sampled. In that sense, generalization is not normative. Now, in everyday in­ ference, where representative sampling does not take place, there is no choice but to ex­ trapolate beyond the learning sample (see Luhmann & Ahn, 2005, for an alternative view). But, for a scientist using statistical inference who does not seek to generalize, would there be no role for causal invariance as a default assumption? Even an intended interpolative application of causal knowledge inevitably assumes causal invariance. Generalization in an experimental science using statistics and everyday causal inference are the same in two respects. First, time does not stand still. Time elapses be­ tween data collection and the application of the acquired causal knowledge, potentially changing some causes of the target outcome across the two episodes. The aging of a pop­ ulation sampled in a country, technological innovation, or climate change may alter the relevant causal factors from the learning to the application context. Thus, one typically has no choice but to generalize to a new causal context. Second, the acquisition of causal knowledge itself already involves the causal invariance assumption, as we illustrated earlier with a data pattern used in Buehner et al. (2003). Specifically, contrasting the occurrence of an effect in two contexts requires the assump­ tion that some component of the background cause operates the same way across two contexts. For the acquisition of usable causal knowledge involving binary cause and effect variables, these two contexts are the contexts in which the target candidate cause is re­ spectively present and absent. (Here we treat the background cause as “figure” and the candidate cause as “ground.”) Thus, assuming invariance versus assuming a particular non-invariance function would lead to different statistical conclusions. We again use (p. 75) causation involving binary variables to illustrate our point. Bayesian causal-structure learning arrives at different conclusions depending on whether the models assume that the background cause and candidate cause (1) operate independently, or (2) combine their effects additively (Griffith & Tenenbaum, 2005, 2009; Tenenbaum & Griffiths, 2001). The additive variant of the Bayesian model yields results analogous to associative measures such as the chi-square test. Consider the left and right data sets depicted in Figure 5.3. Assume that data for each set was collected from allergy patients who were randomly assigned to the experi­ mental and control groups. In the figure, taking Medicine H versus not taking the medi­ cine is the candidate cause variable, and headache versus no headache is the binary out­ come variable. Each face in the figure represents the outcome for an allergy patient. A Page 16 of 33

Causal Invariance as an Essential Constraint for Creating a Causal Repre­ sentation of the World: Generalizing the Invariance of Causal Power frowning face represents a patient who has a headache; a smiling face represents a pa­ tient who does not have a headache. The data sets are transpositions of each other: the two values of the candidate cause are transposed across sets, as are the two values of the outcome. These data sets are treated as equivalent by associative measures such as the chi-square statistics or an associative Bayesian structure-learning model that uses the additive generating function (Griffiths & Tenenbaum, 2005, 2009). In assessing whether the Medicine H is a cause of headaches, an associative measure ar­ rives at an identical output value across sets: for both data sets, X2 (1, N = 36) = 4.5, p =. 03, and additive Bayesian causal support = 1.85. The identity rests on assuming a specif­ ic form of interaction between some component of the background cause and the candi­ date cause. As we explained earlier, the additive function specifies non-invariance across contexts for causes of a binary outcome. In contrast, a Bayesian structure-learning model (Griffiths & Tenenbaum, 2005, 2009) that assumes the noisy-OR function, the function consistent with causal invariance assumption, yields different output values (noisy-OR Bayesian causal support = 1.38 and 1.89, respectively, for the left and right data set), in­ dicating less confidence in the causal judgment that Medicine H is a cause of headaches for the left data set than for the right data set. Thus, a decision has to be made regarding causal invariance even if we only interpolate, applying causal knowledge within the sam­ pled population only: data sets that appear symmetrical to associative models yield differ­ ent (p. 76) values in judgments of causal structure when invariance is assumed.

Figure 5.3 Two data sets illustrating the essential role of the causal invariance assumption in statistical inference regarding causal structure.

Empirical Knowledge of How Causes Do Combine Their Influences on an Outcome Versus Analytic Knowledge of How Causes Would Com­ bine their Influences if their Influences Are Independent There is a considerable psychological literature showing that domain-specific knowledge of how causes of an outcome combine their influences on that outcome affects subse­ Page 17 of 33

Causal Invariance as an Essential Constraint for Creating a Causal Repre­ sentation of the World: Generalizing the Invariance of Causal Power quent learning on relations involving novel variables of the same type or in similar con­ texts (e.g., Beckers, De Houwer, Pineno & Miller, 2005; Beckers, Miller, De Houwer, Urushihara, 2006; Boddez, De Houwer, & Beckers, Chapter 4 in this volume; Griffiths & Tenenbaum, 2009; Lovibond, Deen, Mitchell, Bouton & Frohardt, 2003; Lucas and Grif­ fiths, 2010; Mitchell, De Houwer, & Lovibond, 2009; Shanks & Darby, 1998; Sobel, Tenen­ baum & Gopnik, 2004; Tenenbaum & Griffiths, 2003). We certainly do not dispute that it is useful to learn how types of causes combine their influences and to let that knowledge guide prediction and subsequent learning regarding novel causes of those types. Much of our causal reasoning no doubt involves applying prior causal knowledge to novel in­ stances of the categories we form (Ahn, Kalish, Medin & Gelman, 1995; Ahn, Kim, & Lebowitz, Chapter 30 in this volume). But how was the prior causal knowledge created in the first place? Note that when causal invariance for a specific hypothesis is violated, what is rejected is only causal invariance in its domain-specific descriptive role. Causal invariance in its “ex­ it condition” role remains unchanged. The newly revised hypothesis will become the cur­ rent input to the while loop of causal-knowledge construction. Causal invariance in its ex­ it-condition role is driven top-down by the aspiration of formulating usable causal knowl­ edge and is not rejected unless one gives up that aspiration. It may be helpful here to distinguish between two kinds of knowledge regarding how causes combine their influences: (1) empirical knowledge of how specific causes (and types of causes) do combine their influences, and (2) analytic knowledge of what the com­ bined influences of two causes would or should be if these causes operate independently, given information on how they each operate in isolation and on the mathematical proper­ ties of the variables involved (e.g., the outcome variable is binary scalar, continuous scalar, a vector, a wave; the measurement scale is interval; the quantity measured is in­ tensive versus extensive; Waldmann, 2007). Observations of outcomes in the world can only give us empirical knowledge, which concerns the truth of claims, but not analytic knowledge, which concerns the validity of inferences under a set of premises. Specifically, observing how causes combine their influences need not inform the reasoner on whether or not the causes did combine their influences independently. Consider the hypothetical data sets in Figures 5.4 and 5.5 in turn (reprinted from Liljeholm and Cheng, 2007). Each figure displays results from two studies, Study I and Study II, testing allergy medicines for a potential side effect of causing headache. The representation is the same as in Figure 5.3 (a frowning face indicates that the patient has a headache). Assume that within Study I and within Study II but not across studies, allergy patients were randomly assigned to an experimental group that received a treatment and a control group that did not receive the treatment. In Study I, the treatment was Medicine A. In Study II, the treatment was Medicines A and B. Allergy medicines A and B are the candidate causes. Based on the data in Studies I and II in Figure 5.4, the question is: What would be a reasoner’s best bet that Medicine B has a side effect of causing headaches?

Page 18 of 33

Causal Invariance as an Essential Constraint for Creating a Causal Repre­ sentation of the World: Generalizing the Invariance of Causal Power Presumably, the rationale is that if the influence of the treatment was the same across the two studies, then the influence can be accounted for by Medicine A alone, and Medicine B is not needed to explain the data. But what does the “same” influence mean? Notice that across the four panels in Figure 5.4, the relative frequency of headaches varies—the sameness of causal influences is not apparent in the data. No two panels have the same relative frequency of headache. Like­ wise for Figure 5.5. Because causation itself is not observable, it follows that the same­ ness of causal influence is not observable. Liljeholm and Cheng (2007) showed that most adult participants presented with the data sets in Figure 5.4 bet that Medicine B has a side effect of causing headache, but most par­ ticipants presented with the data sets in Figure 5.5 bet that Medicine B has no effect on headache. It is the reasoners’ analytic understanding of what to expect for binary out­ come variables (headache or no headache here) assuming causal invariance that explains their judgments. They use causal invariance as a criterion to (p. 77) determine whether to revise their causal structure by adding B as a cause of headache.

Figure 5.4 Two hypothetical data sets illustrating that judging the sameness of the influence of Medi­ cine A across Study I and Study II (with random as­ signment of patients within but not across studies) requires analytic knowledge of the outcome pattern predicted by sameness. Only this knowledge allows the reasoner to judge that the observed pattern in this figure deviates from the expected pattern assum­ ing sameness.

Figure 5.5 can provide a more direct illustration of this analytic understanding if we mod­ ify Study II in the figure so that the treatment is Medicine A only (i.e., omit Medicine B from the bottom right panel). Based on the modified study, one can estimate what Medi­ cine A does in each patient in Study II. Now, the critical conditional question is: If Medi­ cine A works the same way in Study I as in Study II, and we give Medicine A to every pa­ tient in the Study I control group (top left panel), what would be the expected pattern of Page 19 of 33

Causal Invariance as an Essential Constraint for Creating a Causal Repre­ sentation of the World: Generalizing the Invariance of Causal Power outcome in the bottom panel? This expectation is an inference based on an analytic un­ derstanding of what sameness of causal capacity means. To summarize, we have just illustrated that observing how causes do combine their influ­ ences does not inform the reasoner about whether or not the causes combine their influ­ ences independently. Instead, the reasoner’s analytic knowledge of the causal invariance function for a binary outcome variable is what allows the reasoner to judge whether an observed pattern in the data confirms the expected pattern assuming causal invariance. Empirical knowledge is justified by experience and hence can be modified by experienc­ ing patterns of events (e.g., as in Beckers et al., 2006, and similar experiments). Analytic knowledge is justified by reason and cannot be modified by such experience.

Is the “Exit Condition” Conception Simply Saying “Revise a Model When It Fails?” A folk saying goes, “if it ain’t broke, don’t fix it.” Revising a hypothesis when causal in­ variance is violated may seem obvious. A close examination may show what may not be immediately obvious. First, violation of causal invariance is not the same as a

(p. 78)

causal law failing to work. A probabilistic hypothesis (e.g., smoking causes lung cancer) may fail in many instances, but if it is predictive, it will not be abandoned until a more in­ variant replacement comes along. As the philosopher of science Thomas Kuhn (1962) documented, even for deterministic phenomena, when a physical theory is well known to fail in certain ways (e.g., Ptolemy’s geocentric model of the cosmos, Newton’s theory of gravitation), it can nonetheless be retained for centuries until a better theory emerges. Kuhn’s work clarified that a theory working or not working is not a criterion for theory change in the history of science.

Figure 5.5 Two hypothetical data sets illustrating that judging the sameness of the influence of Medi­ cine A across Study I and Study II (with random as­ signment of patients within but not across studies) requires analytic knowledge of the outcome pattern that sameness predicts.

Page 20 of 33

Causal Invariance as an Essential Constraint for Creating a Causal Repre­ sentation of the World: Generalizing the Invariance of Causal Power Second, although we suspect that scientists, like everyday reasoners, intuitively use causal invariance as a criterion for hypothesis revision, neither current machine learning nor conventional statistics, which is associative, adopts the criterion (Cheng et al., 2013). Three approaches to causal learning—the causal Bayes nets approach, the associative ap­ proach, and the causal power approach—adopt three different criteria for hypothesis revi­ sion (Carroll & Cheng, under review).

What Is Causal About Causal Invariance? An Illustration: Causal Invariance for Binary Cause-and-Effect Vari­ ables So far we have just assumed an intuitive grasp of causal invariance, without explaining what is causal about causal invariance. Here we review and explain the concept of causal invariance according to the causal view (Cartwright, 1989; Cheng, 1997; Kant, 1781; Lu et al., 2008; Novick & Cheng, 2004). This view agrees with David Hume (1739) that causal knowledge is induced from non-causal data. Intervening between the observable input and the causal output, however, the causal view adds a causal explanation of the da­ ta. This explanation, under Kant’s (1781) domain-general causal framework, posits the ex­ istence of such things as causal relations: theoretical events that yield observed phenom­ ena. Why do we need a causal framework to interpret that data, intervening between the data and its causal output? Why should we not be content with associative definitions of invariance, which are simpler in that they forgo the intervening causal explanation? To understand the motivation for a causal framework, consider the definition of causal invariance for discrete cause-and-effect variables. Recall the statistical definition of interaction as a second-order observed change: the observed change on the dependent variable due to one independent variable depends on the value or level of another inde­ pendent variable. As Park et al. (under review) explain, this definition of interaction in terms of a violation of additivity is appropriate for defining the violation of causal invari­ ance for continuous outcome variables. For such variables in the range where the out­ come is on an interval scale, every change in the value of the cause is accompanied by a corresponding observable change in the intensity of the outcome variable. Thus, causal invariance for continuous outcome variables manifests itself as additivity. For discrete outcome variables, however, representation in terms of observable changes alone cannot capture the constancy of a causal relation across contexts. To explain the inadequacy, we again make use of causes and effects that are either “present” or “absent.” (p. 79)

When effect e is binary, a factor’s capacity to influence e may have no observable manifes­ tations, even when there is no confounding. Suppose c is a cause of e that does not inter­ act with any other cause of e. Yet, whenever e is already present (regardless of which oth­ er cause produced it), introducing c will be indistinguishable from introducing a non­ causal factor: both interventions will yield no change in the state of e. Suppose someone is already dead (the binary outcome in question) from being hit by a car. Being hit by an­ Page 21 of 33

Causal Invariance as an Essential Constraint for Creating a Causal Repre­ sentation of the World: Generalizing the Invariance of Causal Power other car will show no change in the outcome (the person is still dead), despite the same­ ness of the forces underlying car accidents (the second car would have killed the person, too). In such occlusion events, unobservable causal capacities lose their mapping onto ob­ servable changes. The realization of the capacity of one factor (collision by the first car killing the victim) occludes another factor’s capacity (collision by the second car). Given the lack of constancy in this mapping, postulating capacities becomes crucial for representing a stable causal world. Not only would observable changes, as used in asso­ ciative models, be inadequate, even actual causation in an episode, as used in epidemio­ logical causal models (Rothman, Greenland, & Lash, 2008), would still be inadequate: ac­ tual causation changes depending on whether outcome e is already produced by some other cause. Just as we assume that objects occluded in the two-dimensional visual input on our retinas continue to exist in the spatial world, so should we assume that occluded causal capacities continue to exist in the causal world. According to the statistical defini­ tion of interaction, all causes of a binary outcome would interact with all other causes of the outcome. They can never be invariant across contexts. Such a description is uninfor­ mative. Thus, for binary outcome variables, the very definition of causal invariance, the basis for interpreting data during causal inference, requires the postulation of causal ca­ pacities. There is therefore a need for a generic, domain-general, prior assumption that causal capacities exist in the world.

Empirical Evidence in Humans and Rats for Causal Invariance Functions We review two recent studies using Bayesian sequential learning models that provide evi­ dence consistent with the assumption of causal invariance by humans and non-human ani­ mals (Lu et al., 2015). The assumption allows the sequential models to explain a longstanding discrepancy in the human and non-human causal and associative learning litera­ tures involving the well-studied blocking paradigm (Kamin, 1969). The sequential models make different predictions depending on the appropriate causal-invariance function for an outcome variable type. Studies of human performance, as well as conditioning experi­ ments with rats and other non-human animals (which by necessity involve sequential da­ ta), show that the order of data presentation can dramatically influence causal learning. A traditional blocking procedure, forward blocking, can serve as an example of a paradigm in which it is possible to compare the performance of non-human and human learners. In the forward blocking paradigm, the experimental group is presented with a number of A+ trials (i.e., cue A coupled with an outcome) in an initial learning phase, whereas the con­ trol group is not exposed to these pairings. Then, in a second learning phase, both groups are presented with AX+ trials (i.e., cue A and cue X presented together and coupled with the outcome). The transfer test concerns whether cue X comes to be considered a cause of the outcome. The blocking paradigm provides an opportunity to assess whether reason­ ers use causal invariance as a criterion for revising their causal structure to add cue X as a cause of the outcome. Page 22 of 33

Causal Invariance as an Essential Constraint for Creating a Causal Repre­ sentation of the World: Generalizing the Invariance of Causal Power The common finding from animal conditioning studies is that in the experimental condi­ tion, cue X is identified as clearly non-causal, even though it is always paired with the outcome, as evidenced (p. 80) by much weaker responses to cue X in the experimental group than in a control group (Kamin, 1969). Near-complete forward blocking has been demonstrated with non-human animals across a wide variety of procedures and species (Good & Macphail, 1994; Kamin, 1969; Kehoe, Schreurs, & Amodei, 1981; Le Pelley et al., Chapter 2 in this volume; Merchant & Moore, 1973). In contrast, when forward blocking is used in experiments on human causal learning, numerous studies have yielded blocking effects that were relatively weak (i.e., partial rather than complete blocking), or even fail­ ures to obtain this effect (Glautier, 2002; Lovibond, et al., 2003; Vandorpe & De Houwer, 2005; Waldmann & Holyoak, 1992). We review simulation results to explain the paradoxical findings in human and rat causal learning. As explained earlier, causal invariance for binary outcome variables is repre­ sented by the noisy-OR function, whereas causal invariance for continuous outcome vari­ ables is represented by an additivity or linear-sum function. Figure 5.6 shows the predict­ ed mean weights of each cue as a function of the training trials in a forward blocking par­ adigm based on the causal invariance function for a binary outcome variable and for a continuous outcome variable (Lu et al., 2015). In stage 1 with six A+ trials, both sequen­ tial models capture the gradual increase of estimated causal strength for cue A as the number of observations increases. However, after six AX+ trials in stage 2, the model adopting noisy-OR function generates predictions distinct from the model with the addi­ tivity function. The simulation with the additive model (right panel in Figure 5.6) indeed yields near-complete forward blocking (i.e., the predicted causal weight for cue X ap­ proaches the lowest possible rating), indicating the absence of a causal relation between cue X and the outcome. The model assuming the noisy-OR function consistent with causal invariance (left panel in Figure 5.6) predicts no blocking effect, consistent with the human ratings observed by Vandorpe and De Houwer (2005). Vandorpe and De Houwer’s study involves a binary out­ come (“allergic reaction” vs. “no allergic reaction”). Thus, human sequential causal learn­ ing in a forward blocking paradigm involving a binary outcome is, appropriately, consis­ tent with causal invariance for binary outcome variables represented by the sequential model assuming the noisy-OR function. In contrast, the additive model is consistent with the common finding of complete or near-complete blocking in non-human species (Good & Macphail, 1994; Kamin, 1969; Ke­ hoe et al., 1981; Merchant & Moore, 1973). Our interpretation is that the outcome vari­ ables in the non-human studies (p. 81) (e.g., amount of food, intensity of shock) were ap­ parently perceived as continuous.

Page 23 of 33

Causal Invariance as an Essential Constraint for Creating a Causal Repre­ sentation of the World: Generalizing the Invariance of Causal Power

Figure 5.6 Model simulations of mean causal weights of each cue as a function of the number of training trials in a forward blocking paradigm (six A+ trials followed by six AX+ trials). The asterisks in­ dicate the human causal rating for the target cue X reported by Vandorpe and De Houwer (2005). Left: model simulation with the noisy-OR function consis­ tent with causal invariance for a binary outcome vari­ able. Right: model simulation with the additive func­ tion consistent with causal invariance for a continu­ ous outcome variable. The black solid lines show the predicted weights for the target cue X; the gray dashed lines show the predicted weights for the cue A.

Page 24 of 33

Causal Invariance as an Essential Constraint for Creating a Causal Representation of the World: Generalizing the Invariance of Causal Power Table 5.1 Experimental Design in Blocking Paradigm with Pretraining Condition

Group

Phase 1: Pre­ training

Phase 2

Phase 3

Test

Subadditive

Experimental

4C+/4D+/4CD+

12A+

4AX+

X

Control

4C+/4D+/4CD+

12B+

4AX+

X

Experimental

4C+/4D+/4E+

12A+

4AX+

X

Control

4C+/4D+/4E+

12B+

4AX+

X

Irrelevant Ele­ ment

Note: A, B, C, D, E and X represent various visual and auditory cues as conditioned stimuli; + indicates a footshock as outcome. The numerical values indicate the number of trials. Source: Beckers et al. (2006, Experiment 1).

Page 25 of 33

Causal Invariance as an Essential Constraint for Creating a Causal Repre­ sentation of the World: Generalizing the Invariance of Causal Power We consider in greater detail Becker et al.’s (2006) experiments on forward blocking in rats. Animals were presented with cues that were associated with shocks while they pressed a lever for water. Because shock can occur with a range of intensities, it may be perceived as a continuous outcome even though the experimenters presented the shocks at a constant intensity level. The same Bayesian sequential model (Lu et al., 2015) was adopted to account for the results. Table 5.1 schematizes the design in Experiment 1 in Beckers et al. (2006). Animals in the experimental group received forward blocking training (A+ followed by AX+); control ani­ mals did not receive blocking training (B+ followed by AX+). Before the actual blocking training (Phases 2 and 3), experimental and control animals were both exposed to a demonstration of two effective cues, C and D, that had subadditive pretraining (i.e., C+, D+, CD+) or to an irrelevant element pretraining (i.e., C+, D+, E+). The number of leverpress responses to X after Phase 3 was measured for the animals in all four groups. The irrelevant-element condition is our focus here, the condition that assesses whether rats perform in accordance with causal invariance for a continuous outcome as a default, as represented by a linear-sum function. The subadditive condition provides a contrasting baseline in which experience with how causes combine their influences in an interactive manner overrode the rats’ default causal invariance assumption. Beckers et al. (2006) used the suppression ratio of cue X as a measure of rats’ causal judgment about cue X. A value of 0 for the suppression ratio corresponds to complete suppression of bar pressing (i.e., high fear of cue X), and a value of 0.5 corresponds to a complete lack of suppression (i.e., no fear of X). Figure 5.7 shows the mean suppression ratios for experimental and control animals in Experiment (p. 82) 1 of Beckers et al. (2006) and the predictions of selected models. The suppression ratio in the irrelevant-ele­ ment condition was computed by the linear-sum, because additivity—the causal invari­ ance function for a continuous outcome—should be the default model according to our analysis. The noisy-MAX model, which represents an interaction between causes of a con­ tinuous outcome variable, was selected to model the suppression ratio for the subadditive condition. As shown in model simulation results in Figure 5.7, there is little difference in the suppression ratio between the experimental and control conditions for the noisy-MAX model, in agreement with rat data for the subadditive pretraining condition. In contrast, the suppression ratios differ between experimental and control groups under the linearsum model, in agreement with the rat data for the irrelevant-element pretraining condi­ tion.

Page 26 of 33

Causal Invariance as an Essential Constraint for Creating a Causal Repre­ sentation of the World: Generalizing the Invariance of Causal Power

Figure 5.7 Mean suppression ratio for cue X in the experimental and control groups by pretraining con­ ditions in a rat experiment (Beckers et al., 2006, Ex­ pt. 1). Black/white bars indicate the experimental/ control group, respectively. (a) Suppression ratio in rat results. (b) Suppression ratio predicted by the noisy-MAX model (corresponding to the subadditive condition) and predicted by the linear-sum model (corresponding to the irrelevant element condition).

Causal invariance plays such an essential role in our construction of the causal represen­ tation of the world that it appears to be hardwired in both humans and rats. However, al­ though the human and rat participants in the two experiments discussed apparently showed remarkable sensitivity to the two causal invariance functions, the use of the dif­ ferent functions may be due to differences between species. Stronger evidence for the two functions expressing causal invariance awaits experimental manipulation in each species.

Page 27 of 33

Causal Invariance as an Essential Constraint for Creating a Causal Repre­ sentation of the World: Generalizing the Invariance of Causal Power

Summary and Conclusions In this chapter, we have reviewed explanations of the essential role of the concept of causal invariance in interpreting data and developing causal representations. We also have reviewed examples of belief revisions in science, as well as experimental findings on intuitive causal reasoning in humans and rats relevant to assessing the roles that causal invariance plays in causal knowledge construction. Given the unobservability of causal re­ lations, the inexorably changing causal contexts, and the vast problem space of causal representation where reality does not come with ready-made concepts, causal invariance is an essential constraint for representing how things work. The constraint shapes the representations to enable generalizing from learning contexts to application contexts, in support of the acquisition of usable causal knowledge. Causal invariance manifests itself as different mathematical functions for different types of outcome variables, such as vec­ tor addition, addition of the phases of waves, and the noisy-logicals for scalar binary out­ come variables. Unifying the various manifestations is the concept of superposition: caus­ es superpose their capacities to influence an outcome if the capacity of each cause to in­ fluence the outcome remains unchanged in the presence of other causes, as if the other causes were not there. Our review of empirical evidence supports our analysis, suggest­ ing that in both scientific and intuitive causal reasoning, the latter in both humans and rats, reasoners construct their causal representations under the abstract constraint of causal invariance in its various mathematical manifestations. One may ask, do our perceptual systems not provide us with the rudimentary variables and concepts, narrowing the search space? They do, but the computational issue regard­ ing constraints for causal learning is simply pushed one step back: What constraints gov­ ern the evolution of our perceptual systems so that they guide our causal learning process to arrive at usable causal knowledge? Finally, what if we do not have causal in­ variance as our aspiration or analytic knowledge of causal-invariance functions for at least the outcome variable types that are important for survival? As Park et al. note, in that case, our hypothesis testing would be like Alice asking the Cheshire cat for direc­ tions without knowing where she wants to go (Carroll, 1920).

Acknowledgments Preparation of this chapter was in part supported by AFOSR grant FA9550-08-1-0489 to Cheng and NSF grant BCS-1353391 to Lu. We thank Chris Carroll and Jeffrey Bye for valuable discussion. We thank Michael Waldmann and James Woodward for insightful comments on an earlier draft of our chapter. Part of this chapter was presented at the 64th Annual Meeting of the Psychonomics Society, Long Beach, November 2014.

References Ahn, W., Kalish, C. W., Medin, D. L., & Gelman, S. A. (1995). The role of covariation versus mechanism information in causal attribution. Cognition, 54, 299–352. Page 28 of 33

Causal Invariance as an Essential Constraint for Creating a Causal Repre­ sentation of the World: Generalizing the Invariance of Causal Power Beckers, T., De Houwer, J., Pineño, O., & Miller, R. R. (2005). Outcome additivity and out­ come maximality influence cue competition in human causal learning. Journal of Experi­ mental Psychology: Learning, Memory, & Cognition, 31, 238–249. Beckers, T., Miller, R. R., De Houwer, J., & Urushihara, K. (2006). Reasoning rats: For­ ward blocking in Pavlovian animal conditioning is sensitive to constraints of causal infer­ ence. Journal of Experimental Psychology: General, 135, 92–102. Buehner, M. J., Cheng, P. W., & Clifford, D. (2003). Covariation to causation: A test of the assumptions of causal power. (p. 83) Journal of Experimental Psychology: Learning, Mem­ ory, & Cognition, 29(6), 1119–1140. Carroll, C. D., & Cheng, P. W. (2010). The induction of hidden causes: Causal mediation and violations of independent causal influence. In S. Ohlsson & R. Catrambone (Eds.), Proceedings of the 32nd annual conference of the Cognitive Science Society (pp. 913– 918). Austin, TX: Cognitive Science Society. Carroll, C. D., & Cheng, P. W. (under review). Causal invariance, hypothesis revision, and our representation of the causal world. Carroll, L. Alice’s Adventures in Wonderland. New York: Macmillan, 1920. Cartwright, N. (1989). Nature’s capacities and their measurement. Oxford: Clarendon. Cheng, P. W. (1997). From covariation to causation: A causal power theory. Psychological Review, 104, 367–405. Cheng, P. W. (2000). Causality in the mind: Estimating contextual and conjunctive causal power. In F. Keil, & R. Wilson (Eds.), Explanation and cognition (pp. 227–253). Cambridge, MA: MIT Press. Cheng, P. W., & Buehner, M. (2012). Causal learning. In K. J. Holyoak & R. G. Morrison (Eds.), Oxford handbook of thinking and reasoning (pp. 210–233). New York: Oxford Uni­ versity Press. Cheng, P. W., Novick, L. R., Liljeholm, M., & Ford, C. (2007) Explaining four psychological asymmetries in causal reasoning: Implications of causal assumptions for coherence. In M O’Rourke (Ed.), Topics in contemporary philosophy, Vol. 4: Causation and explanation (pp. 1–32). Cambridge, MA: MIT Press. Cheng, P. W., Liljeholm, M., Sandhofer, C. (2013). Logical consistency and objectivity in causal learning. In Proceedings of the 35th annual conference of the Cognitive Science Society (pp. 2034–2039). Austin, TX: Cognitive Science Society. Copernicus, N. (1543/1992) On the Revolutions. Trans. E. Rosen. Baltimore, MD: The Johns Hopkins University Press.

Page 29 of 33

Causal Invariance as an Essential Constraint for Creating a Causal Repre­ sentation of the World: Generalizing the Invariance of Causal Power Einstein, A. (1916). The foundation of the general theory of relativity. In A. Einstein, H. A. Lorentz, H. Minkowski, & H. Weyl. (1952). The principle of relativity (pp. 109–164). New York, NY: Dover. (Original work published 1923) Glautier, S. (2002). Spatial separation of target and competitor cues enhances blocking of human causality judgements. Quarterly Journal of Experimental Psychology B, 55(2), 121– 135. Glymour, C. N. (2001). The mind’s arrows: Bayes nets and graphical models in psychology. Cambridge, MA: MIT Press. Good, M., & Macphail, E. M. (1994). Hippocampal lesions in pigeons (Columba livia) dis­ rupt reinforced preexposure but not overshadowing or blocking. Quarterly Journal of Ex­ perimental Psychology, 47(3), 263–291. Griffiths, T. L., & Tenenbaum, J. B. (2005). Structure and strength in causal induction. Cognitive Psychology, 51, 334–384. Griffiths, T. L., & Tenenbaum, J. B. (2009). Theory-based causal induction. Psychological Review, 116(4), 661–716. Hawking, S., & Mlodinow, L. (2010). The grand design. New York: Bantam Books. Hume, D. (1739/1987). A treatise of human nature (2nd ed.). Oxford: Clarendon Press. Hume, D. (1748/1975). An enquiry concerning human understanding. (L. A. Selby-Bigge & P. H. Nidditch, Eds.) (3rd ed.). Oxford: Clarendon Press. Kamin, L. J. (1969). Predictability, surprise, attention, and conditioning. In B. A. Campbell & R. M. Church (Eds.), Punishment and aversive behavior (pp. 279–296). New York: Ap­ pleton-Century-Crofts. Kant, I. (1781/1965). Critique of pure reason. London: Macmillian. Kehoe, E. J., Schreurs, B., & Amodei, N. (1981). Blocking acquisition of the rabbit’s nicti­ tating membrane response to serial conditioned stimuli. Learning and Motivation, 12(1), 92–108. Kuhn, T. S. (1962/2012). The structure of scientific revolutions (4th ed.). Chicago: Univer­ sity of Chicago Press. Lewis, C. I. (1929). Mind and the world order. Chapter XI. New York: Scribner. Lien, Y., & Cheng, P. W. (2000). Distinguishing genuine from spurious causes: A coherence hypothesis. Cognitive Psychology, 40, 87–137. Liljeholm, M., & Cheng, P. W. (2007). When is a cause the “same?” Psychological Science, 18(11), 1014–1021.

Page 30 of 33

Causal Invariance as an Essential Constraint for Creating a Causal Repre­ sentation of the World: Generalizing the Invariance of Causal Power Lovibond, P. F., Been, S.-L., Mitchell, C. J., Bouton, M. E., & Frohardt, R. (2003). Forward and backward blocking of causal judgement is enhanced by additivity of effect magni­ tude. Memory & Cognition, 31, 133–142. Lu, H., Rojas, R. R., Beckers, T., & Yuille, A. L. (2015). A Bayesian theory of sequential causal learning and abstract transfer. Cognitive Science, 40, 404–439. DOI: 10.1111/cogs. 12236. Lu, H., Yuille, A. L., Liljeholm, M., Cheng, P. W., & Holyoak, K. J. (2008). Bayesian generic priors for causal learning. Psychological Review, 115(4), 955–984. Lucas, C. G., & Griffiths, T. L. (2010). Learning the form of causal relationships using hier­ archical Bayesian models. Cognitive Science, 34, 113–147. Luhmann, C., & Ahn, W.-k. (2005). The meaning and computation of causal power: Com­ ment on Cheng (1997) and Novick and Cheng (2004). Psychological Review, 112, 685– 693. Mackie, J. L. (1974/1980). The cement of the universe: a study of causation. Oxford: Clarendon Press. Merchant, H. G., & Moore, J. W. (1973). Blocking of the rabbit’s conditioned nictitating membrane response in Kamin’s two-stage paradigm. Journal of Experimental Psychology, 101(1), 155. Mermin, N. D. (2005). It’s about time: Understanding Einstein’s relativity. Princeton, NJ: Princeton University Press. Michotte, A. (1946/1963). The perception of causality. New York: Basic Books. Mitchell, C. J., De Houwer, J., & Lovibond, P. F. (2009). The propositional nature of human associative learning. Behavioral and Brain Sciences, 32, 183–198. Newton, I. (1687/1713/1726/1999). The Principia: Mathematical principles of natural phi­ losophy. (I. B. Cohen & A. Whitman, Eds.). Berkeley and Los Angeles: University of Cali­ fornia Press. Novick, L. R., & Cheng, P. W. (2004). Assessing interactive causal influence. Psychological Review, 111, 455–485. Pais, A. (2000). The genius of science: A portrait gallery. Oxford: Oxford University Press. Park, J. Y., McGillivray, S., Cheng, P.W. (under review). Causal invariance as an aspiration: Analytic knowledge of invariance functions. Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible infer­ ence. San Mateo, CA: Morgan Kaufmann.

Page 31 of 33

Causal Invariance as an Essential Constraint for Creating a Causal Repre­ sentation of the World: Generalizing the Invariance of Causal Power Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge, UK: Cambridge University Press. Perales, J. C., & Shanks, D. R. (2007). Models of covariation-based causal judg­ ment: A review and synthesis. Psychonomic Bulletin & Review, 14, 577–596. (p. 84)

Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and non-reinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 64–99). New York: Ap­ pleton-Century-Crofts. Rothman, K. J., Greenland, S., & Lash, T. L. (2008) Modern epidemiology (3rd ed.). Philadelphia: Lippincott, Williams & Wilkins. Scholl, B. J., & Tremoulet, P. D. (2000). Perceptual causality and animacy. Trends in Cogni­ tive Sciences, 4(8), 299–309. Shanks, D. R., & Darby, R. J. (1998). Feature- and rule-based generalization in human as­ sociative learning. Journal of Experimental Psychology: Animal Behavior Processes, 24(4), 405–415. doi:http://dx.doi.org/10.1037/0097-7403.24.4.405 Sobel, D. M., Tenenbaum, J. B., & Gopnik, A. (2004). Children’s causal inferences from in­ direct evidence: Backwards blocking and Bayesian reasoning in preschoolers. Cognitive Science, 28, 303–333. Spirtes, P., Glymour, C., & Scheines, R. (1993/2000). Causation, prediction and search (2nd ed.). Cambridge, MA: MIT Press. Tenenbaum, J. B., & Griffiths, T. L. (2001). Structure learning in human causal induction. Advances in Neural Information Processing Systems, 13, 59–65. Cambridge, MA: MIT Press. Tenenbaum, J. B., & Griffiths, T. L. (2003). Theory-based causal inference. In S. Becker, S. Thrun, & K. Obermayer (Eds.), Advances in Neural Information Processing Systems, 15, 35–42. Cambridge, MA: MIT Press. Vandorpe, S., & De Houwer, J. (2005). A comparison of forward blocking and reduced overshadowing in human causal learning. Psychonomic Bulletin and Review, 12(5), 945– 949. Waldmann, M. R. (2007). Combining versus analyzing multiple causes: How domain as­ sumptions and task context affect integration rules. Cognitive Science, 31, 233–256. Waldmann, M. R., & Hagmayer, Y. (2006). Categories and causality: The neglected direc­ tion. Cognitive Psychology, 53, 27–58.

Page 32 of 33

Causal Invariance as an Essential Constraint for Creating a Causal Repre­ sentation of the World: Generalizing the Invariance of Causal Power Waldmann, M. R., & Holyoak, K. J. (1992). Predictive and diagnostic learning within causal models: Asymmetries in cue competition. Journal of Experimental Psychology: Gen­ eral, 121, 222–236. Woodward, J. (2000). Explanation and invariance in the special sciences. British Journal of the Philosophy of Science, 51, 197–254. Woodward, J. (2003). Making things happen: A theory of causal explanation. Oxford Stud­ ies in the Philosophy of Science. Oxford: Oxford University Press. Wu, M., & Cheng, P.W. (1999). Why causation need not follow from statistical association: Boundary conditions for the evaluation of generative and preventive causal powers. Psy­ chological Science, 10, 92–97. Yuille, A., & Lu, H. (2008). The noisy-logical distribution and its application to causal in­ ference. Advances in Neural Information Processing Systems, 20, 1673–1680. Cambridge, MA: MIT Press.

Patricia W. Cheng

Department of Psychology University of California, Los Angeles Los Angeles, Califor­ nia, USA Hongjing Lu

Department of Psychology University of California, Los Angeles Los Angeles, Califor­ nia, USA

Page 33 of 33

The Acquisition and Use of Causal Structure Knowledge

The Acquisition and Use of Causal Structure Knowl­ edge   Benjamin Margolin Rottman The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.10

Abstract and Keywords This chapter provides an introduction to how humans learn and reason about multiple causal relations connected together in a causal structure. The first half of the chapter fo­ cuses on how people learn causal structures. The main topics involve learning from ob­ servations versus interventions, learning temporal versus atemporal causal structures, and learning the parameters of a causal structure including individual cause-effect strengths and how multiple causes combine to produce an effect. The second half of the chapter focuses on how individuals reason about the causal structure, such as making predictions about one variable given knowledge about other variables, once the structure has been learned. Some of the most important topics involve reasoning about observa­ tions versus interventions, how well people reason compared to normative models, and whether causal structure beliefs bias reasoning. In both sections the author highlights open empirical and theoretical questions. Keywords: causal structure, learning, reasoning, cause-effect, empirical, theoretical

Introduction In the past two decades, psychological research on casual learning has been strongly in­ fluenced by a normative framework developed by statisticians, computer scientists, and philosophers called causal Bayesian networks (CBN), or probabilistic directed acyclic graphical models. The psychological adoption of this computational approach is often called the CBN framework or causal models. The CBN framework provides a principled way to learn and reason about complex causal relations among multiple variables. For example, Thrornley (2013) used causal learning algorithms to extract the causal structure in Figure 6.1 from medical records. Having the causal structure is useful for ex­ perts such as epidemiologists and biologists to understand the disease and make predic­ tions for groups of patients (e.g., the likelihood of having cardiovascular disease among 70-year-old smokers). It is also useful for scientists when planning future research; when Page 1 of 51

The Acquisition and Use of Causal Structure Knowledge researching cardiovascular disease as the primary outcome, it is critical to measure and account for smoking status and age; it is not important to measure or statistically control for systolic blood pressure. Though causal structures are surely useful for scientists, the causal models approach to causal reasoning hypothesizes that lay people also have an intuitive understanding of causal structures and use a framework similar to CBNs to learn and reason about causal relations. Adopting a “man as intuitive statistician” or “intuitive scientist” approach (Pe­ terson & Beach, 1967), we can also contemplate how a doctor might develop a set of causal beliefs about cardiovascular disease somewhat akin to Figure 6.1. Of course the doctor likely has some knowledge of specific causal links from medical school and re­ search articles. But these links may be reinforced or contradicted by personal experience, such as noticing which patients have which symptoms and diseases, and tracking how pa­ tients’ symptoms change after starting a treatment. Developing a set of causal beliefs such as in Figure 6.1 would (p. 86) allow a physician to make prognoses and treatment plans tailored to individual patients.

Figure 6.1 Causal structure of cardiovascular dis­ ease. Source: Adapted from Thornley (2013).

The CBN framework supports all of these different functions: learning, prediction, expla­ nation, and intervention. The rest of this chapter will explain what the CBN framework entails, the evidence pertaining to how people learn and reason about causal networks, and how closely humans appear to mimic the normative CBN framework. The outline of this chapter is as follows. I first explain what CBNs are, both normatively and as a model of human learning and reasoning. The bulk of the first half of this chapter is devoted to evidence about how people learn about causal networks, including the struc­ ture, strength, and integration function. I then discuss evidence suggesting that instead of using the basic CBN framework, people may be using something akin to a generalized version of the CBN framework that allows for reasoning about time. The second half of the chapter is devoted to evidence on how people reason about their causal beliefs. At the end of the chapter I raise some questions for future research.

Page 2 of 51

The Acquisition and Use of Causal Structure Knowledge

What Are Causal Bayesian Networks? A causal Bayesian network (CBN) is a compact visual way to represent the causal rela­ tions between variables. Each variable is represented as a node, and arrows represent causal relations from causes to effects. The absence of an arrow implies the absence of a causal relation. Though CBNs capture the causal relations among variables, they also summarize the sta­ tistical relations between the variables. The CBN framework explains how causal rela­ tions should be learned from statistical relations. For example, given a data set with a number of variables, the CBN framework has rules for figuring out the causal structure(s) that are most likely to have produced the data. Conversely, a causal structure can be read such that if the causal structure is believed to be true, it implies certain statistical rela­ tions between the variables—that some sets of variables will be correlated and others will not be correlated. In order to understand how to “read” a CBN, it is important to under­ stand these relations between the causal arrows and the statistical properties they imply. First, it is critical to understand some basic statistical terminology. “Unconditional depen­ dence” is whether two variables are statistically related to (or “dependent on”) each oth­ er (e.g., correlated) without controlling for any other variables. If they are correlated, they are said to be dependent, and if they are not correlated, they are said to be indepen­ dent. Conditional dependence is whether two variables are statistically related to each other after controlling for one or more variables (e.g., whether there is a significant rela­ tion between two variables after controlling for a third variable in a multiple regression). Conditional and unconditional dependence and independence are critical for understand­ ing a CBN, so it is important to be fluent with these terms before moving on. There are two properties, the Markov property and the faithfulness assumption, that ex­ plain the relations between the causal arrows in a CBN and the statistical dependencies between the variables in a data set that is summarized by the CBN (also see Rehder, Chapters 20 and 21 in this volume). The Markov property states that once all the direct causes of a variable X are controlled for or held constant, X is statistically independent of every variable in the causal network that is not a direct or indirect effect of X. For example, consider Figure 6.2 b. The Markov assumption states that X will be indepen­ dent of all variables (e.g., Z) that are not direct or indirect effects of X (X has no effects in Figure 6.2 b) once controlling for all direct causes of X (Y is the only direct cause of X). Similar analyses can be used to see that X and Z are also independent conditional on Y in Figures 6.2 a and 6.2 c. In regard to Figure 6.2 d, the Markov assumption implies that X and Z are unconditionally independent (not correlated). Neither X nor Z have any direct causes, so X will be inde­ pendent of all variables (such as Z) that are not a direct or indirect (p. 87) effect of X (e.g., Y). The Markov property is symmetric; if X is independent of Z, Z is independent of X— they are uncorrelated. Page 3 of 51

The Acquisition and Use of Causal Structure Knowledge

Figure 6.2 Four CBNs.

The faithfulness assumption states that the only independencies between variables in a causal structure must be those implied by the Markov assumption (Glymour, 2001; Spirtes, Glymour, & Scheines, 1993). Stated another way, all variables in the structure will be dependent (correlated), except when the Markov property states that they would not be. This means that if we collect a very large amount of data from the structures in Figures 6.2 a, 6.2 b, or 6.2 c, X and Y, and Y and Z, and X and Z would all be unconditional­ ly dependent; the only independencies between the variables arise due to the Markov as­ sumption, that X and Z are conditionally independent given Y. If we collected a large amount of data and noticed that X and Z were unconditionally independent, this indepen­ dency in the data would not be “faithful” to Figures 6.2 a, 6.2 b, or 6.2 c, implying that the data do not come from one of these structures. For Figure 6.2 d, the only independency implied by the Markov property is that X and Z are unconditionally independent. If a large amount of data were collected from structure 6.2 d, X and Y, and Z and Y would be depen­ dent (according to the faithfulness assumption). In sum, causal models provide a concise, intuitive, visual language for reasoning about complex webs of causal relations. The causal network diagram intuitively captures how the variables are causally and statistically related to each other. But causal networks can do much more than just describe the qualitative causal and statistical relations; they can precisely capture the quantitative relations between the variables. To capture the quantitative relations among variables, causal networks need to be speci­ fied with a conditional probability distribution for each variable in the network given its direct causes. A conditional probability distribution establishes the likelihood that a vari­ able such as Y will have a particular value given that another variable X (Y’s cause) has a particular value. Additionally, exogenous variables, variables that have no known causes in the structure, are specified by a probability distribution representing the likelihood that the exogenous variable assumes a particular state. For example, the CBN in Figure 6.2 a would be specified by a probability distribution for X, a conditional distribution of Y given X, and a conditional distribution of Z given Y. If X, Y, and Z are binary (0 or 1) variables, the distribution for X would simply be the probabili­ ty that x = 1, P(x = 1). The conditional probability of Y given X would be the probability that y = 1 given that x = 1, P(y = 1 | x = 1), and the probability that y = 1 given that x = 0, P(y = 1 | x = 0), and likewise for the conditional probability of Z given Y. (There is also an­ other way to specify these conditional distributions with “causal strength” parameters, which will be discussed in later sections, and summarized in the section “Alternative Rep­ resentations for Causal Reasoning.” See also, in this volume, Cheng & Lu, Chapter 5; Grif­ fiths, Chapter 7; and Rehder, Chapters 20 and 21, for more details about parameterizing a structure with causal strengths.) Page 4 of 51

The Acquisition and Use of Causal Structure Knowledge If the variables are normally distributed continuous variables, the distribution for X would be captured by the mean and standard deviation of X. Then, the conditional distribution of Y would be captured by a regression coefficient of Y given X (e.g., the probability that y = 2.3 given that x = 1.7), as well as a parameter to capture the amount of error variance. The CBN in Figure 6.2 c would be specified by a distribution for Y, and conditional proba­ bility distributions for X given Y and Z given Y. The CBN in Figure 6.2 d would be specified by probability distributions for X and Z, and a conditional probability distribution for Z given both X and Y. In this way, a large causal structure is broken down into small units. Once all the individual probability distributions are specified, Bayesian inference can be used to make inferences about any variable in the network given any set of other vari­ ables, for example, the probability that y = 3.5 given that x = –0.7 and z = 1.1. CBNs also support inferring what would happen if one could intervene and set a node to a particular value. Being able to predict the result of an intervention allows an agent to choose the ac­ tion that produces the most desired outcome. (p. 88)

CBNs have been tremendously influential across a wide range of fields, including

computer science, statistics, engineering, epidemiology, management sciences, and phi­ losophy (Pearl, 2000; Spirtes, Glymour, & Scheines, 2000). The CBN framework is ex­ tremely flexible and supports many different sorts of tasks. CBNs can be used to make precise predictions (including confidence intervals), and they can incorporate back­ ground knowledge or uncertainty (e.g., uncertainty in the structure, or uncertainty in the strengths of the causal relations) for sensitivity analysis. They can be extended to handle processes that occur over time. And since CBNs are an extension of probability theory, they can incorporate any probability distribution (logistic, multinomial, Gaussian, expo­ nential). In sum, the CBN framework is an extremely flexible way to represent and reason about probabilistic causal relations.

What Is the Causal Bayesian Network Theory of Learning and Rea­ soning? Most generally, the causal model theory of human learning and reasoning is that humans learn and reason about causal relations in ways that are similar to formal CBNs. This the­ ory is part of a broader movement in psychology of using probabilistic Bayesian models as models of higher-level cognition.1 The broader movement of using probabilistic models as models of higher-level cognition is typically viewed at Marr’s computational level of analysis—identifying the problem to be solved. Indeed, articles appealing to causal networks have fulfilled the promise of a computational-level model—for example, they have reframed the problem of human causal reasoning by clarifying the distinction between causal strength versus structure (Griffiths & Tenenbaum, 2005), and by identifying causal structure learning as a goal un­ to itself (Gopnik et al., 2004; Steyvers, Tenenbaum, Wagenmakers, & Blum, 2003).

Page 5 of 51

The Acquisition and Use of Causal Structure Knowledge Though the flexibility of the CBN framework is obviously a tremendous advantage for its utility, the flexibility makes it challenging to specify a constrained descriptive theory of human learning and reasoning. The theoretical underpinning of the CBN framework (e.g., learning algorithms, inference algorithms) is an active area of research, rather than a sta­ tic theory. Additionally, there are many different instantiations of how to apply the frame­ work in a specific instance (e.g., alternative learning algorithms, alternative parameteri­ zations of a model). Because of the flexibility and multifaceted nature of the CBN framework, it is not particu­ larly useful to talk about the CBN framework as a whole. Instead, in the current chapter I focus on the fit between specific aspects of the framework and human reasoning, within a specific task.

Learning Learning Causal Structure Learning a Causal Structure from Observations One of the most dramatic ways that the CBN framework has changed the field of human causal reasoning is by identifying causal structure learning as a primary goal for human reasoning (Steyvers et al., 2003). A fundamental principle of learning causal structure from observation is that it is often not possible to identify the exact causal structure; two or more structures may explain the data equally well. This is essentially a more sophisti­ cated version of “correlation does not imply causation.” Consider the nine observations in Table 6.1, which summarizes the contingency between two variables, X and Y. The correlation between these two variables is .79. Just knowing that X and Y are (p. 89) correlated cannot tell us whether X causes Y [X→Y] or whether Y causes X [X←Y]. (Technically it is also possible that a third factor causes both X and Y, but I ignore this option for simplicity of explanation.) One simple way to see this is that under both of these causal structures, [X→Y] and [X←Y], we would expect X and Y to be correlat­ ed, so the fact that they are correlated cannot tell us anything about the causal structure. A more sophisticated way to understand how it is impossible to determine the true causal structure just from observing the data in Table 6.1 is to see how parameters can be creat­ ed for both causal structures that fit the data equally well. This means that the structures are equally likely to have produced the data.

Page 6 of 51

The Acquisition and Use of Causal Structure Knowledge Table 6.1 Sample Data for Two Variables X

Y

Number of Observa­ tions

1

1

3

1

0

1

0

1

0

0

0

5

Page 7 of 51

Parameters That Fit the Data Perfectly for Each Causal Structure X→Y

X←Y

P(x = 1) = 4/9 P(y = l|x = l) = 3/4 P(y = l|x = 0) = 0

P(y = l) = 3/9 P(x = l|y = l) = l P(x = 1|y = 0) = l/6

The Acquisition and Use of Causal Structure Knowledge The right side of Table 6.1 shows parameters for the respective causal structure that fit the data perfectly. For example, for [X→Y], we need to find three parameters to specify the structure. The base rate of X, P(x = 1), can be obtained by calculating the percentage of times that X = 1 regardless of Y ((3+1)/9). P(y = 1 | x = 1) is simply the percent of times that Y = 1 given that X = 1, and P(y = 1 | x = 0) is the percent of times that Y = 1 given that X = 0. Parameters can be deduced for X←Y in a similar fashion. If we simulated a large number of observations that we would expect to see from each causal structure with the parameters specified in Table 6.1, we would find that both structures would pro­ duce data with proportions that looks similar to the data in Table 6.1. Specifically, we would observe trials in which both variables are 1 about 3 out of 9 times, trials in which both variables are 0 about 5/9 times, and trials in which X = 1 and Y = 0 about 1 in 9 times in the long run. Because we were able to find parameters for these structures that produce data very similar to the data we observed, these two structures are equally likely given the observed data. The same logic also applies with more variables. Consider the data in Table 6.2 with three variables. If you ran a correlation between each pair of variables you would find that X and Y are correlated (r = .25), Y and Z are correlated (r = .65), and X and Z are correlated (r = .17) but are independent (r = 0) once Y is controlled for. According to the Markov and faithfulness assumptions, this pattern of dependencies and conditional independencies is consistent with three and only three causal structures: X→Y→Z, X←Y←Z, and X←Y→Z. Table 6.2 shows parameters for each of these causal structures that fit the data perfectly. If we sampled a large amount of data from any of the three structures in Table 6.2 with the associated parameters, the proportion of the types of eight observations of X, Y, and Z would be very similar to the proportions of the number of observations in Table 6.2. These three causal structures are said to form a Markov class because they are all equally con­ sistent with (or likely to produce) the set of conditional and unconditional dependencies in the observed data. Thus, it is impossible to know which of these three structures pro­ duced the set of data in Table 6.2. Importantly, non-Markov equivalent causal structures can be distinguished from one an­ other with observations. For example, common effect structures such as X→Y←Z are in their own Markov equivalence class, so they can be uniquely (p. 90) identified. According to the Markov assumption for [X→Y←Z], X and Y are dependent, and Z and Y are depen­ dent, but X and Z are independent. There are no other three-variable structures with this particular set of conditional and unconditional dependencies. This means that even though X→Y→Z, X←Y←Z, and X←Y→Z are all equally likely to produce the data in Table 6.2, X→Y←Z is much less likely. Suppose we tried to find parameters for X→Y←Z to fit the data in Table 6.2. It would be possible to choose parameters such that Y and X are correlated roughly around r = .25, and that Y and Z are correlated roughly around r = .65, matching the data in Table 6.2 fairly closely. But critically, we would find that no matter what para­ meters we chose, X and Z would always be uncorrelated, and thus, it would be very un­ likely that the data from Table 6.2 would come from X→Y←Z.

Page 8 of 51

The Acquisition and Use of Causal Structure Knowledge Table 6.2 Sample Data for Three Variables X

Y

Z

Number of Observations

1

1

1

6

1

1

0

3

1

0

1

0

1

0

0

3

0

1

1

4

0

1

0

2

0

0

1

0

0

0

0

6

Page 9 of 51

Parameters That Fit the Data Perfectly for Each Causal Structure X→Y→Z

X←Y←Z

X←Y→Z

P(x = l) = l/2 P(y = l|x = l) = 3/4 P(y = l|x = 0) = l/2 P(z = l|y = l) = 2/3 P(z = l|y = 0) = 0

P(z = l) = 5/12 P(y = l|z = l) = l P(y = l|z = 0) = 5/14 P(x = l|y = l) = 3/5 P(x = l|y = 0) = l/3

P(y = l) = 5/8 P(x = l|y = l) = 3/5 P(x = l|y = 0) = l/3 P(z = l|y = l) = 2/3 P(z = l|y = 0) = 0

The Acquisition and Use of Causal Structure Knowledge In sum, by examining the dependencies between variables, it is possible to identify which types of structures are more or less likely to have produced the observed data. Structures within the same Markov equivalence class always have the exact same likelihood of pro­ ducing a particular set of data, which means that they cannot be distinguished, but struc­ tures from different Markov equivalence classes have different likelihoods of producing a particular set of data. Steyvers et al. (2003) conducted a set of experiments to test whether people understand Markov equivalence classes and could learn the structure from purely observational data. First, they found that given a particular set of data, participants detected the correct Markov class at rates above chance. Furthermore, people seem to be fairly good at under­ standing that observations cannot distinguish X→Y from X←Y. And people also seem to un­ derstand to some extent that common effect structures X→Y←Z belong to their own Markov equivalence class. However, Steyvers et al.’s participants were not good at distinguishing chain and com­ mon cause structures, even when they were from different equivalence classes (e.g., X→Y→Z vs. X→Z→Y). Distinguishing these structures was made more difficult in the exper­ iment because the most common type of observation for all these structures was for the three variables to have the same state. Still, the participants did not appear to use the tri­ als when two of the variables shared a state different from a third to discriminate causal structures (e.g., the observation X = Y ≠ Z is more consistent with X→Y→Z than X→Z→Y). Given that Markov equivalence class is so important for theories of causal structure learning from observation, it is surprising that there is not more work on how well lay people understand Markov equivalence. One important future direction would be to give participants a set of learning data that unambiguously identifies a particular Markovequivalent class, and test the percent of participants who (1) identify the correct class, (2) identify all the structures in the Markov class, and (3) include incorrect structures outside the class. Such an experiment would help clarify how good or bad people are at learning causal structure from observations. Additionally, Steyvers et al. used categorical variables with a large number of categories and nearly deterministic causal relations, which likely facilitated accurate learning because it was very unlikely for two variables to have the same value unless they were causally related. It would be informative to examine how well people understand Markov equivalence classes with binary or Gaussian variables, which will likely be harder. Another question raised by this article is to what extent heuristic strategies may be able to explain the psychological processes involved in this in­ ference. In the studies by Steyvers et al. (2003) there are some simple rules that can dis­ tinguish the Markov equivalence classes fairly successfully. For example, upon observing a trial in which X = Y = Z, X←Y→Z is much more likely than X→Y←Z, but upon observing a trial in which X ≠ Y = Z, the likelihoods flip. But in other types of parameterizations, such as noisy binary data or Gaussian data, this discrimination would not be so easy. Even though Markov equivalence is a core feature of causal structure learning from ob­ servations, as far as I know this study by Steyvers et al. is the only study to test how well Page 10 of 51

The Acquisition and Use of Causal Structure Knowledge people learn causal structures purely from the correlations between the variables. There are a number of other studies that have investigated other observational cues to causali­ ty. For example, a number of studies have found that if X occurs followed by Y, people quickly and robustly use this temporal order or delay cue to infer that X causes Y (Lagna­ do & Sloman, 2006; McCormack, Frosch, Patrick, & Lagnado, 2015). This inference oc­ curs despite that fact that the temporal order may not necessarily represent the order in which these variables actually occurred, but instead it might reflect the order in which they become available for the subject to observe them. Another cue that people use to in­ fer causal direction are beliefs about necessity and sufficiency. If a learner believes that (p. 91) all causes are sufficient to produce their effects, that whenever a cause is present, its effects will be present, then observing that X = 1 and Y = 0 implies that X is not a cause of Y, otherwise Y would be 1 (Mayrhofer & Waldmann, 2011). In the section “Learn­ ing Temporal Causal Structures” I will discuss one other way that people learn causal di­ rection from observation. But the general point of all of these studies is that when these other cues to causality are pitted against pure correlations between the variables, people tend to use these other cues to causality (see Lagnado, Waldmann, Hagmayer, & Sloman, 2007, for a summary). In sum, though there is some evidence that people do understand Markov equivalence class to some extent, this understanding appears limited. Furthermore, there are not many studies on how well people learn causal structures in a bottom-up fashion purely from correlational data when covariation is the only cue available. In contrast, what is clear is that when other strategies are available, people tend to use them instead of infer­ ring causal structure purely from the dependencies and conditional independencies.

Learning Causal Structure from Interventions: Choosing Interventions Another core principle underlying causal structure learning is that interventions (manipu­ lations) have the capability to discriminate causal structures that would not be distin­ guishable from observation; this is the same reason that experiments are more useful than observational studies. Going back to the example in Figure 6.1, if we were trying to figure out the causal relations between diabetes (D), statin use (S), and systolic blood pressure (BP), observational data would only be able to narrow the possibilities down to three structures: D→S→BP, D←S←BP, and D←S→BP. However, if we could do a randomized experiment such that half the patients take a statin and the other half do not, we could in­ fer the causal structure. If D→S→BP is the true causal structure, then the patients who take a statin would have lower BP than those who do not, but there would not be any dif­ ference in D across the two groups. In contrast, if D←S←BP is the true causal structure, there would be a difference in D across the two groups, but there would not be a differ­ ence in BP across the two groups. Finally, if D←S→BP is the true causal structure, there would be a difference in both D and BP across the two groups. The rest of this section will explain in more detail how interventions can be used to precisely identify a causal struc­ ture, and how humans use interventions.

Page 11 of 51

The Acquisition and Use of Causal Structure Knowledge The language of causal structure diagrams has a simple notation to represent interven­ tions. When an intervention sets the state of a variable, all other variables that would oth­ erwise be causes of the manipulated variable are no longer causes, so those links get re­ moved. For example, when the patients in our example are randomly assigned to take a statin, even though normally having diabetes is a cause of taking a statin, now because of the random assignment diabetes is no longer a cause of taking a statin. More generally, the reason that interventions can make causal structures that are in the same Markov equivalence class distinguishable is that the intervention changes the causal structure. For this reason, interventions are sometimes called “graph surgery” (Pearl, 2000). Figure 6.3 a–c shows three causal structures that are not differen­ tiable from observation. Figure 6.3 d–f and g–i show the same three causal structures un­ der either an intervention on Y or an intervention on X; the i nodes represent the (p. 92) intervention. (The intervention on Y is analogous to the previous example of the random­ ized experiment about taking a statin; it could be useful to compare these two examples for generality.)

Figure 6.3 Three causal structures with different types of interventions.

Under the intervention on Y, all three causal structures now have different dependence relations. In graph d (Figure 6.3), Z and Y would still be correlated, but neither would be correlated with X. In graph e, X and Y would be correlated, but neither would be correlat­ ed with Z. And in graph f, all three variables would be correlated, but X and Z would be­ come uncorrelated conditional on Y. In sum, interventions on Y change the causal struc­ ture such that the resulting structures no longer fall within the same Markov equivalence class, so they can be discriminated. In contrast, an intervention on X can discriminate graph g from graph h, but cannot discriminate graph h from graph i. This means that an intervention on X does not provide as much information for discriminating these three causal structures as does an intervention on Y. Do people choose interventions that maximize “information gain,” the ability to discrimi­ nate between multiple possible structures? Before getting to the evidence, it is useful to consider an alternative strategy for choosing interventions to learn about causal struc­ ture, aside from maximizing information gain: selecting interventions that have the largest influence on other variables. For example, consider again the three structures in the “No interventions” row in Figure 6.3. In graph a, X influences two variables directly or Page 12 of 51

The Acquisition and Use of Causal Structure Knowledge indirectly, and in graphs b and c, X does not influence either other variable, for a total “centrality” rating of 2. Z has the same centrality rating—2. Y, in contrast, influences Z in graph a, X in graph b, and X and Y in graph c, for a total centrality rating of 4. In sum, looking across all three possible structures, Z is more “central” or more of a “root cause.” If a learner chooses to intervene to maximize the amount of changes in other variables, she will tend to intervene on Y instead of X or Z. Sometimes, as in the example in Figure 6.3, the information gain strategy and the root cause strategy produce the same interventions; Y is the most central variable, and inter­ ventions on Y help discriminate the three structures the most. However, sometimes the two strategies lead to different interventions. For example, when trying to figure out whether [X→Z→Y] or [X→Y→Z] is the true structure, X has the highest centrality rating. However, intervening on X would not discriminate the structures well (low information gain) because for both structures it would tend to produce data in which X = Y = Z. Inter­ vening on Y or Z would more effectively discriminate the structures. For example, inter­ vening on Y would tend to produce data in which X = Z ≠ Y for [X→Z→Y] but would tend to produce data in which X ≠ Y = Z for [X→Y→Z], effectively discriminating the two struc­ tures. The root cause strategy, intervening on X, can be viewed as a type of positive or confirmatory testing strategy in the sense that it confirms the hypothesis that X has some influence on Y and Z, but does not actually help discriminate between the remaining hy­ potheses. Coenen et al. (2015) tested whether people use these two strategies and found that most people use a mixture of both, though some appear to use mainly one or the other. In an­ other experiment, Coenen tested whether people can shift toward primarily using the in­ formation gain strategy if they are first trained on scenarios for which the root cause pos­ itive testing strategy was very poor at discriminating the causal structures. Even without feedback, over time participants switched more toward using information gain. They also tended to use the root cause strategy more when answering faster. In sum, root cause positive testing is a heuristic that sometimes coincides with information gain, and it ap­ pears that people sometimes can overcome the heuristic when it is especially unhelpful. Though Steyvers et al. (2003) did not describe it in the same way as Coenen et al. (2015), they actually have evidence for a fairly similar phenomenon to the positive test strategy. They found that people tended to intervene on root causes more than would be expected by purely using information gain. In their study, participants first saw 10 observational learning trials, and then chose the causal structure that they thought was most plausible, for example [X→Y→Z]. Technically, since the data was observational, they could not distin­ guish models within the same Markov equivalent class at this stage. Next they selected one intervention on either X, Y, or Z and would get 10 more trials of the same interven­ tion repeatedly. Steyvers et al. found that their participants tended to select root cause variables to intervene upon. If they thought that the chain [X→Y→Z] structure was most plausible, they most frequently intervened on X, then Y, and then Z. This pattern fits bet­

Page 13 of 51

The Acquisition and Use of Causal Structure Knowledge ter with the root cause heuristic than the information gain strategy, which suggests inter­ vening on Y. Because this finding cannot be explained by information gain alone, Steyvers et al. creat­ ed two additional models; here I only discuss rational test (p. 93) model 2. This model makes two additional assumptions. First, it assumes that participants had a distorted set of hypotheses about the possible causal structure. Normatively their hypothesis space for the possible causal structures should have been the Markov equivalent class; if they se­ lected [X→Y→Z] as the most likely structure after the 10 observational trials, they should have also viewed [X←Y←Z] and [X←Y→Z] as equally plausible, and the goal should have been to try to discriminate among these three. However, this model assumes that instead of trying to discriminate between the three Markov equivalent structures, participants were trying to discriminate between [X→Y→Z], [X→Y; Z], [X; Y→Z], and [X; Y; Z]; the latter three are the subset of the chain structures that include the same causal links or fewer links. I call this assumption the “alternate hypothesis space assumption” in that the set of possible structures (hypothesis space) is being changed. Under this new hypothesis space, when X is intervened upon, each of these four struc­ tures would produce different patterns of data, making it possible to determine which of these four structures is most likely (see Table 6.3. This means that X has high information gain. In contrast, when Y or Z is manipulated, some of the structures produce the same patterns of data, meaning that they provide less information gain. Steyvers et al. (2003) introduced another assumption as well; they assumed that people only attend to variables that take on the same state as the intervened-upon variable and ignore any variables that have a different state from the manipulated variable. The combi­ nation of the two assumptions is detailed in the right column of Table 6.3. When X is inter­ vened upon, it would produce three different patterns of data for the four causal struc­ tures, which means that it has fairly high information gain; it can distinguish all but the bottom two structures in Table 6.3. The reason that an intervention on X can no longer distinguish between the bottom two structures is because of the assumption that the oth­ er two variables that do not equal X—Y and Z—are ignored. When Y is intervened upon, it can narrow down the space of 4 structures to 2, a medium amount of information gain. When Z is intervened upon, all the structures produce the same pattern of data, so an intervention on Z does not help at all to identify the true structure. In sum, the combination of these two hypotheses now makes it such that inter­ vening on X is more informative than Y, which is more informative than inventing on Z. This pattern matches the frequency of participants’ interventions, which were most fre­ quently on X, then on Y, and lastly on Z. There are two key points made by this analysis of the similarities between the findings of Coenen et al. (2015) and Steyvers et al. (2003). First, even though they approach the re­ sults from different perspectives and talk about the results in different ways, they both found that people tended to intervene on root causes. Second, even though Steyvers’s model has rational elements, the resulting model is not very close to the ideal model, for Page 14 of 51

The Acquisition and Use of Causal Structure Knowledge which Y is the most informative intervention. Finally, by comparing different models with different assumptions, it can be (p. 94) seen how the two assumptions made by Steyvers et al. effectively amount to the positive test strategy put forth by Coenen at al. Restated, the same behavioral pattern of intervening primarily on root causes could be explained in more than one way.

Page 15 of 51

The Acquisition and Use of Causal Structure Knowledge Table 6.3 Most Common Pattern of Data Produced by an Intervention on a Particular Node for a Particular Structure Alternate Hypothesis Space Assumption

Alternate Hypothesis Space Assumption and Only Attending to Variables with Same State as Manip­ ulated Variable

Structure Hy­ pothesis Space

Intervention On

Intervention On

X

Y

Z

X

Y

Z

X→Y→Z

X=Y=Z

X≠Y=Z

X=Y≠Z

X=Y=Z

Y=Z

Z

X→Y; Z

X=Y≠Z

X≠Y≠Z

X=Y≠Z

X=Y

Y

Z

X; Y→Z

X≠Y=Z

X≠Y=Z

X≠Y≠Z

X

Y=Z

Z

X; Y;Z

X≠Y≠Z

X≠Y≠Z

X≠Y≠Z

X

Y

Z

How informa­ tive this inter­ vention is

High

Medium

Medium

High

Medium

Low

Page 16 of 51

The Acquisition and Use of Causal Structure Knowledge Bramley et al. (2015) also studied a number of important factors related to learning from interventions. Overall, they found that humans were highly effective causal learners, and were able to select and make use of interventions for narrowing down the number of pos­ sible structures. One particular factor he introduced was the possibility of intervening on two variables simultaneously rather than just one. Double interventions are particularly helpful to distinguish between [X→Y→Z and X→Z] versus [X→Y→Z]. With a single interven­ tion, these two structures are likely to produce very similar outcomes. For example, an in­ tervention on X is likely to produce data in which X = Y = Z, an intervention on Y is likely to produce data in which X ≠ Y = Z, and an intervention on Z is likely to produce data in which X = Y ≠ Z. However, consider a double intervention setting X = 1 and Y = 0. Under the simple chain [X→Y→Z], Z is most likely to be 0, but in the more complex structure in which Z is influ­ enced by both X and Y, Z has a higher chance of being 1 because X = 1. Bramley et al. (2015) found that distinguishing between these two types of structures was the hardest discrimination in this study and produced the most errors: 61 of the subjects rarely used double interventions, whereas 49 were more likely to use them, suggesting that people may not use double interventions frequently enough for causal learning. In sum, there are many open questions about how people choose interventions. There is some evidence that people do use information gain when selecting interventions, but there are also a variety of heuristics (only use single interventions, intervene on root causes) and/or simplifying assumptions (focus on a limited and distorted hypothesis space, only attend to variables with the same value as the manipulated variable). Clearly there is more work to be done to have a fuller and more robust understanding of how hu­ mans choose interventions to learn about causal structures.

Learning Causal Structures from Interventions: Interpreting Interventions and Updating Beliefs about the Structure The prior section discussed how people choose interventions to learn a causal structure. This section examines how people interpret the outcome after making an intervention. Four patterns have been proposed to explain how people interpret the outcomes of inter­ ventions. The first is that if a variable X is manipulated and another variable Z assumes the same state of X, people tend to infer a direct link from X to Z. Though this heuristic makes sense when learning the relations between two variables, it can lead to incorrect inferences in cases involving three or more variables linked together in a chain structure such as X→Y→Z because it can lead people to infer additional links that are not in the structure. If a learner intervenes on X such that it is 1, and subsequently Y and Z are both 1, this heuristic implies that X→Y and X→Z. Indeed, people often infer that there is an X→Z link above and beyond X→Y→Z, even in cases when there is not a direct link from X to Z (Bramley et al., 2015; Fernbach & Sloman, 2009; Lagnado & Sloman, 2004; Rottman & Keil, 2012). (In reality, the correct way to determine whether there is an X→Z above and beyond X→Y→Z is to see whether the probability of Z is correlated with the state of X

Page 17 of 51

The Acquisition and Use of Causal Structure Knowledge within trials in which Y is 1 or within trials in which Y is 0, or to use double interventions, as explained previously.) This heuristic is problematic for two reasons. At a theoretical level, it suggests that peo­ ple fail to pay attention to the fact that X and Z are independent conditional on Y. As al­ ready discussed, attending to statistical independencies is critical for understanding Markov equivalence classes, and this finding suggests that people do not fully understand the relations between statistical independence and causal Markov equivalence class. At a more applied level, adding this additional link X→Z could lead to incorrect inferences (see the section “Do People Adhere to the Markov Condition When Reasoning About Causal Structures?”). In particular, when inferring the likelihood that Z will be present given that Y is present, people tend to think that X has an influence on Z above and beyond Y. In ref­ erence to a subset of Figure 6.1, even though the true causal structure is Ethnicity → Smoking → Cardiovascular Disease, this heuristic could lead doctors to incorrectly predict that people of certain ethnicities are more likely to develop cardiovascular disease even after knowing their smoking status, even though ethnicity has no influence on cardiovas­ cular disease above and beyond smoking (according to Thornley, 2013). Such a misper­ ception could lead people of those ethnicities to feel unnecessarily worried that their (p. 95) ethnicity will cause them to have cardiovascular disease. The second pattern of reasoning was already discussed in the previous section. Steyvers et al. (2003) proposed that when a person intervenes on a variable, that he only attends to other variables that assume the same state as the manipulated variable. The previous section explained how this tendency would bias reasoners to intervene on root causes (Ta­ ble 6.3). But this tendency would also decrease the effectiveness of learning from inter­ ventions. If one intervenes on Z and the resulting observation is X = Y ≠ Z, the fact that X and Y have the same state should increase the likelihood that there is some causal rela­ tion between X and Y; however, this heuristic implies that people would not learn any­ thing about X or Y because people only attend to variables with the same state as the in­ tervened-upon variable (Z). In sum, this simplification means that people do not extract as much information from interventions as they could. The third and fourth habits of updating causal beliefs after interventions come from the study by Bramley et al. (2015). In this study, participants made a series of interventions on X, Y, or Z, and after each intervention they drew the causal structure that they be­ lieved to be the most plausible structure given the evidence up to that point. They discov­ ered two interrelated habits. First, participants updated their drawings of the causal structure slowly. This can be explained as a conservative tendency; people need consider­ able evidence before adding or deleting a causal relation to their set of beliefs. The sec­ ond pattern is that when drawing the causal structures, participants were influenced by the most recent intervention and appeared to forget many of the outcomes of prior inter­ ventions. The combination of these two habits, conservatism and forgetting, can be ex­ plained with an analogy to balancing a checkbook. After each transaction, one updates the current balance by adding the most recent transaction to the prior balance, but one does not recalculate the balance from all past transactions after each transaction. Keep­ Page 18 of 51

The Acquisition and Use of Causal Structure Knowledge ing the running balance is a way to simplify the calculation. Likewise, storing a represen­ tation of the causal structure as a summary of the past experience allows the learner to get by without remembering all the past experiences; the learner just has to update the prior causal structure representation. In a related vein, Fernbach and Sloman (2009) found that people have a recency bias—they are most influenced by the most recent data, which is similar to forgetfulness. Understanding the interplay between all of these habits will provide insights into how people learn causal structures from interventions in ways that are cognitively tractable.

Learning Temporal Causal Structures So far this chapter has focused on how people learn about atemporal causal networks in which each observation is assumed to be temporally independent. In the example at the beginning of the chapter about cardiovascular disease, each observation captured the age, sex, smoking status, diabetes status, and other variables of an individual patient. The causal link between smoking and cardiovascular disease, for example, implies that across patients, those who smoke are more likely to have cardiovascular disease. However, often it is important to understand how variables change over time. For exam­ ple, a physician treating patients with cardiovascular disease is probably less interested in population-level effects, and instead is more interested in understanding how a change in smoking would influence an individual patient’s risk of developing cardiovascular dis­ ease. Temporal versions of CBNs can be used to represent learning and reasoning about changes over time (Ghahramani, 1998; Murphy, 2002 also see Rehder, Chapter 21 in this volume). Temporal CBNs are very similar to standard CBNs, except each variable is rep­ resented by a series of nodes for each time point t. The causal structure is often assumed to be the same across time, in which case the causal structure is repeated at each time point. Additionally, often variables are assumed to be influenced by their past state; posi­ tive autocorrelation means that if the variable was high at time t, it is likely to be high at t+1. For example, Figure 6.4 shows a causal network representing the influence of using an antihypertensive on blood pressure: 1 represents using an antihypertensive or having high blood pressure, whereas 0 represents not using an antihypertensive or having nor­ mal blood pressure. Instead of just having one node that represents using an antihyper­ tensive and another for blood pressure, now the structure is repeated at each time point. Additionally, the autocorrelation can be seen with the horizontal arrows. All things being equal, if a patient’s blood pressure is high, it will tend to stay high for periods of time. Likewise, if a patient starts using an antihypertensive, they might continue to use it for a while. Like all CBNs, temporal CBNs follow the same rules and conventions. Here, instead of us­ ing i (p. 96) nodes to represent interventions, I used text to explain the intervention (e.g., a physician prescribed an antihypertensive). The interventions are the reason that some Page 19 of 51

The Acquisition and Use of Causal Structure Knowledge of the vertical and horizontal arrows are removed in Figure 6.4 because an intervention modifies the causal structure. The Markov condition still holds in exactly the same way as in temporal CBNs. For example, a patient’s blood pressure (BP) at age 73 is influenced by his BP at 72, but his BP at 71 does not have an influence on his BP at 73 above and be­ yond his BP at age 72. Causal learning from interventions works in essentially the same way in temporal and atemporal causal systems. In Figure 6.4 it is easy to learn that using an antihypertensive influences blood pressure, not the reverse. When the drug is started, the patient’s BP de­ creases, and when the drug is stopped, the patient’s BP increases. But when another in­ tervention (e.g., exercising) changes the patient’s blood pressure, it does not have an ef­ fect on whether the patient uses a statin. One interesting aspect about temporal causal systems is that is possible to infer the direc­ tion of a causal relationship from observations, which is not possible with atemporal sys­ tems. Consider the data in Figure 6.5; the direction of the causal relation is not shown in the figure. There is an asymmetry in the data; sometimes X and Y change together, and sometimes Y changes without X changing, but X never changes without Y changing. Col­ leagues and I have found that both adults and children notice this asymmetry and use it to infer that X causes Y (Rottman & Keil, 2012; Rottman, Kominsky, & Keil, 2014; Soo & Rottman, 2014). The logic is that Y sometimes changes on its own, implying that whatever caused Y to change did not carry over to X; Y does not influence X. Furthermore, some­ times X and Y change together. Since we already believe that Y does not influence X, one way to explain the simultaneous change in X and Y is that a change in X caused the change in Y.2 This is one way in which human causal learning seems more akin to learn­ ing a temporal CBN rather than an atemporal CBN; the temporal aspect of this data is critical for inferring the causal direction.

Figure 6.4 A temporal CBN.

Page 20 of 51

The Acquisition and Use of Causal Structure Knowledge

Figure 6.5 Example of learning causal direction from temporal data.

A number of other phenomena fit well into the temporal CBN framework. Consider the data in Figure 6.6 “Observed data.” In this situation, subjects know that C is a potential cause of E, not the reverse, and the goal is to judge the extent that C influences E. Over­ all, there is actually zero correlation between C and E. The faithfulness assumption states that the only independencies in the data arise through the Markov assumption. If C and E are unconditionally independent, it means that C cannot be a direct cause of E. Instead, another possibility (“Possible structure 1” in Figure 6.6) is that some unobserved third variable U is entirely responsible for E. However, when faced with data like that in Figure 6.6, people do not conclude that C is unrelated to E; instead, they notice that there are periods of time in which C has a posi­ tive influence on E (times 0–3), and other periods of time in which C has a negative influ­ ence on E (times 4–7). They subsequently tend to infer that C does actually have a strong influence on E, but that there is some unobserved factor that is fairly stable over time, and C and the unobserved factor (U) interact to produce E (Rottman & Ahn, 2011). This explanation is represented in Figure 6.6 “Possible structure 2.” In this structure, both C and U influence E, and there is an ark between the two links, which represents an interac­ tion; in this case the interaction is a perfect cross-over such that E is 1 if both C and U are 1 or both are 0. The reason people appear to make this inference about the crossover in­ teraction with an unobserved cause rather than inferring that C is unrelated to E is be­ cause the data are grouped into distinct periods such that there are periods during which there is sometimes a positive relation and (p. 97) other times a negative relation between C and E. This allows the reasoner to infer that some unobserved factor U must account for the switch. If the same eight trials were randomized, then people tend to infer that only U is a cause of E, not C. This inference again suggests that people tend to represent causal systems as temporally extended (that variables such as U tend to be autocorrelated) rather than atemporal (see Rottman & Ahn, 2009, for another example).

Page 21 of 51

The Acquisition and Use of Causal Structure Knowledge

Figure 6.6 Learning about an interaction with an un­ observed factor.

Elsewhere, colleagues and I have argued (Rottman et al., 2014) that many of the causal learning phenomena that have been used as evidence that people learn about causal rela­ tions in ways akin to CBNs are even better explained by temporal CBNs. For example, one study found that children can learn about bidirectional causal relations in which two vari­ ables both cause each other (Schulz, Gopnik, & Glymour, 2007). Bidirectional causal structures can only be represented through temporal, not atemporal causal networks (Griffiths & Tenenbaum, 2009; Rottman et al., 2014). In conclusion, there is growing evidence that, at least in certain situations, people appear to be learning something similar to a temporal causal network, and the temporal aspect of reasoning allows them to infer quite sophisticated causal relations that would other­ wise be impossible to learn.

Learning About the Integration Function Another aspect of a CBN that must be learned in addition to the structure is the integra­ tion function: the way that multiple causes combine to influence an effect (also see Grif­ fiths, Chapter 7 in this volume, and Rehder, Chapters 20 and 21 in this volume). For exam­ ple, in regression, the predictors are typically assumed to combine linearly. The CBN framework allows for the possibility that causes can potentially combine in any conceiv­ able way, and humans are extremely flexible as well. For example, Waldmann (2007) demonstrated that people naturally reason about causes that are additive (e.g., the effect of taking two medicines is the sum of the two individual effects) and averages (e.g., the taste of two chemicals mixed together is the average of the two). Furthermore, people use background knowledge (e.g., about medicines and taste) to decide which type of inte­ gration function is more plausible in a given situation. Most research on causal learning has focused on binary variables. The most prominent in­ tegration function for binary variables, called noisy-OR, describes situations in which there are multiple generative causes (Cheng, 1997; Pearl, 1988). It stipulates that the probability of the effect being absent is equal to the probability that all the causes hap­ pen to simultaneously fail to produce the effect. If there are two causes, each of which produce the effect 50% of the time on their own (a causal strength of .5), then both would fail simultaneously 25% of the time; the effect should occur 75% of the time. If there are Page 22 of 51

The Acquisition and Use of Causal Structure Knowledge three causes, each of which produces the effect 50% of the time, then all three would si­ multaneously fail .53 = 12.5% of the time; the effect would be present 87.5% of the time. An analogous integration function called noisy-AND-NOT can be used to describe in­ hibitory causes that combine in a similar fashion. It is not difficult to imagine other sorts of integration functions, (p. 98) and the following studies have examined how people learn about the integration function from data. Beckers et al. (2005; see also Chapter 4 in this volume by Boddez, De Houwer, & Beckers) studied how beliefs about the integration function influence learning. In one study, partic­ ipants first learned about two causes, G and H, both of which produce an outcome of 1 on their own. In the “additive” condition, they saw that G and H together produce an out­ come of 2, which is consistent with an integration function in which two causes add to­ gether. In another condition, they saw that G and H together produce an outcome of 1. This is inconsistent with the notion that the two causes add together; instead, it suggests some sort of “subadditive” integration function in which the effect can never be higher than 1. Subsequently, participants in both conditions experienced a blocking paradigm in which they learn that A by itself produces an outcome of 1, and A plus X produces an out­ come of 1. In the subadditive condition, participants still thought that X might be a cause because they believed that the effect could never go higher than 1. In contrast, in the ad­ ditive condition they concluded that X was not a cause; if it was, then presumably the ef­ fect would have been 2. Lucas and Griffith (2010) investigated a similar phenomenon, that initial training about how causes combine can influence whether subjects interpret that a variable is a cause or not. They first presented people with data that suggested that the causes worked con­ junctively (multiple causes were needed to be present for the effect to occur), or through the noisy-OR function (a single cause was sometimes sufficient to produce the effect). Af­ terward, participants saw a cause D never produce the effect, and saw that two causes in combination, D and F, produced the effect. Participants in the conjunctive condition tend­ ed to conclude that both D and F were causes, whereas participants in the noisy-OR con­ dition tended to infer that only F was a cause. In sum, these results show that people quickly and flexibly learn about how causes com­ bine to produce an effect, and the integration rule that they learn dramatically influences subsequent reasoning about the causal system.

Learning Causal Strength So far this chapter has focused on how people learn causal structure, and to a lesser ex­ tent integration functions. One other important component of causal relations is causal strength, our internal measurement of how important a cause is. For example, if a medi­ cine works very well to reduce a symptom, it has high causal strength, but if it does not reduce the symptom at all, it has zero causal strength.

Page 23 of 51

The Acquisition and Use of Causal Structure Knowledge Prior to the CBN framework, theories of causal strength learning were based on simple measures of the contingency between the cause and effect. For example, the ΔP model computes the strength of the influence of a cause (C) on an effect (E) by the extent of the difference in the probability of the effect when the cause is present versus absent; P(e = 1 | c = 1) – P(e = 1 | c = 0) (Cheng & Novick, 1992; Jenkins & Ward, 1965). This same con­ trast is calculated at asymptote by one of the most influential models of conditioning as a way to capture how strongly a cue and outcome become associated by an animal (Danks, 2003; Rescorla & Wagner, 1972). This same model has also been proposed as a model of causal learning, the idea being that the stronger that a cue is associated with an out­ come, the stronger that humans would infer that the cue causes the outcome (Shanks & Dickinson, 1987). With the introduction of the CBN framework, a number of theories of causal learning were proposed that incorporate different sorts of top-down causal beliefs into the learn­ ing process. A number of other chapters in this volume discuss causal strength learning, including those by Griffiths (Chapter 7), Cheng and Lu (Chapter 5), and Perales, Catena, Maldonado and Cándido (Chapter 3). Thus, I briefly discuss the connections between the CBN framework and theories of causal strength learning, while leaving the details to those other chapters.

Elemental Causal Induction: Learning Causal Strength Between Two Vari­ ables One of the most important developments of models of causal strength learning is the Pow­ er PC model (Cheng, 1997). This model builds off the ΔP model by incorporating causal beliefs and assumptions. This model assumes that one generative cause combines through the noisy-OR integration function with another unobserved cause. For example, imagine that the effect E occurs 25% of the time without the observed cause C; P(e = 1 | c = 0) = .25. We can attribute this 25% to some background cause that has a strength of . 25. Further, imagine that the observed cause has a strength of 2/3. When the observed cause is present, the effect should occur 75% of the time if C and the background cause combine through a noisy-OR function; P(e = 1 | c = 1) = .75. (The effect would fail with a probability of 1/3×3/4 = ¼). Cheng used this sort of logic, in reverse, to deduce that if an observed cause combines with a (p. 99) background cause through a noisy-OR integration function, the correct way to calculate causal strength involves dividing ΔP by P(e = 0 | c = 0). Consider now the probabilities just presented, without knowing the causal strength: P(e = 1 | c = 1)=.75 and P(e = 1 | c = 0) = .25. According to ΔP, the causal strength is .5; the causes raises the probability of the effect by .5. According to Power PC, the causal strength of C is (.75 – . 25)/(1 – .25) = .67; the cause increases the effect by two-thirds (from .25 to .75). In sum, by specifying a set of prior beliefs about the causal relation, Cheng specified how causal strength should be induced given those beliefs.

Page 24 of 51

The Acquisition and Use of Causal Structure Knowledge Another influential development to causal strength learning is the causal support model. Griffiths and Tenenbaum (2005) proposed that when people estimate causal strength, what they are actually doing is not judging the magnitude of the influence of the cause on the effect, similar to effect sizes in inferential statistics, but rather are judging the extent to which there is evidence that there is any causal relation or not, similar to the function of a p-value in hypothesis testing. At a theoretical level, this model is calculated by deter­ mining the relative likelihood that the true causal structure is [C→E←U], that both C and an unobserved factor U influence E versus that the true causal structure is [C; E←U], that C does not influence E and E is determined by an unobserved factor U. Thus, Causal Sup­ port treats causal strength learning as discriminating between two possible causal struc­ tures, one in which C actually is a cause of E, and one in which C is not a cause of E. Causal support has a number of behavioral implications, but the most obvious one and easiest to think about is sample size. Whereas ΔP and Power PC are unaffected by sample size, causal support is influenced by sample size. Going back to the analogy of causal sup­ port as a p-value whereas ΔP and power PC are effect size measures, if there is a large enough sample size it is possible to have a very low p-value (confident that there is a causal relation) even if the effect size is small. In sum, power PC and causal support were both motivated by understanding causality through a CBN perspective, involving top-down beliefs about how an observed cause combines with other unobserved factors.

Inferring Causal Strength: Controlling for Other Causes The previous section focused on how people infer causal strength given observations of just a single cause and effect (elemental causal induction). However, often there are more than two variables. When inferring the strength of one cause on an effect, it is important to control for certain types of third variables (and not others), depending on the causal structure. Consider Figure 6.1. When studying the strength of the effect of a new drug on cardiovascular disease, it is important to control for age and smoking habits, either statis­ tically or through the design of the study. One should not control for statin use because it is not a direct cause of cardiovascular disease.

Page 25 of 51

The Acquisition and Use of Causal Structure Knowledge

Figure 6.7 Possible third variables when learning the causal relation from C to E.

More generally, consider trying to learn if there is a causal link from a potential cause C to a potential effect E, and if so, how strong the relation is. Figure 6.7 presents eight dif­ ferent third variables (S–Z); the question is which of these variables should be controlled for. For readers familiar with multiple regression, you can think of C as one predictor in the regression that you are primarily interested in, and E is the outcome variable. The question about controlling for alternative variables is, which of these variables should be included as predictors or covariates in the analysis? The following bullets systematically explain each of the third factors and whether it should be controlled for when inferring the strength of C on E: • V and X are confounds and must be controlled for when inferring the relation of C on E. If they are not controlled for, there would be a spurious correlation between C and E even if there is no causal relation between C and E. (X represents the case when some unobserved factor causes both C and X.) • W represents an alternative mechanism from C to E. In order to test whether there is a direct influence of C on E above and beyond W, it must be controlled for. • Y is a noise variable. Accounting for it increases our power to detect a rela­ tion between C and E. (p. 100)

• U and Z should not be controlled for. The logic is a bit opaque (Eells, 1991, p. 203), but consider the simple case that E deterministically causes Z such that they are per­ fectly correlated. Controlling for Z explains all the variance in E, and there will be no left-over variance for C to explain. Controlling for Z and U can distort the apparent re­ lation between C and E. • S and T never need to be controlled for. With large sample sizes, it does not matter if S and T are controlled for or not when inferring the influence of C on E. The reason is that even though S and T are correlated with C, since S and T are screened off from E (S and T are independent of E after controlling for C), they will not have any predictive power in a regression above and beyond C. However, with small sample sizes, most likely S and T will not be perfectly uncorrelated with E controlling for C, in which case Page 26 of 51

The Acquisition and Use of Causal Structure Knowledge they can change the estimated influence of C on E. Thus, they should not be controlled for. In sum, the overall rule is that when inferring the strength of a relation of C on E, third variables that are believed to be potential direct causes of E should be controlled for; oth­ er variables should not be controlled (Cartwright, 1989; Eells, 1991; Pearl, 1996). This rule nicely dovetails with how causal structures are defined; each variable is modeled us­ ing a conditional probability distribution incorporating all of its direct causes. Remarkably, a variety of research suggests that people have the ability to appropriately control for third variables when inferring causal strength. In fact, research on this topic was the first research on whether people intuitively use beliefs about causal structure when reasoning about causality (Waldmann, 1996, 2000; Waldmann & Holyoak, 1992). Michael Waldmann and colleagues called this theory the Causal Model theory; the idea was that when inferring causal strength, people use background knowledge about the causal structure (“model”) to determine which variables to control for. In the first study on this topic, a scenario with three variables (X, Y, and Z) was set up. Based on the cover story, the three variables were either causally related in a common effect structure [X→Y←Z] or in a common cause structure [X←Y→Z]. In the common effect condition [X→Y←Z], the goal for participants was to decide the extent to which X and Z were causes of Y; normatively, people should control for alternative causes (e.g., control for X when de­ termining whether Z is a cause of Y). In the common cause condition [X←Y→Z], the goal for participants was to decide the extent to which X and Z are effects of Y; normatively, these two decisions should be made separately (e.g., one should ignore X when determin­ ing the influence of Y on Z). After the cover story manipulating the believed causal structure, participants first experi­ enced a set of data in which X and Y were perfectly correlated; Z was not displayed. This training made it seem that there is a strong causal relation between X and Y. Then they experienced a set of data in which X, Y, and Z were all perfectly correlated; now Z is a re­ dundant predictor of Y because X is entirely sufficient to predict Y. In sum, participants experienced the exact same data, and the only difference between the two conditions was their belief about the causal structure. In the common effect condition [X→Y←Z], participants controlled for X when interpreting whether Z was a cause of Y, and consequently concluded that Z is not a cause of Y because X is entirely sufficient to predict whether Y was present or absent. In contrast, in the common cause condition [X←Y→Z], participants did not control for X, and concluded that Y was a cause of both X and Z. Subsequently, a number of other studies have also shown that people control for alterna­ tive causes (V–Y in Figure 6.7) of the main effect and not alternative effects of the main cause (T in Figure 6.7) (Goodie, Williams, & Crooks, 2003; Spellman, Price, & Logan, 2001; Waldmann, 2000). There is even work suggesting that people do not control for

Page 27 of 51

The Acquisition and Use of Causal Structure Knowledge variables like S and Z (Waldmann & Hagmayer, 2001); however, there has not been re­ search on whether people control for variables like U. In sum, when learning about a causal relation between C and E, people have some core intuitions to control for variables that they believe to be alternative causes of E, and not other roles, which is critical for correct causal learning (Glymour, 2001). This research is some of the most dramatic showing how top-down beliefs about causal structure influ­ ence learning, and consequently is some of the strongest evidence that human causal rea­ soning involves structured directional representations beyond just associations between variables (Waldmann, 1996).

Reasoning with the Causal Structure So far this chapter has focused on how people learn about a causal network; the structure of the (p. 101) network, the parameters or causal strengths, and the functional form. The remainder of the chapter discusses how people use this knowledge (see also Oaksford and Chater, Chapter 19 in this volume). Going back to Figure 6.1, one might desire to explain whether a person’s cardiovascular disease was caused by his age, or his smoking. One might desire to predict whether his cardiovascular disease will get worse as he ages. And one might desire to know which intervention, stopping smoking or starting to take a statin, would have the largest influence on his cardiovascular disease in order to choose the action with the greatest rewards. Though this second half of the chapter focuses on reasoning about the causal network, rather than learning, it is impossible to completely divorce learning and reasoning. In the real world we learn about causal relations both from first-hand experience with data (e.g., did starting the statin lower my blood pressure) and also from communicated knowledge (e.g., from family members, teachers, doctors, newspaper articles). Research in psycholo­ gy has used both personal experience and communicated knowledge, often in combina­ tion, to teach subjects about the causal structure before they reason about the structure. Typically, words and pictures are used to convey the causal structure to participants, al­ though the structural information is sometimes conveyed through or supplemented with experienced data. If the participants learn anything about the parameters (causal strengths) of the causal structure, it is usually conveyed through data-driven experience, though sometimes the parameters are conveyed textually. The integration function is of­ ten not mentioned at all, though sometimes it is mentioned. One of the challenges with studying how well people reason about causal structures is that apparent flaws in reasoning can either be explained as reasoning biases, or as poor, biased, or insufficient learning about the causal structure. It is not clear how to cleanly differentiate the two because checking that the causal structure is learned appropriately involves questions that are typically viewed as reasoning about the causal structure. This sets up a difficult situation because any observed reasoning bias can potentially be ex­ plained away by claiming that the researcher failed to sufficiently convey the causal structure to the participants. Here I do not try to solve this problem, but instead just Page 28 of 51

The Acquisition and Use of Causal Structure Knowledge present the empirical findings of how closely reasoning appears to fit with the causal structures presented to subjects. These conclusions are based on a much more thorough analysis of the literature than can be presented here (Rottman & Hastie, 2014), though this chapter includes some newly published evidence.

Reasoning Based on Observations Versus Interventions In the section “Learning,” I explained how the CBN framework treats observations and in­ terventions very differently for learning a causal structure. Interventions change the causal structure by removing links from variables that were previously causes of the ma­ nipulated variable. For example, given the structure X→Y→Z, if Y is intervened upon, Y gets severed from X resulting in [X; Y→Z]. Under an intervention on Y, X would be statisti­ cally independent or uncorrelated from Y, even though Z would still be dependent upon Y. Practically, given the structure X→Y→Z, if a reasoner can observe the state of Y, she can make a prediction about both X and Z. In the types of situations typically studied in the lab with binary variables and positive causal relations, if Y is observed as 1, then X and Z are both likely to be 1 as well. However, if a reasoner intervenes on Y and sets its value to 1, then Z is likely to be 1, but this intervention would have no influence on X, so the best estimate of X is simply its base rate. In sum, interventions only influence variables downstream from the manipulated variable, not up-stream (but see Hiddleston, 2005, for an al­ ternative approach, and also see Over, Chapter 18 in this volume, on whether “if … then” conditionals are interpreted as interventions). A number of researchers have found that people discriminate between observations and interventions when making inferences based on a causal structure. Sloman and Lagnado (2005) set up simple verbal descriptions in which one event (X) causes the other (Y), and found that when Y was observed to have a particular value, X would be inferred to have the same value, but when Y was intervened upon to have a particular value, X was in­ ferred to have its normal default value. In sum, when it was made very clear whether there was an observation versus an intervention, subjects’ judgments largely followed the prescriptions of the CBN framework. In contrast, when more ambiguous language is used such that the value of a variable could be known either through an observation or an in­ tervention, then the responses looked more muddy (see also Rips, 2010). Another set of studies took this basic finding a step further by demonstrating that this dif­ ference (p. 102) between interventions versus observations also holds in contexts in which participants are told the causal structure and then learn the parameters (e.g., the base rates and the causal strengths) from experience. Consider a set of studies that investigat­ ed reasoning on a diamond structure [X←W→Y and X→Z←Y] (Meder, Hagmayer, & Wald­ mann, 2008, 2009; Waldmann & Hagmayer, 2005). These studies are unique for involving more than three variables, and also for having two causal routes W→X→Z and W→Y→Z. Despite the complexities involved in these studies, the participants showed remarkable subtlety in reasoning about the causal structures, and distinguishing between interven­ tions and observations. Page 29 of 51

The Acquisition and Use of Causal Structure Knowledge Consider observing a low value of X, and trying to infer the value of Z. In the diamond structure there are two routes from X to Z: X←W→Y→Z and X→Z. Due to these two routes, X and Z should be strongly correlated, and thus Z should be quite low when X is observed to be low. In contrast, if X is intervened upon and set to a low value, the route X←W→Y→Z is destroyed—the link from W to X is cut. The X→Z route is still open, so the predicted val­ ue of Z is still low, but it should not be as low as when X is observed. In fact, this is the ex­ act pattern of reasoning that was observed; the inference of Z after an observation of X was lower than after an intervention on X. This finding further suggests that people rea­ son about observations both down-stream and up-stream, but they reason about interven­ tions only down-stream. This research also shows how people can reason about observa­ tions and interventions on more complex structures. So far this section has focused on “perfect” interventions in which the intervention com­ pletely determines the state of the manipulated variables, and completely severs all other influences. However, often interventions are not perfect. For example, after prescribing a patient an antihypertensive to treat high blood pressure, the patient may not actually take it, or may not take it exactly as prescribed (e.g., as frequently as he should, at the right dose). Furthermore, even if the patient does take the medicine as prescribed, the medicine does not guarantee that all patients will have a 120/80 blood pressure. Patients who initially had very high blood pressures will probably still tend to have higher blood pressures than those who initially had moderately high blood pressures. Or, perhaps the medicine only succeeds in bringing the blood pressure into a normal range for a certain percentage of patients, but not for others. In these ways, taking an antihypertensive is an “imperfect” intervention on blood pressure; a patient’s blood pressure is not completely determined by the intervention. In such cases of imperfect interventions, reasoning upstream is warranted to some extent, similar to observations. Unfortunately, there has been fairly little work examining how people reason about imperfect interventions (Med­ er, Gerstenberg, Hagmayer, & Waldmann, 2010; Meder & Hagmayer, 2009). In sum, the existing research has found that people do distinguish between interventions and observations when reasoning about causal systems, in particular that interventions only influence variables down-stream from the intervened-upon variable. An important di­ rection for future research is to examine how people reason about imperfect interven­ tions. This seems especially important given that many of the actions or “interventions” that humans perform are not perfect interventions.

Do People Adhere to the Markov Condition When Reasoning About Causal Structures? Recall that the Markov condition states that once all the direct causes of a variable Z are controlled for or held constant, Z is statistically independent of every variable in the causal network that is not a direct or indirect effect of Z. For example, in the structure X→Y→Z, Z is conditionally independent of X once Y (the only direct cause of Z) is held con­ stant. People have often been found to violate the Markov assumption; their inferences about the state of Z are influenced by the state of X even when they already know the Page 30 of 51

The Acquisition and Use of Causal Structure Knowledge state of Y (Mayrhofer & Waldmann, 2015; Park & Sloman, 2013; Rehder & Burnett, 2005; Rehder, 2014; Rehder, Chapter 21 in this volume; Walsh & Sloman, 2008). Specifically, people tend to infer that P(z = 1 | y = 1, x = 1) > P(z = 1 | y = 1, x = 0) even though they should be equivalent. Likewise, they use Z when inferring X, even after knowing the state of Y. Going back to the section “Learning Causal Structures from Interventions: Interpret­ ing Interventions and Updating Beliefs About the Structure,” such a mistake could lead a doctor to incorrectly believe that ethnicity has an influence on cardiovascular disease above and beyond smoking, even when the true causal structure is Ethnicity → Smoking → Cardiovascular Disease. There are a variety of possible explanations for why inferences violate the Markov condi­ tion, and most of the explanations have attempted to find (p. 103) rationalizations for the violations—reasons that such judgments would make sense according to the CBN frame­ work, assuming some modification to the structure due to prior knowledge. For example, if subjects believe that there is some other causal link between X and Z (e.g., X→Z, X←Z, or X←W→Z) in addition to the causal structure told to them by the experimenter (X←Y→Z), such additional information could justify their inferences. Three specific proposals are that people infer an unobserved factor that inhibits both X and Z; an unobserved factor that influences X, Y, and Z; or an intermediary mechanism M such that Y causes M, which in turn causes X and Z. Different articles in the preceding list have supported different ac­ counts. For example, Burnett and Rehder (2005) argued for the account in which an unob­ served factor influences X, Y, and Z. Park and Sloman (2013) found that people only make the Markov violation when the middle variable is present, not absent; P(X = 1 | Y = 1, Z = 1) > P(X = 1 | Y = 1, Z = 0) but that P(X = 1 | Y = 0, Z = 1) = P(X = 1 | Y = 0, Z = 0). This finding is most consistent with the account that people infer an unobserved factor that in­ hibits X and Z. They also found that the size of the Markov violation was larger when par­ ticipants believed that the two effects (X and Z) are both caused through the same mecha­ nism (e.g., Y causes mechanism A, which in turn causes X and Z), than through separate mechanisms (e.g., X←A←Y→B→Z, where A and B are the two mechanisms that explain how X and Z are each caused by Y). Mayrhofer and Waldmann (2015) have also found evidence that people infer an unobserved inhibitory factor that influences multiple effects of the same cause. And they further found that the size of the Markov violation was influenced by whether the causes and effects were described as agents versus patients (e.g., cause “sending” information to effect versus effect “reading” information from cause). Rehder (2014) found some support for both the unobserved inhibitor and the one versus two mechanism accounts, though more generally he found that none of these rationaliza­ tions provides a parsimonious and comprehensive explanation for all the reasoning er­ rors. He argued that it is indeed highly likely that people embellish causal structures giv­ en in experiments with additional nodes and links based on their own prior knowledge. However, Rehder proposed that in addition to any embellishments due to background knowledge, some judgments followed an associative-style of reasoning that does not obey the Markov assumption. He proposed taking an individual-differences approach to under­ standing why certain people are more likely to use an associative style of reasoning. Page 31 of 51

The Acquisition and Use of Causal Structure Knowledge One surprising aspect about the work on whether people uphold the Markov condition is that there have been very few studies in which people learn the parameters of the causal structure through trial-by-trial experience, and then make judgments (though recently there have been a few more studies by Rottman & Hastie, 2016).3 Giving participants sta­ tistical experience with the correlations between the variables provides them with direct evidence that X and Z are statistically independent given Y. Park and Sloman (2013) conducted one experiment of this sort. Their participants inferred that P(z = 1 | y = 1, x1) > P(z = 1 | y = 1, x = 0), though P(z = 1 | y = 0, x = 1) = P(z = 1 | y = 0, x = 0); a violation of the Markov condition only when y = 1. As discussed earlier, this pattern actually fits the proposal that people infer an unobserved inhibitory cause of both X and Y. However, the modified structure with the unobserved inhibitory cause is still unfaithful to the data that they observed; in the learning data, X and Z were independent when y = 1. This rais­ es a question for future research: If being told the structure and experiencing data faith­ ful to the structure are not sufficient to stamp out violations of the Markov assumption, what is?

Qualitative and Quantitative Inferences When Reasoning About Causal Structures Rottman and Hastie (2014) reviewed inferences on many different types of causal struc­ tures, including one link [X→Y], chains [X→Y→Z], common cause [X←Y→Z], common effect [X→Y←Z], and diamond [X←W→Y and X→Z←Y] structures. For each of these structures we reviewed evidence about how well people make inferences on one variable given different observed combinations of the others (e.g., X given knowledge about Y, or Y given knowl­ edge of X and Z, etc.). We concluded that for almost all the causal structures (see the section “Reasoning About Explaining Away Situations”) the inferences tend to go in the right direction. For exam­ ple, for the chain [X→Y→Z], if both causal relations between X→Y and Y→Z were positive or both were negative, people tended to infer a positive relation between X and Z. But if one of the links was positive and the other negative, people inferred a negative causal rela­ tion (Baetu & Baker, 2009). The previously mentioned studies involving interventions and observations on a diamond structure [X←W→Y and X→Z←Y] also reveal how sensitive people are to the para­ meters of the structure (Meder et al., 2008, 2009). These studies systematically manipu­ lated the base rates of some of the variables, and also the strengths of some of the causal links. Even though the causal structures involved four variables, and the inference re­ quired reasoning with two routes from X to Z, all of these manipulations had influences on subjects’ inferences in the predicted directions. In sum, reasoning habits often corre­ spond to the qualitative predictions of the CBN framework. (p. 104)

Yet, despite the qualitative correspondence between human inferences and the normative judgments based on the CBN framework, the quantitative correspondence is not so tight. For example, in one condition when inferring the probability of Z given X for the preced­ Page 32 of 51

The Acquisition and Use of Causal Structure Knowledge ing study, the normative answer was 12.5%, yet subjects answered on average 37%. Giv­ en that 50 is the middle of the scale, 37% is actually considerably closer to a default of 50% than the normative answer. This pattern of conservative results, judgments too close to the center of the scale, was very common across many studies reviewed in Rottman and Hastie (2014). For example, for both chain [X→Y→Z] and common cause [X←Y→Z] structures, people do typically infer a correlation between A and C; however, often the correlation is considerably weaker than the correlation in the data that the subjects ob­ served (Baetu & Baker, 2009; Bes, Sloman, Lucas, & Raufaste, 2012; Hagmayer & Wald­ mann, 2000; Park & Sloman, 2013). There are multiple possible interpretations of such ef­ fects, such as response biases or memory errors (Costello & Watts, 2014; Hilbert, 2012) or potentially priors on the parameters (Lu, Yuille, Liljeholm, Cheng, & Holyoak, 2008; Ye­ ung & Griffiths, 2011). More evidence is needed to understand why these effects occur, and also to understand the accuracy when reasoning with more than three or four vari­ ables (see also Rottman & Hastie, 2016).

Reasoning About Explaining Away Situations The previous section already addressed quantitative inferences on causal networks, and the conclusion is that for the most part, people are fairly good at making inferences, though there is a conservative bias. However, there is one type of inference called “ex­ plaining away” that stands out as particularly difficult. Explaining away inferences in­ volve judgments of P(x = 1 | y = 1, z = 1) and P(x = 1 | y = 1, z = 0) on a common effect structure [X→Y←Z]. The reason that explaining away is so challenging is that once the state of Y is known, X and Z actually become negatively dependent, so the normative pat­ tern of inference is P(x = 1 | y = 1, z = 1) < P(x = 1 | y = 1, z = 0). This is unlike any other type of inference. For example, on a chain structure [X→Y→Z], positive relations between X and Y and Y and Z mean that there is a positive relation between X and Z; P(x = 1 | z = 1) > P(x = 1 | z = 0), and because of the Markov assumption P(x = 1 | y = 1, z = 1) = P(x = 1 | y = 1, z = 0). In terms of Figure 6.1, [smoke → cardiovascular disease ← age], explaining away could in­ volve inferring the probability that someone smokes given their age and knowing that they have cardiovascular disease. Out of patients who have cardiovascular disease, know­ ing that a given patient is old means that it is less necessary to infer that he smokes in or­ der to explain the cardiovascular disease; old age “explains away” the cardiovascular dis­ ease. If the patient is young, it becomes more necessary to infer that he smokes—other­ wise what explains the cardiovascular disease? In sum, when the two causes have a posi­ tive influence on the effect, the causes become negatively related controlling for the ef­ fect. Prior evidence did not decisively identify how well people explain away (Morris & Larrick, 1995; Sussman & Oppenheimer, 2011). The newest and clearest evidence suggests that people have considerable difficulties when making explaining-away judgments (Rehder, 2014). Though sometimes people get the direction of the inference correct, P(x = 1 | y = 1, z = 1) < P(x = 1 | y = 1, z = 0), they often are ambivalent about the direction of the in­ Page 33 of 51

The Acquisition and Use of Causal Structure Knowledge ference, and sometimes think that Z would have a positive effect on X, P(x = 1 | y = 1, z = 1) > P(x = 1 | y = 1, z = 0). Rehder proposed that this type of reasoning is more akin to an associative spreading-activation network than causal reasoning. Reid Hastie and I (Rottman & Hastie, 2016) have also recently collected data on explaining away; unlike the previous research, we gave participants learning data so that they could reason from experience rather than just from the causal structure, and so that they also have direct evidence that P(x = 1 | y = 1, z = 1) < P(x = 1 | y = 1, z = 0). We sometimes found explain­ ing away that was much weaker than normatively predicted by the CBN framework, (p. 105) and other times inference patterns in the opposite direction from explaining away. The challenge people have with explaining away is somewhat mysterious. There are no other types of causal inference that give reasoners so much trouble, yet at the same time, explaining away has also been touted as a fundamental strength of human reasoning (Jones, 1979; Kelley, 1972; Pearl, 1988, p. 49). There are also other results in which ex­ plaining away does occur. Oppenheimer et al. (2013) created stories to elicit explaining away. For example, participants were told about an animal with three features—feathers, lays eggs, and cannot fly—and asked to rate how likely this animal is to be an ostrich. Be­ ing an ostrich is a plausible explanation for why this bird cannot fly. Other participants were given the same three features with one additional feature, that it has a broken wing, which is an alternative cause for not being able to fly. These participants judged the likeli­ hood of being an ostrich as lower than the participants who were not given this feature, suggesting explaining away (see also Oppenheimer & Monin, 2009). So sometimes people do get the direction of the inference correct.4 An additional complexity is that explaining away is related to another phenomenon. Ex­ plaining away involves inferring the probability of X given knowledge of Y and Z on the structure [X→Y←Z]. Another much-studied topic is inferring the causal strength of X on Y. As already discussed, people know that they must control for Z when inferring the causal strength of X on Y. However, when Z is a very strong cause of Y, it is not uncommon for people to infer that the strength of X is very weak, weaker than it actually is; sometimes this is called “discounting” (Goedert & Spellman, 2005). This discounting effect is related to explaining away in that both phenomena require understanding that two causes are competing to explain an effect. In sum, there is conflicting evidence as to when, whether, and how much people explain away. Despite the fact that explaining away has been studied for 40 years, there is still important work to be done to reconcile these findings.

Do Causal Relations Bias Reasoning? It is a fairly common view in psychology that it is easier for people to reason from causes to effects than from effects to causes (Pennington & Hastie, 1993; White, 2006), and this hypothesis is supported by evidence that cause-to-effect judgments are made faster than effect-to-cause judgments (Fernbach & Darlow, 2010). The question in this section is whether cognitive ease has an influence on the inferences themselves. Page 34 of 51

The Acquisition and Use of Causal Structure Knowledge Tversky and Kahneman (1980) found that causal inferences are higher when reasoning from causes to effects. Similarly, Bes et al. (2012) found that when making inferences on the chain [X→Y→Z], inferences of P(z = 1 | x = 1) were higher than P(x = 1 | z = 1). Addi­ tionally, both of these inferences were higher than inferences P(z = 1 | x = 1) or P(x = 1 | z = 1) on a common cause [X←Y→Z] structure. These differences are especially instructive because their participants received trial-by-trial training, according to which all the infer­ ences mentioned earlier should have been equivalent. They speculate that making infer­ ences between X and Z on the common cause is harder because one must reason about causal relations going in two different directions, and this increased difficulty could lower the final judgment. This study reaches a very different conclusion than most of the rest of the articles pre­ sented in this chapter. The conclusion is that strength of the inferences is determined by the ease of explaining how the two variables are connected, and that this cognitive ease overwhelms the probabilities that participants experience. Even though the explanations for these findings appeal to causal structure and causal direction, they are inconsistent with the CBN framework; the CBN framework predicts that all the inferences mentioned earlier would be equal given the parameters used in the study. Though the effects of causal direction were found consistently across three experiments, there are other results that do not entirely fit with the story that cause-to-effect judg­ ments are higher than effect-to-cause judgments. First, Fernbach et al. (2011, p. 13) failed to replicate the study by Tversky and Kahneman (1980). More broadly, Fernbach et al. have found that inferences from causes to effects tend to be lower than the normative standard, but inferences from effects to causes tend to be roughly normative (Fernbach, Darlow, & Sloman, 2010; Fernbach et al., 2011; Fernbach & Rehder, 2013; see also Re­ hder, Chapter 21 in this volume). The explanation is that when reasoning from causes to effects, people sometimes forget that alternative causes could produce the target effect aside from the main cause, though they do not forget about alternative causes when rea­ soning from the effect to a target cause. There is some tension between these two sets of findings; Bes et al. (2012) found that effect-to-cause judgments are too low (lower than cause-to-effect judgments), where­ as Fernbach et al. (2011) found that cause-to-effect judgments are too low. However, these results cannot be directly compared because they differ on a variety of dimensions.5 (p. 106)

Fernbach et al. used real-world cover stories, asked participants their beliefs about the parameters of the causal structure, and then used those parameters to calculate the nor­ mative answers. Because of this approach, Fernbach et al. could not directly compare the cause-to-effect and effect-to-cause inferences and instead compared each inference to the normative standard for that inference.6 In contrast, Bes et al. (Experiment 3) gave partici­ pants trial-by-trial learning data; because the learning data were symmetric, the cause-toeffect and effect-to-cause inferences could be directly compared (although the cover story labels for the variables were not counterbalanced).

Page 35 of 51

The Acquisition and Use of Causal Structure Knowledge In sum, though it is intuitive that it is easier to reason from causes to effects rather than vice versa, it is still unclear whether or how cognitive fluency and neglect of alternative causes manifest in judgments; it is not clear exactly whether or when cause-to-effect judgments are higher than effect-to-cause judgments. It is especially important to come to consensus on these results, or to explain why different patterns of reasoning are found in different situations, because both of the patterns of findings imply deviations from the CBN framework.

Alternative Representations for Causal Reasoning So far this chapter has presented the CBN framework as a single method of learning causal structures and making inferences. However, like most sophisticated modeling tools, there are actually many choices that the modeler can make. Assuming that human cognitive representations of causality are somehow similar to the representation of a causal Bayesian network (directed representations of causality, parameters to capture the strength of causal relations and base rates), these choices correspond to different cogni­ tive representations of the task and background knowledge. An accurate description of causal reasoning requires clarifying the representations being used. In the next two sec­ tions I discuss some representational options, and whether they can be empirically distin­ guished. Consider the case that you are told that X and Z both cause Y [X→Y←Z], you experience a set of learning trials that instantiate the statistical relations between these variables, and are subsequently asked to infer P(x = 1 | y = 1, z = 1). Figure 6.8 details four possible processes for making the judgment. The first route, the dashed line, involves making the inference directly from the experi­ enced data. Whenever a learner experiences data that instantiates the causal structure, it is possible to come to the correct inference by focusing on the experienced data and ig­ noring the causal structure. For example, in order to calculate P(x = 1 | y = 1, z = 1), a reasoner just needs to remember the total number of observations in which all three vari­ ables were 1, N(x = 1, y = 1, z = 1), and divide this by the total number of observations in which y = 1 and z = 1, ignoring X, N(y = 1, z = 1) (see Figure 6.8. This reasoning process can be thought of as similar to exemplar models of categorization; inference is performed by recalling specific exemplars. The remaining three options all involve elaborating the causal structure with different kinds of parameters, and inference is performed through a computation on the parame­ ters. Though in some ways the inference itself seems more complicated, the cognitive benefit is that the learner only needs to store the structure and the parameters, not all the individual instances. The difference between these three options is how they repre­ sent the conditional probability distribution of Y, the probability of Y given the causes X and Z. This conditional probability distribution is denoted as P(Y = y | X = x, Z = z), which means the probability that Y is in a particular state (y = 0 or 1), given that X and Z are each in particular states, x and z. Page 36 of 51

The Acquisition and Use of Causal Structure Knowledge Representation 1 involves calculating the conditional probability distribution P(Y = y | X = x, Z = z) directly from the experienced data. For example, the probability that y = 1 given that x = 1 and z = 1, is calculated directly from rows 1 and 3 from the experience table. Inference can then proceed through simple probability theory (Figure 6.8). Heckerman (1998) provides a tutorial on this approach, and provides citations to other exact and ap­ proximate inference algorithms. Representation 2 does not directly represent the conditional probability distribution P(Y = y | X = x, Z = z), but instead assumes that people spontaneously infer causal strengths from the learning data. SX→Y and SZ→Y refer to the strength of X on Y and Z on Y, respec­ tively. The most popular way to represent causal strengths in the normative psychological (p. 107) literature is using causal power theory, which assumes that causes combine through a noisy-OR function (Cheng, 1997; Novick & Cheng, 2004; also see sections “Learning About the Integration Function” and “Learning Causal Strength” in this chap­ ter). This approach also requires the learner to estimate the probability that the effect is present without any of its causes, P(Y = 1 | x = 0, z = 0). The causal strengths and the functional form (noisy-OR) subsequently allow a reasoner to deduce the conditional distri­ bution P(Y = y | X = x, Z = z), which would be used for making the inference P(x = 1 | y = 1, z = 1). The critical difference between Representation 1 versus 2 is that Representa­ tion 2 embodies the assumption that X and Z combine through a noisy-OR function and do not interact (Novick & Cheng, 2004); the noisy-OR assumption is the reason that Repre­ sentation 2 has only 5 parameters instead of the 6 parameters in Representation 1.

Figure 6.8 Four possible processes for making an in­ ference. Note: N refers to the number of trials of observations of a particular type.

Page 37 of 51

The Acquisition and Use of Causal Structure Knowledge Representation 3 is very similar to Representation 2; however, instead of representing the parameter (p. 108) P(Y = 1 | x = 0, z = 0), an additional background cause B is added that explains the cases when Y = 1 but X and Z are 0. In Figure 6.8, B is assumed to always be present, and to have a strength of 1/3. The question raised by these four options is whether some sort of representation of causal structure and strength mediates the process of making an inference based on ex­ perienced data, or whether the inference is made directly from the experienced data (dashed line). If indeed some sort of causal structure representation mediates the infer­ ence, which form of representation gets used? All four approaches make the exact same predictions, so they are difficult to distinguish empirically. I do not know of any studies that address the first question, whether a causal structure representation mediates the process of making an inference based on experience data. However, there are some studies that have attempted to distinguish the nature of the CBN representation, specifically the difference between Representations 2 versus 3. Krynski and Tenenbaum (2007) studied how well people make inferences on the famous mammogram problem. In this problem, participants are told that breast cancer (cause) al­ most always results in a positive mammogram test (effect), and they are told the base rate of breast cancer. They are also told that mammograms have false positives 6% of the time. Critically, this false positive rate is framed either as inherent randomness (Repre­ sentation 2, which has a parameter to represent the probability of the effect when the known cause is absent), or due to a benign cyst (an explicit background cause, as in Rep­ resentation 3). Krynski and Tenenbaum found that participants’ judgments about the probability of breast cancer given a positive mammogram were considerably more accu­ rate when the false positive rate was framed as being caused by a benign cyst, suggest­ ing that Representation 3 may be the most intuitive. A number of recent studies help to clarify this finding by Krynski and Tenenbaum. First, though this facilitation of Bayesian responding by a causal framing has sometimes been found, the effect has not always been consistent (Hayes et al., 2015; Hayes, Newell, & Hawkins, 2013; McNair & Feeney, 2014, 2015). There appear to be two main reasons for the inconsistency. First, the causal framing has a bigger influence for participants who have higher mathematical abilities (McNair & Feeney, 2015). Second, the facilitation ef­ fect is often seen in a reduction in extreme overestimations (called base rate “neglect”); however, the final judgments are often lower, closer to the normative response, but still not quite “normative” (McNair & Feeney, 2014). A plausible explanation for this effect was put forth by Hayes, Hawkins, and Newell (2014, 2015), who found that the causal framing increases the perceived relevance of the false positive information. They conclud­ ed that the causal framing mainly has an influence on the attention paid to the false posi­ tive rate and possibly the construction of a representation of the problem, but does not necessarily help participants to actually use the false positive rate in a normative way when calculating the posterior inference.

Page 38 of 51

The Acquisition and Use of Causal Structure Knowledge In sum, it seems like having explicit alternative causes (Representation 3) may facilitate accurate causal inference. That said, this finding raises a worrying prospect that causal reasoning is apparently fragile enough that it can be harmed by a small difference in framing. If causal reasoning is robust, why can’t people translate between these repre­ sentations by mentally generating an alternative cause to represent the false positive rate? More broadly, the purpose of this analysis in Figure 6.8 was to show that the CBN frame­ work can be instantiated in multiple possible ways. Different articles present different versions. Even though they all make similar, if not identical, predictions, these alternative versions present different cognitive processes involved in making the inference. In order to move from a computational-level theory to an algorithmic-level theory, it will be neces­ sary to further clarify the representations and inference process. It is especially critical to clarify whether a causal structure representation mediates causal inference when a rea­ soner has experienced learning data because in such instances it is possible to make in­ ferences directly from the remembered experiences without thinking about the causal structure at all.

Even More Complicated Alternative Models for Causal Reasoning The previous section discussed four possible implementations of the CBN framework. However, in reality there are many more possibilities. A fully Bayesian treatment of learn­ ing and inference allows for a way for prior knowledge to influence the learning and in­ ference processes. In regard to a causal structure, there are three possible roles of prior information: prior beliefs about the network, about the integration function, and about the strengths or parameters. First, whereas Representations 2 and 3 in Figure 6.8 both assume one particular function­ al (p. 109) form, the noisy-OR, in reality, learning is not this simple. The section “Learning About the Integration Function” on functional forms already has covered experiments on how people learn the specific way in which multiple causes combine to produce an effect, and how this belief shapes further learning and reasoning about the causal system. (Beck­ ers & Miller, 2005; Lucas & Griffiths, 2010; Waldmann, 2007). Thus, a fully Bayesian ver­ sion of Figure 6.8 would allow for multiple possible integration functions and priors on those functions. Second, the parameters in Figure 6.8 were calculated by using point estimates. For exam­ ple, the parameter P(y = 1 | x = 0, z = 0) for Representations 1 and 2, and the SB→Y parameter in Representation 3, are all given as exactly 1/3 in Figure 6.8, which was cal­ culated by comparing rows 6 and 8 in the data table. If a point estimate of the parameters is used, then all four approaches produce exactly the same inferences. Alternatively, an­ other option is that people represent uncertainty about all of the parameters based on the amount of data experienced. If this second approach is used, then Representation 1 will make somewhat weaker inferences than Representations 2 and 3, because Representa­ tion 1 requires inferring an additional parameter. Additionally, people may have prior be­ Page 39 of 51

The Acquisition and Use of Causal Structure Knowledge liefs about causal strengths that may bias the learning and inference process. For exam­ ple, Lu et al. (2008) argued that people believe causes to be sparse and strong. Given the data in Figure 6.8, the sparse and strong priors pull the strengths downward; instead of a strength of .50, the sparse and strong priors would produce a strength estimate of .43, and with more data the estimate gets closer to .50. In contrast, Yeung and Griffiths (2015) found that people have priors such that they believe that most candidate causes are very strong. If people had such priors, it would result in causal strength estimates above .50. Priors on strength would have a down-stream influence on inference; the stronger the causal strength beliefs, the stronger the inferences should be. Third, people often have prior beliefs about the causal network. Lu et al.’s sparse and strong prior suggests that people believe that fewer causes are more likely than many causes (Lu et al., 2008). In a related vein, Meder et al. (2014) proposed that when per­ forming an inference, even if told a causal structure, people may entertain the possibility that another causal structure could actually be the true structure, which can influence the judgment. In particular, Meder et al. told participants the structure [X→Y], had them observe contingency data so that they could learn the statistical relation between X and Y, and then had them make an inference of P(x = 1 | y = 1). They found evidence that when the causal strength of X on Y is fairly weak, people may not believe the structure [X→Y] and instead entertain the possibility that X and Y may be unrelated. This general ap­ proach, that people may entertain the possibility that the causal structure presented by the experimenter may not actually be the true causal structure, has also been used to ex­ plain violations of the Markov assumption (see section “Do People Adhere to the Markov Condition When Reasoning About Causal Structures?”). One problem with this account, however, is that when there are more than two variables, it is unclear what set of alterna­ tive structures is entertained, and considering multiple possibilities would quickly be­ come cognitively unwieldy. In sum, allowing for the possibility that people think about multiple possible strengths, functional forms, and causal structures makes the CBN framework very flexible, and on a case-by-case level it seems plausible that people may actually have priors for any of these aspects of the network. However, incorporating all of these priors makes the reasoning task much harder than any of the options in Figure 6.8, and it seems unlikely that people are always engaged in reasoning with all these priors simultaneously. Thus, it will be im­ portant to understand when people make use of the priors and how well they incorporate priors with observed data for making inferences.

Final Questions, Future Directions, and Con­ clusions Throughout the chapter I have highlighted questions and future directions. In this section I repeat some of those questions and add some new ones. I believe that these questions

Page 40 of 51

The Acquisition and Use of Causal Structure Knowledge are critical for having a thorough and accurate understanding of human causal learning and reasoning. 1. Though recently there have been more attempts to explore other functional forms, the vast majority of research on the CBN framework has investigated binary vari­ ables that combine through a noisy-OR function. There has been very little theorizing about what causal strength means, for example, when causes and/or effects are mul­ tilevel (Pacer & Griffiths, 2011; Rottman, 2016; White, 2001 (p. 110) ). For example, is the human interpretation of causal strength for multilevel (e.g., Gaussian) variables analogous to effect size measures for linear regression? What is the relation between function learning and causal strength learning? Do people face any challenges or use different heuristics when learning causal structures from multilevel rather than bina­ ry variables? In sum, causal reasoning is extremely diverse, and it will be critical to broaden our experimental paradigms to capture this diversity. 2. One of the goals of cognitive psychology is to understand the representations that people use for thought. As Figure 6.8 demonstrates, there are multiple possible rep­ resentations for how people reason about causal structures, and many of these rep­ resentations make exactly the same (or very similar) predictions. Clarifying which sorts of representations are used will help develop a more precise descriptive ac­ count of causal reasoning. 3. So far the CBN framework has been framed as a computational-level theory of hu­ man causal reasoning. However, the computations involved in inferring a causal structure from data, or making inferences on a network (e.g., Figure 6.8), are very complex. Thus, an important goal is to develop a process-level account of how people actually perform these inferences. A number of theorists have proposed various heuristics for causal learning, which often come close to the optimal solution, and of­ ten have equal or better fit to participants’ inferences (Bramley et al., 2015; Coenen et al., 2015; Lagnado & Sloman, 2004; Rottman & Keil, 2012; Rottman et al., 2014; Rottman & Hastie, 2016; Steyvers et al., 2003). Yet so far this heuristics approach has been disconnected and has often taken the back seat to proof of concept demon­ strations that the CBN framework can model human learning. More attention to how these inferences are actually made through a process-level account will help provide psychological insight into this fascinating and complex reasoning process. 4. Finally, all of the studies on human causal reasoning give participants toy exam­ ples and sample data in short periods of time. It is unclear how well this research strategy captures actual causal reasoning in the real world, which involves long-term accumulation of data and many more variables. An ideal approach would be to find a real-world domain involving causes and effects that includes records of experiences. For example, a highly accurate electronic medical records system might in the future permit us to track a doctor’s experiences with all the variables in Figure 6.1 to see if the doctor’s judgments fit closely with his or her personal experiences. The causal Bayesian network framework has entirely reshaped the landscape of research on causality to the point that it is now rare to see articles that investigate causal learning without mentioning the CBN framework. Whereas research on causal reasoning used to Page 41 of 51

The Acquisition and Use of Causal Structure Knowledge be primarily about inferences between a single cause and effect, now the central ques­ tions are about larger causal structures. Thus, the new focus is on how people learn the structure and determine causal directionality, how people simplify complex structures in­ to smaller units using the Markov assumption, and how various beliefs captured in the network, such as the integration function, influence learning and reasoning. Even older questions such as elemental causal learning have benefited tremendously from the CBN framework by reinterpreting strength as a parameter in the causal network. On the descriptive side, the most important fact about human causal reasoning is that hu­ mans are remarkably good causal reasoners: we adeptly incorporate many different be­ liefs when learning and reasoning (e.g., integration functions, autocorrelation, causal di­ rectionality); we can learn about quite complicated causal relations (e.g., unobserved causes that interact with observed causes), and we often do so with remarkably little da­ ta. The introduction of the CBN framework has revealed many of these capacities that were previously unknown and has also raised important questions: How can a processlevel account of these sophisticated inferences be developed? How closely do the repre­ sentations of the CBN framework map onto the actual representations that we use for causal reasoning? How does causal reasoning occur with more diverse sorts of stimuli and in more naturalistic environments? Answering these questions not only will help us develop a more accurate and complete picture of human causal reasoning, but also may identify ways to help people become even better causal reasoners.

Author Note This research was supported by NSF BCS-1430439. I thank Michael Waldmann and Bob Rehder for providing helpful comments and suggestions.

References Baetu, I., & Baker, A. G. (2009). Human judgments of positive and negative causal chains. Journal of Experimental Psychology: Animal Behavior Processes, 35(2), 153–168. http:// doi.org/10.1037/a0013764 Beckers, T., & Miller, R. R. (2005). Outcome additivity and outcome maximality influence cue competition in human causal learning, Journal of Experimental Psychology: Learning, Memory, and Cognition, 31(2), 238–249. http://doi.org/10.1037/0278-7393.31.2.238 Bes, B., Sloman, S. A., Lucas, C. G., & Raufaste, E. (2012). Non-bayesian inference: causal structure trumps correlation. Cognitive Science, 36(7), 1178–1203. http://doi.org/ 10.1111/j.1551-6709.2012.01262.x Bramley, N. R., Lagnado, D. A., & Speekenbrink, M. (2015). Conservative forgetful schol­ ars: How people learn causal structure through sequences of interventions. Journal of Ex­ perimental Psychology. Learning, Memory, and Cognition, 41(3), 708–731. http://doi.org/ 10.1037/xlm0000061 Page 42 of 51

The Acquisition and Use of Causal Structure Knowledge Cartwright, N. (1989). Nature’s capacities and their measurement. Oxford: Clarendon Press. Cheng, P. W. (1997). From covariation to causation: A causal power theory. Psychological Review, 104(2), 367–405. http://doi.org/10.1037//0033-295X.104.2.367 Cheng, P. W., & Novick, L. R. (1992). Covariation in natural causal induction. Psychologi­ cal Review, 99(2), 365–382. http://doi.org/10.1037/0033-295X.99.2.365 Coenen, A., Rehder, B., & Gureckis, T. (2015). Strategies to intervene on causal systems are adaptively selected. Cognitive Psychology, 79, 102–133. http://doi.org/10.1016/ j.cogpsych.2015.02.004 Costello, F., & Watts, P. (2014). Surprisingly rational : Probability theory plus noise ex­ plains biases in judgment. Psychological Review, 121(3), 463–480. Danks, D. (2003). Equilibria of the Rescorla–Wagner model. Journal of Mathematical Psy­ chology, 47(2), 109–121. http://doi.org/10.1016/S0022-2496(02)00016-0 Eells, E. (1991). Probabilistic causality. Cambridge, UK: Cambridge University Press. Fernbach, P. M., & Darlow, A. (2010). Causal conditional reasoning and conditional likeli­ hood. In Proceedings of the 32nd annual conference of the Cognitive Science Society (p. 305). Austin, TX: Cognitive Science Society. http://doi.org/ 10.1177/0272989X9101100408 Fernbach, P. M., Darlow, A., & Sloman, S. A. (2010). Neglect of alternative causes in pre­ dictive but not diagnostic reasoning. Psychological Science, 21(3), 329–336. http:// doi.org/10.1177/0956797610361430 Fernbach, P. M., Darlow, A., & Sloman, S. A. (2011). Asymmetries in predictive and diag­ nostic reasoning. Journal of Experimental Psychology: General, 140(2), 168–185. http:// doi.org/10.1037/a0022100 Fernbach, P. M., & Rehder, B. (2013). Cognitive shortcuts in causal inference. Argument & Computation, 4(1), 64–88. http://doi.org/10.1080/19462166.2012.682655 Fernbach, P. M., & Sloman, S. A. (2009). Causal learning with local computations. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35(3), 678–693. http:// doi.org/10.1037/a0014928 Ghahramani, Z. (1998). Learning dynamic Bayesian networks. In Adaptive processing of sequences and data structures (Vol. 1387, pp. 168–197). Berlin; Heidelberg: Springer. Glymour, C. (2001). The mind’s arrows. Cambridge, MA: MIT Press. Goedert, K. M., & Spellman, B. A. (2005). Nonnormative discounting: There is more to cue interaction effects than controlling for alternative causes. Animal Learning & Behav­ ior, 33(2), 197–210. http://doi.org/10.3758/BF03196063 Page 43 of 51

The Acquisition and Use of Causal Structure Knowledge Goodie, A. S., Williams, C. C., & Crooks, C. L. (2003). Controlling for causally relevant third variables. The Journal of General Psychology, 130(4), 415–430. http://doi.org/ 10.1080/00221300309601167 Gopnik, A., Glymour, C., Sobel, D. M., Schulz, L. E., Kushnir, T., & Danks, D. (2004). A the­ ory of causal learning in children: Causal maps and Bayes nets. Psychological Review, 111(1), 3–32. http://doi.org/10.1037/0033-295X.111.1.3 Griffiths, T. L., & Tenenbaum, J. (2005). Structure and strength in causal induc­ tion. Cognitive Psychology, 51(4), 334–384. http://doi.org/10.1016/j.cogpsych. 2005.05.004 (p. 112)

Griffiths, T. L., & Tenenbaum, J. (2009). Theory-based causal induction. Psychological Re­ view, 116(4), 661–716. http://doi.org/10.1037/a0017201 Hagmayer, Y., & Waldmann, M. R. (2000). Simulating causal models : The way to structur­ al sensitivity. In L. R. Gleitman & A. K. Joshi (Eds.), Proceedings of the 22nd annual con­ ference of the Cognitive Science Society (pp. 214–219). Austin, TX: Cognitive Science So­ ciety. Hayes, B. K., Hawkins, G. E., & Newell, B. R. (2015). Consider the alternative: The effects of causal knowledge on representing and using alternative hypotheses in judgments un­ der uncertainty. Journal of Experimental Psychology : Learning, Memory, and Cognition, 41(6), 723–739. http://doi.org/10.1037/xlm0000205 Hayes, B. K., Hawkins, G. E., Newell, B. R., Pasqualino, M., & Rehder, B. (2014). The role of causal models in multiple judgments under uncertainty. Cognition, 133(3), 611–620. http://doi.org/10.1016/j.cognition.2014.08.011 Hayes, B. K., Newell, B. R., & Hawkins, G. E. (2013). Causal model and sampling ap­ proaches to reducing base rate neglect. Proceedings of the 35th annual conference of the Cognitive Science Society (pp. 567–572). Austin, TX: Cognitive Science Society. Heckerman, D. (1998). A tutorial on learning with Bayesian networks. In M. I. Jordan (Ed.), Learning in graphical models (pp. 301–354). Berlin: Springer. Hiddleston, E. (2005). A causal theory of counterfactuals. Nous, 39, 632–657. http:// doi.org/10.1111/j.0029-4624.2005.00542.x Hilbert, M. (2012). Toward a synthesis of cognitive biases: How noisy information pro­ cessing can bias human decision making. Psychological Bulletin, 138(2), 211–237. http:// doi.org/10.1037/a0025940 Jenkins, H. M., & Ward, W. C. (1965). Judgment of contingency between responses and outcomes. Psychological Monographs: General and Applied, 79(1), 1–17. Jones, E. (1979). The rocky road from acts to dispositions. The American Psychologist, 34(2), 107–117. Page 44 of 51

The Acquisition and Use of Causal Structure Knowledge Kelley, H. H. (1972). Causal schemata and the attribution process. In E. Jones, D. E. Kanouse, H. H. Kelley, R. E. Nisbett, S. Valins, & B. Weiner (Eds.), Attribution: Perceiving the causes of behavior (pp. 151–174). Morristown, NJ: General Learning Press. Krynski, T. R., & Tenenbaum, J. (2007). The role of causality in judgment under uncertain­ ty. Journal of Experimental Psychology. General, 136(3), 430–450. http://doi.org/ 10.1037/0096-3445.136.3.430 Lagnado, D. A., & Sloman, S. A. (2004). The advantage of timely intervention. Journal of Experimental Psychology: Learning, Memory, and Cognition, 30(4), 856–876. http:// doi.org/10.1037/0278-7393.30.4.856 Lagnado, D. A., & Sloman, S. A. (2006). Time as a guide to cause. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32(3), 451–460. http://doi.org/ 10.1037/0278-7393.32.3.451 Lagnado, D. A., Waldmann, M. R., Hagmayer, Y., & Sloman, S. A. (2007). Beyond covaria­ tion: Cues to causal structure. In A. Gopnik & L. Schulz (Eds.), Causal learning: Psycholo­ gy, philosophy, and computation (pp. 154–172). Oxford: Oxford University Press. Lu, H., Yuille, A. L., Liljeholm, M., Cheng, P. W., & Holyoak, K. J. (2008). Bayesian generic priors for causal learning. Psychological Review, 115(4), 955–984. http://doi.org/ 10.1037/a0013256 Lucas, C. G., & Griffiths, T. L. (2010). Learning the form of causal relationships using hier­ archical bayesian models. Cognitive Science, 34(1), 113–147. http://doi.org/10.1111/j. 1551-6709.2009.01058.x Mayrhofer, R., & Waldmann, M. R. (2011). Heuristics in covariation-based induction of causal models: Sufficiency and necessity priors. In C. H. Carlson & T. Shipley (Eds.), Pro­ ceedings of the 33rd annual conference of the Cognitive Science Society (pp. 3110–3115). Austin, TX: Cognitive Science Society. Mayrhofer, R., & Waldmann, M. R. (2015). Agents and causes: Dispositional intuitions as a guide to causal structure. Cognitive Science, 39(1), 65–95. http://doi.org/10.1111/cogs. 12132 McCormack, T., Frosch, C., Patrick, F., & Lagnado, D. A. (2015). Temporal and statistical information in causal structure learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 41(2), 395–416. McNair, S., & Feeney, A. (2014). When does information about causal structure improve statistical reasoning? Quarterly Journal of Experimental Psychology, 67(4), 625–645. http://doi.org/10.1080/17470218.2013.821709 McNair, S., & Feeney, A. (2015). Whose statistical reasoning is facilitated by a causal structure intervention? Psychonomic Bulletin & Review, 22(1), 258–264. http://doi.org/ 10.3758/s13423-014-0645-y Page 45 of 51

The Acquisition and Use of Causal Structure Knowledge Meder, B., Gerstenberg, T., Hagmayer, Y., & Waldmann, M. R. (2010). Observing and in­ tervening : Rational and heuristic models of causal decision making. The Open Psychology Journal, 3, 119–135. Meder, B., & Hagmayer, Y. (2009). Causal induction enables adaptive decision making. In N. A. Taatgen & H. van Rijn (Eds.), Proceedings of the 31th annual conference of the Cog­ nitive Science Society (pp. 1651–1656). Austin, TX: Cognitive Science Society. Meder, B., Hagmayer, Y., & Waldmann, M. R. (2008). Inferring interventional predictions from observational learning data. Psychonomic Bulletin & Review, 15(1), 75–80. http:// doi.org/10.3758/PBR.15.1.75 Meder, B., Hagmayer, Y., & Waldmann, M. R. (2009). The role of learning data in causal reasoning about observations and interventions. Memory & Cognition, 37(3), 249–264. http://doi.org/10.3758/MC.37.3.249 Meder, B., Mayrhofer, R., & Waldmann, M. R. (2014). Structure induction in diagnostic causal Reasoning. Psychological Review, 121(3), 277–301. http://doi.org/10.1037/ a0035944 Morris, M. W., & Larrick, R. P. (1995). When one cause casts doubt on another: A norma­ tive analysis of discounting in causal attribution. Psychological Review, 102(2), 331–355. http://doi.org/10.1037/0033-295X.102.2.331 Murphy, K. P. (2002). Dynamic bayesian networks: Representation, inference and learning. Berkeley: University of California Press. Novick, L. R., & Cheng, P. W. (2004). Assessing interactive causal influence. Psychological Review, 111(2), 455–485. http://doi.org/10.1037/0033-295X.111.2.455 Oppenheimer, D. M., & Monin, B. (2009). Investigations in spontaneous discounting. Memory & Cognition, 37(5), 608–614. http://doi.org/10.3758/MC.37.5.608 Oppenheimer, D. M., Tenenbaum, J., & Krynski, T. R. (2013). Categorization as causal ex­ planation: Discounting and augmenting in a Bayesian framework. In B. H. Ross (Ed.), (p. 113) Psychology of learning and motivation: Advances in research and theory (Vol. 58, pp. 203–231). Waltham, MA: Elsevier. http://doi.org/10.1016/ B978-0-12-407237-4.00006-2 Pacer, M. D., & Griffiths, T. L. (2011). A rational model of causal induction with continu­ ous causes. In J. Shawe-Taylor, R. S. Zemeland, P. L. Bartlett, F. Pereira, & K. Q. Weinberg­ er (Eds.), Advances in neural information processing systems (pp. 2384–2392). La Jolla, CA: Neural Information Processing Systems Foundation. Park, J., & Sloman, S. A. (2013). Mechanistic beliefs determine adherence to the Markov property in causal reasoning. Cognitive Psychology, 67(4), 186–216. http://doi.org/ 10.1016/j.cogpsych.2013.09.002 Page 46 of 51

The Acquisition and Use of Causal Structure Knowledge Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible infer­ ence. San Mateo, CA: Morgan Kaufmann. Pearl, J. (1996). Structural and probabilistic causality. In D. Shanks, K. J. Holyoak, & D. L. Medin (Eds.), Psychology of learning and motivation: Causal learning (Vol. 34, pp. 393– 435). San Diego: Academic Press. Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge, UK: Cambridge University Press. Pennington, N., & Hastie, R. (1993). Reasoning in explanation-based decision making. Cognition, 49, 123–163. Peterson, C. R., & Beach, L. R. (1967). Man as an intuitive statistician. Psychological Bul­ letin, 68(1), 29–46. Rehder, B. (2014). Independence and dependence in human causal reasoning. Cognitive Psychology, 72, 54–107. http://doi.org/10.1016/j.cogpsych.2014.02.002 Rehder, B., & Burnett, R. C. (2005). Feature inference and the causal structure of cate­ gories. Cognitive Psychology, 50(3), 264–314. http://doi.org/10.1016/j.cogpsych. 2004.09.002 Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 64–99). New York: Ap­ pleton-Century-Crofts. Rips, L. J. (2010). Two causal theories of counterfactual conditionals. Cognitive Science, 34(2), 175–221. http://doi.org/10.1111/j.1551-6709.2009.01080.x Rottman, B. M. (2016). Searching for the best cause: Roles of mechanism beleifs, autocor­ relation, and exploitation. Journal of Experimental Psychology: Learning, Memory and Cognition. http://doi.org/http://dx.doi.org/10.1037/xlm0000244 Rottman, B. M., & Ahn, W. (2009). Causal learning about tolerance and sensitization. Psy­ chonomic Bulletin & Review, 16(6), 1043–1049. http://doi.org/10.3758/PBR.16.6.1043 Rottman, B. M., & Ahn, W. (2011). Effect of grouping of evidence types on learning about interactions between observed and unobserved causes. Journal of Experimental Psycholo­ gy: Learning, Memory, and Cognition, 37(6), 1432–1448. http://doi.org/10.1037/ a0024829 Rottman, B. M., & Hastie, R. (2014). Reasoning about causal relationships: Inferences on causal networks. Psychological Bulletin, 140(1), 109–139. http://doi.org/10.1037/ a0031903

Page 47 of 51

The Acquisition and Use of Causal Structure Knowledge Rottman, B. M., & Hastie, R. (2016). Do people reason rationally about causally related events? Markov violations, weak inferences, and failures of explaining away. Cognitive Psychology, 87, 88–134. Rottman, B. M., & Keil, F. C. (2012). Causal structure learning over time: Observations and interventions. Cognitive Psychology, 64(1–2), 93–125. http://doi.org/10.1016/ j.cogpsych.2011.10.003 Rottman, B. M., Kominsky, J. F., & Keil, F. C. (2014). Children use temporal cues to learn causal directionality. Cognitive Science, 38(3), 1–25. http://doi.org/10.1111/cogs.12070 Schulz, L. E., Gopnik, A., & Glymour, C. (2007). Preschool children learn about causal structure from conditional interventions. Developmental Science, 10(3), 322–332. http:// doi.org/10.1111/j.1467-7687.2007.00587.x Shanks, D. R., & Dickinson, A. (1987). Associative accounts of causality judgment. In G. H. Bower (Ed.), The psychology of learning and motivation (Vol. 21, pp. 229–261). San Diego: Academic Press. Sloman, S. A., & Lagnado, D. A. (2005). Do we “do”? Cognitive Science, 29, 5–39. http:// doi.org/10.1207/s15516709cog2901_2 Soo, K., & Rottman, B. M. (2014). Learning causal direction from transitions with continu­ ous and noisy variables. In P. Bello, M. Guarini, M. McShane, & B. Scassellati (Eds.), Pro­ ceedings of the 36th annual conference of the Cognitive Science Society (pp. 1485–1490). Austin, TX: Cognitive Science Society. Spellman, B. A., Price, C. M., & Logan, J. M. (2001). How two causes are different from one: The use of (un)conditional information in Simpson’s paradox. Memory & Cognition, 29(2), 193–208. http://doi.org/10.3758/BF03194913 Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, prediction, and search. New York: Springer-Verlag. Spirtes, P., Glymour, C., & Scheines, R. (2000). Causation, prediction, and search (2nd ed.). Cambridge, MA: MIT Press. Steyvers, M., Tenenbaum, J., Wagenmakers, E., & Blum, B. (2003). Inferring causal net­ works from observations and interventions. Cognitive Science, 27(3), 453–489. http:// doi.org/10.1016/S0364-0213(03)00010-7 Sussman, A., & Oppenheimer, D. (2011). A causal model theory of judgment. In C. Hölsch­ er & T. Shipley (Eds.), Proceedings of the 33rd annual conference of the Cognitive Science Society (pp. 1703–1708). Austin, TX: Cognitive Science Society. Thornley, S. (2013). Using directed acyclic graphs for investigating causal paths for car­ diovascular disease. Journal of Biometrics & Biostatistics, 4(182), 1–6. http://doi.org/ 10.4172/2155-6180.1000182 Page 48 of 51

The Acquisition and Use of Causal Structure Knowledge Tversky, A., & Kahneman, D. (1980). Causal schemata in judgments under uncertainty. In M. Fishbein (Ed.), Progress in social psychology (pp. 49–72). Hillsdale, NJ: Lawrence Erl­ baum Associates. Waldmann, M. R. (1996). Knowledge-based causal induction. In D. R. Shanks, K. L. Holyoak, & D. L. Medin (Eds.), The psychology of learning and motivation (Vol. 34: Causal, pp. 47–88). San Diego: Academic Press. Waldmann, M. R. (2000). Competition among causes but not effects in predictive and di­ agnostic learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26(1), 53–76. http://doi.org/10.1037//0278-7393.26.1.53 Waldmann, M. R. (2007). Combining versus analyzing multiple causes: How domain as­ sumptions and task context affect integration rules. Cognitive Science, 31(2), 233–56. http://doi.org/10.1080/15326900701221231 Waldmann, M. R., & Hagmayer, Y. (2001). Estimating causal strength: The role of struc­ tural knowledge and processing effort. Cognition, 82(1), 27–58. http://doi.org/10.1016/ S0010-0277(01)00141-X Waldmann, M. R., & Hagmayer, Y. (2005). Seeing versus doing: two modes of accessing causal knowledge. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31(2), 216–27. http://doi.org/10.1037/0278-7393.31.2.216 Waldmann, M. R., & Holyoak, K. J. (1992). Predictive and diagnostic learning with­ in causal models: Asymmetries in cue competition. Journal of Experimental Psychology: General, 121(2), 222–236. http://doi.org/10.1037/0096-3445.121.2.222 (p. 114)

Walsh, C., & Sloman, S. A. (2008). Updating beliefs with causal models: Violations of screening off. In M. A. Gluck, J. R. Anderson, & S. M. Kosslyn (Eds.), Memory and Mind: A Festschrift for Gordon H. Bower (pp. 345–358). Mahwah, NJ: Lawrence, Earlbaum, Asso­ ciates. White, P. A. (2001). Causal judgments about relations between multilevel variables. Jour­ nal of Experimental Psychology: Learning, Memory, and Cognition, 27(2), 499–513. White, P. A. (2006). The causal asymmetry. Psychological Review, 113(1), 132–147. http:// doi.org/10.1037/0033-295X.113.1.132 Yeung, S., & Griffiths, T. L. (2011). Estimating human priors on causal strength. In Pro­ ceedings of the 33rd annual conference of the Cognitive Science Society (pp. 1709–1714). Austin, TX: Cognitive Science Society. Yeung, S., & Griffiths, T. L. (2015). Identifying expectations about the strength of causal relationships. Cognitive Psychology, 76, 1–29. http://doi.org/10.1016/j.cogpsych. 2014.11.001

Page 49 of 51

The Acquisition and Use of Causal Structure Knowledge

Notes: (1.) Here “model” is serving two purposes. First, probabilistic Bayesian models are in­ tended to be objective models of how the world works (e.g., Figure 6.1 is an objective model of cardiovascular disease). The second sense of “model,” as used by the psycholo­ gist, is that the same probabilistic model could also serve as a model of human reasoning —treating Figure 6.1 as a representation of how a doctor thinks about cardiovascular dis­ ease. (2.) Technically, the reason why it is possible to learn the direction of the causal relation is the autocorrelation, the belief that Yt→Yt+1 and that Xt→ Xt+1. Thus, the learner is really discriminating between [Yt→Yt+1←Xt+1←Xt] and [Yt→Yt+1→Xt+1←Xt], which are in different Markov equivalence classes. I thank David Danks for pointing this out. (3.) In the previous sections on learning causal structures, when the true structure is X→Y→Z, people tend to also infer the link X→Z, suggesting that they are not fully aware of the conditional independence. This section focuses on reasoning about the causal struc­ ture rather than learning, though of course they are related. (4.) This study is different from the preceding ones in two ways. First, this study did not have a normatively correct quantitative answer to compare human inferences against. Se­ cond, this study tests the comparison P(ostrich | feathers, lays eggs, cannot fly, broken wing) vs. P(ostrich | feathers, lays eggs, cannot fly), not P(ostrich | feathers, lays eggs, cannot fly, no broken wing). This is analogous to P(x = 1 | y = 1, z = 1) vs. P(x = 1 | y = 1) instead of P(x = 1 | y = 1, z = 0), so it is a slightly different comparison. (5.) I thank Michael Waldmann for highlighting these differences. (6.) Assuming a world in which causes and effects have the same base rates, on average, Fernbach et al.’s findings imply that cause-to-effect judgments would be lower than ef­ fect-to-cause judgments. However, Fernbach et al. actually assume a world in which ef­ fects have higher base rates than causes on average. Fernbach et al. (2011, p. 13) claim that a normative CBN analysis shows that inferences of P(effect = 1 | cause = 1) should be higher than P(cause = 1 | effect = 1) 65% of the time when integrating across the en­ tire parameter space with uniform priors. The reason for this finding is due to the fact that they assumed that there are alternative factors that can generate effects but not in­ hibit effects. This same analysis shows that even though causes have a base rate .5 on av­ erage, effects have a base rate of .625. So their analysis is only appropriate in worlds in which there are no inhibitory factors.

Benjamin Margolin Rottman

Learning Research and Development Center University of Pittsburgh Pittsburgh, Pennsylvania, USA

Page 50 of 51

Formalizing Prior Knowledge in Causal Induction

Formalizing Prior Knowledge in Causal Induction   Thomas L. Griffiths The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.38

Abstract and Keywords Prior knowledge plays a central role in causal induction, helping to explain how people are capable of identifying causal relationships from small amounts of data. Bayesian in­ ference provides a way to characterize the influence that prior knowledge should have on causal induction, as well as an explanation for how that knowledge could itself be ac­ quired. Using the theory-based causal induction framework of Griffiths and Tenenbaum (2009), this chapter reviews recent work exploring the relationship between prior knowl­ edge and causal induction, highlighting some of the ways in which people’s expectations about causal relationships differ from approaches to causal learning in statistics and com­ puter science. Keywords: prior knowledge, causal induction, Bayesian, intuitive theories, Bayesian models, statistics

When you walk into a hotel room, flip a light switch, and see the lights go on, you have no doubt that a causal relationship exists between the two events. When you get abdominal pain after taking a new medicine, you stop taking the medicine. When you see two objects rebound from a collision, it is straightforward to infer that one caused the motion of the other. The problems of causal induction that people face are quite different from those that arise in artificial intelligence and machine learning: rather than dealing with “big da­ ta,” people have to navigate through “small data,” often making inferences from just a handful of observations. Cheng and Lu (Chapter 5 in this volume) and Rottman (Chapter 6 in this volume) have in­ troduced the ideas behind causal graphical models, a formalism for representing causal relationships that is widely used to solve problems of causal induction in artificial intelli­ gence and machine learning (Pearl, 1988, 2000). This formalism has proven valuable in modeling human cognition, providing a new approach to developing quantitative models of how people learn causal relationships (e.g., Griffiths & Tenenbaum, 2005). However, causal graphical models alone are not enough to explain the speed with which this learn­ ing takes place—the fact that people can learn so much from so little. Giving a formal ac­ count of these inferences is important not only for understanding human cognition, but Page 1 of 20

Formalizing Prior Knowledge in Causal Induction also for bringing automated systems for causal induction into closer correspondence with human intuitions and human performance. People are able to learn causal relationships from limited amounts of data because they have prior knowledge that shapes their expectations about where causal relationships will exist and how those relationships will manifest. The idea that human causal learning is informed by prior knowledge has been proposed by multiple researchers (e.g., Lagnado & Sloman, 2004; Waldmann, 1996). The challenge is developing an appropriate formalism for capturing the content of this knowledge, and for articulating how it influences causal induction. In this chapter, I review some of the advances that have been made in understanding how prior knowledge informs human causal induction, organizing this review using the “theo­ ry-based causal induction” framework introduced by (p. 116) Griffiths and Tenenbaum (2009). Inspired by work on the role of intuitive causal theories in cognitive development and categorization (Carey, 1985; Gopnik & Meltzoff, 1997; Karmiloff-Smith, 1988; Keil, 1989; Murphy & Medin, 1985), Griffiths and Tenenbaum proposed that causal theories can be viewed as specifying a stochastic procedure for generating causal graphical mod­ els in a given domain. This procedure defines a probability distribution over graphical models, which can be used as a prior distribution in Bayesian inference. The challenge of capturing human prior knowledge then becomes a matter of identifying the theories that people bring to bear in different domains and accounting for how this knowledge itself might be acquired. The theory-based causal induction approach advocated by Griffiths and Tenenbaum di­ vides theories into three components: an ontology, which specifies entities and their prop­ erties; a set of plausible relations that indicate what relationships might exist; and the functional form of those relationships, determining how causes influence their effects. These three components specify how to generate a causal graphical model. Respectively, they identify the variables that the model is defined over, the prior probability of each causal link, and the conditional probability distributions associated with each variable. In the remainder of the chapter, I introduce the idea of hierarchical Bayesian models that motivates this approach, and then review recent work, exploring each of these compo­ nents of causal theories in turn.

Hierarchical Bayesian Models Models of human causal learning based on causal graphical models often use Bayesian in­ ference to identify the causal model that seems most likely given observed data. Formally, if we have hypotheses h ∈ H about the underlying causal model and observed data d, Bayes’s rule indicates that the probability we should assign to each hypothesis h after ob­ serving d is

Page 2 of 20

Formalizing Prior Knowledge in Causal Induction

(1)

where p(h) is the prior probability of the hypothesis h (the degree of belief assigned to that hypothesis before observing any data), p(d|h) is the likelihood (the probability of see­ ing d if h were true), and p(h|d) is the posterior probability of h after observing d. The prior distribution over hypotheses, p(h), determines which causal models we even consider when interpreting observed data, and how plausible we consider those models to be. It thus provides a way of encapsulating the prior knowledge that learners bring to a causal learning problem. Since each hypothesis is a causal graphical model—with a set of nodes corresponding to variables, a set of edges denoting causal relationships, and a con­ ditional probability distribution for each variable given its parents in the resulting graph —our prior knowledge provides a distribution over these models. Griffiths and Tenen­ baum (2009) argued that a natural way to express such a distribution is via a theory that provides a recipe for generating causal graphical models. By specifying the set of vari­ ables under consideration, the probability that edges exist between those variables, and the form of the resulting relationship, we have all the information we need to define a dis­ tribution over causal graphical models. In the theory-based causal induction framework, the three components of causal graphi­ cal models are each generated by a different component of a theory. The ontology of the theory picks out the variables in a domain. The plausible relations identified by the theory specify a distribution over edges between those variables. The functional form of those relationships determines the conditional probability distribution associated with each variable. A theory with these three components thus provides a complete stochastic gen­ erative process for producing causal graphical models, and hence fully specifies the prior distribution p(h). Of course, theories do not just guide causal learning—they are themselves learned through experience. This kind of learning can be captured using hierarchical Bayesian models. In a standard Bayesian model, a learner makes an inference about how well each of a set of hypotheses might account for observed data. A hierarchical Bayesian model adds another level of abstraction above this—in our case, a theory that generates those hypotheses. The same statistical principles that are used to evaluate hypotheses are used to evaluate these theories—based on data that are potentially accumulated across multi­ ple learning settings, the learner makes an inference about which theory best accounts for the observed data. This hierarchical Bayesian approach provides a way to explain how people might learn abstract theories at the same time as learning more concrete causal models.

Page 3 of 20

Formalizing Prior Knowledge in Causal Induction More formally, a hierarchical Bayesian model provides a way to learn a prior dis­ tribution over hypotheses. In the place of our prior p(h), we have a prior that is explicitly conditioned on a particular theory, p(h|t). This theory makes predictions about the data that will be observed by averaging over possible hypotheses, . As a (p. 117)

consequence, we can apply Bayes’s rule as in Equation 1, substituting t for h, and calcu­ late a posterior distribution over theories p(t|d). This posterior distribution can be used to inform future learning—now when we evaluate the probability of a hypothesis, we can do so by averaging the prior distribution for each theory p(h|t) over the posterior distribu­ tion on theories p(t|d). Bayesian inference provides a way to capture the effects of prior knowledge on learning, and hierarchical Bayesian models let us explain how that knowledge can itself be ac­ quired. Together, these tools have been useful for investigating how people use and learn the three components of causal theories: ontology, plausible relations, and functional form.

Ontology The ontology of a theory expresses the variables that might participate in causal relation­ ships, organizing those variables into types that determine the plausibility of particular relationships. But a basic ontological prerequisite for causal learning is the notion of causality itself. I will start by considering this prerequisite, then turn to the question of how variables and types might be identified. Can the notion of causality—specifically, the kind of intervention-based causality that Pearl (2000) used to define causal graphical models—itself be learned? Goodman, Ullman, and Tenenbaum (2011) explored this question by examining whether a Bayesian learner could form an appropriate “theory of causality” at the same time as learning causal mod­ els. Essentially, the learner was equipped with a set of possible theories characterized by logical statements that identified the relationships that could hold among variables. Two of these statements contained the key assumptions behind the intervention-based notion of causality. Exposed to data generated from causal models that followed these assump­ tions, a hierarchical Bayesian model was able to identify the appropriate theory of causal­ ity, along with the correct causal models. In fact, it formed the appropriate theory of causality before it became certain about the specific causal models—something that the authors dubbed the “blessing of abstraction.” Having developed an appropriate theory of causality, the next challenge is determining what constitutes a variable. In many experiments on human causal learning, the variables are pre-identified for the participants: they know that they are going to be observing welldelineated events, such as whether or not a person took some medicine and then dis­ played a particular symptom. Constructing a causal model is then just a matter of deter­ mining the values of the variables and the relationships between them. But in many real-

Page 4 of 20

Formalizing Prior Knowledge in Causal Induction world causal learning situations, the variables themselves are ill-defined—how can you learn that tickling makes people laugh without knowing exactly what constitutes tickling? Goodman, Mansinghka, and Tenenbaum (2007) tackled this problem by exploring a class of causal graphical models in which discrete variables translated into events in a continu­ ous space. The presence or absence of a variable corresponded to a point appearing in a particular region of a two-dimensional space. The challenge for the learner was thus to identify the correspondence between the locations in the continuous space and the dis­ crete variables to be used in causal learning—“grounding” the causal model in low-level perceptual dimensions. Using Bayesian inference, this learning can be done in parallel with identifying the underlying causal structure. The exact regions corresponding to each variable do not need to be pinned down before causal learning can take place, another in­ stance of the blessing of abstraction. Learning an ontology is not just a matter of recognizing what variables participate in causal relationships—those variables need to be organized into different types that can influence one another. Kemp, Tenenbaum, Niyogi, and Griffiths (2010) presented a formal framework that addresses this problem. Based on a kind of statistical model known as a “stochastic blockmodel” (Nowicki & Snijders, 2001; Wang & Wong, 1987), the framework assumes that entities are arranged into an unknown number of types and that the proba­ bility that a relationship exists between two entities depends only on the types of those entities. These assumptions are the basic ingredients of an ontology, as specified by Grif­ fiths and Tenenbaum (2009). Using Bayesian inference, it is possible to compute a poste­ rior distribution over both the type assigned to each entity and the probability that a rela­ tionship exists between them. As might be expected from the other examples discussed in this section, identifying the types is often easier than resolving individual causal relation­ ships. Kemp et al. (2010) found support for this framework through two experiments in which people learned causal relationships between computer-simulated blocks: some blocks would light up when brought into contact with other blocks, or would light up the blocks that they were brought into contact with. The underlying causal structure was one in which the blocks fell into different types, such as one type that always caused the other type to light up. People were able to figure out this structure—the ontology of the domain —and the ease with which they figured it out depended on a set of factors predicted by (p. 118)

the formal framework, such as the reliability of the causal relationship. Learning ontologies of even this most basic kind plays an important role in facilitating causal learning. Mansinghka, Kemp, Tenenbaum, and Griffiths (2006) demonstrated this by applying a similar formal framework to a set of causal induction problems from ma­ chine learning. In these problems, an automated learner was presented with data sam­ pled from a particular causal structure. The learner was able to identify the correct causal structure faster—from less data—when it had the capacity to learn a basic ontol­ ogy, organizing variables into types. Just having this capacity induces a different prior dis­

Page 5 of 20

Formalizing Prior Knowledge in Causal Induction tribution on the possible causal structures that the learner might encounter, and that pri­ or distribution guides the learner toward correct solutions more often. These results reflect significant progress toward explaining how people might learn on­ tologies relevant to causal induction and illustrate how insights from human causal learn­ ing can be applied in artificial intelligence and machine learning. However, there are still interesting problems to be solved in the area of ontology learning. For example, changing the ontological distinctions made in a given domain seems to be one element of conceptu­ al change—a difference in the basic terms used to describe the domain that can make theories incommensurable (Carey, 1985). One mechanism that has been held up as poten­ tially playing a role in this kind of conceptual change is analogy: recognizing that the structure of one domain mimics another (Carey, 2009; Gentner, 1983; Gick & Holyoak, 1980). Formalizing this process—and more generally, considering how ontologies in dif­ ferent domains relate to and inform one another—is an important part of gaining a deep­ er understanding of the nature of intuitive theories.

Plausible Relations The methods for learning ontologies reviewed in the previous section also identify the probabilities with which relationships exist between different types of entities, simply be­ cause the propensities of these relationships are the best cue to the structure of the on­ tology. The importance of this kind of knowledge has been illustrated by several recent studies. These studies also provide insight into the way in which knowledge about plausi­ ble relations is represented, and the range of settings in which it is drawn upon. Ontological knowledge should directly influence the beliefs that people have about which causal relationships are plausible. Support for this idea comes from a series of experi­ ments by Lien and Cheng (2000), which showed that people used categories to interpret contingency information that might otherwise be causally ambiguous. In these experi­ ments, participants judged whether a substance caused flowers to bloom. The experiment was designed so that participants did not obtain enough information to clearly identify whether this causal relationship existed for any single flower. However, the visual fea­ tures of the flowers were manipulated to suggest that certain flowers formed a category. This category structure provided a way to aggregate evidence for a causal relationship, and when a coherent category structure existed, people inferred that the substance did cause flowers to bloom. Waldmann and Hagmayer (2006) built on these results by conducting a series of experi­ ments in which people were explicitly trained on category structures, and then learned about causal relationships involving the objects that belong to those categories. They found that people do tend to use category structures in evaluating causal relationships, particularly when the category structures are coherent and have labels corresponding to natural kinds (e.g., viruses). People tended to use the category information even when the category structure that they had learned was not ideal for understanding the causal prop­ erties of the objects. However, when the learned categories consisted of arbitrary objects Page 6 of 20

Formalizing Prior Knowledge in Causal Induction rather than having a more naturalistic structure, people were more willing to abandon the categories when learning about causal relationships. Going beyond the effect of ontologies on the plausibility of causal relationships, a natural question to ask is how we decide whether or not a (p. 119) relationship is plausible. To ex­ plore this question, Perales, Catena, Maldonado, and Cándido (2007) conducted an exper­ iment in which the expectations that participants had about whether or not a causal rela­ tionship existed were manipulated, and then examined the consequences of providing ad­ ditional covariational evidence. Prior expectations were manipulated either by providing a putative mechanism, providing details of the extent to which cause and effect tended to co-occur, or giving both mechanism and covariation information. The resulting prior be­ liefs about the relationship were then assessed. Finally, participants were presented with covariational evidence that was either strong or weak, and were asked to assess the relia­ bility of that evidence, as well as the degree to which the cause produced the effect given all of the information that had been provided. As might be expected on a theory-based ac­ count of causal induction, these final judgments reflected a combination of the prior be­ liefs and the strength of the evidence, modulated by the perceived reliability of that evi­ dence. The results of Perales et al. (2007) illustrate one way that mechanism information can play a role in causal induction: in determining the prior probability that a causal relation­ ship exists, through the plausibility of possible mechanisms.1 These results show that mechanism information and covariational evidence can be cashed out in a common cur­ rency—belief about whether a causal relationship exists. Further support for this idea was provided by a second study by Catena, Maldonado, Perales, and Cándido (2008), in which they showed that strong covariational evidence can trump prior beliefs established by mechanism information, but that weak covariational evidence can in turn be trumped by those prior beliefs. The plausibility of causal relationships also plays a role in other aspects of human reason­ ing. For example, Griffiths and Tenenbaum (2007) argued that our sense of “coincidence” arises in a very specific situation: when we get strong evidence for a causal relationship in a context where we think no such relationship exists. This suggests an interesting methodology for exploring people’s beliefs about the plausibility of different kinds of rela­ tionships: explore the kinds of events that elicit that sense of coincidence. Griffiths (2015) used a variant on this methodology to explore people’s intuitions about the sorts of causal forces that might exist in the world. Rather than asking people to eval­ uate coincidences, I asked people to evaluate magic tricks. Arguably, a good magic trick meets the same criterion as a good coincidence: it makes us believe something we think is impossible could be possible. So, exploring the quality of magic tricks is a guide to the graded degrees of impossibility we assign to different kinds of causal relationships. In particular, I found that magical transformations that move down an ontological hierarchy —adding features such as animacy and agency—are considered better tricks than those that move up an ontological hierarchy. As a magician, you are better off transforming a Page 7 of 20

Formalizing Prior Knowledge in Causal Induction statue of a tiger into a live tiger than vice versa. These transformations push harder against our sense of what is causally possible, revealing subtle aspects of our intuitive theories that might otherwise be hard to identify. There are undoubtedly many other com­ mitments about the plausibility of causal relationships that could be discovered in a simi­ lar fashion.

Functional Form The third component of causal theories, after ontology and plausible relations, is the func­ tional form of causal relationships. Functional form has two parts: how causes combine together, and how strongly those causes influence their effects. In this section I first con­ sider these two aspects of functional form for relationships in which both cause and effect are binary, being either present or absent. I then highlight ways in which recent research has gone beyond this assumption.

How Causes Combine For elemental causal induction—determining whether a causal relationship exists be­ tween a single binary cause and a single binary effect—there is a substantial literature supporting the idea that causes combine via a relationship that is known in the artificial intelligence and machine learning literature as a “noisy-OR” (Pearl, 1988). Specifically, for a cause C and effect E, the probability that the effect occurs (e+) is given by

(2)

where w1 is the strength of the cause C, c = 1 if the cause is present and 0 otherwise, and w0 is the strength of the background—the probability that the effect occurs in the ab­ sence of the cause. Since the cause can only increase the probability of the (p. 120) effect, Equation 1 applies to generative causes. For preventive causes, the analogous equation is that of the “noisy-AND-NOT”,

(3)

The idea that people combine causes in this way has its roots in Cheng’s (1997) theory of causal power, which Griffiths and Tenenbaum (2005) showed to be an estimator of w1 (see Cheng & Lu, Chapter 5 in this volume). Further support comes from developmental stud­ ies in which children identify which objects cause a “blicket detector” to activate— Bayesian inference over causal graphical models with this parameterization can explain how children are able to make these inferences from only a few observations (Griffiths, Sobel, Tenenbaum, & Gopnik, 2011). However, some of the strongest evidence for this form of combining causes comes from studies in which people were explicitly asked to Page 8 of 20

Formalizing Prior Knowledge in Causal Induction evaluate multiple causal relationships, using designs in which different functional forms were pitted against one another. The key assumption behind Equations 2 and 3 is that each cause (in these cases, C and the background) has an independent opportunity to generate (or prevent) the effect. This as­ sumption of independence leads to some natural predictions. Hagmayer and Waldmann (2007) and Carroll and Cheng (2010) both used designs in which people should be led to postulate additional hidden causes if they assume that causes independently influence their effects, and both found support for this assumption. But perhaps the most com­ pelling evidence comes from a study by Liljeholm and Cheng (2007), in which people were asked to evaluate the influence of a cause that had only been observed in circum­ stances that were confounded with other causes. Specifically, they presented participants with information about two causes A and B (in this case, medicines being evaluated for side effects). Participants saw information about the rate of headaches in the absence of medicine, and patients who were exposed to either A or both A and B. Finally, they were asked to judge whether B had an effect on headaches. Crucially, the contingency informa­ tion was constructed so that in one condition the rate of headaches followed Equation 1, while in the other condition it violated Equation 1. A higher proportion of participants judged B to be a cause when Equation 1 was violated—that is, when the evidence could not be accounted for by an independent influence of A and the background. These studies make it clear that people have a strong tendency to assume that causes combine independently—a “disjunctive” view of generative causes, as expressed in the noisy-OR. A natural question to ask is how this tendency is acquired: Can we learn an ap­ propriate functional form for a causal system? Lucas and Griffiths (2010) explored this question, and found that people can learn the functional form of causal relationships from relatively little evidence. They conducted a series of studies in which people interacted with a machine that would light up and play music when blocks were placed on it. The question was how the machine worked—what the underlying causal relationship was be­ tween the blocks and the machine lighting up and playing music. Lucas and Griffiths gave people some initial experience with the machine, and then asked them questions about a new set of blocks. The initial experience was designed to encourage people to think about the machine following either a disjunctive (noisy-OR-like) or conjunctive (noisy-AND-like) functional form. Table 7.1 shows the initial experiences that people had in the different conditions of Ex­ periment 1 of Lucas and Griffiths (2010). Participants were first presented with one of three sets of events involving three objects (A, B, C), suggesting either a deterministic disjunctive, conjunctive, or noisy disjunctive theory, depending on condition. Next, partici­ pants in all conditions saw a set of events with three new objects (D, E, F) that were com­ patible with all three functional forms: D– D– D– E– DF+ DF+ (i.e., object D failing to acti­ vate the machine three times, E failing once, and a combination of D and F succeeding twice). This second set of events was selected so that people would make different judg­

Page 9 of 20

Formalizing Prior Knowledge in Causal Induction ments about which blocks could activate the machine, depending on the conclusions they had (p. 121) drawn about the functional form of the causal relationship. Table 7.1 Evidence Presented to Participants in Experiment 1 of Lucas and Griffiths (2010) Block

Evidence

Causes

Conjunctive training

A− B− C− AB− AC+ BC−

A,C

Noisy disjunctive training

A+ B− C− AB− AC + BC−

A

Deterministic disjunctive training

A+ B− C− AB+ AC+ BC−

A

Test

D− D− D− E− DF+ DF +

Figure 7.1 Results of Experiment 1 from Lucas and Griffiths (2010), showing mean human ratings of the probability that test objects cause the machine to ac­ tivate and the predictions of a hierarchical Bayesian model that infers the appropriate functional form.

Figure 7.1 shows the results of the experiment. When people were asked to rate how like­ ly it was that each block had a causal relationship to activating the machine, the answers were very different, depending on which condition people were assigned to. Their initial experience, suggesting different functional forms, colored the inferences that the partici­ pants made about new causal relationships. As shown in the figure, these inferences could be captured by a hierarchical Bayesian model, in which the functional form of the causal relationship was expressed as an additional abstract variable along with the hy­ potheses about the specific causal structure that applied to the blocks that had been ob­ served. The tendency to favor disjunctive causal relationships documented in other stud­ ies could thus be explained as the result of learning—people have had a lot of experience suggesting that causes act independently, which they apply in appropriate situations. Extending this theory of how people might learn the functional form of causal relation­ ships to a wider range of situations and possible forms is an important direction for fu­ ture research. Undoubtedly, people assume different functional forms in other contexts. For example, Waldmann (2007) used a paradigm in which participants made inferences about the consequences of multiple causes combining, where the same statistical infor­ Page 10 of 20

Formalizing Prior Knowledge in Causal Induction mation was presented in different contexts—in one case participants were told that the taste of a medicine modulated its effect, in the other case the strength of the medicine. Waldmann found that when people are told that the underlying mechanism is based on taste they average the strengths of the causes, but when they are told the underlying mechanism is based on strength they imagine the influence of the causes to be additive. This leaves a number of open questions about the range of functional forms that people entertain, the contexts in which they entertain them, and the mechanism knowledge that guides these expectations.

Strength of Causes For causal relationships that follow a functional form like that given in Equations 1 and 2, fully specifying people’s prior knowledge also requires characterizing beliefs about the strength of causes—the parameters w1 and w0 denoting the strength of the cause and the background, respectively. In a Bayesian model of causal induction, this knowledge is specified as a prior distribution over these parameters. In the Bayesian model of causal induction proposed by Griffiths and Tenenbaum (2005), the prior distribution on w1 and w0 was assumed to be uniform—not favoring any particu­ lar value of these parameters over another. This was not intended to be a strong claim about human cognition, but rather a simplifying assumption to reduce the complexity of the resulting model. Subsequently, Lu and colleagues (Lu, Yuille, Liljeholm, Cheng, & Holyoak, 2006, 2007, 2008) demonstrated that a closer fit to human judgments could be obtained by using a different form for this prior distribution. Lu and colleagues argued that people might assume that causes trade off with one anoth­ er—that individually causes tend to be strong, but that for any given phenomenon with multiple potential causes only one of those causes is likely to be strong. The resulting “sparse and strong” prior makes very specific predictions for the distribution of w1 and (p. 122) w , as shown in Figure 7.2. In several experiments, a Bayesian model using this 0 prior produced predictions that were closer to human judgments than a model using the uniform prior assumed by Griffiths and Tenenbaum. The results of Lu and colleagues indicated that a sparse and strong prior does a better job of capturing people’s expectations about the strength of causal relationships than a uniform prior. However, it remains possible that there are other prior distributions that will result in an even closer match to human judgments. Exploring the space of possible prior distributions over w1 and w0 is potentially challenging—there are literally uncount­ ably many probability distributions on two variables that could be tried. As a conse­ quence, we need a different procedure for exploring the space of possible distributions. Yeung and Griffiths (2015) used just such a procedure, building on previous work that suggested that the process of “iterated learning” can reveal human prior distributions (Canini, Griffiths, Vanpaemel, & Kalish, 2014 (p. 123) ; Griffiths, Christian, & Kalish, 2006; Kalish, Griffiths, & Lewandowsky, 2007; Lewandowsky, Griffiths, & Kalish, 2009). In iter­ ated learning, the choices that a participant makes on one trial determine the data that Page 11 of 20

Formalizing Prior Knowledge in Causal Induction are seen on the next trial. For example, in one of the experiments conducted by Yeung and Griffiths, participants saw contingency data (the frequency with which the effect oc­ curred in the presence and absence of the cause) and were asked questions that required them to estimate the causal strengths w1 and w0. Those estimates were then plugged into Equation 1 for generative causes and Equation 2 for preventive causes, and the resulting probabilities were used to generate the contingencies presented on a subsequent trial. Assuming the estimates of the strengths were made by sampling from the posterior distri­ bution given by applying Bayes’s rule to the contingency data, this procedure will con­ verge over time to the prior distribution on w1 and w0.

Figure 7.2 Prior distributions on causal strength. (A) Sparse and strong (SS) priors proposed by Lu et al. (2008) for generative and preventive causes. The horizontal axes correspond to the strength of the cause (w1) and the background (w0). The vertical ax­ is indicates the relative probability density at each (w0 w1) pair. (B) Smoothed empirical estimates of human priors on causal strength from Yeung and Griffiths (2015).

By examining the distribution of the values of w1 and w0 that emerged from this process, Yeung and Griffiths were able to estimate a prior distribution on causal strength. This dis­ tribution is shown in Figure 7.2. Like the sparse and strong prior suggested by Lu and colleagues, this prior favors w1 being strong—near deterministic causal relationships are considered more probable. Unlike the sparse and strong prior, there is no preference for the value of w0, and no competition between w1 and w0. A Bayesian model using this prior distribution outperformed models using both the uniform and the sparse and strong prior in predicting people’s judgments of causal strength across a very large range of contin­ gencies.

Page 12 of 20

Formalizing Prior Knowledge in Causal Induction Finding that people favor strong causal relationships—indeed, relationships that are near deterministic—makes other predictions about human causal learning. For example, peo­ ple should favor causal structures that result in more deterministic causal relationships. Mayrhofer and Waldmann (2015) explored this hypothesis, using a task in which people had to identify the direction of a causal relationship. Under a generic view of causality in which a causal relationship corresponds only to statistical dependence between variables, the direction of relationships cannot be resolved. But if causal relationships should be near-deterministic, we should prefer the direction that results in a greater degree of de­ terminism. Mayrhofer and Waldmann found support for this idea, illustrating a way in which expectations about the strength of causal relationships can influence causal struc­ ture learning. More generally, iterated learning is a powerful method for exploring the prior knowledge that people bring to bear on inductive problems and can potentially be used to explore other questions about prior knowledge in causal induction. Similar experiments could be conducted to identify priors on functional form, causal structures, and ontologies. Ulti­ mately, developing a complete picture of the prior knowledge that informs causal induc­ tion is going to require putting these different pieces together. A first step in this direc­ tion was taken by Kemp, Goodman, and Tenenbaum (2010), who specified a framework in which ontologies, plausible relations, and the strength of causal relationships were all learned in parallel—a representation that the authors called “causal schemas.” Iterated learning provides a tool that can be used to investigate the structure of these causal schemas and how they vary across different domains.

Expanding the Scope of Models of Causal Induction People’s expectations about the functional form of causal relationships between binary variables are now well understood. However, the bigger challenge for research on human causal induction is to capture the complexity of the circumstances in which people infer causal relationships. Binary variables are over-represented in the laboratory relative to their prevalence in the world—in many situations, people infer causal relationships from continuous variables, unfolding in continuous time. Part of the goal of the theory-based approach to causal induction was to be able to define models that engage with this com­ plexity. One natural extension beyond binary variables is to consider continuous causes: causes that vary continuously along a single dimension, rather than being either present or ab­ sent. Marsh and Ahn (2009) ran a series of experiments in which they explored how peo­ ple reasoned about situations that involved continuous causes, but argued that these con­ tinuous quantities were discretized into categories before evaluating causal relationships. Griffiths and Pacer (2011) showed that the results of Marsh and Ahn could be captured by a simple Bayesian model where the continuous causes were assumed to be converted into a probabilistic strength (the analogue of w1 in Equation 1). Furthermore, this model pro­ vided better predictions of people’s inferences about causal relationships involving con­

Page 13 of 20

Formalizing Prior Knowledge in Causal Induction tinuous causes in a novel experiment than several other standard statistical models for hypothesis testing. Lu, Rojas, Beckers, and Yuille (2016) considered the converse setting, where bina­ ry causes influence a continuous effect. In this context there are many different possible functional forms, and Lu et al. considered models in which causes combine linearly, un­ dergo a “noisy-MAX” operation in which their values are mixed together in a way that fa­ vors the largest, or participate in a noisy-OR. They found that the particular integration rule that people tended to favor depended on the structure of the problem—essentially, whether people had received evidence that the effect was binary or continuous. This pat­ tern could be accounted for by a hierarchical Bayesian model in which functional form was inferred from past experience. Further details of this and related work appear in Boddez, De Houwer, & Beckers (Chapter 4 in this volume). (p. 124)

Learning causal relationships from data that unfold in continuous time poses a different kind of challenge: without discrete trials on which events occur, it is hard to apply tradi­ tional models of human learning. As a consequence, papers that have looked at events that evolve over time have tended to try to discretize those events (e.g., Greville & Buehn­ er, 2010). Griffiths and Tenenbaum (2009) indicated how the theory-based causal induc­ tion framework could be extended to capture continuous-time events, and used this ap­ proach to model causal learning from the rates of events (e.g., Griffiths & Tenenbaum, 2005) and simple physical interactions. Pacer and Griffiths (2012, 2015) developed this formalism into a complete continuous-time analog of causal graphical models based on Poisson processes, and showed that this approach could account for results from several recent studies that have explored this richer form of causal learning (e.g., Lagnado & Speekenbrink, 2010). The topic of time is explored in more detail by Buehner (Chapter 28 in this volume). Continuous variables and continuous time bring causal induction closer to intuitive physics: reasoning about the properties of physical objects and their interactions. Mi­ chotte (1963) conducted a classic series of experiments exploring the circumstances un­ der which people would attribute causality to collisions: when people would view one ob­ ject as causing another to move. Subsequent work on the perception of collisions argued that people’s judgments seemed to be guided by simple heuristics, rather than any real intuitive understanding of Newton’s laws. Sanborn, Mansinghka, and Griffiths (2009; 2013) showed that taking a Bayesian approach to intuitive physics—in which uncertainty in the location and velocity of objects is appropriately modeled, but objects are assumed to move according to Newtonian dynamics—could account for many of the phenomena taken as providing evidence for heuristic-based reasoning. In addition, this approach could be used to model the circumstances under which people attribute causality to colli­ sions. Subsequently, this “noisy Newton” approach has been used to model a variety of other phenomena at the intersection of physics and causal reasoning (e.g., Gerstenberg, Goodman, Lagnado, & Tenenbaum, 2012). Exploring this territory further will deepen our understanding of the physical routes of causality and may provide insights that will prop­

Page 14 of 20

Formalizing Prior Knowledge in Causal Induction agate to other aspects of causal induction (see Wolff & Thorstad, Chapter 9 in this vol­ ume; Gerstenberg & Tenenbaum, Chapter 27 in this volume).

The Development of Prior Knowledge About Causal Relationships The results summarized in the previous sections have charted the state of prior knowl­ edge about causal relationships that guides adults when they perform causal induction. But how is that adult state reached, and how do children differ from adults in their as­ sumptions about causal relationships? The hierarchical Bayesian models used to explain learning of ontologies, plausible relations, and functional form provide a possible answer to the first question—through exposure to causal relationships in the world, people gradu­ ally form expectations at an abstract level that aid in future learning. This should lead us to expect that the prior knowledge of children might be less tightly coupled to that world. Prior knowledge helps learners when it accurately characterizes the inductive problem that they face. But it can also impede learning when this is not the case. Having strong expectations that are simply incorrect can make it harder to reach the right conclusion. This leads to an interesting prediction: if adults have internalized certain properties of causal relationships, such as the idea that causes independently influence their effects, then they will be slower to learn in situations where those properties do not hold. More­ over, if children have not yet internalized those properties, then children might be capa­ ble of learning faster than adults. Lucas, Bridgers, Griffiths, and Gopnik (2014) tested this hypothesis, using a variant of the task developed by Lucas and Griffiths (2010) with both adults and children. Consistent with the previous results of Lucas and Griffiths, they found that adults (p. 125) could learn to distinguish between disjunctive and conjunctive causal relationships. They also found that children could learn to make the same distinction. But, critically, children identified conjunctive causal relationships from limited data more reliably than adults—they did not have the bias toward disjunctive causal relationships that adults displayed. These results support the idea that prior knowledge about causal relationships is learned through experience, with that experience gradually bringing our expectations into align­ ment with the world we face. However, functional form is just one part of this prior knowledge. Understanding how richer forms of prior knowledge develop is an important challenge for future research, helping to complete the picture of how people become so effective at causal learning. Further discussion of the development of this knowledge is provided by Muentener and Bonawitz (Chapter 33 in this volume).

Page 15 of 20

Formalizing Prior Knowledge in Causal Induction

Conclusions Prior knowledge about ontologies, plausible relations, and functional form informs the in­ ferences that people make about causal relationships. Each of these kinds of knowledge helps to constrain the hypotheses that people might entertain, and reduces the amount of data that human learners need in order to pick out the relationships in the world around them. Incorporating this prior knowledge into automated systems for identifying causal relationships has the potential to make those automated systems more efficient, and to bring them into closer alignment with human cognition. In particular, assuming that caus­ es independently influence their effects and that causes are strong seems to be key to capturing human causal learning, and is at odds with standard methods in both Bayesian (e.g., Friedman & Koller, 2000) and constraint-based (e.g., Pearl, 2000; Spirtes, Glymour, & Schienes, 2001) machine-learning approaches. Causality is a richer concept than a lack of independence, or even sensitivity to a particular pattern of intervention: when people say that one thing causes another, they implicitly assume the relationship to be indepen­ dent and strong. By capturing these implicit assumptions, we can come closer to a formal definition of causality that aligns with human intuition.

Author Note The writing of this chapter was supported in part by a grant from the Air Force Office of Scientific Research (grant number FA-9550-13-1-0170).

References Canini, K. R., Griffiths, T. L., Vanpaemel, W., & Kalish, M. L. (2014). Revealing human in­ ductive biases for category learning by simulating cultural transmission. Psychonomic Bulletin & Review, 21, 785–793. Carey, S. (1985). Conceptual change in childhood. Cambridge, MA: MIT Press. Carey, S. (2009). The origin of concepts. Oxford: Oxford University Press. Carroll, C. D., & Cheng, P. W. (2010). The induction of hidden causes: Causal mediation and violations of independent causal influence. In Proceedings of the 32nd annual confer­ ence of the Cognitive Science Society (pp. 913–918). Austin, TX: Cognitive Science Soci­ ety. Catena, A., Maldonado, A., Perales, J. C., & Cándido, A. (2008). Interaction between previ­ ous beliefs and cue predictive value in covariation-based causal induction. Acta Psycho­ logica, 128, 339–349. Cheng, P. (1997). From covariation to causation: A causal power theory. Psychological Re­ view, 104, 367–405.

Page 16 of 20

Formalizing Prior Knowledge in Causal Induction Friedman, N., & Koller, D. (2000). Being Bayesian about network structure. In Proceed­ ings of the 16th annual conference on uncertainty in AI (pp. 201–210). Stanford, CA: Mor­ gan Kaufmann. Gentner, D. (1983). Structure-mapping: A theoretical framework for analogy. Cognitive Science, 7, 155–170. Gerstenberg, T., Goodman, N., Lagnado, D. A., & Tenenbaum, J. B. (2012). Noisy newtons: Unifying process and dependency accounts of causal attribution. In Proceedings of the 34th annual conference of the Cognitive Science Society (pp. 378–383). Austin, TX: Cogni­ tive Science Society. Gick, M. L., & Holyoak, K. J. (1980). Analogical problem solving. Cognitive Psychology, 12, 306–355. Goodman, N. D., Mansinghka, V. K., & Tenenbaum, J. B. (2007). Learning grounded causal models. In Proceedings of the 29th annual conference of the Cognitive Science Society (pp. 305–310). Austin, TX: Cognitive Science Society. Goodman, N. D., Ullman, T. D., & Tenenbaum, J. B. (2011). Learning a theory of causality. Psychological Review, 118, 110–119. Gopnik, A., & Meltzoff, A. N. (1997). Words, thoughts, and theories. Cambridge, MA: MIT Press. Greville, W. J., & Buehner, M. J. (2010). Temporal predictability facilitates causal learning. Journal of Experimental Psychology: General, 139(4), 756–771. Griffiths, T. L. (2015). Revealing ontological commitments by magic. Cognition, 136, 43– 48. Griffiths, T. L., Christian, B. R., & Kalish, M. L. (2006). Revealing priors on category struc­ tures through iterated learning. In Proceedings of the 28th annual conference of the Cog­ nitive Science Society. Mahwah, NJ: Lawrence Erlbaum Associates. Griffiths, T. L., & Pacer, M. (2011). A rational model of causal inference with continuous causes. In Advances in neural information processing systems (pp. 2384–2392). (p. 126)

Griffiths, T. L., Sobel, D. M., Tenenbaum, J. B., & Gopnik, A. (2011). Bayes and

blickets: Effects of knowledge on causal induction in children and adults. Cognitive Science, 35, 1407–1455. Griffiths, T. L., & Tenenbaum, J. B. (2005). Structure and strength in causal induction. Cognitive Psychology, 51, 354–384. Griffiths, T. L., & Tenenbaum, J. B. (2007). From mere coincidences to meaningful discov­ eries. Cognition, 103, 180–226.

Page 17 of 20

Formalizing Prior Knowledge in Causal Induction Griffiths, T. L., & Tenenbaum, J. B. (2009). Theory-based causal induction. Psychological Review, 116, 661–716. Hagmayer, Y., & Waldmann, M. R. (2007). Inferences about unobserved causes in human contingency learning. Quarterly Journal of Experimental Psychology, 60, 330–355. Kalish, M. L., Griffiths, T. L., & Lewandowsky, S. (2007). Iterated learning: Intergenera­ tional knowledge transmission reveals inductive biases. Psychonomic Bulletin and Review, 14, 288–294. Karmiloff-Smith, A. (1988). The child is a theoretician, not an inductivist. Mind and Lan­ guage, 3, 183–195. Keil, F. C. (1989). Concepts, kinds, and cognitive development. Cambridge, MA: MIT Press. Kemp, C., Tenenbaum, J. B., Niyogi, S., & Griffiths, T. L. (2010). A probabilistic model of theory formation. Cognition, 114, 165–196. Lagnado, D., & Sloman, S. A. (2004). The advantage of timely intervention. Journal of Ex­ perimental Psychology: Learning, Memory, and Cognition, 30, 856–876. Lagnado, D. A., & Speekenbrink, M. (2010). The influence of delays in real-time causal learning. The Open Psychology Journal, 3(2), 184–195. Lewandowsky, S., Griffiths, T. L., & Kalish, M. L. (2009). The wisdom of individuals: Ex­ ploring people’s knowledge about everyday events using iterated learning. Cognitive Science, 33, 969–998. Lien, Y., & Cheng, P. W. (2000). Distinguishing genuine from spurious causes: A coherence hypothesis. Cognitive Psychology, 40, 87–137. Liljeholm, M., & Cheng, P. W. (2007). When is a cause the “same?” coherent generaliza­ tion across contexts. Psychological Science, 18, 1014–1021. Lu, H., Rojas, R. R., Beckers, T., & Yuille, A. L. (2016). A ayesian theory of sequential causal learning and abstract transfer. Cognitive Science, 40, 404–439. Lu, H., Yuille, A., Liljeholm, M., Cheng, P. W., & Holyoak, K. J. (2006). Modeling causal learning using Bayesian generic priors on generative and preventive powers. In R. Sun & N. Miyake (Eds.), Proceedings of the 28th annual conference of the Cognitive Science So­ ciety (pp. 519–524). Mahwah, NJ: Lawrence Erlbaum Associates. Lu, H., Yuille, A., Liljeholm, M., Cheng, P. W., & Holyoak, K. J. (2007). Bayesian models of judgments of causal strength: A comparison. In D. S. McNamara & G. Trafton (Eds.), Pro­ ceedings of the 28th annual conference of the Cognitive Science Society (pp. 1241–1246). Mahwah, NJ: Lawrence Erlbaum Associates.

Page 18 of 20

Formalizing Prior Knowledge in Causal Induction Lu, H., Yuille, A. L., Liljeholm, M., Cheng, P. W., & Holyoak, K. J. (2008). Bayesian generic priors for causal learning. Psychological Review, 115(4), 955–984. Lucas, C. G., Bridgers, S., Griffiths, T. L., & Gopnik, A. (2014). When children are better (or at least more open-minded) learners than adults: Developmental differences in learn­ ing the forms of causal relationships. Cognition, 131, 284–299. Lucas, C. G., & Griffiths, T. L. (2010). Learning the form of causal relationships using hier­ archical ayesian models. Cognitive Science, 34, 113–147. Marsh, J. K., & Ahn, W.-k. (2009). Spontaneous assimilation of continuous values and tem­ poral information in causal induction. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35, 334–352. Mayrhofer, R., & Waldmann, M. R. (2015). Sufficiency and necessity assumptions in causal structure induction. Cognitive Science, 1–14. Michotte, A. (1963). The perception of causality. New York: Basic Books. Murphy, G. L., & Medin, D. L. (1985). The role of theories in conceptual coherence. Psy­ chological Review, 92, 289–316. Nowicki, K., & Snijders, T. A. B. (2001). Estimation and prediction for stochastic block­ structures. Journal of the American Statistical Association, 96, 1077–1087. Pacer, M., & Griffiths, T. L. (2012). Elements of a rational framework for continuous-time causal induction. In Proceedings of the 34th annual conference of the Cognitive Science Society (pp. 833–838). Austin, TX: Cognitive Science Society. Pacer, M. D., & Griffiths, T. L. (2015). Upsetting the contingency table: Causal induction over sequences of point events. In Proceedings of the 37th annual conference of the Cog­ nitive Science Society (pp. 1805–1810). Austin, TX: Cognitive Science Society. Pearl, J. (1988). Probabilistic reasoning in intelligent systems. San Francisco: Morgan Kaufmann. Pearl, J. (2000). Causality: Models, reasoning and inference. Cambridge, UK: Cambridge University Press. Perales, J. C., Catena, A., Maldonado, A., & Cándido, A. (2007). The role of mechanism and covariation information in causal belief updating. Cognition, 105, 704–714. Sanborn, A. N., Mansinghka, V. K., & Griffiths, T.L. (2009). A Bayesian framework for modeling intuitive dynamics. Proceedings of the 31st Annual Conference of the Cognitive Science Society (pp. 1145–1150). Austin, TX: Cognitive Science Society. Sanborn, A. N., Mansinghka, V. K., & Griffiths, T. L. (2013). Reconciling intuitive physics and Newtonian mechanics for colliding objects. Psychological Review, 120, 411–437. Page 19 of 20

Formalizing Prior Knowledge in Causal Induction Spirtes, P., Glymour, C., & Schienes, R. (2001). Causation prediction and search (2nd ed.). Cambridge, MA: MIT Press. Waldmann, M. R. (1996). Knowledge-based causal induction. In The psychology of learn­ ing and motivation (Vol. 34, pp. 47–88). San Diego: Academic Press. Waldmann, M. R. (2007). Combining versus analyzing multiple causes: How domain as­ sumptions and task context affect integration rules. Cognitive Science, 31, 233–256. Waldmann, M. R., & Hagmayer, Y. (2006). Categories and causality: The neglected direc­ tion. Cognitive Psychology, 53, 27–58. Wang, Y. J., & Wong, G. Y. (1987). Stochastic blockmodels for directed graphs. Journal of the American Statistical Association, 82, 8–19. Yeung, S., & Griffiths, T. L. (2015). Identifying expectations about the strength of causal relationships. Cognitive Psychology, 76, 1–29.

Notes: (1.) Another way in which causal mechanisms can be relevant is in determining the ex­ pectations people have about the form that a causal relationship will take—how the cause is expected to influence the effect. I return to this kind of prior knowledge in the next sec­ tion. See also Johnson and Ahn (Chapter 8 in this volume) for a more detailed discussion of mechanisms.

Thomas L. Griffiths

Department of Psychology University of California, Berkeley Berkeley, California, USA

Page 20 of 20

Causal Mechanisms

Causal Mechanisms   Samuel G. B. Johnson and Woo-kyoung Ahn The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.12

Abstract and Keywords This chapter reviews empirical and theoretical results concerning knowledge of causal mechanisms—beliefs about how and why events are causally linked. First, it reviews the effects of mechanism knowledge, showing that mechanism knowledge can override other cues to causality (including covariation evidence and temporal cues) and structural con­ straints (the Markov condition), and that mechanisms play a key role in various forms of inductive inference. Second, it examines several theories of how mechanisms are mental­ ly represented—as associations, forces or powers, icons, abstract placeholders, networks, or schemas—and the empirical evidence bearing on each theory. Finally, it describes ways that people acquire mechanism knowledge, discussing the contributions from statistical induction, testimony, reasoning, and perception. For each of these topics, it highlights key open questions for future research. Keywords: causal mechanisms, causal learning, Markov condition, induction, perception

Introduction Our causal knowledge not only includes beliefs about which events are caused by other events, but also an understanding of how and why those events are related. For instance, when a soprano hits an extremely high note, the sound can break a wine glass due to the high frequency of the sound waves. Although people may not know the detailed mecha­ nisms underlying this relationship (Rozenblit & Keil, 2002), people believe that some mechanism transmits a force from the cause to the effect (White, 1989). Likewise, people believe in causal mechanisms underlying interpersonal relations (see Hilton, Chapter 32 in this volume). When Romeo calls to the balcony, Juliet comes, and she does so because of her love. When Claudius murders the king, Hamlet seeks revenge, because Hamlet is filled with rage. We use mechanisms to reason about topics as grand as science (Koslows­ ki, 1996) and morality (Cushman, 2008; see Lagnado & Gerstenberg, Chapter 29 in this volume); and domains as diverse as collision events (Gerstenberg & Tenenbaum, Chapter 27 in this volume; White, Chapter 14 in this volume) and psychopathology (Ahn, Kim, & Page 1 of 35

Causal Mechanisms Lebowitz, Chapter 30 in this volume). Causal mechanisms pervade our cognition through and through. Indeed, when a person tries to determine the cause of an event, understanding the under­ lying causal mechanism appears to be the primary concern. For instance, when attempt­ ing to identify the cause of “John had an accident on Route 7 yesterday,” participants in Ahn, Kalish, Medin, and Gelman (1995) usually asked questions aimed at testing possible mechanisms (e.g., “Was John drunk?” or “Was there a mechanical problem with the car?”) rather than which factor was responsible for the effect (e.g., “Was there something spe­ cial about John?” or “Did other people also have a traffic accident last night?”). In this chapter, we describe the state of current research on mechanism knowledge. After defining (p. 128) terms, we review the effects of mechanism knowledge. We summarize studies showing (1) that mechanism knowledge can override other important cues to causality, and (2) that mechanism knowledge is critical for inductive inference. Next, we examine how mechanisms might be mentally represented, and summarize the empirical evidence bearing on each of several approaches. We then turn to how mechanisms are learned, parsing the contributions from statistical induction, testimony, reasoning, and perception. For each of these broad topics, we discuss potential avenues of future re­ search.

What Is a Causal Mechanism? A causal mechanism is generally defined as a (1) system of physical parts or abstract vari­ ables that (2) causally interact in systematically predictable ways so that their operation can be generalized to new situations (e.g., Glennan, 1996; Machamer, Darden, & Craver, 2000). We use the term mechanism knowledge to refer to a mental representation of such a system. Mechanism knowledge is critical in cognition because we use it to understand other causal relations (Ahn & Kalish, 2000). Thus, we are motivated to seek out the mecha­ nisms that underlie a causal relationship. The mechanism underlying the relation “X caused Y” (e.g., a soprano’s singing caused a wine glass to break) will involve constructs other than X and Y (e.g., high frequency of the voice), but which can connect those events together. For this reason, mechanisms have a close relationship to explanations (Lombro­ zo, 2010; Lombrozo & Vasilyeva, Chapter 22 in this volume). For instance, the causal rela­ tion “Mary was talking on her cell phone and crashed into a truck” can be explained through its underlying mechanism, “Mary was distracted and didn’t see the red light.” However, because causal knowledge is organized hierarchically (Johnson & Keil, 2014; Si­ mon, 1996), this entire causal system could be embedded into a larger system such that more specific events might act as mechanisms underlying more general events. That is, “Mary was talking on her cell phone and crashed into a truck” might be a mechanism un­ derlying “Mary’s driving caused a traffic accident,” which in turn might be a mechanism underlying “Mary caused delays on I-95,” and so on. Thus, mechanism knowledge is not merely a belief about what caused some event, but a belief about how or why that event Page 2 of 35

Causal Mechanisms was brought about by its cause, which can itself be explained in terms of another underly­ ing mechanism, ad infinitum. Although we adopt this understanding of mechanism as a working definition, other factors, such as the organization of memory, appear to play a role in how mechanism knowledge is used and in what counts as a mechanism (Johnson & Ahn, 2015). We discuss some of these factors later in this chapter (see the section “Repre­ senting Causal Mechanisms”). The term “mechanism” has also been used in several other ways in the literature, which are somewhat different from our use. First, the term “mechanistic explanation” is used to refer to backward-looking explanations (e.g., the knife is sharp because Mark filed it), as opposed to forward-looking, teleological explanations (the knife is sharp because it is for cutting; Lombrozo, 2010). However, this distinction does not map onto our sense of mech­ anism, because teleological explanations can often be recast in mechanistic terms, in terms of causally interacting variables (e.g., the knife is sharp because human agents wanted to fashion a sharp object, and forging a sharp piece of metal was the best way to accomplish this goal; Lombrozo & Carey, 2006). Second, some have argued that our knowledge of mechanisms underlying two causally re­ lated events, say A and B, includes not only the belief that there is a system of causally re­ lated variables mediating the relationship between A and B (a “mechanism,” as defined in the current chapter), but also an assumption that a force or causal power is transmitted from A to B (Ahn & Kalish, 2000; White, 1989). This is an independent issue because knowledge about a system of causally interconnected parts does not have to involve the notion of causal power or force. In fact, many of studies reviewed in this chapter demon­ strating effects of mechanism knowledge did not test whether the assumptions of causal force are required to obtain such effects. In this chapter, we separate these two issues when defining mechanism knowledge. Thus, our discussion of the effects of mechanism knowledge does not take a position on the debate concerning causal force, and our dis­ cussions of how people represent and learn mechanisms do not beg the question against statistical theories.

Using Mechanism Knowledge A major purpose of high-level cognition is inductive inference—predicting the unknown from the known. Here, we argue that mechanism knowledge plays a critical role in people’s inductive capacities. We describe studies on how mechanism knowledge is used in a variety of inductive tasks, including causal inference, category formation, categorybased induction, and probability judgment. (p. 129)

Mechanisms and Causal Inference

David Hume (1748/1977) identified two cues as critical to identifying causal relationships —covariation (the cause and effect occurring on the same occasions more often than would be expected by chance) and temporal contiguity (the cause and effect occurring close together in time). Both of these factors have received considerable empirical atten­ Page 3 of 35

Causal Mechanisms tion in recent years, and it has become increasingly clear that neither of these cues acts alone, but rather in conjunction with prior knowledge of causal mechanisms. In this sec­ tion, we first describe how mechanism knowledge influences the interpretation of covari­ ation information. We then describe how mechanism knowledge can result in violations of the causal Markov condition, a key assumption to modern Bayesian approaches to causal inference. Finally, we review evidence that even the seemingly straightforward cue of temporal contiguity is influenced in a top-down manner by mechanism knowledge.

Covariation Scientists must test their hypotheses using statistical inference. To know whether a med­ ical treatment really works, or a genetic mutation really has a certain effect, or a psycho­ logical principle really applies, one must test whether the cause and effect are statistical­ ly associated. This observation leads to the plausible conjecture that laypeople’s everyday causal reasoning also depends on an ability to test for covariation between cause and ef­ fect. But consider the following (real) research finding from medical science (Focht, Spicer, & Fairchok, 2002): placing duct tape over a wart made it disappear in 85% of the cases (compared to 60% of cases receiving more traditional cryotherapy). Despite the study’s experimental manipulation and statistically significant effect, people may still be doubtful that duct tape can remove warts because they cannot think of a plausible mechanism un­ derlying the causal relationship. In fact, the researchers supplied a mechanism: the duct tape irritates the skin, which in turn stimulates an immune system response, which in turn wipes out the viral infection that had caused the wart in the first place. Given this mechanism information, people would be far likelier to believe this causal link. Thus, even statistically compelling covariation obtained through experimental manipulation may not be taken as evidence for a causal link in the absence of a plausible underlying mecha­ nism. However, in this example, it could be that the mechanism is supplying “covert” covaria­ tion information—for example, the mechanism implies covariation between duct tape and irritation, irritation and immune response, and immune response and wart recovery, and could have thereby conveyed stronger covariation between duct tape and wart recovery. In that case, one might argue that there is nothing special about mechanism information other than conveying covariation. To empirically demonstrate that mechanism informa­ tion bolsters causal inferences above and beyond the covariation implied by the mecha­ nism, Ahn et al. (1995, Experiment 4) asked a group of participants to rate the strength of the covariation implied by sentences like “John does not know how to drive” for “John had a traffic accident.” They then asked a new group of participants to make causal attribu­ tions for the effect (e.g., the accident), given either the mechanism (e.g., John does not know how to drive) or its equivalent covariation (e.g., John is much more likely to have a traffic accident than other people are), as rated by the first group of participants. Partici­ pants were much more inclined to attribute the accident to the target cause when given

Page 4 of 35

Causal Mechanisms the underlying mechanism, showing that mechanism information has an effect that goes beyond covariation. More generally, the interpretation of covariation data is strongly influenced by mecha­ nism knowledge. For example, learning about a covariation between a cause and effect has a stronger effect on the judged probability of a causal relationship when there is a plausible mechanism underlying the cause and effect (e.g., severed brake lines and a car accident) than when there is not (e.g., a flat tire and a car failing to start; Fugelsang & Thompson, 2000). Similarly, both scientists and laypeople are more likely to discount data inconsistent with an existing causal theory, relative to data consistent with the theory (Fugelsang, Stein, Green, & Dunbar, 2004). Finally, people are more likely to condition on a potential alternative cause when interpreting trial-by-trial contingency data, if they are told about the mechanism by which the alternative cause operates (Spellman, Price, & Logan, 2001). These effects show that not only does mechanism information do some­ thing beyond covariation, but that it even constrains the way that covariation is used.

Structural Constraints Patterns of covariation between variables can be combined into larger patterns of causal dependency, represented as Bayesian networks (Pearl, 2000; Rottman & Hastie, 2014; Rottman, Chapter 6 in this volume). For example, if a (p. 130) covariation is known to exist between smoking cigarettes (A) and impairment of lung function (B), and another is known to exist between smoking cigarettes (A) and financial burden (C), this can be rep­ resented as a causal network with an arrow from A to B and an arrow from A to C (a com­ mon cause structure). But of course, all of these events also have causes and conse­ quences—social pressure causes cigarette smoking, impairment of lung function causes less frequent exercise, financial burden causes marital stress, and so on, ad infinitum. If we had to take into account all of these variables to make predictions about any of them (say, B), then we would never be able to use causal knowledge to do anything. The world is replete with too much information for cognition without constraints. The key computational constraint posited by Bayesian network theories of causation is the causal Markov condition (also known as “screening off”; Pearl, 2000; Spirtes, Gly­ mour, & Scheines, 1993). This assumption allows the reasoner to ignore the vast majority of potential variables—to assume that the probability distribution of a given variable is in­ dependent of all other variables except its direct effects, conditional on its causes. For ex­ ample, the Markov condition tells us, given the causal structure described previously for smoking, that if we know that Lisa smokes (A), knowing about her lung function (B) doesn’t tell us anything about her potential financial burden (C), and vice versa. Because the Markov condition is what allows reasoners to ignore irrelevant variables (here, we can predict B without knowing about C or any of the causes of A), it is crucial for infer­ ence on Bayesian networks. Alas, people often violate the Markov condition. Although there appear to be a number of factors at play in these violations, including essentialist (Rehder & Burnett, 2005) and as­ sociationist (Rehder, 2014) thinking, one critical factor is mechanism knowledge (Park & Page 5 of 35

Causal Mechanisms Sloman, 2013, 2014). In common cause structures such as the preceding smoking exam­ ple (smoking leading to lung impairment and financial burden) where each causal link re­ lies on a different mechanism, people do tend to obey the Markov condition. That is, when asked to judge the probability of lung impairment given that a person smokes, this judg­ ment is the same as when asked to judge the probability of lung impairment given that a person smokes and has a financial burden. But when the links rely on the same mechanism (e.g., smoking leading to lung impairment and to blood vessel damage), peo­ ple robustly violate the Markov condition. When asked to judge the probability of lung im­ pairment given that a person smokes, this judgment is lower than when asked to judge the probability of lung impairment given that a person smokes and has blood vessel dam­ age. This effect is thought to occur because participants use mechanism information to elabo­ rate on the causal structure, interpolating the underlying mechanism into the causal graph (Park & Sloman, 2013). So, when the link between A and B depends on a different mechanism than the link between A and C, the resulting structure would involve two branches emanating from A, namely A→M1→B and A→M2→C. In Lisa’s case, cellular dam­ age might be the mechanism mediating smoking and lung impairment, but cigarette ex­ penditures would be the mechanism mediating smoking and financial burden. Thus, knowing about C (Lisa’s financial burden) triggers an inference about M2 (cigarette ex­ penditures), but this knowledge has no effect on B (lung impairment) given that A (smoking) is known—the Markov condition is respected. But when the link between A and B depends on the same mechanism as the link between A and C, the resulting structure would be a link from A to M1, and then from M1 to B and to C—so, in effect, the mecha­ nism M1 is the common cause, rather than A. That is, cellular damage might be the mech­ anism mediating the relationship between smoking and lung impairment and the relation­ ship between smoking and blood vessel damage. Thus, knowing about C (blood vessel damage) triggers an inference about M1 (cellular damage), and this knowledge has an ef­ fect on B (lung impairment) even if A (smoking) is known. Mechanism knowledge there­ fore not only affects the interpretation of covariation information, but also the very com­ putational principles used to make inferences over systems of variables.

Temporal Cues According to the principle of temporal contiguity, two events are more likely to be causal­ ly connected if they occur close together in time. This idea has considerable empirical support (e.g., Lagnado & Sloman, 2006; Michotte, 1946/1963), and at least in some con­ texts, temporal contiguity appears to be used more readily than covariation in learning causal relations (Rottman & Keil, 2012; White, 2006). The use of temporal contiguity was long taken as a triumph for associationist theories of causal inference (Shanks, Pearson, & Dickinson, 1989), because longer temporal delays are associated (p. 131) with weaker associations in associationist learning models. Yet, people’s use of temporal cues appears to be more nuanced. People are able to asso­ ciate causes and effects that are very distant in time (Einhorn & Hogarth, 1986). For ex­ ample, a long temporal gap intervenes between sex and birth, between smoking and can­ Page 6 of 35

Causal Mechanisms cer, between work and paycheck, and between murder and prison. Why is it that the long temporal gaps between these events do not prevent us from noticing these causal links? A series of papers by Buehner and colleagues documented top-down influences of causal knowledge on the use temporal contiguity (see Buehner, Chapter 28 in this volume). When participants expect a delay between cause and effect, longer delays have a marked­ ly smaller deleterious effect on causal inference (Buehner & May, 2002, 2003), suggesting some knowledge mediation. In fact, when temporal delay is de-confounded with contin­ gency, the effect of temporal delay can be eliminated altogether by instructions that in­ duce the expectation of delay (Buehner & May, 2004). Most dramatically, some experi­ ments used unseen physical causal mechanisms, which participants would believe to take a relatively short time to operate (a ball rolling down a steep ramp, hidden from view) or a long time to operate (a ball rolling down a shallow ramp). Under such circumstances, causal judgments were facilitated by longer delays between cause and effect, when the mechanism was one that would take a relatively long time to operate (Buehner & McGre­ gor, 2006). Although older (9- to 10-year-old) children can integrate such mechanism cues with temporal information, younger (4- to 8-year-old) children continued to be swayed by temporal contiguity, suggesting that the relative priority of causal cues undergoes devel­ opment (Schlottmann, 1999). Thus, when people can apply a mechanism to a putative causal relationship, they adjust their expectations about temporal delay so as to fit their knowledge of that mechanism.

Mechanisms and Induction The raison d’être for high-level cognition in general, and for causal inference in particu­ lar, is to infer the unknown from the known—to make predictions that will usefully serve the organism through inductive inference (Murphy, 2002; Rehder, Chapters 20 and 21 in this volume). In this section, we give several examples of ways that mechanism knowl­ edge is critical to inductive inference. Categories are a prototypical cognitive structure that exists to support inductive infer­ ence. We group together entities with similar known properties, because those entities are likely to also share similar unknown properties (Murphy, 2002). Mechanism knowl­ edge influences which categories we use. In a study by Hagmayer, Meder, von Sydow, and Waldmann (2011), participants learned the contingency between molecules and cell death. Molecules varied in size (large or small) and color (white or gray). While large white (11) molecules always led to cell death and small gray (00) molecules never did, small white (01) and large gray (10) ones led to cell death 50% of the time. That is, 01 and 10 were equally predictive of cell death. However, prior to this contingency learning, some participants learned that molecule color was caused by a genetic mutation. Partici­ pants used this prior causal history to categorize small white molecules (01) with large white (11) molecules, which always resulted in cell death. Consequently, these partici­ pants judged that small white molecules (01) were much more likely to result in cell death than large gray molecules (10), even though they observed both probabilities to be

Page 7 of 35

Causal Mechanisms 50%. The opposite pattern was obtained when participants learned that genetic mutation caused molecules to be large. Critically, this effect of prior categorization on subsequent causal learning depended on the type of underlying mechanism. Note that most people would agree that genetic muta­ tions affect deeper features of molecules, which not only modify surface features such as color of molecules, but also can affect the likelihood of cell death. Thus, the initial catego­ ry learning based on the cover story involving genetic mutations provided a mechanism, which could affect later causal judgments involving cell death. In a subsequent experi­ ment, however, the cover story used for category learning provided an incoherent mecha­ nism. Participants learned that the variations in color (or size) were due to atmospheric pressure, which would be viewed as affecting only the surface features. Despite identical learning situations, participants provided with mechanism information that were relevant only to surface features did not distinguish between 10 and 01 in their causal judgments; their judgments stayed close to 50%. Thus, Hagmayer et al. (2011) showed that prior learning of categorization affects subsequent causal judgments only when the categoriza­ tion involves mechanisms that would be relevant to the content of the causal judgments (see also Waldmann & Hagmayer, 2006, for related results). More generally, people are likely to induce and use categories that are coherent (Murphy & Allopenna, 1994; Rehder & Hastie, 2004; Rehder & Ross, 2001). A category is coherent to the extent that its features “go together,” given the reasoner’s prior causal theories (Murphy & Medin, 1985). For example, “lives in water, eats fish, has many off­ spring, and is small” is a coherent category, because one can think of a causal mechanism that unifies these features, supplying the necessary mechanism knowledge; in contrast, “lives in water, eats wheat, has a flat end, and is used for stabbing bugs” is an incoherent category because it is difficult to supply mechanisms that could unify these features in a single causal theory (Murphy & Wisniewski, 1989). Categories based on a coherent mech­ anism are easier to learn (Murphy, 2002), are more likely to support the extension of properties to new members (Rehder & Hastie, 2004), and require fewer members pos­ sessing a given property to do so (Patalano & Ross, 2007). (p. 132)

Mechanism knowledge also influences category-based induction, or the likelihood of ex­ tending features from one category to another (see Heit, 2000, for a review). If the mech­ anism explaining why the premise category has a property is the same as the mechanism explaining why the conclusion category might have the property, then participants tend to rate the conclusion category as very likely having that property (Sloman, 1994). For ex­ ample, participants found the following argument highly convincing: Hyundais have tariffs applied to them; therefore, Porsches have tariffs applied to them. That is, the reason that Hyundais have tariffs applied to them is because they are foreign cars, which would also explain why Porsches have tariffs applied to them. So, the premise

Page 8 of 35

Causal Mechanisms in this case strongly supports the conclusion. In contrast, one may discount the likelihood of a conclusion when the premise and conclusion rely on different mechanisms, such as: Hyundais are usually purchased by people 25 years old and younger; therefore, Porsches are usually purchased by people 25 years old and younger. In this case, the reason that Hyundais are purchased by young people (that Hyundais are inexpensive and young people do not have good credit) does not apply to Porsches (which might be purchased by young people because young people like fast cars). Because the premise introduces an alternative explanation for the property, people tend to rate the probability of the conclusion about Porsches lower when the premise about Hyundais is given, compared to when it is not given—an instance of the discounting or explainingaway effect (Kelley, 1973). These results show that mechanism knowledge can moderate the likelihood of accepting an explanation in the presence of another explanation. Ahn and Bailenson (1996) further examined the role of mechanism knowledge in the dis­ counting and conjunction effects. In the discounting effect (Kelley, 1973), people rate the probability P(B) of one explanation higher than its conditional probability given another competing explanation, P(B|A). In the conjunction effect (Tversky & Kahneman, 1983), people rate the probability of a conjunctive explanation, P(A&B), higher than its individ­ ual constituents such as P(A). The two effects may appear contradictory because the dis­ counting effect seems to imply that one explanation is better than two, whereas the con­ junction effect seems to imply that two explanations are better than one. Yet, Ahn and Bailenson (1996) showed that both phenomena turn on mechanism-based reasoning, and can occur simultaneously with identical events. For example, consider the task of explain­ ing why Kim had a traffic accident. Further suppose that a reasoner learns that Kim is nearsighted. Given this explanation, a reasoner can imagine Kim having a traffic accident due to her nearsightedness. Note that to accept this explanation, one has to imagine that Kim’s nearsightedness is severe enough to cause a traffic accident even under normal cir­ cumstances. Once such a mechanism is established, another explanation, “there was a se­ vere storm,” would be seen as less likely because Kim’s nearsightedness is already a suffi­ cient cause for a traffic accident. Thus, the second cause would be discounted. However, consider a different situation where both explanations are presented as being tentative and are to be evaluated simultaneously. Thus, one is to judge the likelihood that Kim had a traffic accident because she is nearsighted and there was a severe storm. In this case, a reasoner can portray a slightly different, yet coherent mechanism where Kim’s (some­ what) poor vision, coupled with poor visibility caused by a storm, would have led to a traf­ fic accident. Due to this coherent mechanism, the reasoner would be willing to accept the conjunctive explanation as highly likely—even as more likely than either of its conjuncts individually. That is, the discounting effect occurs because a (p. 133) reasoner settles in on a mechanism that excludes a second explanation, whereas the conjunction effect occurs because a reasoner can construct a coherent mechanism that can incorporate both expla­ nations.

Page 9 of 35

Causal Mechanisms In addition to demonstrating simultaneous conjunction and discounting effects, Ahn and Bailenson (1996) further showed that these effects do not occur when explanations are purely covariation-based—that is, when the explanations indicate positive covariation be­ tween a potential cause and effect without suggesting any underlying mechanism mediat­ ing their relationship. For instance, the explanations “Kim is more likely to have traffic accidents than other people are” and “traffic accidents were more likely to occur last night than on other nights” resulted in neither conjunction nor discounting effects. This pattern of results indicates that both discounting and conjunction effects are species of mechanism-based reasoning.

Open Questions These studies demonstrate a variety of ways that mechanism knowledge pervades our in­ ductive capacities, but mechanism knowledge could affect induction in yet other ways. Beyond covariation, structural constraints, and temporal cues, might other cues to causality be affected by the nature of the underlying mechanisms? For instance, might the results of interventions be interpreted differently given different mechanisms? Might mechanism knowledge modulate the relative importance of these various cues to causali­ ty? There are also open questions about how mechanisms are used in induction. Given the tight link between mechanisms and explanation, what role might mechanisms play in in­ ference to the best explanation, or abductive inference (Lipton, 2004; Lombrozo, 2012)? To what extent do different sorts of inductive problems (Kemp & Jern, 2013) lend them­ selves more to mechanism-based versus probability-based causal reasoning (see also Lombrozo, 2010)? Are there individual differences in the use of mechanisms? For in­ stance, given that mechanisms underlie surface events, could people who are more intol­ erant of ambiguity or more in need of cognitive closure be more motivated to seek them out? Could people who are high in creativity be more capable of generating them, and more affected by them as a result? Finally, although we could in principle keep on asking “why” questions perpetually, we eventually settle for a given level of detail as adequate. What determines this optimal level of mechanistic explanation?

Representing Causal Mechanisms In the previous section, we described several of the cognitive processes that use mecha­ nism knowledge. Here, we ask how mechanism knowledge is mentally represented (Mark­ man, 1999). That is, what information do we store about mechanisms, and how do differ­ ent mechanisms relate to one another in memory? We consider six possible representa­ tional formats—associations, forces or powers, icons, abstract placeholders, networks, and schemas.

Page 10 of 35

Causal Mechanisms

Associations According to associationist theories of causality, learning about causal relationships is equivalent to learning associations between causes and effects, using domain-general learning mechanisms that are evolutionarily ancient and used in other areas of causation (Shanks, 1987; Le Pelley, Griffiths, & Beesley, Chapter 2 in this volume). Thus, causal rela­ tions (including mechanism knowledge) would be represented as an association between two classes of events, akin to the stored result of a statistical significance test, so that one event would lead to the expectation of the other. This view is theoretically economi­ cal, in that associative learning is well established and well understood in other domains and in animal models. Further, associative learning can explain many effects in trial-bytrial causal learning experiments, including effects of contingency (Shanks, 1987) and de­ lay (Shanks, Pearson, & Dickinson, 1989). However, hard times have fallen on purely associative theories of causation. Because these theories generally do not distinguish between the role of cause and effect, they have difficulty accounting for asymmetries in predictive and diagnostic causal learning (Waldmann, 2000; Waldmann & Holyoak, 1992). Further, these theories predict a monoto­ nic decline in associative strength with a delay between cause and effect, yet this decline can be eliminated or even reversed with appropriate mechanism knowledge (Buehner & May, 2004; Buehner & McGregor, 2006). Although associative processes are likely to play some role in causal reasoning and learning (e.g., Rehder, 2014), causal learning appears to go beyond mere association. There are also problems with associations as representations of mechanism knowledge. One straightforward way of representing mechanism knowledge using associations is to represent causal relations among sub-parts or intermediate steps between cause and ef­ fect using associations. Thus, association between cause and effect would consist (p. 134) of associations between the cause and first intermediate step, the first intermediate step and second intermediate step, and so on, while the overall association between cause and effect remains the same. This approach to mechanisms may be able to account for some effects of mechanism knowledge described earlier. For example, to account for why peo­ ple believe more strongly in a causal link given a plausible mechanism for observed co­ variation (Fugelsang & Thompson, 2000), an advocate of associationism can argue that the mechanism conveys additional associative strength. However, other effects of mechanism knowledge described earlier seem more challenging to the associationist approach. Ahn et al. (1995; Experiment 4) equated the covariation or association conveyed by the mechanism statements and the covariation statements, but participants nonetheless gave stronger causal attributions given the mechanism state­ ments than covariation statements. Likewise, it is unclear on the associationist approach why conjunction and discounting effects are not obtained given purely covariational state­ ments (Ahn & Bailenson, 1996), or why mechanism knowledge influences which cate­ gories we induce, given identical learning data (Hagmayer et al., 2011).

Page 11 of 35

Causal Mechanisms

Forces and Powers The associationist view contrasts most strongly with accounts of causal mechanisms in terms of forces (Talmy, 1988; Wolff, 2007) or powers (Harré & Madden, 1975; White, 1988, 1989). The intuition behind these approaches is that causal relations correspond to the operation of physical laws, acting on physical objects (Aristotle, 1970; Harré & Mad­ den, 1975) or through physical processes (Dowe, 2000; Salmon, 1984; see also Danks, Chapter 12 in this volume). For example, Dowe (2000) argued that causal relations occur when a conserved quantity, such as energy, is transferred from one entity to another. This idea is broadly consistent with demonstrations that people often identify visual collision events as causal or non-causal in ways concordant with the principles of Newtonian me­ chanics, such as conservation of momentum (Michotte, 1946/1963). Indeed, even young children seem to be sensitive to physical factors such as transmission in their causal rea­ soning (Bullock, Gelman, & Baillargeon, 1982; Shultz, Fisher, Pratt, & Rulf, 1986). The force dynamics theory (Talmy, 1988; Wolff, 2007; Wolff & Thorstad, Chapter 9 in this volume) fleshes out these intuitions by representing causal relations as combinations of physical forces, modeled as vectors. On this theory, the causal affector (the entity causing the event) and patient (the entity operated on by the agent) are both associated with force vectors, indicating the direction of the physical or metaphorical forces in operation. For example, in a causal interaction between a fan and a toy boat, the fan would be the affec­ tor and the toy boat would be the patient, and both entities would have a vector indicat­ ing the direction of their motion. These forces, as well as any other forces in the environ­ ment, would combine to yield a resultant vector (e.g., the boat hits an orange buoy). On Wolff’s (2007) theory, the affector causes a particular end state to occur if (a) the patient initially does not have a tendency toward that endstate, but (b) the affector changes the patient’s tendency, and (c) the end state is achieved. For instance, the fan caused the boat to hit the buoy because (a) the boat was not initially headed in that direction, but (b) the fan changed the boat’s course, so that (c) the boat hit the buoy. This sort of force analysis has been applied to several phenomena in causal reasoning, including semantic distinc­ tions among causal vocabulary (cause, enable, prevent, despite; Wolff, 2007); the chain­ ing of causal relations (e.g., A preventing B and B causing C; Barbey & Wolff, 2007); cau­ sation by omission (Wolff, Barbey, & Hausknecht, 2010); and direct versus indirect causa­ tion (Wolff, 2003). A related physicalist approach is the causal powers theory (Harré & Madden, 1975; White, 1988, 1989). On this view, people conceptualize particulars (objects or persons) as having dispositional causal properties, which operate under the appropriate releasing conditions. These properties can be either causal powers (capacities to bring about ef­ fects) or liabilities (capacities to undergo effects). For example, a hammer might strike a glass watch face, causing it to break (Einhorn & Hogarth, 1986). In this case, the ham­ mer has a power to bring about breaking, and the glass has the liability to be broken. (See White, 2009b, for a review of many studies consistent with the notion that causal re­ lations involve transmission of properties among entities.) People then make causal pre­

Page 12 of 35

Causal Mechanisms dictions and inferences based on their knowledge of the causal powers and liabilities of familiar entities. These physicalist theories capture a variety of intuitions and empirical results concerning causal thinking (see Waldmann & Mayrhofer, 2016), and any complete theory of causal mechanisms is responsible for accounting for these phenomena. However, these theories are compatible with many (p. 135) different underlying representations. In the case of force dynamics, the vector representations are highly abstract and apply to any causal sit­ uation. That is, this theory does not posit representations for specific mechanisms in se­ mantic memory, and therefore mechanism representations could take one of many for­ mats. In the case of causal powers theory, the reasoner must represent properties of par­ ticular objects, which in combination could lead to representations of specific mecha­ nisms. However, these property representations could potentially take several different representational formats, including icons and schemas (see later discussion). Thus, al­ though force and power theories certainly capture important aspects of causal reasoning, they do not provide a clear answer to the question of how mechanisms are mentally rep­ resented.

Icons A related possibility is that people represent causal mechanisms in an iconic or image-like format. For example, when using mechanism knowledge to think about how a physical de­ vice works, the reasoner might mentally simulate the operation of the machine using mental imagery. More generally, people might store mechanism knowledge in an iconic format isomorphic to the physical system (Barsalou, 1999)—a view that sits comfortably with the physicalist theories described earlier. (Goldvarg and Johnson-Laird, 2001, pro­ pose a different, broadly iconic view of causal thinking based on mental models; see also Johnson-Laird & Khemlani, Chapter 10 in this volume.) Forbus’s (1984) qualitative process theory is an artificial intelligence theory of this style of reasoning. Qualitative process theory is designed to solve problems such as whether a bathtub will overflow, given the rate of water flowing out the faucet, the rate of drainage, and the rate of evaporation. This theory is “qualitative” in the sense that it compares quantities and stores the direction of change, but does not reason about exact quantities. In this way, it is supposed to be similar to how humans solve these problems. However, even if qualitative process theory accurately characterizes human problem-solv­ ing processes, it is unclear whether these processes rely on mental representations that are propositional or image-like; after all, qualitative process theory itself is implemented in a computer programming language, using propositional representations. Several ex­ perimental results have been taken to support image-like representations (see Hegarty, 2004, for a review). First, when solving problems about physical causal systems (such as diagrams of pulleys or gears), participants who think aloud are likely to make gestures preceding their verbal descriptions, suggesting that spatial reasoning underlies their ver­ balizations (Schwartz & Black, 1996). Second, solving problems about physical causal Page 13 of 35

Causal Mechanisms systems (such as diagrams of pulleys or gears) appears to rely on visual ability but not verbal ability. Performance on such problems is predicted by individual differences in spa­ tial ability but not in verbal ability (Hegarty & Sims, 1994), and dual-task studies reveal interference between mechanical reasoning and maintenance of a visual working memory load, but not a verbal working memory load (Sims & Hegarty, 1997). It is an open question whether people run image-like mental simulations even when rea­ soning about causal processes that are less akin to physical systems, but some indirect support exists. For instance, asymmetries in cause-to-effect versus effect-to-cause reason­ ing suggest that people may use simulations. Tversky and Kahneman (1981) showed that people rate the conditional probability of a daughter having blue eyes given that her mother has blue eyes to be higher than the conditional probability of a mother having blue eyes given that the daughter has blue eyes. If the base rates of mothers and daugh­ ters having blue eyes are equal, these probabilities should be the same, but people ap­ pear to err because they make higher judgments when probability “flows” with the direc­ tion of causality (for similar findings, see Fernbach, Darlow, & Sloman, 2010, 2011; Medin, Coley, Storms, & Hayes, 2003; Pennington & Hastie, 1988). While these results do not necessitate image-like representations, they do speak in favor of simulation process­ es, as forward simulations appear to be more easily “run” than backward simulations, just as films with a conventional narrative structure are more readily understood than films like Memento in which the plot unfolds in reverse order. However, other arguments and evidence suggest that these results may be better under­ stood in terms of non-iconic representations. First, a number of researchers have argued that there are fundamental problems with iconic representations. Pylyshyn (1973) argues, for example, that if we store iconic representations and use them in the same way that we use visual perception, then we need a separate representational system to interpret those icons, just as we do for vision. Rips (1984) criticizes mental simulation more generally, pointing out that the (p. 136) sort of mental simulation posited by AI systems in all but the simplest cases is likely to be beyond the cognitive capacity of human reasoners. Reason­ ing about turning gears is one thing, but Kahneman and Tversky (1982) claim that people use mental simulation to assess the probabilities of enormously complex causal systems, such as geopolitical conflict. Clearly, the number and variety of causal mechanisms at play for such simulations is beyond the ken of even the most sophisticated computer algo­ rithms, much less human agents. In Rips’s view, rule-based mechanisms are far more plausible candidates for physical causal reasoning. According to both Pylyshyn and Rips, then, the phenomenology of mental simulation may be epiphenomenal. There is also empirical evidence at odds with iconic representations of mechanisms. For example, Hegarty (1992) gave participants diagrams of systems of pulleys, and asked them questions such as “If the rope is pulled, will pulley B turn clockwise or counter­ clockwise?” Response times were related to the number of components between the cause (here, the rope) and effect (pulley B). While this result is broadly consistent with the idea of mental simulation, it suggests that people simulate the system piecemeal rather than simultaneously (as one might expect for a mental image or “movie”). More Page 14 of 35

Causal Mechanisms problematically, participants seem to be self-inconsistent when all parts are considered. In a study by Rips and Gentner (reported in Rips, 1984), participants were told about a closed room containing a pan of water. They were asked about the relations between dif­ ferent physical variables (such as air temperature, evaporation rate, and air pressure)— precisely the sort of inferences that mental simulations (such as those proposed by quali­ tative process theory) are supposed to be used for. The researchers found that people not only answered these questions inconsistently with the laws of physics, but even made in­ transitive inferences. That is, participants very frequently claimed that a variable X causes a change in variable Y, which in turn causes a change in variable Z, but that X does not cause a change in Z—an intransitive inference. Such responses should not be possible if people are qualitatively simulating the physical mechanisms at work: even if their mechanism knowledge diverges from the laws of physics, it should at least be inter­ nally consistent. (Johnson and Ahn, 2015, review several cases where causal intransitivity can be normative, but none of these cases appears to be relevant to the stimuli used in the Rips and Gentner study). These results are more consistent with a schema view of mechanism knowledge (see later in this chapter). In sum, while studies of physical causal reasoning provide further evidence that causal thinking and mechanism knowledge in particular are used widely across tasks, they do not seem to legislate strongly in favor of iconic representations of mechanism knowledge. These results do, however, provide constraints on what representations could be used for mechanism-based reasoning.

Placeholders A fourth representational candidate is a placeholder or reference pointer. On this view, people do not have elaborate knowledge about causal mechanisms underlying causal rela­ tions, but instead have a placeholder for a causal mechanism. That is, people would be­ lieve that every causal relation has an (unknown) causal mechanism, yet in most cases would not explicitly represent the content. (See Keil, 1989; Kripke, 1980; Medin & Ortony, 1989; Putnam, 1975 for the original ideas involving conceptual representations; and see Pearl, 2000 for a related, formal view.) The strongest evidence for this position comes from metacognitive illusions, in which peo­ ple consistently overestimate their knowledge about causal systems (Rozenblit & Keil, 2002). In a demonstration of the illusion of explanatory depth (IOED), participants were asked to rate their mechanistic knowledge of how a complex but familiar artifact operates (such as a flush toilet). Participants were then instructed to explain in detail how that ar­ tifact operates. When asked to re-rate their mechanistic knowledge afterward, ratings were sharply lower, indicating that the act of explaining brought into awareness the illu­ sory nature of their mechanistic knowledge. Thus, people’s representations of causal mechanisms appear to differ from their meta-representation—people’s representations of mechanisms are highly skeletal and impoverished, yet their meta-representations point to much fuller knowledge.

Page 15 of 35

Causal Mechanisms Further, this illusion goes beyond general overconfidence. Although similar effects can be found in other complex causal domains (e.g., natural phenomena such as how tides oc­ cur), people’s knowledge is comparatively well calibrated in non-causal domains, such as facts (e.g., the capital of England), procedures (e.g., how to bake chocolate chip cookies from scratch), and narratives (e.g., the plot of Good Will Hunting), although some (more modest) overconfidence can be found in (p. 137) these other domains as well (Fischhoff, Slovic, & Lichtenstein, 1977). Together, these results suggest that, at least in some cases, people do not store detailed representations of mechanisms in their heads, but rather some skeletal details, together with a meta-representational placeholder or “pointer” to some unknown mechanism as­ sumed to exist in the world. These impoverished representations, together with the ro­ bust illusions of their richness, are another reason to be suspicious of iconic representa­ tions of mechanism knowledge (see earlier subsection, “Icons”). To the extent that this is a plausible representational format because it feels introspectively right, we should be suspicious that this intuition may be a metacognitive illusion. However, in addition to these meta-representational pointers or placeholders, people clearly do have some skeletal representations of mechanisms. Many of the effects de­ scribed in earlier sections depend on people having some understanding of the content of the underlying mechanisms (e.g., Ahn & Bailenson, 1996; Ahn et al., 1995; Fugelsang & Thompson, 2000). And although people’s mechanistic knowledge might be embarrassing­ ly shallow for scientific phenomena and mechanical devices, it seems to be more com­ plete for mundane phenomena. For instance, people often drink water after they exercise. Why? Because they become thirsty. Although the physiological details may elude most people, people surely understand this mechanism at a basic, skeletal level. If not as asso­ ciations, causal powers, or icons, what format do these representations take? Next, we consider two possibilities for these skeletal representations—causal networks and schemas.

Networks The idea that causal mechanisms might be represented as networks has recently received much attention (e.g., Glymour & Cheng, 1998; Griffiths & Tenenbaum, 2009; Pearl, 2000). According to this view, causal relationships are represented as links between variables in a directed graph, encoding the probabilistic relationships among the variables and the counterfactuals entailed by potential interventions (see Rottman, Chapter 6 in this volume for more details). For example, people know that exercising (X) causes a person to be­ come thirsty (Y), which in turn causes a person to drink water (Z). The causal arrows ex­ pressed in the graph encode facts such as: (1) exercising raises the probability that a per­ son becomes thirsty (a probabilistic dependency); and (2) intervening to make a person exercise (or not exercise) will change the probability of thirst (a counterfactual dependen­ cy). The relationship between thirst (Y) and drinking water (Z) can be analyzed in a simi­ lar way. These two relationships can lead a reasoner to infer, transitively, a positive co­ variation between exercise (X) and drinking water (Z), and a counterfactual dependence Page 16 of 35

Causal Mechanisms between interventions on exercise and the probability of drinking water (but see the fol­ lowing subsection, “Schemas,” for several normative reasons why causal chains can be in­ transitive). Similarly, the effects of drinking water will also have probabilistic and coun­ terfactual relationships to exercise, as will the alternative causes of drinking water, and so on. These networks are used in artificial intelligence systems because they are eco­ nomical and efficient ways of storing and reasoning about causal relationships (Pearl, 1988, 2000; Spirtes, Glymour, & Scheines, 1993). If causal knowledge is represented in causal networks, then they could be reducible to the probabilistic dependencies and counterfactual entailments implied by the network. One proponent of this view is Pearl (1988), who argued that our knowledge is fundamen­ tally about probabilities, and that causal relationships are merely shorthand for proba­ bilistic relationships (though Pearl, 2000, argues for a different view; see “Open Ques­ tions” later in this section). If causal relations are merely abbreviations of probabilistic re­ lationships, we can define a mechanism for the causal relationship X→Z as a variable Y which, when conditioned on, makes the correlation between X and Z go to zero (Glymour & Cheng, 1998) so that the Markov condition is satisfied. That is, Y is a mechanism for X→Z if P(Z|X) > P(Z|~X), but P(Z|X,Y) = P(Z|~X,Y). The intuition here is the same as in mediation analysis in statistics—a variable Y is a full mechanism or mediator if it accounts for the entirety of the relationship between X and Z. As an example, Glymour and Cheng (1998, p. 295) cite the following case (from Baumrind, 1983): The number of never-married persons in certain British villages is highly inversely correlated with the number of field mice in the surrounding meadows. [Marriage] was considered an established cause of field mice by the village elders until the mechanisms of transmission were finally surmised: Never-married persons bring with them a disproportionate number of cats. In this case, the number of cats (Y) would be a mechanism that mediates the rela­ tionship between marriage (X) and field mice (Z) because there is no longer a relationship between marriage and field mice when marriage is held constant. In the next section, we discuss limitations of conceptualizing mechanisms this way, after describing the schema format. (p. 138)

Schemas Finally, mechanism knowledge might be represented in the form of schemas—clusters of content-laden knowledge stored in long-term memory. Schemas are critical for inductive inference because they are general knowledge that can be used to instantiate many spe­ cific patterns (Bartlett, 1932; Schank & Abelson, 1977). For example, if Megan tells you about her ski trip, you can already fill in a great amount of the detail without her explicit­ ly telling you—you can assume, for example, that there was a mountain, that the ground was snowy, that warm beverages were available in the lodge, and so on. Causal mecha­ nisms could likewise be represented as clusters of knowledge about the underlying causal relations. Page 17 of 35

Causal Mechanisms Like networks, schemas are a more skeletal representation and would not necessarily im­ plicate image-like resources. Unlike networks, however, relationships between causally adjacent variables would not necessarily be stored together. This is because two causal relationships can be “accidentally” united in a causal chain by sharing an event in com­ mon, yet not belong to the same schema. For example, we have a schema for sex causing pregnancy, and another schema for pregnancy causing nausea. But we may not have a schema for the relationship between sex and nausea. On the network view discussed ear­ lier (Glymour & Cheng, 1998), because these three events are related in a causal chain, pregnancy is a mechanism connecting sex and nausea. On the schema view, in contrast, sex and nausea might not even be seen as causally related. To distinguish between networks and schemas, Johnson and Ahn (2015) tested people’s judgments about the transitivity of causal chains—the extent to which, given that A caus­ es B and B causes C, A is seen as a cause of C. According to the network view, the A→C relationship should be judged as highly causal to the extent that A→B and B→C are seen as highly causal. In contrast, the schema view implies that A→C would be judged as highly causal only if A and C belong to the same schema, even if A→B and B→C are strong. This is exactly what was found. For chains that were found in a preliminary experiment to be highly schematized (e.g., Carl studied, learned the material, and got a perfect score on the test), participants gave high causal ratings to A→B, B→C, and A→C (agreeing that Carl studying caused him to get a perfect score on the test). But for chains that were not schematized (e.g., Brad drank a glass of wine, fell asleep, and had a dream), participants gave high causal ratings for A→B and B→C, but not for A→C (denying that Brad’s glass of wine made him dream). Johnson and Ahn (2015) also ruled out several normative explana­ tions for causal intransitivity (e.g., Hitchcock, 2001; Paul & Hall, 2013). For example, causal chains can be normatively intransitive when the Markov condition is violated, but the Markov condition held for the intransitive chains. Similarly, chains can appear intran­ sitive if one or both of the intermediate links (A→B or B→C) is probabilistically weak, be­ cause the overall relation (A→C) would then be very weak. But the transitive and intransi­ tive chains were equated for intermediate link strength, so this explanation cannot be correct. The lack of transitive inferences given unschematized causal chains is a natural conse­ quence of the schema theory, but is difficult to square with the network theory. When as­ sessing whether an event causes another, people often use a “narrative” strategy, reject­ ing a causal relationship between two events if they cannot generate a story leading from the cause to the effect using their background knowledge (e.g., Kahneman & Tversky, 1982; Taleb, 2007). Hence, if people store A→B and B→C in separate schemas, they could not easily generate a path leading from A to C, resulting in intransitive judgments. The very point of the network representation, however, is to allow people to make precisely such judgments—to represent, for example, the conditional independence between A and C given B, and the effects of potential interventions on A on downstream variables. In­ deed, if the network view defines mechanisms in terms of such conditional independence

Page 18 of 35

Causal Mechanisms relations, then it would require these variables to be linked together. Participants’ intran­ sitive judgments, then, are incompatible with network representations.

Open Questions Because the issue of how causal knowledge is represented is a young research topic, we think it is (p. 139) fertile ground for further theoretical and empirical work. The greatest challenge appears to be understanding how mechanism knowledge can have all the rep­ resentational properties that it does—it has schema-like properties (e.g., causally adja­ cent variables are not necessarily connected in a causal network; Johnson & Ahn, 2015), yet it also has association-like properties (e.g., causal reasoning sometimes violates prob­ ability theory in favor of associationist principles; Rehder, 2014), force-like properties (e.g., vector models capture aspects of causal reasoning; Wolff, 2007), icon-like properties (e.g., people have the phenomenology of visual simulation in solving mechanistic reason­ ing problems; Hegarty, 2004), placeholder-like properties (e.g., our meta-representations are far richer than our representations of mechanisms; Rozenblit & Keil, 2002), and net­ work-like properties (e.g., people are sometimes able to perform sophisticated probabilis­ tic reasoning in accord with Bayesian networks; Gopnik et al., 2004). One view is that Bayesian network theories will ultimately be able to encompass many of these representational properties (Danks, 2005). Although one version of the network theory equates mechanism knowledge with representing the causal graph (Glymour & Cheng, 1998), other network-based theories might be more flexible (e.g., Griffiths & Tenenbaum, 2009). For example, Pearl (2000, pp. xv–xvi) writes: In this tradition [of Pearl’s earlier book Probabilistic Reasoning in Intelligent Sys­ tems (1988)], probabilistic relationships constitute the foundations of human knowledge, whereas causality simply provides useful ways of abbreviating and or­ ganizing intricate patterns of probabilistic relationships. Today, my view is quite different. I now take causal relationships to be the fundamental building blocks both of physical reality and of human understanding of that reality, and I regard probabilistic relationships as but the surface phenomena of the causal machinery that underlies and propels our understanding of the world. That is, our causal knowledge might be represented on two levels—at the level of causal graphs that represent probabilities and counterfactual entailments, and at a lower level that represents the operation of physical causal mechanisms. This view does not seem to capture all of the empirical evidence, as the results of Johnson and Ahn (2015) appear to challenge any theory that posits representations of causal networks without significant qualifications. Nonetheless, theories that combine multiple representational formats and explain the relations among them are needed to account for the diverse properties of mechanism knowledge. Another largely open question is where the content of these representations comes from. For example, to the extent that mechanism knowledge is stored in a schema format, where do those schemas come from? That is, which event categories become clustered to­ Page 19 of 35

Causal Mechanisms gether in memory, and which do not? Little is known about this, perhaps because schema formation is multiply determined, likely depending on factors such as spatial and tempo­ ral contiguity, frequency of encounter, and others. This problem is similar in spirit and dif­ ficulty to the problem of why we have the particular concepts that we do. Why do we have the concept of “emerald” but not the concept of “emeruby” (an emerald before 1997 or a ruby after 1997; Goodman, 1955)? Likewise, why do we have a schema for pregnancy and a schema for nausea, but not a schema that combines the two? Although we describe pri­ or research below on how people learn causal mechanisms, this existing work does not resolve the issue of where causal schemas come from.

Learning Causal Mechanisms In this section, we address how mechanism knowledge is learned. Associationist and net­ work theories have usually emphasized learning from statistical induction (e.g., Glymour & Cheng, 1998). However, these theories can also accommodate the possibility that much or even most causal knowledge comes only indirectly from statistical induction. For exam­ ple, some mechanisms could have been induced by our ancestors and passed to us by cul­ tural evolution (and transmitted by testimony and education) or biological evolution (and transmitted by the selective advantage of our more causally enlightened ancestors). Al­ though the bulk of empirical work on the acquisition of mechanisms focused on statistical induction, we also summarize what is known about three potential indirect learning mechanisms—testimony, reasoning, and perception.

Direct Statistical Induction If mechanisms are essentially patterns of covariation, as some theorists argue (Glymour & Cheng, 1998 (p. 140) ; Pearl, 1988), then the most direct way to learn about mechanisms is by inducing these patterns through statistical evidence. In fact, people are often able to estimate the probability of a causal relationship between two variables from contingency data (e.g., Griffiths & Tenenbaum, 2005; see also Rottman, Chapter 6 in this volume). However, mechanisms involve more than two variables, and the ability to learn causal re­ lationships from contingency data largely vanishes when additional variables are intro­ duced. For instance, in Steyvers, Wagenmakers, Blum, and Tenenbaum (2003), partici­ pants were trained to distinguish between three-variable common cause (i.e., A causes both B and C) and common effect (i.e., A and B both cause C). Although performance was better than chance levels (50% accuracy), it was nonetheless quite poor—less than 70% accuracy on average even after 160 trials, with nearly half of participants performing no better than chance. (For similar results, see Hashem & Cooper, 1996, and White, 2006.) Although people are better able to learn from intervention than from mere observation (Kushnir & Gopnik, 2005; Lagnado & Sloman, 2004; Waldmann & Hagmayer, 2005; see al­ so Bramley, Lagnado, & Speekenbrink, 2015; Coenen, Rehder, & Gureckis, 2015), they are still quite poor at learning multivariable causal structures. In Steyvers et al. (2003), learners allowed to intervene achieved only 33% accuracy at distinguishing among the 18 possible configurations of three variables (compared to 5.6% chance performance and Page 20 of 35

Causal Mechanisms 100% optimal performance). For the complex causal patterns at play in the real world, it seems unlikely that people rely on observational or interventional learning of multivari­ able networks as their primary strategy for acquiring mechanism knowledge. Given that people have great difficulty learning a network of only three variables when presented simultaneously, a second potential learning strategy is piecemeal learning of causal networks. That is, instead of learning relations among multiple variables at once, people may first acquire causal relationships between two variables, and then combine them into larger networks (Ahn & Dennis, 2000; Fernbach & Sloman, 2009). For example, Baetu and Baker (2009) found that people who learned a contingency between A and B and between B and C inferred an appropriate contingency between A and C, suggesting that participants had used the principle of causal transitivity to combine inferences about these disparate links (for similar findings, see Goldvarg & Johnson-Laird, 2001; von Sy­ dow, Meder, & Hagmayer, 2009).1 Although more work will be necessary to test the boundary conditions on piecemeal construction of causal networks (e.g., Johnson & Ahn, 2015), this appears to be a more promising strategy for acquiring knowledge of complex causal mechanisms. Learning networks of causal relations from contingency data is challenging, whether from observations or from interventions, likely as a result of our computational limits. Hence, it seems unlikely that we induce all of our mechanism knowledge from statistical learning (see Ahn & Kalish, 2000), even if direct statistical induction plays some role. Where might these other beliefs about causal mechanisms come from?

Indirect Sources of Mechanism Knowledge Much of our mechanism knowledge appears to come not directly from induction over ob­ servations, but from other sources, such as testimony from other people or explicit educa­ tion, reasoning from other beliefs, and perhaps perception. Although relatively little work has addressed the roles of these sources in acquiring mechanism knowledge in particular, each has been implicated in causal learning more generally.

Testimony and Cultural Evolution Much of our mechanism knowledge seems to come from family members and peers, from experts, and from formal and informal education. Children are famously curious, and renowned for their enthusiasm for asking series of “why” questions that probe for under­ lying mechanisms. Although parents are an important resource in children’s learning (e.g., Callanan & Oakes, 1992), parents’ knowledge is necessarily limited by their exper­ tise. However, children’s (and adults’) ability to seek out and learn from experts puts them in a position to acquire mechanism knowledge when unavailable from more immedi­ ate informants (Mills, 2013; Sobel & Kushnir, 2013; Sperber et al., 2010). In particular, children have an understanding of how knowledge is distributed across experts (Lutz & Keil, 2002) and which causal systems are sufficiently rich or “causally dense” that they would have experts (Keil, 2010).

Page 21 of 35

Causal Mechanisms Further, the growth of mechanism knowledge not only over ontogeny but over history points to powerful mechanisms of cultural evolution (Boyd & Richerson, 1985; Dawkins, 1976). Successive generations generate new scientific knowledge and transmit a subset of that knowledge to the (p. 141) public and to other scientists. Most experimental and computational work in cultural evolution has focused on how messages are shaped over subsequent generations (Bartlett, 1932; Griffiths, Kalish, & Lewandowsky, 2008), how languages evolve (Nowak, Komarova, & Niyogi, 2001), or how beliefs and rituals are propagated (Boyer, 2001). Less is known from a formal or experimental perspective about how cultural evolution impacts the adoption of scientific ideas (but see Kuhn, 1962). Nonetheless, it is clear that the succession of ideas over human history are guided in large part by a combination of scientific scrutiny and cultural selection, and that these forces therefore contribute to the mechanism knowledge that individual cognizers bring to bear on the world.

Reasoning Imagine you have done the hard work of understanding the mechanisms underlying the circulatory system of elephants—perhaps by conducting observations and experiments, or through explicit education. It would be sad indeed if this hard-won mechanism knowledge were restricted to causal reasoning about elephants. What about specific kinds of ele­ phants? Mammals in general? Particular mammals like zebras? Beliefs are not informational islands. Rather, we can use reasoning to extend knowledge from one domain to another. We can use deductive reasoning to extend our general knowledge about elephant circulation “forward” to African elephant circulation (JohnsonLaird & Byrne, 1991; Rips, 1994; Stenning & van Lambalgen, 2008; see Oaksford & Chater, Chapter 19 in this volume, and Over, Chapter 18 in this volume). We can use ana­ logical reasoning to extend our knowledge of elephant circulation “sideways” to similar organisms like zebras (Gentner & Markman, 1997; Hofstadter, 2014; Holyoak & Thagard, 1997; see Holyoak & Lee, Chapter 24 in this volume). And we can use abductive reasoning to extend our knowledge “backward” to mammals (Keil, 2006; Lipton, 2004; Lombrozo, 2012; see Lombrozo & Vasilyeva, Chapter 22 in this volume, and Meder & Mayrhofer, Chapter 23 in this volume); indeed, Ahn and Kalish (2000) suggested that ab­ ductive reasoning is a particularly important process underlying mechanistic causal rea­ soning. Although these reasoning strategies do not always lead to veridical beliefs (e.g., Lipton, 2004; Stenning & van Lambalgen, 2008), they seem to do well often enough that they can be productive sources of hypotheses about causal mechanisms, and they may be accurate enough to support causal inference in many realistic circumstances without ex­ ceeding our cognitive limits.

Perception Intuitively, we sometimes seem to learn mechanisms from simply watching those mecha­ nisms operate in the world (see White, Chapter 14 in this volume). For example, you might observe a bicycle in operation, and draw conclusions about the underlying mecha­ nisms from these direct observations. Indeed, much evidence supports the possibility that Page 22 of 35

Causal Mechanisms people can visually perceive individual causal relations (Michotte, 1946/1963; Rolfs, Dambacher, & Cavanagh, 2013; see White, 2009a, for a review, and Rips, 2011, for a con­ trary view). Haptic experiences may also play a role in identifying causal relations (White, 2012, 2014; Wolff & Shepard, 2013). Just as people seem to learn about individual causal relationships from statistical information and combine them together into more detailed mechanism representations (Ahn & Dennis, 2000; Fernbach & Sloman, 2009), people may likewise be able to learn about individual causal events from visual experience, and com­ bine these into larger mechanism representations. However, we should be cautious in assuming that we rely strongly on perceptual learning for acquiring mechanism knowledge, because little work has addressed this question di­ rectly, and people are susceptible to metacognitive illusions (Rozenblit & Keil, 2002). For example, Lawson (2006) found that people have poor understanding of how bicycles work, and when asked to depict a bicycle from memory, often draw structures that would be im­ possible to operate (e.g., because the frame would prevent the wheels from turning). These errors were found even for bicycle experts and people with a physical bicycle in front of them while completing the task (see also Rozenblit & Keil, 2002). Hence, in many cases, what appears to be a mechanism understood through direct perceptual means is in fact something far more schematic and incomplete, derived from long-term memory.

Open Questions One major open question concerns the balance among these direct and indirect sources. Do we acquire many of our mechanism beliefs through statistical induction, despite our difficulty with learning networks of variables, or is the majority of our causal knowledge derived from other indirect sources? When we combine individual causal (p. 142) relations into mechanism representations, do we do so only with relations learned statistically, or are we also able to combine disparate relations learned through testimony, reasoning, or perception? To what extent can these causal maps combine relations learned through dif­ ferent strategies? Put differently, do these learning strategies all produce mechanism rep­ resentations of the same format, or do they contribute different sorts of representations that may be difficult to combine into a larger picture? Another challenge for future research will be investigating the extent to which these sources contribute not only to learning general causal knowledge (learning that A causes B) but also mechanism knowledge (learning why A causes B). The majority of the evi­ dence summarized earlier concerns only general causal knowledge, so the contribution of these indirect sources to acquiring mechanism knowledge should be addressed empirical­ ly. Finally, might some mechanism knowledge be conveyed through the generations not only through cultural evolution, but also through biological evolution? It is controversial to what extent we have innate knowledge (e.g., Carey, 2009; Elman et al., 1996), and less clear still to what extent we have innate knowledge of causal mechanisms. Nonetheless, we may be born with some highly schematic, skeletal representations of mechanisms. For Page 23 of 35

Causal Mechanisms example, 4-month-old infants appear to understand the fundamental explanatory princi­ ples of physics (e.g., Spelke, Breinlinger, Macomber, & Jacobson, 1992), including physi­ cal causality (Leslie & Keeble, 1987); belief-desire psychology emerges in a schematic form by 12 months (Gergely & Csibra, 2003); and young children use the principles of es­ sentialism (Keil, 1989), vitalism (Inagaki & Hatano, 2004), and inherence (Cimpian & Salomon, 2014) to understand the behavior of living things. These rudimentary explanato­ ry patterns may provide candidate mechanisms underlying many more specific causal re­ lationships observed in the world. To the extent that these patterns are innate, we might be born with some highly skeletal understanding of causal mechanisms that can underlie later learning.

Conclusions The chapters in this volume demonstrate the depth to which causality pervades our think­ ing. In this chapter, we have argued further that knowledge of causal mechanisms per­ vades our causal understanding. First, when deciding whether a relationship is causal, mechanism knowledge can override other cues to causality. It provides evidence over and above covariation, and a mechanism can even change the interpretation of new covaria­ tion information; it can result in violations of the causal Markov condition—a critical as­ sumption for statistical reasoning via Bayesian networks; and it can alter expectations about temporal delays, moderating the effect of temporal proximity on causal judgment. Second, mechanism knowledge is crucial to inductive inference. It affects which cate­ gories are used and induced; how strongly an exemplar’s features are projected onto oth­ er exemplars; how likely we are to extend a property from one category to another; and how we make category-based probability judgments, producing discounting and conjunc­ tion effects. Mechanism knowledge is also key to how causal relations are mentally represented. Sev­ eral representational formats have been proposed—associations, forces or powers, icons, placeholders, networks, and schemas. Although there are likely to be elements of all of these formats in our mechanism knowledge, two positive empirical conclusions are clear. First, people’s meta-representations of causal knowledge are far richer than their actual causal knowledge, suggesting that our representations include abstract placeholders or “pointers” to real-world referents that are not stored in the head. Second, however, peo­ ple do represent some mechanism content, and this content appears to often take the form of causal schemas. Future theoretical and empirical work should address how the various properties of mechanism knowledge can be understood in a single framework. Mechanisms may be acquired in part through statistical induction. However, because peo­ ple are poor at learning networks of three or more variables by induction, it is more likely that people learn causal relations individually and assemble them piecemeal into larger networks. People also seem to use other learning strategies for acquiring mechanism knowledge, such as testimony, reasoning, and perhaps perception. How these strategies interact, and whether they produce different sorts of representations, are open questions. Page 24 of 35

Causal Mechanisms Although we would not claim that all reasoning about causation is reasoning about mech­ anisms, mechanisms are central to many of our nearest and dearest inferential processes. Hence, understanding the representation and acquisition of mechanism knowledge can help to cut to the core of causal thinking, and much of the cognition that it makes possi­ ble.

References Ahn, W., & Bailenson, J. (1996). Causal attribution as a search for underlying mecha­ nisms: An explanation of the conjunction fallacy and the discounting principle. Cognitive Psychology, 31, 82–123. Ahn, W., & Dennis, M. J. (2000). Induction of causal chains. In L. R. Gleitman & A. K. Joshi (Eds.), Proceedings of the 22nd annual conference of the Cognitive Science Society (pp. 19–24). Mahwah, NJ: Lawrence Erlbaum Associates. Ahn, W., & Kalish, C. W. (2000). The role of mechanism beliefs in causal reasoning. In F. C. Keil & R. A. Wilson (Eds.), Explanation and cognition (pp. 199–226). Cambridge, MA: MIT Press. Ahn, W., Kalish, C. W., Medin, D. L., & Gelman, S. A. (1995). The role of covariation versus mechanism information in causal attribution. Cognition, 54, 299–352. Aristotle. (1970). Physics, Books I–II. (W. Charlton, Trans.) Oxford: Clarendon Press. Baetu, I., & Baker, A. G. (2009). Human judgments of positive and negative causal chains. Journal of Experimental Psychology: Animal Behavior Processes, 35, 153–168. Barbey, A. K., & Wolff, P. (2007). Learning causal structure from reasoning. In D. S. Mc­ Namara & J. G. Trafton (Eds.), Proceedings of the 29th annual conference of the Cognitive Science Society (pp. 713–718). Austin, TX: Cognitive Science Society. Barsalou, L. W. (1999). Perceptual symbol systems. Behavioral and Brain Sciences, 22, 577–660. Bartlett, F. C. (1932). Remembering: An experimental and social study. Cambridge, UK: Cambridge University Press. Baumrind, D. (1983). Specious causal attributions in the social sciences: The reformulat­ ed stepping-stone theory of heroin use as exemplar. Journal of Personality and Social Psy­ chology, 45, 1289–1298. Boyd, R., & Richerson, P. J. (2005). The origin and evolution of cultures. Oxford: Oxford University Press. Boyer, P. (2001). Religion explained: The evolutionary origins of religious thought. New York: Basic Books.

Page 25 of 35

Causal Mechanisms Bramley, N. R., Lagnado, D. A., & Speekenbrink, M. (2015). Conservative forgetful schol­ ars: How people learn causal structure through sequences of interventions. Journal of Ex­ perimental Psychology: Learning, Memory, and Cognition, 41, 708–731. Buehner, M. J., & May, J. (2002). Knowledge mediates the timeframe of covariation assess­ ment in human causal induction. Thinking & Reasoning, 8, 269–293. Buehner, M. J., & May, J. (2003). Rethinking temporal contiguity and the judgement of causality: Effects of prior knowledge, experience, and reinforcement procedure. The Quarterly Journal of Experimental Psychology, 56A, 865–890. Buehner, M. J., & May, J. (2004). Abolishing the effect of reinforcement delay on human causal learning. The Quarterly Journal of Experimental Psychology, 57B, 179–191. Buehner, M. J., & McGregor, S. (2006). Temporal delays can facilitate causal attribution: Towards a general timeframe bias in causal induction. Thinking & Reasoning, 12, 353– 378. Bullock, M., Gelman, R., & Baillargeon, R. (1982). The development of causal reasoning. In W. J. Friedman (Ed.), The developmental psychology of time (pp. 209–254). New York: Academic Press. Callanan, M. A., & Oakes, L. M. (1992). Preschoolers’ questions and parents’ explana­ tions: Causal thinking in everyday activity. Cognitive Development, 7, 213–233. Carey, S. (2009). The origin of concepts. Oxford: Oxford University Press. Cimpian, A., & Salomon, E. (2014). The inherence heuristic: An intuitive means of making sense of the world, and a potential precursor to psychological essentialism. Behavioral and Brain Sciences, 37, 461–527. Coenen, A., Rehder, B., Gureckis, T. (2015). Strategies to intervene on causal systems are adaptively selected. Cognitive Psychology, 79, 102–133. Cushman, F. (2008). Crime and punishment: Distinguishing the roles of causal and inten­ tional analyses in moral judgment. Cognition, 108, 353–380. Danks, D. (2005). The supposed competition between theories of human causal inference. Philosophical Psychology, 18, 259–272. Dawkins, R. (1976). The selfish gene. Oxford: Oxford University Press. Dowe, P. (2000). Physical causation. Cambridge, UK: Cambridge University Press. Einhorn, H. J., & Hogarth, R. M. (1986). Judging probable cause. Psychological Bulletin, 99, 3–19.

Page 26 of 35

Causal Mechanisms Elman, J. L., Bates, E. A., Johnson, M. H., Karmiloff-Smith, A., Parisi, D., & Plunkett, K. (1996). Rethinking innateness: A connectionist perspective on development. Cambridge, MA: MIT Press. Fernbach, P. M., Darlow, A., & Sloman, S. A. (2010). Neglect of alternative causes in pre­ dictive but not diagnostic reasoning. Psychological Science, 21, 329–36. Fernbach, P. M., Darlow, A., & Sloman, S. A. (2011). Asymmetries in predictive and diag­ nostic reasoning. Journal of Experimental Psychology: General, 140, 168–85. Fernbach, P. M., & Sloman, S. A. (2009). Causal learning with local computations. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35, 678–693. Fischhoff, B., Slovic, P., & Lichtenstein, S. (1977). Knowing with certainty: The appropri­ ateness of extreme confidence. Journal of Experimental Psychology: Human Perception and Performance, 3, 552–564. Focht, D. R., III, Spicer, C., & Fairchok, M. P. (2002). The efficacy of duct tape vs cryother­ apy in the treatment of verruca vulgaris (the common wart). Archives of Pediatrics & Ado­ lescent Medicine, 156, 971–974. Forbus, K. D. (1984). Qualitative process theory. Artificial Intelligence, 24, 85–168. Fugelsang, J. A., & Thompson, V. A. (2000). Strategy selection in causal reasoning: When beliefs and covariation collide. Canadian Journal of Experimental Psychology, 54, 15–32. Fugelsang, J. A., Stein, C. B., Green, A. E., & Dunbar, K. N. (2004). Theory and data inter­ actions of the scientific mind: Evidence from the molecular and the cognitive laboratory. Canadian Journal of Experimental Psychology, 58, 86–95. Gentner, D., & Markman, A. B. (1997). Structure mapping in analogy and similari­ ty. American Psychologist, 52, 45–56. (p. 144)

Gergely, G., & Csibra, G. (2003). Teleological reasoning in infancy: The naive theory of ra­ tional action. Trends in Cognitive Sciences, 7, 287–292. Glennan, S. S. (1996). Mechanisms and the nature of causation. Erkenntnis, 44, 49–71. Glymour, C., & Cheng, P. W. (1998). Causal mechanism and probability: A normative ap­ proach. In M. Oaksford & N. Chater (Eds.), Rational models of cognition (pp. 295–313). Oxford: Oxford University Press. Goldvarg, E., & Johnson-Laird, P. N. (2001). Naive causality: A mental model theory of causal meaning and reasoning. Cognitive Science, 25, 565–610. Goodman, N. (1955). Fact, fiction, and forecast. Cambridge, MA: Harvard University Press.

Page 27 of 35

Causal Mechanisms Gopnik, A., Glymour, C., Sobel, D. M., Schulz, L. E., Kushnir, T., & Danks, D. (2004). A the­ ory of causal learning in children: Causal maps and Bayes nets. Psychological Review, 111, 3–32. Griffiths, T. L., Kalish, M. L., & Lewandowsky, S. (2008). Theoretical and empirical evi­ dence for the impact of inductive biases on cultural evolution. Proceedings of the Royal Society, B, 363, 3503–3514. Griffiths, T. L., & Tenenbaum, J. B. (2005). Structure and strength in causal induction. Cognitive Psychology, 51, 334–384. Griffiths, T. L., & Tenenbaum, J. B. (2009). Theory-based causal induction. Psychological Review, 116, 661–716. Hagmayer, Y., Meder, B., von Sydow, M., & Waldmann, M. R. (2011). Category transfer in sequential causal learning: The unbroken mechanism hypothesis. Cognitive Science, 35, 842–873. Harré, R., & Madden, E. H. (1975). Causal powers: A theory of natural necessity. Lan­ ham, MD: Rowman & Littlefield. Hashem, A. I., & Cooper, G. F. (1996). Human causal discovery from observational data. Proceedings of the AMIA Annual Fall Symposium, 27–31. https://www.ncbi.nlm.nih.gov/ pmc/articles/PMC2233172/. Hegarty, M. (1992). Mental animation: Inferring motion from static displays of mechani­ cal systems. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 1084–1102. Hegarty, M. (2004). Mechanical reasoning by mental simulation. Trends in Cognitive Sciences, 8, 280–285. Hegarty, M., & Sims, V. K. (1994). Individual differences in mental animation during me­ chanical reasoning. Memory & Cognition, 22, 411–430. Heit, E. (2000). Properties of inductive reasoning. Psychonomic Bulletin & Review, 7, 569–592. Hitchcock, C. (2001). The intransitivity of causation revealed in equations and graphs. The Journal of Philosophy, 98, 273–299. Hofstadter, D. R. (2014). Surfaces and essences: Analogy as the fuel and fire of thought. New York: Basic Books. Holyoak, K. J., & Thagard, P. (1997). The analogical mind. American Psychologist, 52, 35– 44. Hume, D. (1748/1977). An enquiry concerning human understanding. Indianapolis, IN: Hackett. Page 28 of 35

Causal Mechanisms Inagaki, K., & Hatano, G. (2004). Vitalistic causality in young children’s naive biology. Trends in Cognitive Sciences, 8, 356–362. Johnson, S. G. B., & Ahn, W. (2015). Causal networks or causal islands? The representa­ tion of mechanisms and the transitivity of causal judgment. Cognitive Science, 39, 1468– 1503. Johnson, S. G. B., & Keil, F. C. (2014). Causal inference and the hierarchical structure of experience. Journal of Experimental Psychology: General, 143, 2223–2241. Johnson-Laird, P. N., & Byrne, R. M. J. (1991). Deduction: Essays in cognitive psychology. Hillsdale, NJ: Lawrence Erlbaum Associates. Kahneman, D., & Tversky, A. (1982). The simulation heuristic. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncertainty: Heuristics and biases (pp. 201–208). Cambridge, UK: Cambridge University Press. Keil, F. C. (1989). Concepts, kinds, and cognitive development. Cambridge, MA: MIT Press. Keil, F. C. (2006). Explanation and understanding. Annual Review of Psychology, 57, 227– 254. Keil, F. C. (2010). The feasibility of folk science. Cognitive Science, 34, 826–862. Kelley, H. H. (1973). The processes of causal attribution. American Psychologist, 28, 107– 128. Kemp, C., & Jern, A. (2013). A taxonomy of inductive problems. Psychonomic Bulletin & Review, 21, 23–46. Koslowski, B. (1996). Theory and evidence: The development of scientific reasoning. Cam­ bridge, MA: MIT Press. Kripke, S. (1980). Naming and necessity. Oxford: Blackwell. Kuhn, T. S. (1962). The structure of scientific revolutions. Chicago: University of Chicago Press. Kushnir, T., & Gopnik, A. (2005). Young children infer causal strength from probabilities and interventions. Psychological Science, 16, 678–683. Lagnado, D. A., & Sloman, S. A. (2004). The advantage of timely intervention. Journal of Experimental Psychology: Learning, Memory, and Cognition, 30, 856–876. Lagnado, D. A., & Sloman, S. A. (2006). Time as a guide to cause. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32, 451–460.

Page 29 of 35

Causal Mechanisms Lawson, R. (2006). The science of cycology: Failures to understand how everyday objects work. Memory & Cognition, 34, 1667–1675. Leslie, A. M., & Keeble, S. (1987). Do six-month-old infants perceive causality? Cognition, 25, 265–288. Lipton, P. (2004). Inference to the best explanation (2nd ed.). London: Routledge. Lombrozo, T. (2010). Causal-explanatory pluralism: How intentions, functions, and mecha­ nisms influence causal ascriptions. Cognitive Psychology, 61, 303–32. Lombrozo, T. (2012). Explanation and abductive inference. In K. J. Holyoak & R. G. Morri­ son (Eds.), Oxford handbook of thinking and reasoning (pp. 260–276). Oxford: Oxford Uni­ versity Press. Lombrozo, T., & Carey, S. (2006). Functional explanation and the function of explanation. Cognition, 99, 167–204. Lutz, D. J., & Keil, F. C. (2002). Early understanding of the decision of cognitive labor. Child Development, 73, 1073–1084. Machamer, P., Darden, L., & Craver, C. F. (2000). Thinking about mechanisms. Philosophy of Science, 67, 1–25. Markman, A. B. (1999). Knowledge representation. Mahwah, NJ: Lawrence Erlbaum Asso­ ciates. Medin, D. L., Coley, J. D., Storms, G., & Hayes, B. K. (2003). A relevance theory of induc­ tion. Psychonomic Bulletin & Review, 10, 517–532. Medin, D. L., & Ortony, A. (1989). Psychological essentialism. In S. Vosniadou & A. Ortony (Eds.), Similarity and analogical reasoning. Cambridge, UK: Cambridge University Press. Michotte, A. (1946/1963). The perception of causality. (T. R. Miles & E. Miles, Trans.). New York: Basic Books. (p. 145)

Mills, C. M. (2013). Knowing when to doubt: Developing a critical stance when learning from others. Developmental Psychology, 49, 404–418. Murphy, G. L. (2002). The big book of concepts. Cambridge, MA: MIT Press. Murphy, G. L., & Allopenna, P. D. (1994). The locus of knowledge effects in concept learn­ ing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 904–919. Murphy, G. L., & Medin, D. L. (1985). The role of theories in conceptual coherence. Psy­ chological Review, 92, 289–316.

Page 30 of 35

Causal Mechanisms Murphy, G. L., & Wisniewski, E. J. (1989). Feature correlations in conceptual representa­ tions. In Advances in cognitive science, Vol. 2: Theory and applications (pp. 23–45). Chich­ ester, UK: Ellis Horwood. Nowak, M. A., Komarova, N. L., & Niyogi, P. (2001). Evolution of universal grammar. Science, 291, 114–118. Park, J., & Sloman, S. A. (2013). Mechanistic beliefs determine adherence to the Markov property in causal reasoning. Cognitive Psychology, 67, 186–216. Park, J., & Sloman, S. A. (2014). Causal explanation in the face of contradiction. Memory & Cognition, 42, 806–820. Patalano, A. L., & Ross, B. H. (2007). The role of category coherence in experience-based prediction. Psychonomic Bulletin & Review, 14, 629–634. Paul, L. A., & Hall, N. (2013). Causation: A user’s guide. Oxford: Oxford University Press. Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible infer­ ence. San Francisco: Morgan Kaufmann. Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge, UK: Cambridge University Press. Pennington, N., & Hastie, R. (1988). Explanation-based decision making: Effects of memo­ ry structure on judgment. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14, 521–533. Putnam, H. (1975). The meaning of “meaning.” In K. Gunderson (Ed.), Language, mind, and knowledge (pp. 131–193). Minneapolis: University of Minnesota Press. Pylyshyn, Z. W. (1973). What the mind’s eye tells the mind’s brain: A critique of mental imagery. Psychological Bulletin, 80, 1–24. Rehder, B. (2014). Independence and dependence in human causal reasoning. Cognitive Psychology, 72, 54–107. Rehder, B., & Burnett, R. C. (2005). Feature inference and the causal structure of cate­ gories. Cognitive Psychology, 50, 264–314. Rehder, B., & Hastie, R. (2004). Category coherence and category-based property induc­ tion. Cognition, 91, 113–153. Rehder, B., & Ross, B. H. (2001). Abstract coherent categories. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27, 1261–1275. Rips, L. J. (1984). Mental muddles. In M. Brand & R. M. Harnish (Eds.), The representa­ tion of knowledge and belief. Tuscon: University of Arizona Press.

Page 31 of 35

Causal Mechanisms Rips, L. J. (1994). The psychology of proof. Cambridge, MA: MIT Press. Rips, L. J. (2011). Causation from perception. Perspectives on Psychological Science, 6, 77–97. Rolfs, M., Dambacher, M., & Cavanagh, P. (2013). Visual adaptation of the perception of causality. Current Biology, 23, 250–254. Rottman, B. M., & Hastie, R. (2014). Reasoning about causal relationships: Inferences on causal networks. Psychological Bulletin, 140, 109–139. Rottman, B. M., & Keil, F. C. (2012). Causal structure learning over time: Observations and interventions. Cognitive Psychology, 64, 93–125. Rozenblit, L., & Keil, F. C. (2002). The misunderstood limits of folk science: An illusion of explanatory depth. Cognitive Science, 26, 521–562. Salmon, W. C. (1984). Scientific explanation and the causal structure of the world. Prince­ ton, NJ: Princeton University Press. Schank, R., & Abelson, R. (1977). Scripts, plans, goals, and understanding: An inquiry in­ to human knowledge structures. New York: Psychology Press. Schlottmann, A. (1999). Seeing it happen and knowing how it works: How children under­ stand the relation between perceptual causality and underlying mechanism. Developmen­ tal Psychology, 35, 303–317. Schwartz, D. L., & Black, J. B. (1996). Shuttling between depictive models and abstract rules: Induction and fallback. Cognitive Science, 20, 457–497. Shanks, D. R. (1987). Associative accounts of causality judgment. Psychology of Learning and Motivation, 21, 229–261. Shanks, D. R., Pearson, S. M., & Dickinson, A. (1989). Temporal contiguity and the judge­ ment of causality by human subjects. The Quarterly Journal of Experimental Psychology, 41, 139–159. Shultz, T. R., Fisher, G. W., Pratt, C. C., & Rulf, S. (1986). Selection of causal rules. Child Development, 57, 143–152. Simon, H. A. (1996). The sciences of the artificial (3rd ed.). Cambridge, MA: MIT Press. Sims, V. K., & Hegarty, M. (1997). Mental animation in the visuospatial sketchpad: Evi­ dence from dual-task studies. Memory & Cognition, 25, 321–332. Sloman, S. A. (1994). When explanations compete: The role of explanatory coherence on judgements of likelihood. Cognition, 52, 1–21.

Page 32 of 35

Causal Mechanisms Sobel, D. M., & Kushnir, T. (2013). Knowledge matters: How children evaluate the reliabil­ ity of testimony as a process of rational inference. Psychological Review, 120, 779–797. Spelke, E. S., Breinlinger, K., Macomber, J., & Jacobson, K. (1992). Origins of knowledge. Psychological Review, 99, 605–632. Spellman, B. A., Price, C. M., & Logan, J. M. (2001). How two causes are different from one: The use of (un)conditional information in Simpson’s paradox. Memory & Cognition, 29, 193–208. Sperber, D., Clément, F., Heintz, C., Mascaro, O., Mercier, H., Origgi, G., & Wilson, D. (2010). Epistemic vigilance. Mind & Language, 25, 359–393. Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, prediction, and search. New York: Springer. Stenning, K., & van Lambalgen, M. (2008). Human reasoning and cognitive science. Cam­ bridge, MA: MIT Press. Steyvers, M., Tenenbaum, J. B., Wagenmakers, E., & Blum, B. (2003). Inferring causal net­ works from observations and interventions. Cognitive Science, 27, 453–489. Taleb, N. N. (2007). The black swan: The impact of the highly improbable. New York: Ran­ dom House. Talmy, L. (1988). Force dynamics in language and cognition. Cognitive Science, 12, 49– 100. Tversky, A., & Kahneman, D. (1981). Evidential impact of base rates. Technical Report. Pa­ lo Alto, CA: Office of Naval Research. Tversky, A., & Kahneman, D. (1983). Extensional versus intuitive reasoning: The conjunc­ tion fallacy in probability judgment. Psychological Review, 90, 293–315. Von Sydow, M., Meder, B., & Hagmayer, Y. (2009). A transitivity heuristic of proba­ bilistic causal reasoning. In N. A. Taatgen & H. van Rign (Eds.), Proceedings of the 31st annual conference of the Cognitive Science Society (pp. 803–808). Austin, TX: Cognitive Science Society. (p. 146)

Waldmann, M. R. (2000). Competition among causes but not effects in predictive and di­ agnostic learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 53–76. Waldmann, M. R., & Hagmayer, Y. (2005). Seeing versus doing: Two modes of accessing causal knowledge. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 216–227. Waldmann, M. R., & Hagmayer, Y. (2006). Categories and causality: The neglected direc­ tion. Cognitive Psychology, 53, 27–58. Page 33 of 35

Causal Mechanisms Waldmann, M. R., & Holyoak, K. J. (1992). Predictive and diagnostic learning within causal models: Asymmetries in cue competition. Journal of Experimental Psychology: Gen­ eral, 121, 222–236. Waldmann, M. R., & Mayrhofer, R. (2016). Hybrid causal representations. In B. Ross (Ed.), The psychology of learning and motivation (Vol. 65, pp. 85–127). San Diego: Acade­ mic Press. White, P. A. (1988). Causal processing: Origins and development. Psychological Bulletin, 104, 36–52. White, P. A. (1989). A theory of causal processing. British Journal of Psychology, 80, 431– 454. White, P. A. (2006). How well is causal structure inferred from cooccurrence information? European Journal of Cognitive Psychology, 18, 454–480. White, P. A. (2009a). Perception of forces exerted by objects in collision events. Psycholog­ ical Review, 116, 580–601. White, P. A. (2009b). Property transmission: An explanatory account of the role of similari­ ty information in causal inference. Psychological Bulletin, 135, 774–793. White, P. A. (2012). The experience of force: The role of haptic experience of forces in vi­ sual perception of object motion and interactions, mental simulation, and motion-related judgments. Psychological Bulletin, 138, 589–615. White, P. A. (2014). Singular clues to causality and their use in human causal judgment. Cognitive Science, 38, 38–75. Wolff, P. (2003). Direct causation in the linguistic coding and individuation of causal events. Cognition, 88, 1–48. Wolff, P. (2007). Representing causation. Journal of Experimental Psychology: General, 136, 82–111. Wolff, P., Barbey, A. K., & Hausknecht, M. (2010). For want of a nail: How absences cause events. Journal of Experimental Psychology: General, 139, 191–221. Wolff, P., & Shepard, J. (2013). Causation, touch, and the perception of force. Psychology of Learning and Motivation, 58, 167–202.

Notes: (1.) Although this result may appear to conflict with the results of Johnson and Ahn (2015), which demonstrated causal intransitivity in some causal chains, the two sets of findings can be reconciled, because Johnson and Ahn (2015) used familiar stimuli for which people could expect to have schematized knowledge, whereas Baetu and Baker Page 34 of 35

Causal Mechanisms (2009) used novel stimuli. In reasoning about novel stimuli, people would not use a narra­ tive strategy (i.e., trying to think of a story connecting the causal events), but would in­ stead use a statistical (Baetu & Baker, 2009) or rule-based strategy (Goldvarg & JohnsonLaird, 2001). The lack of schematized knowledge would not block transitive inferences under these reasoning strategies.

Samuel G. B. Johnson

Department of Psychology Yale University New Haven, Connecticut, USA Woo-kyoung Ahn

Department of Psychology Yale University New Haven, Connecticut, USA

Page 35 of 35

Force Dynamics

Force Dynamics   Phillip Wolff The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.13

Abstract and Keywords Force dynamics is an approach to knowledge representation that aims to describe how notions of force, resistance, and tendency enter into the representation of certain kinds of words and concepts. As a theory of causation, it specifies how the concept of cause may be grounded in people’s representations of force and spatial relations. This chapter re­ views theories of force dynamics that have recently emerged in the linguistic, psychologi­ cal, and philosophical literatures. In discussing these theories, it reveals how a force dy­ namic account of causation is able to account for many of the key phenomena in causal cognition, including the representation of individual causal events, the encoding of causal relations in language, the encoding of causal chains, and causation by omission. Keywords: force dynamics, causation, force dynamics, language, psychology, causal chains

Causal events typically involve a large set of factors, many of which may be necessary for the occurrence of an effect (Mill, 1872/1973). However, when people are asked to explain why a particular event occurs, they typically pick out only one or two of the factors (Hart & Honoré, 1959/1985). For example, in the case of a boat capsizing, the captain might point to the occurrence of a rogue wave or a broken rudder, but not the boat’s weight, an­ gle to the wind, or the crew. The challenge of identifying which of these conditions consti­ tutes the cause of the effect is what Hesslow (1988) has called the problem of causal se­ lection. The phenomenon of causal selection is not just a philosopher’s puzzle. It has conse­ quences for our understanding of how people determine guilt in a court case, identify the source of a mechanical failure, or choose a medical treatment. The problem of how peo­ ple select a particular cause is not easily explained by theories of causation that define causal relations in terms of statistical or counterfactual dependencies (Sloman & Lagna­ do, 2014; Walsh & Sloman, 2011). To address this problem arguably requires a different kind of theory of causation, one in which causal relations are defined in terms of their in­ ternal properties, rather than in terms of their external effects. One class of theories of this type are those associated with force dynamics. Force dynamics deconstructs the con­ Page 1 of 38

Force Dynamics cept of CAUSE and related concepts into finer components. This decomposition not only offers an account of how people solve the problem of causal selection, but also of several other challenging phenomena in the causation literature. We will focus on the representa­ tion of causation from a force dynamic perspective, but we will also discuss the force dy­ namic approach with respect to several other prominent theories of causation. We will ar­ gue that force dynamic theories provide an account of the problem of causal selection that is not easily equaled by any other theory of causation.

Force Dynamic Theories of Causation To explain how force dynamics approaches the problem of causal selection, it will prove useful to review the fundamental assumptions of (p. 148) this perspective through a re­ view of the different accounts of force dynamics.

Talmy’s (1988) Theory of Force Dynamics The first theory of force dynamics to be formulated was proposed by Talmy (1988). The theory introduced the ideas of the imparting of force, resistance to force, overcoming re­ sistance, and removal of a force. As such, force dynamics was hypothesized as a frame­ work that included not only the notion of causation, but also several other notions such as “letting,” “hindering,” “helping,” and “intending.” The primary goal in Talmy’s theory was to provide an account of the meaning of a large number of verbs, prepositions, and modals. However, Talmy (1988) emphasizes that the patterns of force interactions ob­ served in semantics appear to reflect many of the properties of people’s naïve physics. As such, Talmy viewed his theory as not only relevant to the semantics of language, but also as an account of how various causal concepts might be represented in the conceptual sys­ tem outside of language. In Talmy’s theory, the simplest type of force dynamic pattern is one in which two forces are in steady-state opposition. Such scenarios usually involve two entities. One of these entities, the agonist, is singled out for focal attention, while the other, the antagonist, plays a subordinate role. An example of a steady-state pattern would be a tumbleweed that is kept rolling over the ground by a wind. In this scenario, the entity singled out for focal attention would most likely be the tumbleweed, making it the agonist. The second entity, the wind, would be the antagonist. While there is movement in such a scenario, there is no qualitative change in the type of action of the agonist, so the interaction is classified as steady-state. Talmy argued that several kinds of steady-state patterns could be differentiated with respect to three dimensions. One of these dimensions is the tenden­ cy of the agonist. The agonist is associated with a force that gives it a proclivity either for action or for rest. In Talmy’s theory, the two forces in a force dynamic pattern are almost always in opposition; hence, if the agonist has a tendency for action, the antagonist is as­ sociated with a force that pushes it toward rest, and if the agonist has a tendency for rest, then the antagonist is associated with a force pushing it toward action. The second di­ mension distinguishing different force dynamic patterns is the relative strength of the Page 2 of 38

Force Dynamics forces associated with the agonist and the antagonist: the agonist is either stronger or weaker than the antagonist. The third and final main dimension is the outcome: the ago­ nist remains either in action or rest. Table 9.1 summarizes the different kinds of steadystate force dynamic patterns in Talmy’s theory. Table 9.1 Dimensions That Differentiate Steady-State Force Dynamic Interactions in Talmy (1988) Agonist Tenden­ cy

Agonist Strength

Agonist Re­ sult

‘Causative’ extended causation

Rest

Weaker

Action

Despite

Rest

Stronger

Rest

Despite/hinder

Action

Stronger

Action

Block (i.e., xtended Prevent)

Action

Weaker

Rest

In steady-state interactions in which the agonist has a tendency for rest, its strength is weaker than that of the antagonist, and the result is one of action because the stronger antagonist force is for action, the overall pattern is expressed in language with predicates implying causation (e.g., the wind kept the tumbleweed rolling). In steady-state interac­ tions in which the agonist has a tendency for rest that is stronger than that of the antago­ nist and the resulting outcome is for rest, the over pattern is one of despite (e.g., the tum­ bleweed did not move despite the wind). In steady-state interactions in which the agonist’s tendency is for action and is stronger than that of the antagonist and the result is one of action, the pattern is a second type of despite (e.g., the tumbleweed rolled de­ spite the wind). Finally, in a steady-state interaction in which the agonist has a tendency for action but is weaker than the antagonist and result is of rest, the pattern is one of pre­ vention (e.g., the wind prevented the tumbleweed from rolling down the hill). A second main type of interaction in Talmy’s (1988) theory are change-of-state patterns. In change-of-state patterns, the antagonist, rather than impinging steadily on the agonist, enters or leaves this state of impingement. Table 9.2 summarizes some of the possible change-of-state patterns identified in Talmy’s theory. In change-of-state interactions, when the agonist’s tendency is for rest and the antagonist’s motion is into impingement with the agonist, resulting in action, the interac­ tion is one of causation (e.g., the boy knocked the bottle off the wall). Causation from change-of-state interactions differs from causation in steady-state interactions in instanti­ ating onset causation. Another type of change-of-state interaction is (p. 149) one in which the agonist’s tendency is for action, the antagonist comes into impingement with the ago­ Page 3 of 38

Force Dynamics nist, and the result is rest. Such an interaction is one of stopping or prevention (e.g., the boy prevented the bottle from falling off the wall). When the agonist has a tendency for action, the antagonist moves out of impingement with the agonist, and action ensues, the interaction is one of letting (e.g., the boy let the bottle fall off the wall). Finally, when the agonist has a tendency for rest and the antagonist moves out of impingement with it, and the result is rest, the interaction is another type of letting, (e.g., the boy let the bottle sit on the wall). Whereas onset causation and preventing involve the start of or continuation of impingement of the antagonist on the agonist, letting involves the cessation of impinge­ ment of the antagonist on the agonist.

Page 4 of 38

Force Dynamics Table 9.2 Dimensions of Change-of-State Force Dynamic Interactions in Talmy (1988) Agonist Tendency

Agonist Strength

Antagonist Impinge­ ment

Agonist Result

Causatives

Rest

Weaker

Into

Action

Preventing

Action

Weaker

Into

Rest

Letting

Action

Weaker

Out of

Action

Letting

Rest

Weaker

Out of

Rest

Page 5 of 38

Force Dynamics Talmy’s theory of force dynamics highlights how a relatively large number of verbs and prepositions instantiate various types of force-dynamic patterns. For example, in addition to the verbs cause, prevent, and let, there are the verbs keep, refrain, get, stop, make, overcome, push, pull, press, hold, resist, maintain, exert, press, try, hinder, urge, per­ suade, refuse, free, allow, help, permit, forbid, drag, trudge, and attract. Prepositions en­ coding force dynamics include against, despite, although, because, and on. Talmy’s theory explains the ways in which these verbs overlap and differ in meaning. Another key contri­ bution of Talmy’s theory is the concept of tendency. It is only via the notion of tendency that the notions of CAUSE, PREVENT, and LET can be distinguished: with LET the agonist’s tendency is realized in the final result, whereas in CAUSE and PREVENT it is not. Talmy’s theory is conceptually grounded in physical forces and motion events. How­ ever, it is also intended to be a theory of social (e.g., peer pressure, persuade, urge, per­ mit, forbid) and psychological interactions (e.g., moral fortitude, overcome, refuse, give in). Talmy’s theory of force dynamics shows how the concept of CAUSE can be viewed as one member of a family of concepts. However, a closer look at his proposal also raises compli­ cations for Talmy’s account of the concepts underlying the semantics of various words. In certain ways, the theory invokes more distinctions than are strictly needed in order to dif­ ferentiate several of the force-dynamic patterns. Consider, for example, the set of dimen­ sions associated with steady-state verbs. Talmy proposes that these interactions are spec­ ified in terms of three dimensions. However, knowledge of any two of the dimensions per­ fectly predicts the value of the remaining third dimension. For example, if the agonist’s tendency is for rest and the agonist’s result is for action, it must be the case that the agonist’s tendency is weaker than that of the antagonist. A similar kind of redundancy emerges in the case of the change-of-state patterns. As shown in Table 9.2, in all of Talmy’s change-of-state patterns, the agonist is weaker than the antagonist. Eliminating this dimension does not change the theory’s ability to differentiate the notions of CAUSE, PREVENT, and LET. In other ways, Talmy’s theory appears to lack distinctions that are needed for specifying certain force dynamic patterns or ruling them out. For example, in Talmy’s theory, the forces associated with the agonist and antagonist are nearly always assumed to be in op­ position. If the theory was more accepting of concordance between the agonist and an­ tagonist, it might be better able to specify the patterns underlying the notions of help, en­ able, and assist. Another way in which the theory lacks explicitness lies in its inability to explain why certain force-dynamic patterns do not occur. In the case of the steady-state patterns, the three binary dimensions imply eight possible force-dynamic patterns, but four of these configurations are not discussed. The reason that they are not discussed is presumably because they represent impossible patterns of forces and results. For exam­ ple, if the agonist has a tendency for action and the agonist is stronger than the antago­ nist, then the agonist’s result cannot be one of rest. Such patterns can be recognized as impossible when we attempt to imagine them in our mind. Ideally, their impossibility would be made explicit by the theory. Finally, the various dimensions in Talmy’s theory are formulated in terms of the notions of rest and motion. As a consequence, the theory Page 6 of 38

Force Dynamics generates redundant versions of the concepts of LET, DESPITE, CAUSE, and PREVENT. Such redundancies might be acceptable if evidence could be offered for the existence (p. 150) of multiple versions of these concepts, but no such evidence is offered. As shown in the next theory, these redundancies disappear when the notions of tendency and result are reformulated as relations between forces and an endstate.

Wolff’s Force Theory In prior work, we have developed a theory of force dynamics that addresses several of the limitations of Talmy’s (1988) theory, as well as extends his theory to handle several phe­ nomena not considered by Talmy (Wolff 2007; Wolff & Barbey, 2015; Wolff, Barbey, & Hausknecht 2010; Wolff & Song, 2003; Wolff & Zettergren, 2002; see also Operskalski & Barbey, Chapter 13 in this volume). The force theory partitions the representation of cau­ sation into two major frameworks: (1) the representation of individual configurations of force, and (2) the representation of chains of configurations of forces. Both of these frameworks are important for explaining the main problem addressed in this chapter of how people select the cause from a large set of causal factors. According to the force theory, individual configurations involve two main entities: a force generator and a force recipient (Wolff, Jeon, Klettke, & Yu, 2010; Wolff, Jeon, & Yu, 2009). We will refer to the force generator as an affector and the force recipient as a patient. The theory proposes that people specify causal relations in terms of configurations of forces in relationship to a vector that specifies the patient’s relationship to an endstate. The end state can be a location in physical space, or state in a state space. It is assumed that people’s representations of a force specify its source, direction, and relative magni­ tudes. Absolute magnitudes are not represented, and as a consequence each configura­ tion of forces contains a certain degree of uncertainty (Wolff, 2007, 2014). For individual configurations, this uncertainty does not have an impact on the categorization of a config­ uration of forces because relative differences in magnitude are enough to distinguish dif­ ferent kinds of causal relations. However, when the configurations are combined, it is ex­ pected that this indeterminacy can have an impact on how a causal chain is represented, as described later in the chapter. Lastly, it is assumed that the force in a configuration may be physical, mental (e.g., intentions), or social (e.g., peer pressure) (Copley & Harley, 2015).

Individual Relations At the level of individual configurations of forces, the force theory predicts four main causal concepts: CAUSE, HELP, PREVENT, and DESPITE. These three concepts can be differentiated with respect to three dimensions: (1) the tendency of the patient for an end state, (2) the presence or absence of concordance between the affector and the patient, and (3) whether the resultant is directed toward the end state. Unlike other theories of causation (e.g., probabilistic theories), the force theory does not require that the result event occur before it can be said that causation has occurred. The dimension of whether the event actually occurs could be associated with a fourth dimension that is represented in terms of the length of the end-state vector, as described later. Table 9.3 summarizes Page 7 of 38

Force Dynamics how the three main dimensions differentiate the concepts of CAUSE, HELP (also ALLOW and ENABLE), PREVENT and DESPITE (also HINDER). When we say, for example, high winds caused the man to move toward the bench, we mean that the patient (the man) had no tendency to move toward the bench (tendency = no), the affector (the wind) acted against the patient (concordance = no), and the resultant of the forces acting on the pa­ tient was directed toward the result of moving toward the bench (end state targeted = yes). A key feature of the force theory is that it specifies how the dimensions of tendency, con­ cordance, and end-state targeting can be represented in non-linguistic, computational terms. According to the theory, these dimensions specify possible configurations of forces. Example instantiations of these configurations are depicted in Figure 9.1, which shows scenes involving some wind, a person, and a bench. The configurations of forces are depicted in two ways. In one of these ways, the forces are placed (p. 151) in the scene next to the entity they are associated with: the large open arrow is associated with the af­ fector, the small open arrow is associated with the patient’s tendency, and the small solid arrow is associated with the resultant vector of the affector and patient forces. The con­ figuration of forces is also depicted as a free-body diagram positioned below each scene. As is customary in free-body diagrams, the forces are shown acting on only one object, the patient. They do not show the location of the affector, only the direction and magni­ tude of the affector’s force on the patient. In the free-body diagrams, the force associated with the affector is labeled “A,” the force associated with the patient is labeled “P,” the re­ sultant vector is labeled “R,” and the end state vector is labeled “E.” Table 9.3 Representations of Several Concepts in the Force Theory Patient Tendency for End State

Affector–Patient Concordance

End State Targeted

CAUSE

No

No

Yes

HELP (also AL­ LOW and EN­ ABLE)

Yes

Yes

Yes

PREVENT

Yes

No

No

DESPITE/HIN­

Yes

No

Yes

DER

Page 8 of 38

Force Dynamics

Figure 9.1 Configurations of forces associated with CAUSE, HELP/ENABLE/ALLOW, and PREVENT; A = the affector force; P = the patient force; R = the re­ sultant force; E = end-state vector, which is a posi­ tion vector, not a force. The images in each configu­ ration depict a scene showing a person, wind, and a bench. In the CAUSE scene, the person’s tendency is to move away from the bench, but is pushed back to the bench by the wind. In the HELP scene, the person’s tendency is to go to the bench and the wind pushes the person in the same direction. In the PRE­ VENT scene, the person’s tendency is to go to the bench, but the wind stops the person from reaching it.

The force associated with the patient, P, can be generated in a number of ways, including from processes that are internal to patient (e.g., movement of muscles) or from position­ ing in a force field. Position in a field can give rise to tendencies such as falling due to gravity or to natural processes, such as “aging,” “ripening,” and “cooling” (Copley & Harley, 2015). The force associated with the patient can also emerge from the patient’s resistance to change due to interactions with other entities, as occurs in the case of fric­ tional forces. In Figure 9.1, the patient’s force corresponds to the force generated by chemical potential energy in the patient that allows it to move its muscles. When the pa­ tient has a tendency for the end state, E, the patient vector, P, points in the same direc­ tion as the end-state vector, E; otherwise, P points in a different direction. When the pa­ tient and the affector are in concordance, their respective vectors point in the same direc­ tion. Finally, the patient entity will target the end state when the result (sum) of the A and P vectors, R, is in the same direction as the end-state vector, E. The end-state vector, E, is a position vector, not a direction vector. Hence, the length of the end-state vector speci­ fies how close the patient is to reaching the end state. Once the patient reaches the end state, the magnitude of the end-state vector becomes zero. The predictions of the force theory have been tested in experiments in which configura­ tions of force associated with CAUSE, ALLOW, PREVENT, and DESPITE, among others, were instantiated in three-dimensional animations generated from a physics simulator. As reported in Wolff (Wolff, 2002; Wolff & Zettergren, 2002), people’s descriptions of these animations closely matched the model’s predictions of how the underlying configurations of force should be classified.

Page 9 of 38

Force Dynamics Causal Chains in Force Dynamics In addition to explaining the representation of individual causal interactions, the force theory also explains how individual relations can be joined to form causal chains and how these chains may then be re-represented as a single overarching causal relation. In the force theory, causal chains are created in one of two ways, depending on whether the chain involves transmission or removal of an actual or possible force. In cases of trans­ mission, the resultant force of a configuration serves as the affector force in a sub­ sesquent configuration of forces. The idea can be explained in a simple causal chain in which marble A hits marble B, which hits marble C, (p. 152) as depicted in Figure 9.2. In force dynamic terms, the transmission of force depicted in Figure 9.2 requires treating the result of the forces associated with marbles A and B as the affector force that acts on marble C.

Figure 9.2 This image depicts a causal chain of mar­ bles in which marble A causes marble B to hit and move marble C. When A hits B, it results in a CAUSE configuration of forces. The curved arrow shows how the resultant of these forces in the first CAUSE inter­ action serves as the affector vector in the following CAUSE interaction between marbles B and C. Note: the curved arrow also represents temporal order.

Whereas some causal chains involve the transmission of force, other causal chains in­ volve the removal of a force (or the non-realization of a possible force). When a chain in­ volves the removal of a force, the manner in which the resultant force becomes an affec­ tor force reverses from the way it occurs in the ordinary transmission of forces. The re­ moval of a force occurs in situations known as double preventions. Consider, for example, a situation in which a force (object or person) knocks out a pole that is holding up a tent so that the tent falls. The pole is preventing the tent from falling, and knocking out the pole prevents this prevention. The ultimate result—the tent falling—is due to the removal of a force, that is, the pole that is keeping the tent up (or preventing it from falling) (Wolff & Barbey, 2015; Wolff, Barbey, & Hausknecht, 2010). A chain depicting force removal is shown in Figure 9.3. In this chain, B represents the pole holding up tent C. When object A knocks out B, B can no longer prevent C from falling, and the tent falls. Notice that when A acts on B, B is already interacting with C. Thus, A acts not only on B, but also on the resultant of the forces associated with B and C. In effect, the resultant of the B and C forces serves as the patient vector force in the in­ teraction between objects A and B. Page 10 of 38

Force Dynamics Double prevention chains are also realized when an affector refrains from applying a force (Pinker 1989; Wolff, Barbey, & Hausknecht, 2010). In such chains, the affector has the ability to prevent the patient from realizing its tendency, but refrains from doing so. In effect, the affector acts on itself (p. 153) to prevent a potential prevention. In Figure 9.3, refraining can be depicted by renaming the B object A', implying a situation in which ob­ ject A acts on itself. Imagine, for example, a tent starting to collapse. Someone inside the tent could entertain the possibility of stopping the collapse by holding up her arm and serving as a pole. Alternatively, she might decide to let the tent fall down. In choosing the second option, she removes a potential prevention from being realized by holding back the preventive force. In Wolff et al. (2010), such unrealized forces are referred to as virtu­ al forces. As shown in Wolff et al. (2010), situations involving virtual forces are described in the same way as situations involving actual forces. People describe situations like the tent scenario with such expressions as the person allowed the tent to fall, or the person’s inaction let the tent fall.

Figure 9.3 This image schematically depicts a pole B preventing a tent C from falling. When object A hits B, B can no longer maintain its prevention of the falling, and as a consequence, the tent falls. As im­ plied by the curved arrow, the preventive relation­ ship between B and C must be in place before A can prevent B. In addition, when A acts on B, A acts not just on B, but on the sum of the B and C forces, which in combination form the tendency vector in the interaction of forces that exists between objects A and B.

A third way in which a double prevention can be realized is through the application of a force that prevents a prevention. Note that this approach to double prevention does not involve the removal of a force or the holding back from applying a force. Rather, in this kind of double prevention, an affector applies a force on the patient that allows the pa­ Page 11 of 38

Force Dynamics tient to avoid a preventive force. This third type of double prevention occurs in the cases of bridges. Bridges can apply a (normal) force on a patient (e.g., a car), which allows the patient to avoid a prevention (e.g., a river). In extending the force theory to causal chains, it becomes possible to address a number of important phenomena. Central to all of these phenomena is the process by which a causal chain can be re-represented as a single causal relation (see Johnson & Ahn, 2015; Johnson & Ahn, Chapter 8 in this volume). A simple example of this re-representation process occurs in cases of transitive causal reasoning. When told, for example, that A causes B and B causes C, people can infer that A causes C, or, to use a more contentful example, when told water causes rusting and rusting causes discoloration, people can in­ fer that water causes discoloration. Whether the chain involves the transfer or removal of a force, the manner in which summary configurations are derived remains the same: specifically, the affector in the summary configuration is the affector from the first config­ uration; the end state is based on the end state of the last configuration; and the patient vector in the summary configuration is the sum of the patient vectors in the component configurations (see Figure 9.4). When this procedure is applied to causal chains involving two causes, the resulting summary configuration is always a CAUSE relation (Wolff & Barbey, 2015). Interestingly, the process of deriving new causal relations can also occur in chains in which the component relations differ, that is, in chains involving different kinds of causal relations, such as causing, allowing, and preventing. For example, when told that A causes B and B prevents C, people usually infer that A prevents C (Barbey & Wolff, 2006, 2007; Goldvarg & Johnson-Laird, 2001; Khemlani et al., 2014; Sloman et al., 2009), or more concretely, when told rain causes humidity and humidity prevents evapo­ ration, people infer that rain prevents evaporation. When causal relations are formed from different kinds of causal relations, the process is not simple transitive reasoning: in­ stead, the reasoning involves a process known as relation composition. Wolff and Barbey (2015) show that the force theory’s approach to relation composition is able to predict people’s responses to various kinds of relation compositions at least as well as other lead­ ing theories of relation composition (Goldvarg & Johnson-Laird, 2001; Sloman et al., 2009).1

Page 12 of 38

Force Dynamics

Figure 9.4 This image exemplifies how two configu­ rations of forces are combined to form a single con­ figuration of forces. A summary configuration is cre­ ated using the affector vector from the initial config­ uration as the affect vector, the vector sum of all of the patient vectors as the patient vector, and the end state from the last configuration as the end-state vec­ tor. Once formed, a resultant vector in the summary configuration is formed by simply summing the pa­ tient and affector vectors. The summary configura­ tion is interpreted like any other configuration. In the chain depicted, a sequence of CAUSE configurations gives rise to another CAUSE configuration.

Causation by Omission The force theory’s ability to predict relation compositions allows it to address the prob­ lem of how causation can occur from omissions or absences. (p. 154) Causation by omis­ sion is causation in which the absence of an influence results in the occurrence of an ef­ fect, as in the lack of light causes depression or the absence of water caused the plant to die. According to a number of researchers, theories like force dynamics cannot account for the phenomenon of causation by omission because such theories require that there be a transmission of energy or force, and clearly no such transfer can occur from an absence (Schaffer, 2000; Schulz, Kushnir, & Gopnik, 2007; Sloman & Lagnado, 2015; Woodward, 2006). In the force theory, however, causation can occur not only when force is transmit­ ted but also when force is removed (or held back). Consider the everyday event of pulling a plug so that water can flow down a drain. Such a situation instantiates a double preven­ tion. First, the plug prevents water from draining, which can be viewed as a pre-existing condition (Cheng & Novick, 1991). Next, an agent removes the plug, thereby removing (i.e., preventing) the prevention of the water from flowing down the drain. Importantly, in cases of double prevention such as this, the situation can always be described in terms of absences. In the case of water flowing down a drain, it can be said that the absence of the plug caused/allowed the water to flow down the drain. Such expressions of causation by omission are possible because double preventions are realized through the removal of a force, which creates an absence. In a series of studies in Wolff et al. (2010), we found evi­ dence for this proposal: when shown animations of double preventions, people endorsed statements stating that the lack of the second entity in a double prevention (e.g., the Page 13 of 38

Force Dynamics plug) allowed or caused the third entity (e.g., water) to undergo a process leading to the result.

Allow Relations Following McGrath (2005), we propose that ALLOW relations are based on double pre­ ventions (Wolff et al., 2010; Wolff & Barbey, 2015). In the simplest case, ALLOW relations involve removing a force that was originally preventing an event from happening. In the water and plug example, we can say that the removal of the plug allowed the water to drain. The notion of ALLOW is related to the notion of HELP. As discussed earlier, a HELP configuration is one in which the affector force is concordant with the patient force, as implied in sentences such as the Boy Scout helped the grandmother cross the road. AL­ LOW relations also imply concordance between the affector and the patient when ana­ lyzed with respect to the resulting summary configuration. The underlying prevent rela­ tions in ALLOW entail that the affector’s influence is necessary for the occurrence of the effect. The affector’s influence is necessary because in a double prevention, the occur­ rence of the final result is in some way blocked, and removal of that blockage depends on the affector. This account of ALLOW was supported in a set of studies described in Wolff et al. (2010). Participants viewed animations instantiating double preventions and, as pre­ dicted, they endorsed statements asserting that the first entity in the double prevention allowed the last entity in the double prevention to undergo a certain result. In Wolff et al. (2010), it was also shown that the semantics of the verb enable is much the same as those of allow, suggesting that there may be a set of verbs based on double prevention, includ­ ing the verbs allow, enable, let, and permit. As discussed earlier, double prevention can be instantiated in multiple ways. In addition to the removal of a force, a double prevention can be instantiated by an affector refrain­ ing from applying a preventive force (e.g., standing aside to let someone pass) or by the application of a force that prevents a preventive force from being experienced by the pa­ tient (e.g., a bridge exerting a force on a traveler so that he does not experience the pre­ ventive force of, for example, the river). Regardless of how the double prevention is real­ ized, the relationship can still be viewed as one of allowing, enabling, or letting.

Relationship Between Talmy’s Force Dynamics and Wolff’s Force Theory The force theory differs from Talmy’s account of force dynamics in several ways. First, the force theory uses the dimension of concordance between the affector and patient, whereas Talmy’s theory generally restricts force interactions to those in which the affec­ tor and patient forces oppose one another. Whereas the force theory postulates that the concept of HELP involves concordance between the affector and the patient, Talmy’s force dynamics holds that the concept of HELP involves extended disengagement of the antagonist. As such, Talmy’s theory seems to miss the sense of agreement present in the meaning of verbs like help. Second, in Talmy’s theory, the concept of LET involves the re­ moval of a force via the removal of the antagonist. In the force theory, the affector (or an­ tagonist) removes (p. 155) some other entity from the situation (sometimes the affector it­ self), thereby removing a force from the patient (or agonist). According to the force theo­ Page 14 of 38

Force Dynamics ry, LET necessarily involves at least two configurations of forces, and hence involves a causal chain. In Talmy’s theory, letting does not require a causal chain, but rather in­ volves a change in the nature of the interaction between the antagonist and agonist over time (i.e., a change-in-state). According to the force theory, the CAUSE configurations may be simpler and more direct than LET configurations, whereas in the Talmy’s theory, CAUSE events are no less complex or indirect than LET events. Third, the force theory postulates a set of dimensions that explains how the notions of CAUSE, HELP, and PRE­ VENT are related to one another. In Talmy’s force dynamics, these notions are represent­ ed by different sets of dimensions, obscuring their similarities and differences. Third, Talmy defines the dimensions of tendency and result with respect to rest and motion, which leads to multiple versions of several causal concepts (one for rest and the other for motion). In the force theory, tendency and result are defined with respect to a location in space, and as a consequence, redundant versions of various causal concepts do not arise. Fourth, the force theory is explicit enough to be computational, and as a consequence, it is able to explain why certain parameterizations of the dimensions are never realized: certain configurations of forces are never realized because they are ruled out by vector addition. The force theory also allows for the instantiation of the various configurations of force in a physics simulator, hence showing how different causal concepts are grounded in the physical world (see Wolff, 2007). Lastly, the force theory makes explicit how individ­ ual configurations of forces can be combined to form causal chains, which allows the the­ ory to explain relation composition, causation by omission, and the concept of ALLOW.

Copley and Harley’s (2015) Force-Theoretic Model In linguistics, it has been hypothesized that the concept of CAUSE might be essential in the argument-structure properties of various classes of verbs (Hale & Keyser, 1993; Marantz, 1997; Van Valin, 1990). Consider, for example, the expression Peter melted the butter. The expression can be paraphrased as Peter caused the butter to melt, implying that the expression encodes the notion of CAUSE. Such Vendlerian accomplishments are usually analyzed as composed of two subevents connected by a causal relation. The subevents include a causing subevent e1, Peter’s actions, and a result subevent e2, the melting of the butter (Dowty, 1979; Pustejovsky, 1991). In formal semantics, such expres­ sions can be represented as ∃e1∃e2: e1 CAUSE e2. Such formal representations work well in the case of English because in English accomplishments entail the occurrence of the fi­ nal result (or event). However, in many other languages, accomplishments do not entail the final result, giving rise to the phenomenon of non-culminating accomplishments. For example, in Karachay-Balkar, a Turkic language spoken in Russia, it is possible to say Ker­ im opened the door, but he did not succeed, and in the Salish language St'át'imcets it is possible to say I made the basket, but it didn’t get finished (Copley & Harley, 2015). A re­ lated phenomenon occurs in English with the verbs enable, allow, and let. To say that John allowed Susan to mow the lawn implies that the final result probably occurred, but it is not strictly entailed, as evidenced by the acceptability of the sentence John allowed Susan to mow the lawn, but the lawn wasn’t mowed. In Jackendoff’s (1991) system of force dy­ namics, verbs such as allow specify that the outcome is undetermined. In languages other Page 15 of 38

Force Dynamics than English, an undetermined outcome holds not only for verbs like allow, but also for verbs like cause. Non-culminating accomplishments appear to be quite common across the world’s languages (Copley & Harley, 2015; Copley & Wolff, 2014). Assuming the no­ tion of causation is handled in similar ways across languages, the occurrence of non-cul­ minating accomplishments suggests that the traditional approach to how the argument structure of accomplishments is represented might need to be changed. As discussed earlier, in the force theory, a CAUSE configuration of forces can exist with­ out the occurrence of a result. Copley and Harley (2015) use this property of forces to de­ velop a new theory of argument-structure representation that is able to handle the phe­ nomenon of non-culminating accomplishments. The key ideas in their proposal are the re­ placement of events with situations and the use of forces as functions that connect an ini­ tial situation to a final situation, assuming the absence of external interventions. In Cop­ ley and Harley’s (2015) force-theoretic model, forces have their origins in all of the indi­ viduals and properties in a situation. As such, Copley and Harley’s (2015) theory is well (p. 156) suited for explaining causal relationships involving ambient causes and effects, such as low interests rates cause inflation. In the force-theoretic model, causal chains are represented through the repeated application of net forces on different situations, as de­ picted in Figure 9.5. One major difference between the force-theoretic model and Talmy’s model is that in the force-theoretic model, the notion of tendency is not represented. While the use of a net force allows the model to handle more abstract kinds of causation, the absence of the notion of tendency means that the model is unable to distinguish sev­ eral kinds of causal notions, such as the difference between CAUSE and HELP.

Figure 9.5 Gärdenfors’s (2014) two-vector model. The force-theoretic model represents causal chains composed of a sequence of net forces (f0, f1, f2) that bring about a sequence of situations (s0, s1, s2).

While the force-theoretic model fails to make certain distinctions, its generality allows it to explain the argument-structure properties of a number of verb categories. In addition, the model might serve well as an account of how force dynamic representations emerge developmentally in children. In particular, Göksun, George, Hirsh-Pasek, & Golinkoff (2013) found that 3.5- to 4.5-year-olds were relatively good at representing configurations of force associated with CAUSE, but they struggled with ENABLE and PREVENT. Göksun et al.’s experimental design allowed them to determine that young preschoolers have dif­ ficulty representing multiple forces. It was not until 5.5 years of age that children were able to represent the full range of two-force configurations associated with CAUSE, EN­ Page 16 of 38

Force Dynamics ABLE, and PREVENT. Göksun et al.’s findings are consistent with the possibility that force dynamics in young children may not represent the tendency of the patient and in­ stead might involve just one force. In support of this phenomenon, Bowerman (1982) found that children often confuse the verbs associated with causing and letting.

Gärdenfors (2014) Two-vector model Verbs specify only certain aspects of an event or state (Jackendoff, 1990; Pinker, 1989; Wolff, 2003, 2012; Wolff & Malt, 2010), and this selectivity often forms identifiable pat­ terns. One pattern that has been observed is that verb meanings appear to specify the no­ tion of either manner or result, but not both, a phenomenon referred to as manner/result complementarity (Levin & Rappaport Hovav, 2011; Rappaport Hovav & Levin, 2010). Con­ sider, for example, the event depicted in Figure 9.6, in which a suitcase is pulled over a line by a person. It could be said that the suitcase crossed the line or the suitcase rolled. Interestingly, there is no single verb in English that codes for both crossing and rolling. Gärdenfors and colleagues (Gärdenfors, 2014; see also Warglien, Gärdenfors, & Westera, 2012) have de­ veloped a force-dynamic account of why manner and path components do not appear to­ gether in the meaning of a verb. According to their two-vector model, verb meanings are based on vectors. However, a verb can encode multiple vectors only if they are from the same domain, a restriction they refer to as the “single-domain constraint” (Gärdenfors, 2014; Warglien et al., 2012). Gärdenfors defines a domain (p. 157) as a set of dimensions that are integral rather than separable. Dimensions are integral when having a value on one dimension entails a value on the other dimension (Nelson, 1993). For example, pitch and loudness are integral because if a sound has a pitch, it must also have loudness. Pitch and hue are not integral because a value on one does not entail a value on the other. Turning to force and result vectors, such vectors come from different domains because they are based on separable dimensions. Thus, according to the two-vector model, verbs encode either a result vector or a force vector, but not both.

Figure 9.6 In this scene, a suitcase is pulled over a line.

There are several similarities between the two-vector model and the force theory (Wolff, 2012). In particular, both models propose that verb meanings are based on vectors, and both propose force vectors and result vectors. There are also several ways in which these models differ. In the two-vector model, the result vector indicates whether a result oc­ curs, whereas in the force theory, the result vector (or resultant vector) indicates whether Page 17 of 38

Force Dynamics the patient will move toward the end state, but not whether the patient ultimately reach­ es the end state. In the force theory, whether the patient reaches the end state is coded in the length of the end-state vector. A second way the two models differ concerns the kind of phenomena the two models seek to explain. The two-vector model offers an account of the difference between manner and result verbs and of manner/result complementarity. The force theory, in contrast, offers an account of the meaning of a particular class of verbs and prepositions associated with the expression of causation and related notions. Given these differences in emphasis, the models can be viewed as at least partially com­ plementary. However, there is at least one way in which the assumptions of the two-vec­ tor model conflict with those of the force theory. The two-vector model rules out combinations of vectors from different domains. This property of the two-vector model would appear to rule out the kinds of representations hypothesized in the force theory. According to the force theory, verb meanings are based on configurations of vectors that combine force vectors with position vectors (for coding an end state). The two-vector model rules out such configurations because force and posi­ tion vectors come from different domains. We would argue that the single-domain constraint is too restrictive. Without the ability to relate force vectors to position vectors, it is not possible to express differences in mean­ ing between various kinds of causal verbs, such as CAUSE and PREVENT, which, accord­ ing to the force theory, differ only in the direction of their end-state vector. Such a con­ straint also rules out the ability to distinguish verbs like push and pull. Clearly, the verbs push and pull code for force. According to Zwarts’s (2010) vector analysis of these verbs, the information that differentiates the meaning of these verbs is in the direction of the force vectors relative to a spatial relation between the agent and the patient. In both push and pull, the agent’s force on the suitcase is exactly the same. What differs is that in pushing, the force is directed away from the agent, whereas in pulling the force is direct­ ed toward the agent (Zwarts, 2010). In the force theory, this critical difference between the verbs would be specified in the end-state vector (Wolff, 2012). Zwarts’s analysis of several other verb pairs (e.g., squeeze and stretch, lean and hang) suggests that the meanings of these verbs are based not only on forces, but also on position vectors that specify the spatial direction of those forces. While the single-domain constraint appears to be too restrictive, it may be possible to reformulate the constraint such that it is possi­ ble to explain manner/result complementarity without undermining the ability to repre­ sent the meanings of various force dynamic verbs. According to the force theory and Talmy’s theory of force dynamics, all of the forces are understood with respect to the pa­ tient: the patient serves as the origin for all of the forces in the interaction. Manner verbs, in contrast, typically specify forces acting on the affector. Manner/result comple­ mentarity might result from a constraint in which verbs can only specify forces with re­ spect to a single actor in a situation, either the affector or the patient.

Page 18 of 38

Force Dynamics

Mumford and Anjum’s (2011) Dispositional Theory of Causation One of the key concepts in force dynamics is the notion of tendency. A tendency can be thought of as a property of an object that grants it a disposition. Tendencies or disposi­ tions factor into the realization of an effect. The basic idea has its origins in Aristotle’s causal powers approach to causation (Wolff & Shepard, 2013). Aristotle emphasized that in causal interactions, both the agent and patient had causal powers. The agent had the ability to transmit a causal power, and the patient the capacity to receive a change (Mar­ modoro, 2007). A recent version of this approach to causation is reflected in Mumford and Anjum’s (2009, 2011; Anjum & Mumford, 2010; see also Harré and Madden, 1975) theory of causal dispositionalism (for review, see Waldmann & Mayrhofer, 2016). In Mumford and Anjum’s theory, objects have causal powers by virtue of their properties. For example, putting a penny on a scale causes the needle of the scale to change. It changes because the penny contains the property of weight or mass. It also possesses several other properties like shape and color, but these properties have no causal impact on the scale. Adding sriracha sauce to one’s rice noodles causes them to taste hot. The chili peppers and salt in sriracha give it the property of spiciness, and it is (p. 158)

this property that brings about a certain taste, not the sauce’s red color or thick consis­ tency. Objects roll due to their spherical shape and stay put due to their flat sides. Some objects have the power to hold liquids due to their concave shapes. According to Mum­ ford and Anjum (2009, 2011), objects have dispositions that manifest themselves in prop­ erties, like fragility, weight, warmth, smoothness, and momentum. In Mumford and Anjum’s (2009, 2011), theory, just as in Aristotle’s theory, both agents, like the penny, and patients, like the scale, are endowed with causal powers. Mumford and Anjum suggest that causal powers can be modeled as vectors. Just like vectors, causal powers have a direction and a magnitude. In addition, vectors can be added to­ gether to give rise to a resultant vector. An example of Mumford and Anjum’s approach is shown in Figure 9.7. The starting point for the situation is a vertical line drawn in what is referred to as a quality space. When the resultant vector reaches a threshold, a result oc­ curs.

Page 19 of 38

Force Dynamics

Figure 9.7 The addition of different causal powers (e.g., powers associated with lightning, rain, oxygen, and matches) combine to form a resultant (R) causal power with a great enough magnitude to surpass a threshold (T), triggering an effect (e.g., fire).

Mumford and Anjum’s (2009, 2011) causal disposition theory of causation differs in sever­ al ways from the theories based on force dynamics. First, the vectors in Mumford and Anjum’s theory are causal powers rather than forces. Second, the theory does not differ­ entiate between agents and patients. The cause of a ball’s rolling could just as well be due to its spherical shape as to the cue stick hitting it: all such objects in the situation have causal powers that can be added together to give rise to a resultant causal power. Another difference is that effects are triggered only when the resultant reaches a thresh­ old. The notion of a threshold is not present in theories based on force dynamics. Despite these differences, there are also several commonalities between Mumford and Anjum’s theory and theories based on force dynamics. The most significant commonality is the idea that causal interactions depend on tendencies, or in Mumford and Anjum’s the­ ory, dispositions. A second commonality is the use of vectors in the modeling of causal in­ fluences. In Mumford and Anjum’s theory, the vectors represent causal powers, which cover a wider range of causal influences than forces, but clearly, the way the notion of force is used in several theories of force dynamics overlaps with the notion of causal pow­ er as used in Mumford and Anjum’s theory. Finally, theories of force dynamics and the causal disposition theory both hold that effects follow from the summation of causal influ­ ences in a situation. While there are several commonalities between force dynamics and Mumford and Anjum’s account, force dynamics makes the ontological commitment that the causal pow­ ers involved in causal relations are forces. One line of evidence for this claim comes from perceptual priming experiments. Wolff and his colleagues (Wolff, Ritter, & Holmes, 2014; Page 20 of 38

Force Dynamics Wolff & Shepard, 2013) showed participants animations involving either causal interac­ tions (e.g., a collision of two marbles) or non-causal interactions. While viewing the ani­ mations, participants held a haptic controller that sometimes created a small force against the participant’s hand. The authors found that participants detected physical forces against their hand more rapidly when viewing causal than when viewing noncausal animations (i.e., that viewing causes primes detecting physical forces). This find­ ing lends support to the claim that causal powers may be represented specifically in terms of forces. While Mumford and Anjum’s disposition theory is well suited for capturing intuitions about the (p. 159) complexity of causal interactions, with all of the factors that enter into them, Mumford and Anjum also argue that their theory is able to handle probabilistic cau­ sation, causal chains, and various phenomena associated with the perception of causation (Mumford & Anjum, 2011). However, as explained in Mumford and Anjum (2011), the the­ ory has difficulty with causation by omission. In addition, since it does not differentiate causal roles, like affector and patient, the theory is unable to distinguish different kinds of causal relations. It arguably makes up for these limitations in its detailed account of the notion of dispositions.

Pinker’s Theory of Force Dynamic Relations (1989) In Talmy’s 1988 theory of force dynamics, the distinction between steady-state or extend­ ed causation (e.g., the wind kept the tumbleweed rolling) and “onset causation” (e.g., the wind caused the tumbleweed to start rolling) is emphasized. In Wolff’s force theory, it is not. Indeed, in Wolff’s force theory there is no explicit device for indicating whether a change in state or location has occurred, only whether the resultant vector has targeted an end state. It certainly would be possible to augment the force theory with the ability to specify changes in state or location. In particular, a telicity dimension could be added to the model by use of the end-state vector. If the end-state vector changes length, it could be said a change has occurred, and if the end-state vector changes to a length of zero, it could be said that a particular end state was reached. Such a vector would be much the same as the result vector proposed in Gärdenfors’s (2014) two-vector model. This dimen­ sion was not added to the force theory because the assumption was that various concepts of CAUSE do not entail a change in location or state. Indeed, it is because the force theo­ ry allows for causation without the reaching of an end state that it is able to handle the phenomenon of non-culminating accomplishments, as discussed in reference to Copley and Harley’s force-theoretic model. Steady-state causation illustrates a closely related phenomenon. Consider the situations described in the sentences in (1). 1 a. Air particles cause a balloon to remain inflated. b. Fear caused them remain motionless. c. Flooding caused the museums to stay closed. d. Small ridges cause water to stand on the concrete.

Page 21 of 38

Force Dynamics e. Keels cause sailboats to stay upright. In each of the situations described in (1), nothing happens. There is no regular sequence of events, overt transfer of conserved quantities, or change in state, and yet the situa­ tions can still be construed of as causal. What is true of each of these situations is that they instantiate a configuration of forces. From a force dynamic point of view, it is this configuration of forces that makes them causal, even in the absence of any change. The ability to represent steady-state situations is very nicely explained in Pinker’s (1989) force dynamic account of causal relations. Pinker’s (1989) inventory of possible causal links is shown in Table 9.4. Pinker proposes that the different kinds of causal links are de­ composed with respect to four features. The feature of focus concerns whether the link emphasizes the cause or the effect in the causal relation. Causal relationships that em­ phasize the effect are realized in expressions using subordinating conjunctions, including because, despite, after, and when (Wolff, Klettke, Ventura, & Song, 2005). In the sen­ tences in (2), for example, the main clause expresses an effect and the subordinated clause expresses the cause. In other expressions of causation, as referred to by the mnemonics “effect,” “but,” “let,” and “prevent” in Table 9.4, the effect is subordinated to the cause. 2 a. Jerry missed his flight because the taxi got lost. b. He died soon after, despite receiving the best possible care. A second dimension proposed by Pinker is potency. Potency is a success when the antago­ nist succeeds in exerting its effect over the agonist; otherwise it is a failure. Examples in which potency is a failure include sentences using the preposition despite, as in Ralph bought the TV despite the wishes of his wife. In this scenario, the affector, his wife, is not successful in blocking the patient, Ralph, from buying a TV. Another type of scenario where potency is a failure occurs in sentences containing the conjunction but, which im­ ply the notion of trying without success, as in Terry pushed against the truck, but was un­ able to make it move. In this scenario, the affector, Terry, exerts force on the patient, the truck, but the desired outcome does not occur. The key difference between “despite” and “but” scenarios is that in “despite” scenarios, the focus is on the effect, whereas in “but” scenarios, the focus is on the cause, as is reflected in whether the cause or effect appear as the subject of the sentence. In Pinker’s (1989) proposal, the common notion underlying the verbs enable, permit, and allow is the idea of the affector (p. 160) ceasing to engage with the patient, and as a consequence, something happening to the patient. Pinker refers to this category of causal relations with the mnemonic “let.” In “let” scenarios, the cause occurrence dimension is “no,” while in in the remaining scenarios, the cause occur­ rence dimension is “yes.” Prototypical causation is reflected in situations in which the fo­ cus is on the cause, potency is successful, the affector engages with the patient, and the final effect occurs. Pinker (1989) refers to this kind of relation with the mnemonic “ef­ fect.” In all of the interactions described so far, the type of causation has been onset cau­ sation. Pinker’s model expresses steady-state causation by focusing on situations in which Page 22 of 38

Force Dynamics the focus is on the cause and the potency is successful, but it is successful in the sense that a final effect does not occur. Such situations are reflected in the definition of verbs like support, keep, suspend, and occupy. Such situations describe an affector continuous­ ly exerting a force on an agonist such that a particular effect, like falling, is prevented from occurring. It is for this reason that Pinker refers to these steady-state situations with the mnemonic “prevent.”

Page 23 of 38

Force Dynamics Table 9.4 Inventory of Different Types of Causal Links (Pinker, 1989) Mnemonics for Types of Links

Focus

Potency

Cause Occur­ rence

Effect Occur­ rence

effect

Cause

Success

Yes

Yes

because

Effect

Success

Yes

Yes

despite

Effect

Failure

Yes

No

but

Cause

Failure

Yes

No

let

Cause

Success

No

Yes

prevent

Cause

Success

Yes

No

Onset

Steady-state

Page 24 of 38

Force Dynamics Pinker’s (1989) account of force dynamic relations overlaps with the proposals of Talmy and Wolff. In particular, all three accounts code for the occurrence of the effect, although in Wolff’s force theory, the occurrence of the effect is only implied, rather than strictly en­ tailed. Both Talmy’s and Pinker’s accounts distinguish steady-state from onset causation, whereas force theory does not. Talmy’s theory allows for several different kinds of steadystate scenarios, whereas Pinker’s account specifies only one kind of steady-state. In the force theory, letting scenarios are necessarily more complex than the other force dynamic relations because they necessarily involve at least two stages. This difference in complexi­ ty is not explicitly captured in Pinker’s feature set. Pinker’s (1989) potency dimension is unique to his proposal. Interestingly, it appears that this dimension can be captured Wolff’s force theory by coding for whether the affector force is concordant with the resul­ tant force. Pinker’s theory goes further than Talmy’s and the force theory in its specifica­ tion of the notion of trying. However, as described earlier, it may be possible to capture the notion in the force theory with the endstate vector. One problem with Pinker’s theory is that it does not seem to have a representation for an onset version of PREVENT. If the representation for “prevent” is used to specify onset PREVENT, Pinker’s theory is no longer able to distinguish between steady-state and onset causation. Pinker’s focus di­ mension is also unique to his proposal. In Talmy’s account and the force theory, the focus is always on the patient (or agonist). The focus dimension allows Pinker’s theory to ac­ count for a difference we see in the syntax of different causal expressions, namely whether the cause is subordinated to the effect or vice versa.

Commonality Between the Different Theories of Force Dynamics The theories of force dynamics differ markedly, but they also share some deep commonal­ ities. Most of these theories distinguish a relatively wide range of causal concepts, with the concept of CAUSE (p. 161) being just one member of a family of concepts. A second deep commonality is that they all have built into them the roles of force creators and force recipients (see Beavers, 2011; Rappaport Hovav & Levin, 2001). Finally, all of the theories of force dynamics offer at least an intuitive account of the notion of mechanism. Each of these commonalities appears to be essential to explaining how people identify the one factor in a situation that is identified as the cause of an effect.

Causal Selection Problem From a force dynamic perspective, determining the cause of an event depends on at least three properties of that event.

Causes Versus Enabling Conditions The first is the nature of the relationship between a candidate causal factor and the ef­ fect. In order for a factor to be the primary cause of an event, it must be a cause rather than enabling condition. Consider, for example, a situation in which a person drops a match in dry grass and the result is a forest fire. One of the factors in this situation would Page 25 of 38

Force Dynamics be the match, another the oxygen in the air. Both the match and oxygen are necessary conditions for the fire, because without either, the fire would not occur. However, the match is more readily construed as the cause of the forest fire, and oxygen is more easily construed as an enabling condition. As many have noted, the difference between the match and oxygen cannot be explained in terms of necessity or sufficiency (Cheng & Novick, 1991, 1992; Einhorn & Hogarth, 1986; Goldvarg & Johnson-Laird, 2001). As dis­ cussed earlier, force dynamic theories provide an account of the difference between caus­ es and enabling conditions. According to the force theory in particular, enabling condi­ tions are based on double preventions. As already discussed, there are several ways in which double preventions can be realized: they can occur from the removal of a force, the refraining of application of a force, and the application of a force that prevents a preven­ tion from being realized. In the case of oxygen, the double prevention seems to be based on the application of a force that prevents a prevention from being realized. If oxygen were not present, the match would be prevented from igniting. There have been other accounts of the difference between causes and enabling condi­ tions. For example, in Cheng and Novick’s (1991, 1992) probabilistic contrast model, causal relation are based on covariation observed within a “focal set” of events (rather than the universal set of events). Causal relations are associated with positive covaria­ tion, as indicated by the probability of the effect in the presence of the candidate cause, P(E|C), being noticeably greater than the probability of the absence of the cause, P(E| ¬C). In contrast, an enabling condition is inferred for causal factors that are constantly present in the reasoner’s focal set (making P(E|¬C) undefined), but covary positively with the effect in another focal set. Oxygen, for example, is constantly present in most situa­ tions, but if it were varied, the effect of fire would vary with its presence. From the point of view of force dynamics, Cheng and Novick’s account of enabling conditions makes sense. As discussed earlier, many, if not most, double preventions have pre-conditions: a preventive force needs to be present before it can be removed. Force dynamic theories therefore predict (albeit weakly) that enabling conditions will tend to have pre-existing conditions. Because pre-existing conditions can last, they may be perceived as constan­ cies in a situation, and therefore may accord with the main claim of Cheng and Novick’s account of enabling conditions.

Force Creators One property of causal situations that is uniquely predicted by all force dynamic theories of causation is that the primary cause of an event must be a force creator (Wolff, Jeon, Klette, & Li, 2010; Wolff, Jeon, & Li, 2009). Consider, for example, the sentences in (3). 3 a. The chef smashed the hot potato. b. The knife cut the hot potato. c. The fork lifted the hot potato.

Page 26 of 38

Force Dynamics In English, both (3a) and (3b) are acceptable descriptions of causal events, but (3c) is not. From the point of view of many current theories of causation, this difference in accept­ ability is hard to explain. For example, as discussed in reference to Cheng and Novick’s (1991, 1992) probabilistic contrast model, a cause is a factor that increases the likelihood of an effect, that is, that makes P(E|C) greater than P(E|¬C). Arguably all of the affectors used in (3) increase the likelihood of the effects referred to in the sentences. The unac­ ceptability of (3c) cannot be explained as due to the fact that the named affector is inani­ mate. Sentence (3b) shows that inanimate entities can sound perfectly fine as causers, at least in languages like English ( (p. 162) Wolff, Jeon, Klette, & Li, 2010; Wolff, Jeon, & Li, 2009). In addition, the unacceptability of (3c) cannot be explained as due to the affector being causally unrelated to the effect. As shown in the sentences in (4), both knives and forks can serve as instruments in sentences describing causal chains. 4 a. The cook cut the hot potato with a knife. b. The cook lifted the hot potato with a fork. According to Wolff et al. (2010), the reason that the sentences in (3a) and (3b) sound ac­ ceptable while the sentence in (3c) is not acceptable is because the affectors in (3a) and (3b) can be construed of as force creators, whereas the affector in (3c) cannot. In the sen­ tence in 3b, the knife creates a force by pushing the parts of the potato apart, while in 3C the force is really not created by the fork but by the person controlling the fork. The sen­ tences exemplify the hypothesis that in order for a causal factor to be the primary cause of a situation, it must be a force creator. There are several ways in which forces can be created (Wolff, Jeon, Klette, & Li, 2010). First, forces can be created through energy conversion, that is, when energy is trans­ formed from one form to another (Young & Freedman, 1999). For example, the forces in­ volved in lifting one’s hand into the air or pushing it down to smash a potato begin with a transformation of potential energy, in the form of chemical potential energy, into motion, kinetic energy. Energy transformation can also occur in inanimate entities. In internal combustion engines, energy conversion occurs when chemical potential energy in gaso­ line is transformed into kinetic energy. Entities that generate forces from energy transfor­ mation—intentional agents, natural forces, power devices—seem to make good causers as indicated by the finding that they are almost always construable as causers across a wide range of languages (Wolff et al., 2010). Mayrhofer and Waldmann (2014) found empirical support for the hypothesis that the entities that make good causers are those that can generate their own force. In their experiment, participants saw collision events involving two balls. However, prior to the collision, the balls moved in ways intended to imply dif­ ferent degrees of agentivity (e.g., self-propelled motion). Mayrhofer and Waldmann found that increasing the first object’s agentivity resulted in participants being more willing to describe it as causing the motion of the second object. White (2006) obtained a similar finding in discovering that participants were more willing to say that an object X caused an object Y to move when object X moved before object Y (but see Hubbard & Ruppel, 2013). The importance of the affector being a force generator is implied in studies exam­ Page 27 of 38

Force Dynamics ining the role of intention and causality (see Wolff, 2003). The prototype of a force gener­ ator is most likely an intentional entity capable of initiating its own forces. In support of this hypothesis, Muentener and Lakusta (2011) found that children described more events as causal when they were (a) intended versus (b) unintended or (c) caused by an inani­ mate object. Muentener and Lakusta (2011) propose that in producing and comprehend­ ing language, children may have an intention-to-CAUSE bias.2 In sum, evidence from mul­ tiple sources converges on the conclusion that entities that are able to create forces through energy conversion are readily viewed as potential causes of an event. However, energy conversion is not the only way in which forces can be generated. A second way in which a force can be created is through physical contact. When an object hits another object, it imparts a force. Crucially, the imparted force does not exist until the moment of a collision. We know that the force does not exist prior to the impact be­ cause the properties of the force depend on the properties of the object that is hit. For ex­ ample, a car that hits a balloon will impart less force on that entity than a car that hits a bowling ball. Forces are quantities that are created at the moment of interaction. This property about forces overlaps with findings concerning the perception of causation. Since the early work of Michotte (1963), it has been noted that physical contact can have an impact on people’s impressions of causation. The main overall finding is that the im­ pression of causation is weakened in the absence of physical contact (Hubbard, 2013; Scholl & Tremoulet, 2000; White, 2014). The significance of physical contact on the im­ pression of causation has been observed not only in adults (e.g., White, 2011; Yela, 1952), but also in very young infants (Cohen & Amsel, 1998; Leslie & Keeble, 1987; Newman et al., 2008). Related phenomena include the findings that putting a “tool” in a spatial gap makes the event seem more causal (Hubbard & Favretto, 2003; Young & Falmier, 2008) and the finding that the impression of causation is weakened as the spatial overlap be­ tween two (p. 163) objects grows (Rolfs, Dambacher, & Cavanagh, 2013; Scholl & Nakaya­ ma, 2002, 2004). This last finding implies that the impression of causation is exquisitely sensitive to the edges of the interacting objects, not just to their overall physical distance. From a force dynamic perspective, physical contact is not necessary for causation be­ cause not all forces depend on physical contact (e.g., social forces), but physical contact is a common way in which forces can be created, and hence is predicted to serve as a valuable cue to causation (Wolff, 2008; Wolff & Shepard, 2013).

Force Redirection A third and final way in which forces can be created is through force redirection (Wolff, Jeon, Klettke, & Li, 2010). Force redirection occurs in the use of simple machines, such as levers, pulleys, inclined plans, wedges, screws, and wheels and axles (Cotterell & Kam­ minga, 1990). The notion of force redirection offers an account of why certain instru­ ments such as knives can be viewed as causers, as exemplified in the sentence in (3b). A single force vector has but one direction and magnitude. Any change to its direction or magnitude constitutes a new force. To see how forces might be created from a simple ma­ chine, consider the case of knife. A knife is a wedge. As such, it operates by converting a force applied in the direction of one edge into forces that are perpendicular to the applied Page 28 of 38

Force Dynamics force, as depicted in Figure 9.8. Thus, when someone cuts a cake or loaf of bread, the knife, in effect, creates two new forces perpendicular to the direction of the force from the agent.

Figure 9.8 A downward force acting as a wedge (e.g., a knife) is redirected to create two new forces that are perpendicular to the original force.

When determining which factors in a situation are causes, people may focus on the ele­ ments of the scene that are creating new forces. Force creation through redirection may explain why the sentences in (5) are acceptable. 5 a. The key opened the door. b. The knife cut the bread. c. The axe split the log. d. The diamond scratched the glass. Crucially, not all instruments make acceptable causers in English, as shown in (6). The reason may be because they are not construed as creating a force through redirection (Wolff, Jeon, Klettke, & Li, 2010). 6 a. The snow shovel moved the snow. b. The fork lifted the potato. c. The spatula flipped the pancake. d. The broom cleaned the room. In Wolff, Jeon, and Li (2009), we provided an initial test of this hypothesis by having par­ ticipants rate sentences like those listed in (5) and (6) with respect to the affector’s abili­ ty to generate its own energy. As predicted, causal sentences with high energy creation Page 29 of 38

Force Dynamics affectors were rated as more acceptable than causal sentences with low energy creation affectors. The results provide further evidence for the view that people choose the cause of a situation by identifying the causal factors that can not only be construed as causes (as opposed to enabling conditions), but also can be construed as energy or force cre­ ators.

Mechanism A number of studies have shown that in attempting to identify the cause of an event, peo­ ple try to find the means or mechanism by which a candidate cause is able to have its ef­ fect (Ahn & Bailenson, 1996; Ahn & Kalish, 2000; Ahn, Kalish, Medin, & Gelman, 1995; Johnson & Ahn, Chapter 8 in this volume). The question of whether people attend to mechanism is important because it raises problems for many major theories of causation. According to dependency theories, a cause is a factor that makes a difference to the ef­ fect. In single events, dependency theories are expressed in the form of counterfactuals: A causes B if and only if both A and B occur, and if A had not occurred, B would not have (p. 164) occurred (Sloman & Lagnado, 2014). According to process theories, of which force dynamics is an example, A causes B only if there is a mechanism by which A can have an influence on B. Walsh and Sloman (2011) provided a recent further test of the importance of mechanism. In their experiments, participants read descriptions of events in which in some cases there was an “interrupted mechanism”—for example, Frank kicks a ball, Sam moves out of its path, and the ball smashes a window. Dependency theories treat interrupted mecha­ nisms as causes, since Sam’s moving out of the path of the ball makes a difference as to whether the window breaks. Process theories, on the other hand, do not treat interrupted mechanisms as causes because Sam did not transmit any force to the ball. Consistent with the predictions of process theories, participants were more likely to rate generative factors as causes (e.g., Frank kicking the ball) than they were to rate interrupted mecha­ nisms as causes (e.g., Sam stepping out of the ball’s path). Further evidence for the im­ portance of mechanism is the many studies showing that temporal cues, and in particular temporal contiguity, have a major impact on people’s judgments of causation (Greville and Buehner, 2010; Lagnado and Sloman, 2006; Lagnado, Waldmann, Hagmayer, & Slo­ man, 2007; McCormack, Frosch, Patrick, & Lagnado, 2014; Rottman, Kominsky, & Keil 2014; Shanks, Pearson, & Dickson, 1989; White & Milne, 1997). For process theories, the importance of mechanism can be both motivated by and ex­ plained in terms of forces (Wolff, 2007). With respect to motivation, a force dynamic per­ spective encourages a local level of granularity on the analysis of causal relationships. The local nature of causal connections implies that when there is a causal connection be­ tween non-contiguous events, a reasoner assumes (in the case of physical causation) that there must be a causal chain of intermediate links to explain how forces might be trans­ mitted or removed to bring about an effect. While a reasoner may make this assumption, in practice they usually will not know exactly how the progression of forces actually oc­ Page 30 of 38

Force Dynamics curs. As argued by Keil and colleagues (Rozenblit & Keil 2002), people often feel as if they understand how everyday objects operate, but when they are asked to specify these operations, it becomes clear that they have little knowledge of the underlying mecha­ nisms. Keil and his colleagues refer to this phenomenon as the “illusion of explanatory depth.” In terms of specification, the force theory in particular provides an account of how causal chains might be formed to create a link joining a candidate cause to a particular effect. From a force dynamic perspective, the process of representing a mechanism involves the establishment of spatial connections between objects that allow for the transmission and removal of forces. A force dynamic approach to mechanism differs from a transmission view of mechanism. For example, according to Kistler’s (2006) transmission theory of cau­ sation, “Two events c and e are related as cause and effect if and only if there is at least one conserved quantity P, subject to a conservation law and exemplified in c and e, a de­ terminate amount of which is transferred between c and e.” Kistler’s (2006) proposal builds on Dowe’s (2000) conserved quantity theory. Force theories are highly related to transmission theories of causation, but they are not the same. Most notably, transmission theories are restricted to relationships in which there is a transfer of conserved quanti­ ties. But as noted by Dowe (2001), such a restriction means that transmission theories are unable to represent the notion of PREVENT and causation by omission (Woodward, 2007). Force theories, on the other hand, are able to address these phenomena because (1) they do not require that an effect occur in order for a force relationship to be present, and (2) they allow for interactions in which forces are not only transmitted, but also removed. As shown in Walsh and Sloman (2011), people seem to use this mechanistic information to select a particular factor in a situation as the cause of an effect. Mechanism, then, consti­ tutes one last part of the answer to the problem of causal selection.

Challenges and Conclusions Force theories address many of the phenomena associated with causal cognition. They do so by decomposing the concept of causation into factors that can be tied to both proper­ ties of the physical world as well as people’s sensory experience (Wolff & Shepard, 2013). One challenge for force theories is how such an approach might be extended to more ab­ stract domains. Early work on this problem suggests that physical and abstract causation are understood in much the same way (Wolff, 2014), but some of the crucial tests of this hypothesis have yet to be conducted. It may be that for abstract causal relations, the mind shifts to a different kind of representational format with its own kind of combinator­ ial logic. On the other hand, it may be that the underlying representational substrate for abstract causation is largely the same as for (p. 165) concrete causation, implying that the representation of abstract causation largely preserves the properties of concrete causa­ tion (Wolff & Barbey, 2015).

Page 31 of 38

Force Dynamics As mentioned earlier, people sometimes feel that they understand how objects operate, despite having little knowledge of the underlying mechanism—the “illusion of explanatory depth” (Rozenblit & Keil, 2002). Future research on force dynamics may examine a simi­ lar illusion. In so-called causal illusions (Thorstad & Wolff, 2016), people can have an ini­ tial and illusory impression of force despite the absence of physical contact, as when a magician seems to cause an object to levitate from a distance. One possible explanation for such illusions is the conflict between an initial impression of force based on perceptu­ al cues, and a second impression based on more deliberate analysis of the underlying mechanism. Future research may examine whether dissociable cognitive processes un­ derlie these two impressions of force. Examination of these issues will not only further our understanding of causation, but also give us further insight into the nature of mental representation in general.

References Ahn, W., & Bailenson, J. (1996). Causal attribution as a search for underlying mecha­ nisms: An explanation of the conjunction fallacy and the discounting principle. Cognitive Psychology, 31, 82–123. Ahn, W., & Kalish, C. W. (2000). The role of mechanism beliefs in causal reasoning. In F. C. Keil & R. A. Wilson (Eds.), Explanation and cognition (pp. 199–225). Cambridge, MA: MIT Press. Ahn, W., Kalish, C. W., Medin, D. L., & Gelman, S. A. (1995). The role of covariation versus mechanism information in causal attribution. Cognition, 54, 299–352. Anjum, R. L., & Mumford, S. (2010). A powerful theory of causation. In A. Marmodoro (Ed.), The metaphysics of powers (pp. 143–159). London: Routledge. Barbey, A. K., & Wolff, P. (2006). Causal reasoning from forces. In Proceedings of the 28th annual conference of the Cognitive Science Society (p. 2439). Mahwah, NJ: Lawrence Erl­ baum Associates. Barbey, A. K., & Wolff, P. (2007). Learning causal structure from reasoning. In Proceed­ ings of the 29th annual conference of the Cognitive Science Society (pp. 713–718). Mah­ wah, NJ: Lawrence Erlbaum Associates. Beavers, J. (2011). On affectedness. Natural Language & Linguistics Theory, 29, 1–36. Bowerman, M. (1982). Evaluating competing linguistic models with language acquisition data: Implications of developmental errors with causative verbs. Quaderni di semantica, 3, 5–66. Cheng, P. W., & Novick, L. R. (1991). Causes versus enabling conditions. Cognition, 40, 83–120.

Page 32 of 38

Force Dynamics Cheng, P. W., & Novick, L. R. (1992). Covariation in natural causal induction. Psychologi­ cal Review, 99, 365–382. Cohen, L. B., & Amsel, G. (1998). Precursors to infants’ perception of the causality of a simple event. Infant Behavior and Development, 21, 713–731. Cotterell, B., & Kamminga, J. (1990). Mechanics of pre-industrial technology. Cambridge, UK: Cambridge University Press. Copley, B., & Harley, H. (2015). A force-theoretic framework for event structure. Linguist and Philos, 38, 103–158. Copley, B., & Wolff, P. (2014). Theories of causation should inform linguistic theory and vice versa. In B. Copley & F. Martin (Eds.), Causation in grammatical structures (pp. 11– 57). Oxford: Oxford University Press. Dowe, P. (2000). Physical causation. Cambridge, UK: Cambridge University Press. Dowe, P. (2001). A counterfactual theory of prevention and “causation” by omission. Aus­ tralasian Journal of Philosophy, 79, 216–226. Dowty, D. R. (1979). Word meaning and Montague grammar. Dordrecht: Reidel. Einhorn, H. J., & Hogarth, R. M. (1986). Judging probable cause. Psychological Bulletin, 99, 3–19. Gärdenfors, P. (2014). Geometry of meaning: Semantics based on conceptual spaces. Cambridge, MA: MIT Press. Göksun, T., George, N. R., Hirsh-Pasek, K., & Golinkoff, R. M. (2013). Forces and motion: How young children understand causal events. Child Development, 84, 1285–1295. Goldvarg, E., & Johnson-Laird, P. (2001).Naive causality: A mental model theory of causal meaning and reasoning. Cognitive Science, 25, 565–610. Greville, W. J., & Buehner, M. J. (2010). Temporal predictability facilitates causal learning. Journal of Experimental Psychology: General, 139, 756. Hale, K., & Keyser, S. (1993). On argument structure and the lexical expression of syntac­ tic relations. In K. Hale & S. Keyser (Eds.), The view from Building 20: Essays in linguis­ tics in honor of Sylvain Bromberger (pp. 53–109). Cambridge, MA: MIT Press. Hall, N. (2000).Causation and the price of transitivity. Journal of Philosophy, 97, 198–222. Hall, N. (2004). Two concepts of causation. In J. Collins, N. Hall, & L. Paul (Eds.), Causa­ tion and Counterfactuals (pp. 225–276). Cambridge, MA: MIT Press. Hall, N. (2006). Philosophy of causation: Blind alleys exposed; promising directions high­ lighted. Philosophy Compass, 1, 1–9. Page 33 of 38

Force Dynamics Harré, R. & Madden, E. H. (1975). Causal powers: A theory of natural necessity, Oxford: Blackwell. Hart, H. L., & Honoré, A. (1959/1985). Causation in the law. Oxford: Clarendon Press. Hesslow, G. (1988). The problem of causal selection. In D. J. Hilton (Ed.), Contem­ porary science and natural explanation: Commonsense conceptions of causality (pp. 11– 32). Brighton, Sussex: Harvester Press. (p. 166)

Hubbard, T. L., & Ruppel, S. E. (2013). Ratings of causality and force in launching and shattering. Visual Cognition, 21, 987–1009. Hubbard, T. L. (2013). Phenomenal causality I: Varieties and variables. Axiomathes, 23, 1– 42. Hubbard, T. L., & Favretto, A. (2003). Naïve impetus and Michotte’s” tool effect”: evi­ dence from representational momentum. Psychological Research, 67, 134–152. Jackendoff, R. (1990). Semantic structures. Cambridge, MA: The MIT Press. Johnson, S. G. B., & Ahn, W. (2015). Causal networks or causal islands? The representa­ tion of mechanisms and the transitivity of causal judgment. Cognitive Science, 39, 1468– 1503. Khemlani, S. S., Barbey, A. K., & Johnson-Laird, P.N. (2014). Causal reasoning with mental models. Frontiers in Human Neuroscience, 8, 849. Kistler, Max. (2006). Causation and laws of nature. New York: Routledge. Lagnado, D. A., & Sloman, S. A. (2006). Time as a guide to cause. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32, 451. Lagnado, D. A., Waldmann, M. R., Hagmayer, Y., & Sloman, S. A. (2007). Beyond covaria­ tion. Causal Learning: Psychology, Philosophy, and Computation, 154–172. Leslie, A. M., & Keeble, S. (1987). Do six-month-old infants perceive causality? Cognition, 25, 265–288. Lombrozo, T. (2010). Causal-explanatory pluralism: how intentions, functions, and mecha­ nisms influence causal ascriptions. Cognitive Psychology, 61, 303–332. Marantz, A. (1997). No escape from syntax: Don’t try morphological analysis in the priva­ cy of your own Lexicon. In A. Dimitriadis et al. (Eds.), Proceedings of the 21st annual Penn linguistics colloquium. Penn Working Papers in Linguistics 4(2), 201–225. Marmodoro, A. (2007). The union of cause and effect in Aristotle: Physics III 3. Oxford Studies in Ancient Philosophy, 32, 205–232.

Page 34 of 38

Force Dynamics Mayrhofer, R., & Waldmann, M. R. (2014). Indicators of causal agency in physical interac­ tions: The role of prior context. Cognition, 132, 485–490. McCormack, T., Frosch, C., Patrick, F., & Lagnado, D. (2015). Temporal and statistical in­ formation in causal structure learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 41, 395. McGrath, S. (2005). Causation by omission: A dilemma. Philosophical Studies, 123, 125– 148. Michotte, A. E. (1946/1963). The perception of causality. New York: Basic Books. Mill, J. S. (1872/1973). System of logic (8th ed.). In J. M. Robson (Ed.), Collected works of John Stuart Mill (Vols. VII and VIII). Toronto: University of Toronto Press. Mumford, S., & Anjum, R. L. (2009). Double prevention and powers. Journal of Critical Re­ alism, 8, 277–293. Mumford, S., & Anjum, R. L. (2011). Getting causes from powers. New York: Oxford Uni­ versity Press. Nelson, Deborah G. (1993). Processing integral dimensions: The whole view. Journal of Experimental Psychology: Human Perception and Performance, 19, 1114–1120. Newman, G. E., Choi, H., Wynn, K., & Scholl, B. J. (2008). The origins of causal percep­ tion: Evidence from postdictive processing in infancy. Cognitive Psychology, 57, 262–291. Pinker, S. (1989). Learnability and cognition: The acquisition of argument structure. Cam­ bridge, MA: MIT Press. Pustejovsky, J. (1991). The syntax of event structure. Cognition, 41, 47–81. Rappaport Hovav, M., & Levin, B. (2001). An event structure account of English resulta­ tives. Language, 77, 766–797. Rolfs, M., Dambacher, M., & Cavanagh, P. (2013). Visual adaptation of the perception of causality. Current Biology, 23, 250–254. Rottman, B. M., Kominsky, J. F., & Keil, F. C. (2014). Children use temporal cues to learn causal directionality. Cognitive Science, 38, 489–513. Rozenblit L, Keil F. (2002). The misunderstood limits of folk science: An illusion of ex­ planatory depth. Cognitive Science, 26, 521–62. Schaffer, J. (2000). Causation by disconnection. Philosophy of Science, 67, 285–300. Scholl, B. J., & Nakayama, K. (2004). Illusory causal crescents: Misperceived spatial rela­ tions due to perceived causality. Perception, 33, 455–469.

Page 35 of 38

Force Dynamics Scholl, B. J., & Tremoulet, P. D. (2000). Perceptual causality and animacy. Trends in Cogni­ tive Sciences, 4, 299–309. Schulz, L., Kushnir, T., & Gopnik, A. (2007). Learning from doing: Intervention and causal inference. In A. Gopnik & L. Schulz (Eds.), Causal learning: Psychology, philosophy and computation (pp. 67–85). Oxford: Oxford University Press. Sloman, S. A., Barbey, A., & Hotalling, J. (2009). A causal model theory of the meaning of “cause,” “enable,” and “prevent.” Cognitive Science, 33, 21–50. Sloman, S., & Lagnado, D. (2014). Causality in thought. Annual Review of Psychology, 66, 3.1–3.25. Shanks, D. R., Pearson, S. M., & Dickinson, A. (1989). Temporal contiguity and the judge­ ment of causality by human subjects. The Quarterly Journal of Experimental Psychology, 41, 139–159. Talmy, L. (1988). Force dynamics in language and cognition. Cognitive Science, 12, 49– 100. Thorstad, R., & Wolff, P. (2016). What causal illusions might tell us about the identifica­ tion of causes. Proceedings of the 38th annual conference of the Cognitive Science Soci­ ety (pp. 1991–1996). Austin, TX: Cognitive Science Society. Van Valin, R. D., Jr. (1990). Semantic parameters of split intransitivity. Language, 66, 221– 260. Waldmann, M. R., & Mayrhofer, R. (2016). Hybrid causal representations. Psychology of Learning and Motivation, 65, 1–43. Walsh, C. R., & Sloman, S. A. (2011). The meaning of cause and prevent: The role of causal mechanism. Mind and Language, 26, 21–52. Warglien, M., Gärdenfors, P., & Westera, M. (2012). Event structure, conceptual spaces and the semantics of verbs, Theoretical Linguistics, 38, 159–193. White, P. A. (2006). The role of activity in visual impressions of causality. Acta Psychologi­ ca, 123, 166–185. White, P. A. (2011). Visual impressions of force exerted by one object on another when the objects do not come into contact. Visual Cognition, 19, 340–366. White, P. A. (2014). Singular clues to causality and their use in human causal judgment. Cognitive Science, 38, 38–75. White, P. A., & Milne, A. (1997). Phenomenal causality: Impressions of pulling in the visu­ al perception of objects in motion. The American Journal of Psychology, 110, 573.

Page 36 of 38

Force Dynamics Wolff, P. (2003). Direct causation in the linguistic coding and individuation of causal events. Cognition, 88, 1–48. Wolff, P. (2007). Representing causation. Journal of Experimental Psychology: Gen­ eral, 136, 82–111. (p. 167)

Wolff, P. (2008). Dynamics and the perception of causal events. In T. Shipley & J. Zacks (Eds.), Understanding events: How humans see, represent, and act on events (pp. 555– 587). Oxford: Oxford University Press. Wolff, P. (2012). Representing verbs with force vectors. Theoretical Linguistics, 38, 237– 248. Wolff, P. (2014). Causal pluralism and force dynamics. In B. Copley, F. Martin, & N. Duffield (Eds.), Forces in grammatical structures: Causation between linguistics and phi­ losophy (pp. 100–118). Oxford: Oxford University Press. Wolff, P., & Barbey, A. K. (2015). Causal reasoning with forces. Frontiers in Human Neuro­ science, 9. Wolff, P., Barbey, A. K., & Hausknecht, M. (2010). For want of a nail: How absences cause events. Journal of Experimental Psychology: General, 139, 191–221. Wolff, P., Jeon, G., Klettke, B., & Yu, L. (2010). Force creation and possible causers across languages. In B. Malt, & P. Wolff (Eds.), Words and the world: How words capture human experience (pp. 93–110) Oxford: Oxford University Press. Wolff, P., Jeon, G., & Yu, L. (2009). Causers in English, Korean, and Chinese and the indi­ viduation of events. Language and Cognition, 2, 165–194. Wolff, P., Klettke, B., Ventura, T., & Song, G. (2005). Expressing causation in English and other languages. In W. Ahn, R. L. Goldstone, B. C. Love, A. B. Markman, & P. Wolff (Eds.), Categorization inside and outside the laboratory: Essays in honor of Douglas L. Medin (pp. 29–48). Washington, DC: American Psychological Association. Wolff, P., & Malt, B. (2010). The language-thought interface: An introduction. In B. Malt & P. Wolff (Eds.), Words and the world: How words capture human experience (pp. 93–110). Oxford: Oxford University Press. Wolff, P., Ritter, S., & Holmes, K. (2014). Causation, forces, and the sencse of touch. In Proceedings of the 34th annual conference of the Cognitive Science Society (pp. 1784– 1789). Austin, TX: Cognitive Science Society. Wolff, P., & Shepard, J. (2013). Causation, touch, and the perception of force. Psychology of Learning and Motivation, 58, 167–202. Wolff, P., & Song, G. (2003). Models of causation and the semantics of causal verbs. Cog­ nitive Psychology, 47, 276–332. Page 37 of 38

Force Dynamics Wolff, P., & Zettergren, M. (2002). A vector model of causal meaning. In W. D. Gray & C. D. Schunn (Eds.), Proceedings of the 24th annual conference of the Cognitive Science So­ ciety (pp. 944–949). Mahwah, NJ: Lawrence Erlbaum Associates. Woodward, J. (2006). Sensitive and insensitive causation. Philosophical Review, 115, 1–50. Woodward, J. (2007). Interventionist theories of causation in psychological perspective. In A. Gopnik & L. Schulz (Eds.), Causal learning: Psychology, philosophy, and computation (pp. 19–36). Oxford: Oxford University Press. Yela, M. (1952). Phenomenal causation at a distance. Quarterly Journal of Experimental Psychology, 4, 139–154. Young, M. E., & Falmier, O. (2008). Launching at a distance: The effect of spatial markers. The Quarterly Journal of Experimental Psychology, 61, 1356–1370. Young, H. D., & Freedman, R. A. (1999). University physics (10th ed.). Reading, MA: Addi­ son-Wesley. Zwarts, Joost. (2010). Forceful prepositions. In Vyvyan Evans and Paul Chilton (Eds.), Lan­ guage, cognition and space: The state of the art and new directions (pp. 193–214). Lon­ don: Equinox Publishing. (p. 168)

Notes: (1.) As documented in Johnson and Ahn (2015), not all causal chains are transitive. For ex­ ample, buying an air conditioner can cause an increase in the electric bill, and an in­ crease in the electric bill can cause anger, but few would want to conclude that buying an air conditioner causes anger. Johnson and Ahn (2015) provide evidence that the transitivi­ ty of a chain depends on the ability to fit the relations into a single schema, which may ex­ ist by virtue of a common underlying mechanism, temporal contiguity, and homogeneity of the time scale of the different relations. (2.) Muentener and Lakusta’s (2011) Experiment 3 suggests that children are able to rep­ resent causation when it is unintentional. However, children’s tendency to use causal lan­ guage more often for intended than unintended events might mean that they are more likely to infer causation when it is intended than unintended.

Phillip Wolff

Department of Psychology Emory University Atlanta, Georgia, USA

Page 38 of 38

Mental Models and Causation

Mental Models and Causation   P. N. Johnson-Laird and Sangeet S. Khemlani The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.4

Abstract and Keywords The theory of mental models accounts for the meanings of causal relations in daily life. They refer to seven temporally-ordered deterministic relations between possibilities, which include causes, prevents, and enables. Various factors—forces, mechanisms, inter­ ventions—can enter into the interpretation of causal assertions, but they are not part of their core meanings. Mental models represent only salient possibilities, and so they are identical for causes and enables, which may explain failures to distinguish between their meanings. Yet, reasoners deduce different conclusions from them, and distinguish be­ tween them in scenarios, such as those in which one event enables a cause to have its ef­ fect. Neither causation itself nor the distinction between causes and enables can be cap­ tured in the pure probability calculus. Statistical regularities, however, often underlie the induction of causal relations. The chapter shows how models help to resolve inconsistent causal scenarios and to reverse engineer electrical circuits. Keywords: abduction, causes, deduction, determinism, explanation, mental models, nonmonotonic reasoning

Hume (1748/1988) remarked that most reasoning about matters of fact depends on causal relations. We accordingly invite readers to make two inferences: 1. Eating protein will cause Evelyn to gain weight. Evelyn will eat protein. Will Evelyn gain weight? 2. Marrying Viv on Monday will enable Pat to be happy. Pat will marry Viv on Monday. Will Pat be happy? In a study of several hundred highly intelligent applicants to a selective Italian university, almost all the 132 participants making the first sort of inference responded “yes’’ (98%), whereas over two-thirds of a separate group of 129 participants making the second sort of inference responded “perhaps yes, perhaps no” (68%) and the remainder in this group responded “yes” (Goldvarg & Johnson-Laird, 2001). These inferences stand in need of an

Page 1 of 37

Mental Models and Causation explanation, and one aim of the present chapter is to show that the theory of mental mod­ els explains them and causal reasoning in general. Causation has created controversy for centuries. Some have argued that the notion is ir­ relevant (Russell, 1912–1913), ill defined (Lindley & Novick, 1981), and inconsistent (Salsburg, 2001, pp. 185–186). Scholars also disagree about its foundations, with whether, for instance, causal relations are objective or subjective, and with whether they hold between actions, events, or states of affairs. Certainly, actions can be causes, such as throwing a switch to cause a light to come on. But, as the proverb says, “for want of a nail the kingdom was lost,” and so causes can be negative states of affairs as well. We say no more about these philosophical matters, but we do need to consider the consistency of causal concepts. Causation is built into so much of language that the concept is hardly inconsistent unless languages (p. 170) themselves are inconsistent (see Solstad & Bott, Chapter 31 in this vol­ ume). A causal assertion, such as eating protein causes Evelyn to lose weight, can be paraphrased using verbs, such as makes, gets, and forces. The sentence can also be para­ phrased in a conditional assertion: If Evelyn eats protein, then Evelyn will lose weight. Many verbs embody causal relations in their meanings. For example, an assertion of the form x lifts y can be paraphrased as x does something that causes y to move upward; an assertion of the form x offers y to z can be paraphrased as x does something that enables y to possess z; and an assertion of the form x hides y from z can be paraphrased as x does something that prevents z from seeing y. These paraphrases reveal that certain concepts, expressed here in move, possess, and see, underlie “semantic fields” of verbs, whereas other concepts, expressed in causes, enables, and prevents, occur in many different se­ mantic fields (Miller & Johnson-Laird, 1976). Indeed, it is easier to frame informative defi­ nitions for verbs with a causal meaning than for verbs lacking such a meaning (JohnsonLaird & Quinn, 1976). Inconsistency arises among beliefs about causation. Some people believe that every event has a cause; some believe that an action or intervention can initiate a causal chain, as when a trial begins in an experiment. And some believe both these propositions (e.g., Mill, 1874). But, they are inconsistent with one another. If every event has a cause, then an action cannot initiate a causal chain, because the action has an earlier cause, and so on, back to the ultimate cause or causes of all chains of events. Beliefs, however, are not part of the meanings of terms. Both every event has a cause and every intervention initi­ ates a causal chain make sense, and the meaning of cause should not rule out either as­ sertion as false. The problem, of course, is to separate beliefs from meanings, and the on­ ly guide is usage. We now introduce the theory of mental models. The origins of the theory go back to Peirce’s (1931–1958, Vol. 4) idea that diagrams can present moving pictures of thoughts. The psychologist and physiologist Kenneth Craik (1943) first introduced mental models into psychology. His inspiration was machines such as Kelvin’s tidal predictor, and he wrote that if humans build small-scale models of exter­ nal reality in their heads, they can make sensible predictions from them. Craik’s ideas Page 2 of 37

Mental Models and Causation were programmatic and untested, and he supposed that reasoning depends on verbal rules. In contrast, the modern theory began with the idea that reasoning itself is a process of simulation based on mental models (Johnson-Laird, 1983). The basic principles of the theory of mental models—the “model theory,” for short—apply to any domain of reasoning, given an account of the meaning of the essential concepts in the domain (e.g., Khemlani & Johnson-Laird, 2013). The present chapter therefore begins with a theory of the meaning of causal relations, including those exemplified in the pre­ ceding inferences, taking pains to distinguish meanings from beliefs. It also distinguishes meanings from their interpretation, which yields mental models of the situations to which meanings refer. Models can be static, or they can unfold in time kinematically in a mental simulation of a sequence of events in a causal chain. Both sorts of models yield infer­ ences, and the chapter considers the three main sorts of reasoning: deduction, induction, and abduction. On occasion, it contrasts the model theory with alternative accounts of causation. The aim is not polemical, but to use the contrast to clarify the theory. The chapter concludes with a summary of outstanding problems.

The Meanings of Causal Relations Basic Causal Relations This section presents the model theory of the meanings of everyday causal assertions. It also determines how many different sorts of causal relations exist. Theorists tend not to address this question, and often assume that there is just one—the relation of cause and effect, which may bring about an event or prevent it (Mill, 1874). They argue that causes and enables do not differ in meaning: causes are abnormal, whereas enablers are normal (Hart & Honoré, 1985), causes are inconstant whereas enablers are constant in the situa­ tion (Cheng & Novick, 1991), causes violate a norm whereas enablers do not (Einhorn & Hogarth, 1986), or causes are conversationally relevant whereas enablers are not (Hilton & Erb, 1996; Mackie, 1980). In contrast, the model theory distinguishes the meaning of the two (see also Wolff and Thorstad, Chapter 9 in this volume), and fixes the number of causal relations. A clue is in the mappings in Table 10.1 from quantifiers ranging over possibilities to causation and thence to obligation. A necessary proposition holds in all rel­ evant possibilities, a possible proposition holds in at least one of them, an impossible proposition holds in none of them, and a proposition that is possibly not the case fails to hold in at least one of them. Likewise, if a cause occurs, then its effect is necessary, if an enabling condition occurs then its effect is possible, if a preventive condition occurs then its effect is impossible, and if a condition (p. 171) allows an effect not to occur then it is possible for it not to occur.

Page 3 of 37

Mental Models and Causation Table 10.1 The Set of Mappings over Six Domains from Quantifiers Through Causal Concepts to Modal Notions and on to Deontic Concepts Quantified As­ sertions

Causal Verbs

Modal Concepts

Modal Auxiliary Verbs

Deontic Con­ cepts

Deontic Verbs

In all possibilities, it occurs.

causes

necessary

will occur

compulsory

obligates

In some possibilities, it oc­ curs.

allows/enables

possible

may occur

permissible

allows/permits

In no possibilities, it occurs.

prevents

impossible

cannot occur

impermissible

prohibits

In some possibilities, it does not occur.

allows not/enables not

possible not

may not occur

permissible not

allows not/permits not

Page 4 of 37

Mental Models and Causation Table 10.1 treats allows and enables as synonyms, but they have subtle differences in us­ age. Of course, a single quantifier, such as some, together with negation, allows all four cases to be defined (e.g., all is equivalent to it is not the case that some are not); and the same applies mutatis mutandis for each column in Table 10.1 (e.g., A causes B to occur is equivalent to A does not allow B not to occur). The mappings to deontic concepts are analogous, where deontics embraces what is permissible within morality or within a framework of conventions, such as those governing games or good manners (Bucciarelli & Johnson-Laird, 2005; Bucciarelli, Khemlani, & Johnson-Laird, 2008). Modal auxiliary verbs, such as must and may in English and analogs in other Indo-European languages, are likewise ambiguous between factual and deontic interpretations. But, the two do­ mains of possibilities and permissibilities cannot be amalgamated into one, because of a radical distinction between them. The failure of a factual necessity to occur renders its description false; whereas the violation of a deontic necessity does not render its descrip­ tion false—people do, alas, violate what is permissible. Many theories propose a probabilistic meaning for causal relations and for conditionals (see Over, Chapter 18; and Oaksford & Chater, Chapter 19, both in this volume). One diffi­ culty with this proposal is that the difference in meaning between A causes B and A en­ ables B, which we outline presently, cannot be drawn within the probability calculus: both can yield high conditional probabilities of B given A (Johnson-Laird, 1999). Indeed, as the founder of Bayesian networks wrote, “Any causal premise that is cast in standard proba­ bilistic expressions … can safely be discarded as inadequate” (Pearl, 2009, p. 334). He points out that cause has a deterministic meaning and that the probability calculus can­ not distinguish between correlation and causation. It cannot express the sentence, “Mud does not cause rain” (Pearl, 2009, p. 412). One general point on which all parties should agree is that even granted a deterministic meaning, probabilities can enter into causal assertions in various ways (Suppes, 1970). We can be uncertain about a causal relation: Eating protein will probably cause Evelyn to gain weight. And our degree of belief in a deterministic proposition is best thought of as a probability (e.g., Ramsey, 1929/1990). Subtle aspects of experimental procedure can elicit judgments best modeled probabilistically—from references to “most” in the contents of problems to the use of response scales ranging from 0 to 100 (Rehder & Burnett, 2005; and Rehder, Chapters 20 and 21 in this volume). Evidence for a causal relation can itself be statistical. But, Pearl’s ultimate wisdom remains: keep causation and statistical considerations sepa­ rate; introduce special additional apparatus to represent causation within probabilistic systems. This division is now recognized in many different accounts of causation (see, e.g., Waldmann, 1996; and in this volume, Cheng & Lu, Chapter 5; Griffiths, Chapter 7; Lagnado & Gerstenberg, Chapter 29; and Meder & Mayrhofer, Chapter 23). The consilience of the evidence about reasoning with quantifiers (e.g., Bucciarelli & John­ son-Laird, 1999), modal reasoning (e.g., Bell & Johnson-Laird, 1998), and deontic reason­ ing (e.g., Bucciarelli & Johnson-Laird, 2005 (p. 172) ) implies that each of these domains Page 5 of 37

Mental Models and Causation has deterministic meanings (for a case against probabilistic causal meanings, see Khem­ lani, Barbey, & Johnson-Laird, 2014). Indeed, if necessary were probabilistic, it would al­ low for exceptions and be indistinguishable from possible. The consensus about causation is that, by default, causes precede their effects. But, a bil­ liard ball causes a simultaneous dent in the cushion on which it rests (Kant, 1781/1934), and the moon causes tides, which in Newtonian mechanics calls for instantaneous action at a distance. Philosophers have even speculated about causes following their effects in time. In daily life, however, the norm is that causes precede their effects, or at least do not occur after them. The sensible option is therefore that a cause does not follow its ef­ fect, and the same constraint applies to the prevention or enabling of events. The correspondences in Table 10.1 imply that possibilities underlie the meanings of causal relations. It follows that there are several sorts of causal relation, and we will now enumerate them and explain why there cannot be any other sorts. An assertion about a specific cause and effect, such as Eating protein will cause Evelyn to lose weight refers to a key possibility in which Evelyn eats protein and then loses weight. But, what happens if Evelyn does not eat protein? A weak interpretation of cause is that the assertion leaves open whether or not Evelyn will lose weight. It could result from some other cause, such as a regimen of rigorous exer­ cise. The meaning of the assertion accordingly refers to a conjunction of three possibili­ ties that are each in the required temporal order: Evelyn eats protein and Evelyn loses weight. Evelyn doesn’t eat protein and Evelyn doesn’t lose weight. Evelyn doesn’t eat protein and Evelyn loses weight. An assertion of prevention, such as eating protein prevents Pat from losing weight, is analogous but refers to the non-occurrence of the effect. A cause can be the unique way of bringing about an effect. As far as we know, drinking al­ cohol is the only way to get drunk. Likewise, prevention can be unique. As far as we know, a diet including vitamin C is the only way to prevent scurvy. These stronger senses rule out alternative ways to cause or to prevent effects, and so both sorts of assertion re­ fer only to two possibilities. A unique cause refers to a conjunction of just two possibili­ ties: cause and effect; and no cause and no effect. An enabling relation such as eating protein enables Evelyn to lose weight doesn’t mean that protein necessarily leads Evelyn to lose weight: it may or may not happen, depending on the occurrence or non-occurrence of a cause, such as eating less of other foods. What happens if Evelyn does not eat protein? A weak interpretation is again that Evelyn may or may not lose weight. All four temporally ordered possibilities can therefore occur. They are equivalent to the weak enabling condition for a non-occurrence of an effect, eating protein enables Evelyn not to lose weight. A stronger and more frequent interpretation of Page 6 of 37

Mental Models and Causation the affirmative assertion is that eating protein is a unique and necessary condition for Evelyn to lose weight. The assertion’s meaning is therefore a conjunction of these three temporally ordered possibilities: Evelyn eats protein and Evelyn loses weight. Evelyn eats protein and Evelyn doesn’t lose weight. Evelyn doesn’t eat protein and Evelyn doesn’t lose weight. The negative assertion, eating protein enables Evelyn not to lose weight, is analogous but refers instead to the non-occurrence of the effect. We have now outlined seven distinct causal relations. Their meanings, according to the model theory, refer to different conjunctions of possibilities, which each embodies a tem­ poral order. These meanings are summarized in Table 10.2, and they exhaust all possible causal relations. If A or not-A can occur, and B or not-B can occur, there are four possible contingencies of A and B and their respective negations, and there are 16 possible sub­ sets of these contingencies. Four of these subsets consist in a single possibility, such as A & not-B, which is a categorical assertion of a conjunction. Four of these subsets consist in a conjunction of two possibilities, such as A & B and not-A & B, which corresponds to a categorical assertion, as in this case in which B holds whether or not A holds. And one subset is the empty set corresponding to a self-contradiction. The remaining seven of the 16 subsets of possibilities yield the distinct causal relations in Table 10.2: the strong and weak senses of causes, the strong and weak senses of prevents, the strong senses of en­ ables and enables_not, and their identical weak senses. Granted that the meanings of causal relations refer to conjunctions of possibilities, no other causal relations can exist. The conjunctions of possibilities in Table 10.2 are analogous to truth tables in sentential logic. Some critics have therefore argued that the model theory’s (p. 173) account of A causes B in its weak sense is equivalent to material implication in logic (e.g., Ali, Chater, & Oaksford, 2011; Kuhnmünch & Beller, 2005; Sloman, Barbey, & Hotaling, 2009). Other critics have made the same claim about the model theory’s treatment of conditionals (see, e.g., Evans & Over, 2004). For readers unfamiliar with material implication, it is equiva­ lent to not-A or B, or both, and so on this account A causes B is true provided that A is false or B is true. That’s clearly wrong, and so, according to these critics, the model theo­ ry is therefore wrong, too. In fact, their argument is flawed, because it fails to distinguish between possibilities and truth values. Two truth values such as true and false are incon­ sistent with one another. In contrast, the possibility of A is entirely consistent with the possibility of not-A. Likewise, the conjunction of possibly (A & B) and possibly (not-A & not-B) is consistent. Indeed, the model theory postulates that the meaning of A causes B in its strong sense refers to this conjunction of possibilities. Its weak sense adds a third possibility to the conjunction: not-A & B (see Table 10.2). The mere falsity of A does not establish that this conjunction of three possibilities is true, nor does the mere truth of B. The same argument applies to conditionals, if A then B (cf. Johnson-Laird & Byrne, 2002). Hence, according to the model theory, the meaning of A causes B in its weak sense, and Page 7 of 37

Mental Models and Causation the meaning of a basic conditional, if A then B, both differ from the meaning of material implication in logic.

Page 8 of 37

Mental Models and Causation Table 10.2 The Core Meanings of the Seven Possible Causal Relations in Terms of the Conjunctions of Temporally Ordered Possibili­ ties to Which They Refer The Seven Conjunctions of Possibilities Yielding Distinct Causal Relations

A causes B A allows B

   a  b not-a not-b not-a  b

   a  b not-a not-b

Weak

Strong

   a  b    a not-b not-a not-b not-a  b

   a  b    a not-b not-a not-b

Weak

Strong

A prevents B A allows not B

Weak

   a not-b not-a not-b not-a  b

   a not-b not-a  b

Weak

Strong

   a  b    a not-b not-a  b

Strong

Note: Strong interpretations correspond to unique causes, preventers, and enablers; weak interpretations allow for others.

Page 9 of 37

Mental Models and Causation

Counterfactual Conditionals and Causation When the facts are known, it is appropriate to say, for instance, eating protein caused Evelyn to lose weight. But, as Hume (1748/1988, p. 115) recognized, it is equally appro­ priate to assert a “counterfactual” conditional: If Evelyn hadn’t eaten protein, then Evelyn might not have lost weight. The appropriate counterfactual depends both on the nature of the causal relation and on the nature of the facts. Suppose, in contrast, that the causal re­ lation still holds but the facts are that Evelyn didn’t eat protein and didn’t lose weight. A different counterfactual captures the causal relation: If Evelyn had eaten protein, then Evelyn would have lost weight. The meanings of counterfactuals are straightforward. The assertion A causes B in its weak sense refers to a conjunction of three possibilities (Table 10.2), but it also means that A and not-B is impossible. Hence, given this sense and that the facts are A and B, the two other possibilities are counterfactual, where we define a counterfactual possibility as a contingency that was once possible but that didn’t in fact happen (Byrne, 2005; John­ son-Laird & Byrne, 2002). The meaning of a counterfactual conditional therefore depends on three principles. First, the negations of its two clauses refer to the facts but with a caveat, namely, when “still” occurs in the then-clause, the clause itself describes the facts. Second, its two clauses re­ fer to the main counterfactual possibility. Third, any other counterfactual possibility de­ pends on the modal auxiliary that occurs in its then-clause, which has the same force as its present tense meanings: would means the event necessarily occurs, and might and could mean the event possibly occurs. But, their negations differ in scope: couldn’t means that the event is not possible, and mightn’t means the event (p. 174) possibly does not oc­ cur (see Table 10.1). Other cognate modals express the same meanings. As an example, consider again the counterfactual conditional: If Evelyn had eaten protein, Evelyn would have lost weight. The first principle determines the facts about Evelyn:

The second principle determines the main counterfactual possibility about her:

The third principle tells us that eating protein necessarily leads Evelyn to lose weight— there is no alternative in that case. The only counterfactual possibility that remains is therefore that Evelyn didn’t eat protein but for some unknown reason nevertheless lost weight. A corollary for interpretation is that, as the preceding examples illustrate, models have to keep track of the status of a contingency—as a fact, a possibility, a counterfactual possibility, or an impossibility (Byrne, 2005; Johnson-Laird & Byrne, 2002).

Page 10 of 37

Mental Models and Causation Table 10.3 puts the three principles into practice. It presents the set of counterfactuals expressing the seven causal relations, in their strong and weak senses, depending on the facts of the matter. In general, the meaning of causal counterfactuals resists analysis un­ less one considers the possibilities to which they refer. The experimental evidence sug­ gests that counterfactuals tend to elicit mental models of the facts and only the main counterfactual alternative to them (Byrne, 2005). It is tempting to identify causation and counterfactuals. But, like regular conditionals, counterfactuals need not express causal relations (e.g., If the number hadn’t been divisible by 2 without remainder then it wouldn’t have been even).

Are There Other Components in the Meanings of Causal Relations? Experiments have shown that most participants list as possible the deterministic cases corresponding to the strong meanings of causes, prevents, enables, and enables_not (i.e., the participants tend to minimize the number of possibilities to which assertions refer; Goldvarg & Johnson-Laird, 2001; Johnson-Laird & Goldvarg-Steingold, 2007). Is anything more at stake in the meanings of causal relations, that is, anything more than a conjunc­ tion of temporally ordered possibilities? The question concerns the meaning of causal as­ sertions in everyday life as opposed to one’s degree of belief in them or to how one might establish their truth (or falsity). For the latter, observational evidence can help, but exper­ imentation is the final arbiter because one has to determine what happens given the puta­ tive cause, and what happens without it. If a conditional expresses the correct temporal relations between physical states, then it has a potential causal meaning, for example: If the rooster crows, then the sun rises. But, an adroit experiment will show that the preceding conditional is false, and hence that the rooster’s crowing is not the cause of sunrise. As skeptics sometimes claim, to say that A causes B at a minimum refers to an additional relation that goes beyond a universal succession of A followed by B (pace Hume, 1748/1988, p. 115). Table 10.2 presents such a relation. When A occurs, the only possibili­ ty is that B occurs, and so the relation is a necessary one (see Kant, 1781/1934). Other theorists have proposed various additional elements of meaning. But, before we consider them, we need to clarify the nature of our argument. In the model theory, the interpreta­ tion of any assertion is open to a process of modulation in which knowledge of meanings, referents, and context can eliminate models or add information to them over and above those conveyed by literal meanings (see, e.g., Johnson-Laird & Byrne, 2002). It can add, for instance, a temporal relation between the clauses of a conditional, such as if he passed the exam, then he did study hard (see, e.g., Juhos, Quelhas, & Johnson-Laird, 2012). The case against additional elements of meaning that we are going to make in what follows is against their addition to the core meanings of temporally ordered possibil­ ities (see Table 10.2). But, modulation can incorporate these additional elements in the process of interpretation. For instance, Pearl’s (2009) central assumption about causation is encapsulated in this principle: Y is the cause of Z if we can change Z by manipulating Y. Page 11 of 37

Mental Models and Causation But, we can manipulate a number so that it is or isn’t divisible by 2 without remainder, and the manipulation changes whether or not the number is even. This condition, howev­ er, doesn’t cause the number to be even. It necessitates its evenness. So, the criterion of manipulation lumps together mathematical necessity and causal necessity. It is also equivalent to a recursive definition in which “cause” is referred to in its own definition: Y causes Z =def A manipulation of Y causes Z to change. And recursive definitions need a condition that allows the recursion to bottom out, other­ wise they lead to infinite loops (cf. Woodward, 2003). The model theory provides such a condition for any (p. 175) causal relation (see Table 10.2), and it allows modulation to in­ corporate manipulation into the interpretation of causation.

Page 12 of 37

Mental Models and Causation Table 10.3 The Set of Counterfactual Conditionals Expressing Causal Relations Depending on the Facts of the Matter The Facts Causal Rela­ tion

Strength

Meaning

not-a not-b

not-a b

ab

a not-b

A causes B

Weak

   a  b not-a  b not-a not-b

If A had hap­ pened, then B would have happened.

If A had hap­ pened, then B still would have happened.

If A hadn’t hap­ pened, then B mightn’t have happened.

The causal rela­ tion rules out this fact.

Strong

   a  b not-a not-b

If A had hap­ pened, then B would have happened.

The causal rela­ tion rules out this fact.

If A hadn’t hap­ pened, then B couldn’t have happened.

The causal rela­ tion rules out this fact.

Weak

   a not-b not-a  b not-a not-b

If A had hap­ pened, then B still couldn’t have happened.

If A had hap­ pened, then B couldn’t have happened.

The causal rela­ tion rules out this fact.

If A hadn’t hap­ pened, then B still mightn’t have happened.

Strong

   a not-b not-a  b

The causal rela­ tion rules out this fact.

If A had hap­ pened, then B couldn’t have happened.

The causal rela­ tion rules out this fact.

If A hadn’t hap­ pened, then B would have happened.

A prevents B

Page 13 of 37

Mental Models and Causation A allows B

A allows not-B.

Weak

   a  b    a not-b not-a  b not-a not-b

If A had hap­ pened, then B might have happened.

If A had hap­ pened, then B still might have happened.

If A hadn’t hap­ pened, then B still might have happened.

If A hadn’t hap­ pened, then B still mightn’t have happened.

Strong

   a  b    a not-b not-a not-b

If A had hap­ pened, then B might have happened.

The causal rela­ tion rules out this fact.

If A hadn’t hap­ pened, then B couldn’t have happened.

If A hadn’t hap­ pened, then B couldn’t have happened.

Strong

   a  b    a not-b not-a  b

The causal rela­ tion rules out this fact.

If A had hap­ pened, then B still might have happened.

If A hadn’t hap­ pened, then B still would have happened.

If A hadn’t hap­ pened, then B would have happened.

Note: From left to right, the columns show the causal relation, its interpretation as weak or strong, the possibilities to which it refers, and the four sorts of fact. Each entry presents an appropriate counterfactual conditional for the causal relation and the facts; the other possibilities in the causal relation become counterfactual given the facts. The weak interpretation of A allows not-B is the same as for A allows B.

Page 14 of 37

Mental Models and Causation Some theorists have argued that part of the meaning of A enables B to happen is the exis­ tence of another factor that, when it holds, causes the effect (see Sloman et al., 2009). However, the truth of an enabling assertion doesn’t establish the necessary existence of a cause, for example: The vapor enabled an explosion to occur, but luckily no cause occurred, and so there wasn’t an explosion. If part of the meaning of “enabled” was the existence of a cause, then the previous asser­ tion would be self-contradictory. Modulation, however, can certainly add the existence of a cause. When a rolling billiard ball collides with another stationary one, observers see the physi­ cal contact and perceive that one ball caused the other to move (Michotte, 1946/1963; White, Chapter 14 in this volume). Some theorists have accordingly argued that physical contact or contiguity is part of the meaning of causal assertions (e.g., Geminiani, Carassa, & Bara, 1996). But, consider these claims: Lax monetary policy enabled the explosion in credit to occur in the early 2000s. The explosion in credit caused the 2008 financial crash. It would be supererogatory even to try to establish a chain of physical contacts here. Conversely, the meaning of causation hardly legislates for the falsity of action at a distance: the meaning of cause doesn’t show that Newton’s physics is false. Hence, the meanings of elements interrelated by causal claims can modulate interpretation to add physical contact. (p. 176)

Reasoners know that the wind exerts a force that can blow trees down, that electricity has the power to turn an engine, that workers (or robots) on a production belt are a means for making automobiles, that the mechanism in a radio converts electromagnetic waves into sounds, that explanatory principles account for inflation, and that scientific laws underlie the claim that the moon causes tides. Theorists have accordingly invoked as part of causal meanings: force (Wolff and Thorstad, Chapter 9 in this volume), power (Cheng & Lu, Chapter 5 in this volume), means of production (Harré & Madden, 1975), mechanisms and manipulations (Pearl, 2009), explanatory principles (Hart & Honoré, 1985), and scientific laws. Some of these factors, as Hume (1739/1978) argued, are im­ possible to define without referring to causation itself. Yet, a potent reason to infer a causal relation is relevant knowledge of any of these factors. If you are explaining the mechanism of a sewing machine to a child who persists in asking how?, there will come a point—perhaps after you’ve explained that a catch on the rotating bobbin pulls a loop of the upper thread around it—when you can no longer provide a mechanism. A mechanism is a hierarchy of causal relations (Miyake, 1986), and each relation may have its own un­ derlying mechanism, but the recursion has to bottom out, or otherwise there is an infinite regress. There must be at least one causal relation for which there is no mechanism. The meaning of your final causal claim about the bobbin therefore need not refer to any mech­ Page 15 of 37

Mental Models and Causation anism. It follows that the core meanings of causal assertions need not refer to mecha­ nisms. In sum, knowledge of any of the factors that theorists invoke—force, power, means of pro­ duction, interventions, explanatory principles, and mechanisms—can modulate the mod­ els based on core causal meanings, which do not embody them (see Khemlani, Barbey, & Johnson-Laird, 2014, for an integration of force and the model theory). Hence, the various putative elements of meaning beyond those of temporally ordered possibilities can play a role in the interpretation of causal claims, but they are not part of the core meanings of such claims, on pain of circularity or infinite regress.

Mental Models of Causal Assertions The model theory distinguishes between the meanings of assertions, and the mental mod­ els of the possibilities to which these meanings refer. In the theory’s computational imple­ mentation, the parsing of a sentence yields a representation of its meaning, and this rep­ resentation is used to build or to update the mental models of the situation under descrip­ tion (Khemlani & Johnson-Laird, 2012). Each mental model represents a distinct possibili­ ty (i.e., it represents what is common to the different ways in which it may occur). The more models that individuals have to represent, the greater the load on working memory, and the more difficult reasoning becomes—a result that is highly robust and that, as far as we know, has no counterexamples in the experimental literature (see, e.g., JohnsonLaird, 2006). Indeed, once reasoners have to deal with more than one or two mental mod­ els, their task becomes very difficult, as experiments have shown (e.g., Bauer & JohnsonLaird, 1993; García-Madruga et al., 2001). Table 10.2, which we presented earlier, shows the full set of possibilities to which the sev­ en causal relations refer. In contrast, mental models are based on a principle of truth. They represent what is true in a possibility rather than what is false. The assertion A will cause B to occur has only two mental models:

The first mental model represents the possibility in which A occurs no later than B, and the second mental model has no explicit content, as denoted by the ellipsis, but stands in for other possibilities in which A does not occur. Hence, mental models represent the salient possibility in which the antecedent event and its causal consequence both hold. An enabling assertion A will enable B to occur has exactly the same mental models as the preceding ones. And a preventive assertion: Page 16 of 37

Mental Models and Causation A will prevent B from occurring has the mental models:

where not-b denotes B not occurring. The assertion A will enable B not to occur has these mental models (p. 177) as well. Of course, mental models are not letters or words, which we use here for convenience. They can be static spatiotemporal representations of the world, or kinematic simulations in which events follow one after the other (Khemlani, Mackiewicz, Bucciarelli, & Johnson-Laird, 2013). Yet, just two sorts of sets of mental mod­ els represent all seven causal relations in Table 10.2, because mental models, which em­ body the principle of truth, do not distinguish between causes and enables, or between prevents and enables not to occur, in either their strong or weak senses. For easy tasks, such as listing the possibilities to which an assertion refers, individuals can use the meaning of an assertion to flesh out mental models into fully explicit models of all the possibilities to which the assertion refers (as in Table 10.2). Even so, individuals begin their lists with the possibilities corresponding to mental models (Johnson-Laird & Goldvarg-Steingold, 2007). Only if they construct fully explicit models can they distin­ guish between causal and enabling assertions. A common misconception of the theory is that fully explicit models are used in all inferential tasks (pace Kuhnmünch & Beller, 2005). In fact, mental models are the foundation of intuitions and most inferences.

Deductions from Causal Relations Reasoning starts with perceptions, descriptions, or memories. We refer to “the premises” in order to include any of these sources, and we distinguish among three principal sorts of reasoning: the deduction of valid conclusions; the induction of conclusions, such as gen­ eralizations that go beyond the given information; and a special sort of induction, known as abduction, which yields explanations. In what follows, we outline the model theory for each of these sorts of causal inference, starting with deduction. Naïve individuals tend to reason based on mental models, and to draw conclusions that hold in the set of mental models of the premises. In logic, an inference is valid if the truth of a conclusion follows from the truth of the premises (Jeffrey, 1981). But, in the model theory, for a premise to imply that a conclusion is true, the premise has to imply each of the possibilities to which the conclusion refers. In logic, inferences of this sort are valid: A. Therefore, A or B, or both

Page 17 of 37

Mental Models and Causation because the disjunction is true if one or both of its clauses are true. But, the inference is unacceptable according to the model theory, because the premise doesn’t imply the truth of one of the possibilities to which the conclusion refers: not-A and B. Analogous princi­ ples hold for inferring probabilities (see, e.g., Khemlani, Lotstein, & Johnson-Laird, 2015). Individuals are able to deduce the consequences of causal chains. In one experiment (Goldvarg & Johnson-Laird, 2001), the first premise interrelated two events, A and B, us­ ing a causal relation, and the second premise likewise interrelated B and C. The partici­ pants’ task was to say what, if anything, followed from each pair of premises. The experi­ ment examined all 16 possible pairs of relations based on causes, prevents, allows, and allows_not. The contents of the problems were abstract entities familiar to the partici­ pants (e.g., obedience causes motivation to increase), but which could plausibly occur in any sort of problem. One sort was of the form A causes B. B prevents C. What, if anything, follows? The premises yield the mental models, as shown in a computer program implementing the theory:

and, as the mental models predict, all participants in the experiment concluded that A prevents C. The same conclusion follows from fully explicit models representing all the possibilities to which the premises refer, though reasoners are most unlikely to consider all of them. In contrast, these premises A prevents B. B causes C. yield the mental models:

All but one of the participants drew the conclusion that these mental models predict A prevents C. But, the six fully explicit models of these premises show that all four contin­ gencies between a and c, and their respective negations, are possible. Hence, all that fol­ lows is that A allows C and A allows_not C. In general, the results bore out the predictions based on mental models of the premises, rather than fully explicit models (see Barbey & Page 18 of 37

Mental Models and Causation Wolff, 2007, for a replication). To what extent performance also reflects an “atmosphere” effect in which participants draw conclusions biased by the verbs in the premises calls for further research (see Sloman et al., 2009). A crucial test for mental models is the occurrence of so-called “illusory” infer­ ences. These are fallacious inferences that occur because mental models embody the principle of truth, and so they do not represent what is false. Illusions occur in all the do­ mains of reasoning for which they have been tested, including reasoning based on dis­ junctions and conditionals, probabilistic reasoning, modal reasoning, reasoning about consistency, and quantified reasoning (for a review, see Johnson-Laird & Khemlani, 2013). They are a crucial test because no other theory of reasoning predicts them. Here is a typi­ cal instance of an illusory inference in causal reasoning: (p. 178)

One of these assertions is true and one of them is false: Marrying Pat will cause Viv to relax. Not marrying Pat will cause Viv to relax. The following assertion is definitely true: Viv will marry Pat. Will Viv relax? The rubric to the problem is equivalent to an exclusive disjunction of the first two premis­ es, and so, as the program shows, they yield the following mental models of Viv’s state:

The third assertion eliminates the second model, and so it seems that Viv will relax. But, when one premise is true, the other premise is false. If the first premise is false, then Viv won’t relax even though Viv marries Pat. If the second premise is false, then Viv won’t re­ lax even though Viv doesn’t marry Pat. Either way, on a weak interpretation of cause, Viv won’t relax. On a strong interpretation of cause, the premises imply nothing whatsoever about whether Viv will relax. It is therefore an illusion that Viv will relax. Nearly everyone in an experiment made illusory inferences, but they made correct inferences from control premises (Goldvarg & Johnson-Laird, 2001).

Causes Versus Enabling Conditions Consider the first inference at the start of the chapter: Eating protein will cause Evelyn to gain weight. Evelyn will eat protein. Will Evelyn gain weight? Page 19 of 37

Mental Models and Causation The mental models of Evelyn’s state from the causal premise are as follows:

The categorical premise that Evelyn will eat protein eliminates the second implicit model, and so it follows that Evelyn will gain weight. Only 2% of participants failed to draw this conclusion. The same conclusion follows from mental models when the first premise states an enabling condition: Eating protein will allow Evelyn to gain weight. Only those individuals who flesh out their models of the enabling assertion to represent the alternative possibility:

will infer that Evelyn may or may not gain weight. Many people (32%), but not all, are able to envisage this alternative possibility in which Evelyn eats protein but does not gain weight (Goldvarg & Johnson-Laird, 2001). A similar study used “when” instead of “if” (e.g., when magnetism occurs, magnetism causes ionization), and yielded similar re­ sults (Sloman et al., 2009). Readers should try to identify the cause and the enabler in the following scenario: If you take the drug Coldgon, then, given that you stay in bed, you will recover from the common cold in one day. However, if you don’t stay in bed, then you won’t recover from the common cold in one day, even if you take this drug. Reasoners are most unlikely to envisage all the possibilities to which this description refers, but they should be able to think of the most salient ones, which are represented in mental models:

Reasoners should therefore realize that staying in bed is the catalyst that enables the drug to cause the one-day cure. An experiment compared scenarios such as the preceding with those in which the causal roles were swapped around, for example: If you stay in bed, then given that you take the drug Coldgon, you will recover from the common cold in one day. However, if you don’t take this drug, then you won’t recover from the common cold in one day, even if you stay in bed. Page 20 of 37

Mental Models and Causation Eight scenarios ranged over various domains—physical, physiological, mechanical, so­ cioeconomic, and psychological—and counterbalanced the order of mention of cause and enabler. The participants (p. 179) read just one version of each scenario. They identified the predicted causes and enablers on 85% of trials, and each of them did so more often than not, and each scenario bore out the difference (Goldvarg & Johnson-Laird, 2001). Cheng and Novick (1991) showed that their participants could distinguish between caus­ es and enablers in similar sorts of everyday scenarios, but, for reasons pertaining to their probabilistic theory, their scenarios described enabling conditions that were constant throughout the events in the scenario, such as the presence of gravity, whereas causes were not constant, such as a boy pushing a girl. But, the present study swapped the roles of causes and enablers from one scenario to another, and neither was constant. In the preceding example, a person might or might not stay in bed, and might or might not take Coldgon. Hence, constancy is not crucial for individuals to identify an enabler, and incon­ stancy is not crucial for them to identify a cause. Linguistic cues, such as “if” versus “given that,” might have signaled the distinction be­ tween causes and enablers (Kuhnmünch & Beller, 2005). But, when these cues were rig­ orously counterbalanced or eliminated altogether, individuals still reliably distinguished between causes and enablers (see Frosch & Byrne, 2006). Likewise, when scenarios con­ tained only a cause, or only an enabler, and used the same linguistic cue to introduce both, individuals still reliably identified them (Frosch, Johnson-Laird, & Cowley, 2007). This follow-up study contrasted causes and enablers within six scenarios about wrongdo­ ing, such as: Mary threw a lighted cigarette into a bush. Just as the cigarette was going out, Laura deliberately threw petrol on it. The resulting fire burnt down her neighbor’s house. The participants again distinguished between those individuals whose actions caused criminal events, such as Laura, and those who enabled them to occur, such as Mary. Moreover, they judged causers to be more responsible than enablers, liable for longer prison sentences, and liable to pay greater damages. It is regrettable that neither English nor American law makes the distinction between causers and enablers (Johnson-Laird, 1999)—a legacy of Mill’s (1874) views, as embodied in judicial theory (see Hart & Honoré, 1985; Lagnado & Gerstenberg, Chapter 29 in this volume). According to the model theory, a single instance of A and not-B refutes A causes B in ei­ ther its strong or its weak sense (see Table 10.1). The refutation of an enabling relation is more problematic. In its strong sense, it is necessary to show that the effect can occur in the absence of the enabler; in its weak sense, only temporal order is at issue. A further difficulty is that both causes and enablers have the same mental models. Frosch and John­ son-Laird (2011) invited their participants to select which sort of evidence, A and not-B or not-A and B, provides more decisive evidence against each of eight causal and eight en­ abling assertions, such as Regular exercise of this sort causes a person to build muscle. Page 21 of 37

Mental Models and Causation and: Regular exercise of this sort enables a person to build muscle. Every single participant chose A and not-B more often than not-A and B, but, as the theo­ ry predicts, they chose not-A and B reliably more often as a refutation for enables (25% of occasions) than for causes (10%), even though it refutes the strong meaning of causes too. They had an analogous bias in judging whether a single observation sufficed to refute a claim. The general conclusion from these studies is that individuals distinguish between causes and enabling conditions in deductions, in inferring the role of actors in scenarios, and in assessing refutations of causal claims. In each of these cases, the distinction follows from the model theory’s deterministic account of the meanings of causes and enables (see Ta­ ble 10.2). It is not at all clear that theories that do not base the distinction between these relations on different sets of possibilities can explain these results (cf. Ali et al., 2011; Slo­ man et al., 2009). The model theory makes further predictions about ternary causal rela­ tions, such as: Staying in bed enables Coldgon to cause you to recover in a day. Ternary relations of the sort A enables B to cause C are distinct from a conjunction of A enables C and B causes C, and so challenge the representational power of probabilistic networks, whose binary links have no natural way to represent them.

Inductions of Causal Relations Learning is often a matter of inducing causal relations from observations of the relative frequencies in the covariations of contingencies (see, e.g., Perales & Shanks, 2007; Lu et al., 2008; and see Perales, Catena, Cándido, & Maldonado, Chapter 3 in this volume). Conditioning and reinforcement learning (p. 180) also concern causation (see Le Pelley, Griffiths, & Beesley, Chapter 2 in this volume). Probabilistic inductions at one level can feed into those at a higher or more abstract level in a hierarchical Bayesian network (e.g., Tenenbaum, Griffiths, & Kemp, 2006). Once the network is established, it can assign val­ ues to conditional probabilities that interrelate variables at one level or another (see Grif­ fiths, Chapter 7 in this volume). Yet, causal relations are deterministic, and it is our igno­ rance and uncertainty that force us to treat them as probabilistic (Pearl, 2009, Ch. 1). Inductive reasoning can yield deterministic causal relations. For instance, Robert Boyle carried out experiments in which he varied the pressure of a gas, such as air, and discov­ ered that the pressure of a given quantity of gas at a fixed temperature is inversely pro­ portional to the volume it occupies. This well-known law is deterministic, and so it is iron­ ic that its ultimate explanation is the statistical kinetic theory of gases. Inductions of causal relations are also the intellectual backbone of medicine (see Lombrozo & Vasilye­ va, Chapter 22 in this volume). A typical example is the discovery of the pathology and communication of cholera. When it first arrived in Britain in the nineteenth century, doc­ Page 22 of 37

Mental Models and Causation tors induced that they were dealing with a single disease with a single pathology, not a set of alternative diseases, because of its common symptoms and prognosis. The induc­ tion reflected the heuristic that similar causes have similar effects (Hume, 1748/1988, p. 80). How the disease was communicated from one person to another was more mysteri­ ous. The arrival of an infected person in a particular place often led to an outbreak there. Doctors induced that the illness was either infectious or contagious. Sometimes, however, the disease could leap distances of several miles. Doctors induced that it could be con­ veyed through the air. The prevalence of cholera in slums with their stinking air seemed to corroborate this “miasmal” hypothesis. The doctor who discovered the true mode of the disease’s communication, John Snow, was an expert on anesthesia, and his familiarity with Boyle’s law and the other gas laws enabled him to infer the impossibility of the mias­ mal account (Johnson-Laird, 2006, Ch. 27). His bias toward parsimony led him to induce a common cause. Infected individuals could transmit some sort of particle of the disease, even perhaps an animalcule, to others who were in contact with their fecal matter. If these particles got into the water supply, they could then be transmitted over larger dis­ tances. Snow constructed a causal chain that explained both the pathology of the disease and its communication. And he made many observations that corroborated the idea. He then turned to a series of brilliant natural experiments. He found streets in London sup­ plied with water from two companies, one that drew its water from the Thames down­ stream from the main sewer outflows and one that drew it upstream from them. As he predicted, 20 times more deaths from the disease occurred in those households supplied from the downstream company than in those supplied from the upstream company. Fre­ quencies accordingly entered into his tests of the theory, but not into its mechanism. As the preceding account suggests, inductions are easy. There was no shortage of hy­ potheses about what caused cholera to spread from person to person: infection, conta­ gion, miasma. Knowledge can lead to an induction from a single observation—a claim supported by considerable evidence (see, e.g., Johnson-Laird, 2006, Ch. 13; White, 2014). One source of such inferences is knowledge of a potential mechanism (see Johnson & Ahn, Chapter 8 in this volume), which itself may take the form of a model—a point that we elucidate later. Likewise, “magical’’ thinking, which underlies common beliefs in all soci­ eties, is a result of induction and the Humean heuristic that similar causes have similar effects (Johnson-Laird, 2006, Ch. 5). The hard task is to use observation and experiment to eliminate erroneous inductions. It is simple to refute the strong claim: The rooster’s crowing causes the sun to rise. The observation that the sun also rises when the rooster does not crow suffices. The weaker claim that the rooster’s crowing suffices for the sun to rise but other putative causes exist too, calls for an experiment in which the rooster is made to crow, say, at mid­ night. General causal claims, however, are notoriously difficult to refute. That is the busi­ ness of experimental sciences. At the center of the model theory is the idea that the process of interpretation builds models. In induction, modulation increases information. One way in which it does so is to Page 23 of 37

Mental Models and Causation add knowledge to a model. For instance, it sets up causal relations between events in the model in so-called bridging inferences (Clark, 1975), that is, inferences that build a bridge from an assertion to its appropriate antecedent. An experiment showed the poten­ cy of such inferences (Khemlani & Johnson-Laird, 2015). In one condition, the experiment presented (p. 181) sets of assertions for which the participants could induce a causal chain, for example: David put a book on the shelf. The shelf collapsed. The vase broke. In a control condition, the experiment presented sets of assertions for which the partici­ pants could not readily infer a causal chain, for example: Robert heard a creak in the hall closet. The faucet dripped. The lawn sprinklers started. When a subsequent assertion contradicted the first assertion in a set, the consequences were quite different between the two conditions. In the causal condition, the contradicto­ ry assertion: David didn’t put a book on the shelf led to a decline in the participants’ ratings of the strength of their beliefs in each of the subsequent assertions: only 30% of them now believed that the vase broke. In the control condition, the contradictory assertion: Robert did not hear a creak in the hall closet had no reliable effect on the participants’ strength of belief in the subsequent assertions. All of them continued to believe that the lawn sprinklers started. This difference in the propagation of doubt is attributable to the causal interpretation of the first sort of sce­ nario, and the near impossibility of a causal interpretation for the second scenario. The model theory assumes that knowledge and beliefs can themselves be represented in models, and so the essence of modulation, which occurs in bridging inferences, is to make a conjunction of two sets of models: one set represents the possibilities to which asser­ tions refer, and the other set represents possibilities in knowledge. A simple example of the process occurs when knowledge modulates the core interpretation of conditionals by blocking the construction of models (see Johnson-Laird & Byrne, 2002). A slightly differ­ ent case is likely to have occurred in Snow’s thinking about cholera. The received view was that cholera was transmitted in various ways—by infection or contagion when there was physical contact with a victim or by a miasma in other cases: Page 24 of 37

Mental Models and Causation

Snow’s knowledge of the gas laws yielded two negative cases:

In deductive reasoning, the conjunction of two inconsistent models, such as the models in these sets concerning infection and miasma, results in the empty model (akin to the emp­ ty set), which represents contradictions. But, when one model is based on knowledge, it takes precedence over a model based on premises (Johnson-Laird, Girotto, & Legrenzi, 2004). Precedence in the conjunction of the two sets of models above yields models in which no transmission occurs by infection or miasma, and only one mechanism transmits the disease: Physical contact contagion transmission Snow knew, however, that the disease could also be transmitted over distances. Induction could not yield its mode of transmission. An explanation called for a more powerful sort of inference, abduction, to which we know turn. In Snow’s case, it led to an explanation based on the transmission of “particles” of the disease through the water supply. This idea was never accepted in his lifetime, but he had inferred the disease’s mode of trans­ mission without any knowledge of germs, and his “particles” were later identified as the bacterium Vibrio cholerae.

Abductions of Causal Explanations A fundamental aspect of human reasoning is abduction: the creation of explanations. Like inductions, they increase information, but unlike inductions, they also introduce new con­ cepts that are not part of their premises. Abduction, in turn, depends on understanding, and according to the model theory, if you understand, say, inflation, the way a computer works, DNA, or a divorce, then you have a mental model of them. It may be rich in detail or simple—much as a clock functions as a model of the earth’s rotation (Johnson-Laird, 1983, p. 2). Abductions usually concern causation. Investigators have studied them in ap­ plied domains, such as medical diagnosis (see Meder & Mayrhofer, Chapter 23 in this vol­ ume). To illustrate the role of models in abductions, we consider two cases: the resolution of causal inconsistencies and the reverse engineering of electrical circuits.

Page 25 of 37

Mental Models and Causation

Explanations of Inconsistencies When you are surprised in daily life, something has usually happened contrary to your be­ liefs or their (p. 182) consequences. You believe that a friend has gone to fetch the car to pick you up, and that if so, your friend should be back in no more than five minutes. When your friend fails to return within 20 minutes, this fact refutes the consequences of your beliefs. A large literature exists in philosophy and artificial intelligence on how you then ought to modify or withdraw your conclusion and revise your beliefs—a process that is known as “non-monotonic” or “defeasible” reasoning (see, e.g., Brewka, Dix, & Konolige, 1997). What is more important in daily life, however, is to explain the origins of the incon­ sistency—why your friend hasn’t returned—because such an explanation is vital to your decision about what to do. But, where do explanations come from? The answer has to be from knowledge (see Lombrozo and Vasilyeva, Chapter 22 in this volume). Some explanations are recalled, but many are novel: they are created from knowledge of causal relations, that is, models in long-term memory of what causes, en­ ables, and prevents various events. This knowledge can be used to construct a simulation of a causal chain. A computer program implements the process (see Johnson-Laird et al., 2004). To understand it, readers should try to answer the following question: If someone pulled the trigger, then the pistol fired. Someone pulled the trigger. But the pistol did not fire. Why not? The program constructs a model of the possibility described in the first two assertions:

But, as it detects, the third assertion is inconsistent with this model. The conditional ex­ presses a useful idealization, and the program builds a model of the facts, and its counter­ factual possibilities (cf. Pearl, 2009, Ch. 7):

The program has a knowledge base consisting of fully explicit models of several ways in which a pistol may fail to fire (i.e., preventive conditions such as something jammed the pistol, there were no bullets in the pistol, its safety catch was on). The model of the pre­ ceding facts triggers one such model, which the program chooses arbitrarily if the evi­ dence leaves open more than one option, and the model takes precedence over the facts to create a possibility, for example: Page 26 of 37

Mental Models and Causation

The new proposition, not(bullets in pistol), elicits a cause from another set of models in the knowledge base, for example, if a person empties the bullets from the pistol, then there are no bullets in the pistol. In this way, the program constructs a novel causal chain. The resulting possibility explains the inconsistency: a person intervened to empty the pistol of bullets. And the counterfactual possibilities yield the claim: If the person hadn’t emptied the pistol, then it would have had bullets, and it would have fired. The fact that the pistol did not fire has been used to create an explanation from knowl­ edge, which in turn transforms the generalization into a counterfactual claim. Interven­ tion is sometimes said to demand its own logic (Sloman, 2005, p. 82; see also Glymour, Spirtes, & Scheines, 2000; Pearl, 2000), but the standard machinery of modulation copes with precedence given to models based on knowledge in case of inconsistencies (JohnsonLaird, 2006, p. 313). This same machinery handles the “non-monotonic” withdrawal of conclusions and modification of beliefs. The theory predicts that explanations consisting of a causal chain, such as a cause and ef­ fect, should be highly plausible. They should be rated as more probable than explanations consisting of the cause alone, or the effect alone. An experiment corroborated this predic­ tion in a study of 20 different inconsistent scenarios (see Johnson-Laird et al., 2004). The participants rated the probability of various putative explanations, and they tended to rank the cause-and-effect explanations as the most probable. Hence, individuals do not al­ ways accommodate a new fact with a minimal change to their existing beliefs (see also Walsh & Johnson-Laird, 2009). The acceptance of a conjunction of a cause and effect calls for a greater change than the acceptance of just the cause or the effect. Another study showed that individuals also rate explanations as more probable than minimal revisions to either the conditional or the categorical premise to restore consistency (Khemlani & John­ son-Laird, 2011). Contrary to a common view, which William James (1907, p. 59) first pro­ pounded, the most plausible explanation is not always minimal. (p. 183)

Causal Abduction and Reverse Engineering

The lighting in the halls of many houses has an ingenious causal control, using a switch on the ground floor and a switch on the upper floor. If the lights are on, either switch can turn them off; if the lights are off, either switch can turn them on. The reader should jot down a diagram of the wiring required for this happy arrangement. The problem is an in­ stance of “reverse engineering”: to abduce a causal mechanism underlying a system of a known functionality. A study of the reverse engineering of such circuits revealed a useful distinction between two levels of knowledge—global and local (Lee & Johnson-Laird, 2013). A simple switch closes to make a circuit and opens to break the circuit, but a more complicated switch is needed for the lighting problem. It has two positions, and in one po­ sition it closes one circuit, and in the other position it both breaks this circuit and closes a Page 27 of 37

Mental Models and Causation separate circuit. It can also be used merely to make or break a single circuit. Figure 10.1 is a diagram showing the two positions of such a switch. An experimental study examined naïve individuals’ ability to reverse-engineer three sorts of circuits containing two switches: a circuit in which the light comes on only when both switches are on (a conjunction), a circuit in which the light comes on when one or both switches are on (an inclusive disjunction), and the hall circuit in which the light comes on when one switch or else the other is on (an exclusive disjunction). Each problem was pre­ sented in a table showing the four possible joint positions of the two switches and whether the light was on or off in each case. The participants knew nothing about wiring circuits in series or in parallel, but the experimenter described how the switch in Figure 10.1 worked, and explained that electricity “flows” when a circuit is completed from one terminal on a battery (or power source) to the other. The task was to design correct cir­ cuits for the three sorts of problem in which, as the participants knew, there was already one direct connection from the power to the light.

Figure 10.1 A diagram of the two different positions of a switch making or breaking two alternative cir­ cuits.

Figure 10.2 shows simple solutions for the three circuits. The experimenter video-record­ ed how people wired up actual switches, and in other experiments how they drew a suc­ cession of circuit diagrams to try to solve a problem, or else diagrams of pipes and faucets for three isomorphic problems about the flow of water to turn a turbine. Most par­ ticipants focused either on getting the circuit to deliver one correct output at a time (i.e., a single causal possibility), taking into account the positions of both switches, but a few tried to get one switch at a time to work correctly. The difficulty of reverse engineering should depend on the number of possible configurations, determined by the number of variable components (the switches), the number of their settings that yield positive out­ puts (the light comes on), and the interdependence of the components in controlling the outputs. Only the exclusive disjunction depends on the joint positions of the two switches both to turn the light on and to turn it off. The results showed that both the number of settings with positive outcomes

Page 28 of 37

(p. 184)

and interdependence increased the difficulty of

Mental Models and Causation the task, and so conjunctions were easier than disjunctions, which in turn were easier than exclusive disjunctions.

Figure 10.2 Minimal circuits for a and b, a or b, and a or else b. The rectangle and the circle represent the battery and the bulb, respectively, and each black dot represents a terminal of a switch of the sort shown in Figure 10.1. The light is on in the circuits shown for and and or, but off in the circuit for or else.

Table 10.4 Mean Number of Times in 20,000 Trials in Which the Program Reverse En­ gineered and, or, and or else Switch Circuits, Depending on the Constraints on Its Gen­ erative Process Type of Prob­

Local Con­

Global Con­

Local and Global Con­

lem

straints

straints

straints

a and b

41

3316

4881

a or b, or both

95

619

1359

a or else b, but not both

 0

1

6

Source: Based on Lee & Johnson-Laird (2013). A computer program implementing abduction solves the problems. It explores models of the circuits by making arbitrary wirings under the control of local or global knowledge, or both (see Lee & Johnson-Laird, 2013). The program has access to five local constraints, which govern individual components in the model: a single wire should not connect a ter­ minal on a switch or light to itself or to any other terminal on the same component, it should not be duplicated or be the converse of an existing wire, and it should not connect the power directly to the light (because of the pre-existing connection between them). The program also had access to six global constraints, which govern the model as a whole: the circuit should yield the given output for each switch position, it should contain at least six wires, it should connect the battery to at least one switch, it should connect the light to at least one switch, and each switch should have a wire on its single terminal and another wire on at least one of its double terminals. Table 10.4 shows the results of 20,000 computer simulations in each of several conditions depending on the constraints governing its performance. As it shows, global constraints are more efficient than local constraints, but the two combined increase performance to a level comparable to that of Page 29 of 37

Mental Models and Causation the human participants. Like them, the program almost always fails with an exclusive dis­ junction. Yet, in a rare instance, it did discover a novel circuit for the exclusive disjunc­ tion, which Figure 10.3 presents, and which is a two-dimensional solution unlike the one in Figure 10.2 in which one wire crosses over another. The program uses abduction to produce circuits, and deduction to test their causal conse­ quences. This procedure is common in the creation of explanations. Reasoners also used both abduction and deduction to create informal algorithms for rearranging the order of cars in a train on a track that included a siding (Khemlani et al., 2013).

Figure 10.3 A novel two-dimensional circuit for a or else b that the computer program discovered. Start corresponds to the battery, and End to the bulb, with their other two terminals connected directly. Either position in which one switch is up and the other is down causes the current to flow.

Conclusions The theory of mental models accounts for the meaning of causal relations, their mental representation, and reasoning from them. It proposes meanings that are deterministic. A causes B means that given A the occurrence of B is necessary; A allows B means that giv­ en A the occurrence of B is possible; and A prevents B means that given A the occurrence of B is impossible. If these relations were probabilistic, then necessity would tolerate ex­ ceptions and be equivalent to possibility, and causes would be equivalent to enables. The consilience of evidence corroborates deterministic meanings. For instance, the inference at the head of this chapter: Eating protein will cause Evelyn to gain weight. Evelyn will eat protein. Will Evelyn gain weight? elicited an almost unanimous response of “yes,” which is incompatible with a probabilis­ tic interpretation of causation. Likewise, other studies, which we have described in this chapter, bore out deterministic predictions (e.g., individuals treat a single (p. 185) coun­ Page 30 of 37

Mental Models and Causation terexample as refuting causation). Of course, causal claims can be explicitly probabilistic, as in this example from Suppes (1970, p. 7): Due to your own laziness, you will in all likelihood fail this course. Likewise, generic assertions, whether they are about causes: Asbestos causes mesothelioma or not: Asbestos is in ceiling tiles can tolerate exceptions (Khemlani, Barbey, & Johnson-Laird, 2014). But, just as infer­ ences differ between conditionals with and without probabilistic qualifications (Goodwin, 2014), they are likely to do so for causal relations. Skeptics may say that fully explicit models of real causal relations should contain myriad hidden variables (preventers, en­ ablers, alternative causes) in complex structures, and so they ought to be much more complex than the models in this chapter. We agree. And we refer readers to the models of causal relations in real legal cases (Johnson-Laird, 1999). They soon overtake the reason­ ing ability of naïve individuals—just as comparable estimates of probabilities do (Khem­ lani, Lotstein, & Johnson-Laird, 2015). Other theories of causation postulate other ele­ ments in its meaning, such as forces, mechanisms, and interventions. The model theory accommodates these elements, but not in the meanings of causal relations. They are in­ corporated into the model as a result of modulation—the process that integrates models of knowledge and models of discourse, with the former taking precedence over the latter in case of conflicts. Modulation in the process of interpretation incorporates knowledge of these other elements into models. The model theory is sometimes wrongly classified as concerned solely with binary truth values. In fact, as this chapter has aimed to show, it is rooted in possibilities. They readily extend to yield extensional probabilities based on proportions of possibilities or their fre­ quencies (Johnson-Laird et al., 1999), and non-extensional probabilities based on evi­ dence (Khemlani, Lotstein, & Johnson-Laird, 2015). And possibilities yield seven, and only seven, distinct causal relations: strong and weak meanings of causes, prevents, and al­ lows and allows_not, with the weak meanings of the latter two relations being identical. The only proviso in their meanings is that their antecedents cannot occur after their ef­ fects. The mental models of these relations represent the situations they refer to, and they are identical for causes and allows unless individuals flesh out their models with ex­ plicit models of other possibilities. This identity is reflected in experimental results—indi­ viduals often infer an effect from the statement of either a cause or an enabling condi­ tion. It is also reflected in a long tradition that the difference between the two relations lies, not in their meanings, but in other factors such as normality, constancy, and rele­ vance—a tradition that still lives in common law. Mental models suffice for many infer­ ences. The principle that they represent only what is possible given the truth of the

Page 31 of 37

Mental Models and Causation premises yields systematic illusory inferences. But, only fully explicit models elucidate ternary relations of the sort: Staying in bed enables Coldgon to cause your recovery from a cold in one day. Such relations cannot be reduced to a conjunction of causing and enabling. Inductions of causal relations rely on knowledge, especially those inductions—known as abductions—that yield explanations. In daily life, abductions rely on knowledge of causes and their effects. The model theory explains the process in terms of modulation, which al­ so explains how individuals cope with inconsistencies: models of knowledge take prece­ dence over other sorts of model. Hence, abduction leads to explanations that resolve in­ consistencies, to the non-monotonic withdrawal of conclusions, and to the revision of be­ liefs. We have illustrated this role of models and their role in reverse engineering. The lat­ ter sort of abduction depends on both knowledge of local constraints governing the com­ ponents in a model, and knowledge of global constraints on models as a whole. In sum, causal relations refer to conjunctions of temporally ordered possibilities. Human reasoners envisage these possibilities in mental models, which highlight only the salient cases. They use their knowledge to modulate these representations, and they infer the consequences of the resulting models.

Future Directions Psychological research into causation is burgeoning, and so we describe here three direc­ tions of research most pertinent to the model theory. 1. Reasoning in certain domains depends on the use of a kinematic model that un­ folds in time to represent a succession of events (Khemlani et al., 2013). Such mental simulations should also underlie causal reasoning, but the hypothesis has yet to be tested in experiments. (p. 186) 2. “The reason for Viv divorcing Pat was infidelity.” Are reasons merely caus­ es of another sort? Many philosophers have supposed so (see, e.g., Dretske, 1989), but to the best of our knowledge no empirical research has examined this idea. Per­ haps some reasons are causes of intentions rather than direct causes of actions (Miller & Johnson-Laird, 1976). 3. The treatment of causal relations as probabilistic has been very fruitful. But, the evidence that we have considered supports deterministic meanings for causation, and the use of probabilities as a way to treat human ignorance—a Bayesian approach that we have defended for the probabilities of unique events (Khemlani, Lotstein, & Johnson-Laird, 2015). A major task for the field is to reach a consensus about how to incorporate probabilities into causal reasoning in a way that distinguishes between causes and enabling conditions.

Page 32 of 37

Mental Models and Causation

References Ali, N., Chater, N., & Oaksford, M. (2011). The mental representation of causal condition­ al reasoning: Mental models or causal models. Cognition, 119, 403–418. Barbey, A. K., & Wolff, P. (2007). Learning causal structure from reasoning. In D. S. Mc­ Namara & J. G. Trafton (Eds.), Proceedings of the 29th annual conference of the Cognitive Science Society (pp. 713–718). Mahwah, NJ: Lawrence Erlbaum Associates. Bauer, M. I., & Johnson-Laird, P. N. (1993). How diagrams can improve reasoning. Psycho­ logical Science, 4, 372–378. Bell, V., & Johnson-Laird, P. N. (1998). A model theory of modal reasoning. Cognitive Science, 22, 25–51. Brewka, G., Dix, J., & Konolige, K. (1997). Nonmonotonic reasoning. Stanford, CA: CSLI. Bucciarelli, M., & Johnson-Laird, P. N. (1999). Strategies in syllogistic reasoning. Cogni­ tive Science, 23, 247–303. Bucciarelli, M., & Johnson-Laird, P. N. (2005). Naïve deontics: A theory of meaning, repre­ sentation, and reasoning. Cognitive Psychology, 50, 159–193. Bucciarelli, M., Khemlani, S., & Johnson-Laird, P. N. (2008). The psychology of moral rea­ soning. Judgment and Decision Making, 3, 121–139. Byrne, R. M. J. (2005). The rational imagination. Cambridge, MA: MIT Press. Cheng, P. W., & Novick, L. (1991). Causes versus enabling conditions. Cognition, 40, 83– 120. Clark, H. H. (1975). Bridging. In R. C. Schank & B. L. Nash-Webber (Eds.), Theoretical is­ sues in natural language processing (pp. 169–174). New York: Association for Computing Machinery. Craik, K. (1943). The nature of explanation. Cambridge, UK: Cambridge University Press. Dretske, F. (1989). Reasons and causes. Philosophical Perspectives, 3, 1–15. Einhorn, H. J., & Hogarth, R. M. (1986). Judging probable cause. Psychological Bulletin, 99, 3–19. Evans, J. St .B. T., & Over, D. E. (2004). If. New York: Oxford University Press. Frosch, C. A., & Byrne, R. M. J. (2006). Priming causal conditionals. In R. Sun (Ed.). Pro­ ceedings of 28th annual conference of the Cognitive Science Society (p. 2485). Mahwah, NJ: Lawrence Erlbaum Associates.

Page 33 of 37

Mental Models and Causation Frosch, C. A., & Johnson-Laird, P. N. (2011). Is everyday causation deterministic or proba­ bilistic? Acta Psychologica, 137, 280–291. Frosch, C. A., Johnson-Laird, P. N., & Cowley, M. (2007). It’s not my fault, your Honor, I’m only the enabler. In D. S. McNamara & J. G. Trafton (Eds.), Proceedings of the 29th annu­ al meeting of the Cognitive Science Society (p. 1755). Mahwah, NJ: Lawrence Erlbaum Associates. García-Madruga, J. A., Moreno, S., Carriedo, N., Gutiérrez, F., & Johnson-Laird, P. N. (2001). Are conjunctive inferences easier than disjunctive inferences? A comparison of rules and models. Quarterly Journal of Experimental Psychology, 54A, 613–632. Geminiani, G. C., Carassa, A., & Bara, B. G. (1996). Causality by contact. In J. Oakhill & A. Garnham (Eds.), Mental models in cognitive science (pp. 275−303). Hove, East Sussex: Psychology Press. Goldvarg, Y., & Johnson-Laird, P. N. (2001). Naïve causality: a mental model theory of causal meaning and reasoning. Cognitive Science, 25, 565–610. Goodwin, G. P. (2014). Is the basic conditional probabilistic? Journal of Experimental Psy­ chology: General, 143, 1214–1241. Harré, R., and Madden, E. H. (1975). Causal powers. Oxford: Blackwell. Hart, H. L. A., and Honoré, A. M. (1959/1985). Causation in the law (2nd ed.). Oxford: Clarendon Press. Hilton, D. J., & Erb, H.-P. (1996). Mental models and causal explanation: judgements of probable cause and explanatory relevance. Thinking & Reasoning, 2, 273–308. Hume, D. (1739/1978). A treatise on human nature. (L. A. Selby-Bigge, Ed.) (2nd ed.). Ox­ ford: Oxford University Press. Hume, D. (1748/1988). An enquiry concerning human understanding. (A. Flew, Ed.) La Salle, IL: Open Court. James, W. (1907). Pragmatism. New York: Longmans, Green. Jeffrey, R. (1981). Formal logic: Its scope and limits (2nd ed.). New York: McGraw-Hill. Johnson-Laird, P. N. (1983). Mental models. Cambridge, MA: Harvard University Press. Johnson-Laird, P. N. (1999). Causation, mental models, and the law. Brooklyn Law Review, 65, 67–103. Johnson-Laird, P. N. (2006). How we reason. Oxford: Oxford University Press. Johnson-Laird, P. N., & Byrne, R. M. J. (2002). Conditionals: A theory of meaning, prag­ matics, and inference. Psychological Review, 109, 646–678. Page 34 of 37

Mental Models and Causation Johnson-Laird, P. N., Girotto, V., & Legrenzi, P. (2004). Reasoning from inconsistency to consistency. Psychological Review, 111, 640–661. Johnson-Laird, P. N., & Goldvarg-Steingold, E. (2007). Models of cause and effect. In W. Schaeken et al. (Eds.), The mental models theory of reasoning (pp. 167–189). Mahwah, NJ: Lawrence Erlbaum Associates. Johnson-Laird, P. N., & Khemlani, S. S. (2013). Toward a unified theory of reasoning. In B. Ross (Ed.), The psychology of learning and motivation (Vol. 59, pp. 1–42). New York: Else­ vier. Johnson-Laird, P. N., Legrenzi, P., Girotto, V., Legrenzi, M., & Caverni, J.-P. (1999). Naive probability: A mental model theory of extensional reasoning. Psychological Review, 106, 62–88. Johnson-Laird, P. N., & Quinn, J. G. (1976). To define true meaning. Nature, 264, 635–636. (p. 187)

Juhos, C., Quelhas, C., & Johnson-Laird, P. N. (2012). Temporal and spatial relations in sentential reasoning. Cognition, 122, 393–404. Kant, I. (1781/1934). Critique of pure reason. New York: Dutton. Khemlani, S., Barbey, A. K., & Johnson-Laird, P. N. (2014). Causal reasoning: Mental com­ putations, and brain mechanisms. Frontiers in Human Neuroscience, 8, 1–15. Khemlani, S., & Johnson-Laird, P. N. (2011). The need to explain. Quarterly Journal of Ex­ perimental Psychology, 64, 2276–2288. Khemlani, S., & Johnson-Laird, P. N. (2013). The processes of inference. Argument and Computation, 4, 4–20. Khemlani, S., Lotstein, M., & Johnson-Laird, P. N. (2015). Naive probability: Model-based estimates of unique events. Cognitive Science, 39, 1216–1258. Khemlani, S., & Johnson-Laird, P. N. (2015). Domino effects in causal contradictions. In R. Dale, C. Jennings, P. Maglio, T. Matlock, D. Noelle, A. Warlaumont, & J. Yoshimi (Eds.), Proceedings of the 37th annual conference of the Cognitive Science Society. Austin, TX: Cognitive Science Society. Khemlani, S. S., Mackiewicz, R., Bucciarelli, M., & Johnson-Laird, P. N. (2013). Kinematic mental simulations in abduction and deduction. Proceedings of the National Academy of Sciences, 110, 16766–16771. Kuhnmünch, G., & Beller, S. (2005). Causes and enabling conditions: Through mental models of linguistic cues? Cognitive Science, 29, 1077–1090. Lee, N. Y. L, & Johnson-Laird, P. N. (2013). A theory of reverse engineering and its appli­ cation to Boolean systems. Journal of Cognitive Psychology, 25, 365–389. Page 35 of 37

Mental Models and Causation Lindley, D. V., & Novick, M. R. (1981). The role of exchangeability in inference. Annals of Statistics, 9, 45–58. Lu, H., Yuille, A., Liljeholm, M., Cheng, P. W., & Holyoak, K. J. (2008). Bayesian generic priors for causal learning. Psychological Review, 115, 955–984. Mackie, J. L. (1980). The cement of the universe: A study in causation (2nd ed.). Oxford: Oxford University Press Michotte, A. (1946/1963). The perception of causality. London: Methuen. Mill, J. S. (1874). A system of logic, ratiocinative and inductive (8th ed.). New York: Harp­ er. Miller, G. A., & Johnson-Laird, P. N. (1976). Language and perception. Cambridge, MA: Harvard University Press. Miyake, N. (1986). Constructive interaction and the iterative process of understanding. Cognitive Science, 10, 151–177. Pearl, J. (2009). Causality (2nd ed.). New York: Cambridge University Press. Peirce, C. S. (1931–1958). Collected papers of Charles Sanders Peirce. (C. Hartshorne, P. Weiss, & A. Burks, Eds.). Cambridge, MA: Harvard University Press. Perales, J. C., & Shanks, D. R. (2007). Models of covariation-based causal judgment: A re­ view and synthesis. Psychonomic Bulletin & Review, 14, 577–596. Ramsey, F. P. (1929/1990). Probability and partial belief. In D. H. Mellor (Ed.), F. P. Ram­ sey: Philosophical papers. Cambridge: Cambridge University Press. Rehder, B., & Burnett, R. C. (2005). Feature inference and the causal structure of cate­ gories. Cognitive Psychology, 50, 264–314. Russell, B. A. W. (1912–1913). On the notion of cause. Proceedings of the Aristotelian So­ ciety, 13, 1–26. Salsburg, D. (2001). The lady tasting tea. New York: W. H. Freeman. Sloman, S. (2005). Causal models. New York: Oxford University Press. Sloman, S., Barbey, A. K., & Hotaling, J. M. (2009). A causal model theory of the meaning of cause, enable, and prevent. Cognitive Science, 33, 21–50. Suppes, P. (1970). A probabilistic theory of causality. Amsterdam: North-Holland. Tenenbaum, J. B., Griffiths, T. L., & Kemp, C. (2006). Theory-based Bayesian models of in­ ductive learning and reasoning. Trends in Cognitive Sciences, 10, 309–318.

Page 36 of 37

Mental Models and Causation Waldmann, M. R. (1996). Knowledge-based causal induction. Psychology of Learning and Motivation, 34, 47–88. Walsh, C. R., & Johnson-Laird, P. N. (2009). Changing your mind. Memory & Cognition, 37, 624–631. White, P. A. (2014). Singular cues to causality and their use in human causal judgment. Cognitive Science, 38, 38–75. Woodward, J. (2003). Making things happen: A theory of causal explanation. New York: Oxford University Press. (p. 188)

P. N. Johnson-Laird

Department of Psychology Princeton University Princeton, New Jersey, USA; Depart­ ment of Psychology New York University New York, New York, USA Sangeet S. Khemlani

Navy Center for Applied Research in Artificial Intelligence Naval Research Laborato­ ry Washington, DC, USA

Page 37 of 37

Pseudocontingencies

Pseudocontingencies   Klaus Fiedler and Florian Kutzner The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.14

Abstract and Keywords In research on causal inference and in related paradigms (conditioning, cue learning, at­ tribution), it has been traditionally taken for granted that the statistical contingency be­ tween cause and effect drives the cognitive inference process. However, while a contin­ gency model implies a cognitive algorithm based on joint frequencies (i.e., the cell fre­ quencies of a 2 x 2 contingency table), recent research on pseudocontingencies (PCs) suggests a different mental algorithm that is driven by base rates (i.e., the marginal fre­ quencies of a 2 x 2 table). When the base rates of two variables are skewed in the same way, a positive contingency is inferred. In contrast, a negative contingency is inferred when base rates are skewed in opposite directions. The chapter describes PCs as a re­ silient cognitive illusion, as a proxy for inferring contingencies in the absence of solid in­ formation, and as a smart heuristic that affords valid inferences most of the time. Keywords: causal inference, contingency, pseudocontingency, base rate, joint frequency

Introduction Contingency Assessment: A Basic Module for Inductive Cognition Virtually all research and theorizing on inductive reasoning in general and on causal in­ ference in particular relies on the assumption of a basic cognitive module that enables or­ ganisms to extract statistical contingencies from the stimulus environment. In classical conditioning, a contingency is learned between the occurrence (vs. non-occurrence) of a conditional stimulus and an unconditional stimulus (Rescorla, 1967). In operant condi­ tioning, the occurrence rate of reward or punishment is contingent on the prior initiation of a behavioral response (Morse & Skinner, 1958). Causal chains are evident from interevent contingencies (e.g., Baetu & Baker, 2009); concept learning is informed by contin­ gencies between concepts and attributes (Bourne, Ekstrand, & Montgomery, 1969); stereotypes are defined as (often illusory) contingencies between groups and attributes

Page 1 of 20

Pseudocontingencies (McCauley & Stitt, 1978); and superstition arises when good versus bad luck seem to be contingent on irrelevant but eye-catching stimuli (Lieberman & Thomas, 1986). In the realm of causal reasoning, in particular, it is commonly assumed that causal infer­ ences C → E (i.e., that C causes E) are sensitive to statistical contingencies of the kind de­ picted in Figure 11.1 (see Hattori & Oaksford, 2007; Perales, Catena, Cándido, & Maldon­ ado, Chapter 3 in this volume; Perales & Shanks, 2007). The observation that the occur­ rence rate p(E | C) of an effect E in the presence of a causal condition C is higher (lower) than p(E | ~C), the rate of E in the absence of C, lends support to the inference that C ex­ erts a facilitative (inhibitory) influence on E.1 In other words, if the occurrence rate of E differs between conditions C and ~C, the effect E can be attributed at least partially to the causal condition C. To be sure, psychological models differ in the specific constraints imposed on a causal contingency. Whereas Mill’s (1872/1973) basic method of difference applies to any differ­ ence in the observation rate of E (cf. Mackie, 1974), (p. 190) Cheng and Novick (1990, 1992) provided a more formal definition of causality as a statistical contrast ∆P = p(E | C) – p(E | ~C). Various algorithms that can be used to estimate contingencies were reviewed and compared by Perales and Shanks (2007). Still, the common denominator of all these theoretical and experimental approaches is that perceived causality is sensitive to the sta­ tistical contingency, broadly defined as a difference in the conditional probabilities of E given different causal conditions (C vs. ~C; or else, C1 vs. C2). Even those approaches that emphasize further constraints beyond covariation—such as spatial-temporal contigui­ ty (Buehner, Chapter 28 in this volume; Buehner & May, 2003), causal models (Oaksford & Chater, Chapter 19 in this volume; Rottman, Chapter 6 in this volume; Waldmann, 1996), or the transmission of qualitative properties (White, 2009; Chapter 14 in this vol­ ume)—would not contest the essential contribution of statistical contingencies to causal inferences.

Figure 11.1 Statistical contingency models assume that causal inferences are a function of the joint fre­ quencies a, b, c, d of the presence versus absence of an effect (E) given the presence versus absence of a cause. As explained later, PCs afford an alternative inference algorithm that relies on the alignment of the base-rate distributions (i.e., the marginal row sums and column sums).

Page 2 of 20

Pseudocontingencies Joint Frequencies as a Source of Contingency Estimation Much in line with Gigerenzer’s (1991) notion that methodological tools shape theories of the mind, it has been widely taken for granted that the cognitive process of causal infer­ ence relies on the same basic algorithms that statisticians consider normatively appropri­ ate for contingency estimation. Normative statistical models assume, with reference to Figure 11.1, that contingency estimates have to rely on some combination of the joint fre­ quencies a, b, c, d given within the cells of a contingency table. That is, applying the most common contingency measure ∆P to causal inferences amounts to estimating the differ­ ence between the conditional probability p(E | C) of the effect E in the presence of cause C minus the conditional probability p(E | ~C) of E in the absence of C. Cognitive esti­ mates of both conditionals must use the observed joint frequencies: a/(a+b) (i.e., the joint frequency a of E and C divided by the overall frequency (a+b) of E regardless of C) pro­ vides an estimate of p(E | C). By the same rationale, c/(c+d ) serves to estimate p(E | ~C). In other words, a normative statistical model of contingency as a difference of two condi­ tional probabilities estimated from joint frequencies, ∆P = a/(a+b) + c/(c+d ), is presup­ posed to determine the cognitive algorithm used to detect and assess contingencies. To be sure, some alternative statistical models have suggested slightly different algo­ rithms from a, b, c, and d,2 but virtually all models assume that empirically observed joint frequencies afford the input to the cognitive process, as explained by Hattori and Oaks­ ford (2007) or Perales and Shanks (2007; see also Perales et al., Chapter 3 in this vol­ ume). This basic assumption about the type of information utilized in cognitive contin­ gency assessment is evident from a huge literature on causal inference. Assessing the contingency between shelling and tanks exploding (Shanks, 1986), between the irradia­ tion of food and its quality (Waldmann & Hagmayer, 2001), or between the pressing of a button and the illumination of a light (Alloy & Abramson, 1979) are generally assumed to reflect the joint frequencies of the outcomes and the causal conditions.

Pseudocontingencies: A Seemingly Counter-Normative Alternative Despite this consensus among statisticians and cognitive psychologists, there is growing evidence from a new research program on pseudocontingencies (PCs; Fiedler, Freytag & Meiser, 2009; Fiedler, Kutzner & Vogel, 2013), suggesting that under a wide range of nat­ ural conditions, contingency estimates may actually not rely on the joint frequencies giv­ en within the cells of a contingency table.3 Rather, a much simpler, seemingly unwarrant­ ed algorithm utilizes the base rates of events, here p(E) and p(C). These base rates corre­ spond to the marginal distributions of the attributes defining the rows and columns of the contingency table (cf. Figure 11.1). Specifically, the algorithm makes contingency infer­ ences when two skewed base-rate distributions are aligned. If both base-rate distribu­ tions are skewed in the same direction, that is, when in a given environment both E and C occur more frequently than ~E and ~C, or else when both E and C occur more (p. 191) in­ frequently than ~E and ~C, a positive contingency is inferred. In contrast, when the baserate distributions are misaligned in that E is frequent but C infrequent, or else E is infre­ quent but C is frequent, the inferred contingency is negative.

Page 3 of 20

Pseudocontingencies Let us illustrate this seemingly unwarranted algorithm with a meaningful example. Sup­ pose we are facing the basic operant learning task of a young child who must learn to dis­ criminate what behaviors are allowed and what is forbidden, as evidenced by her parents’ facial expressions. Let us assume two independent causal influences. On one day, the par­ ents are engaged in paperwork (e.g., for filing taxes) and are therefore likely to react with angry facial expressions. At the same time and independently, the child has slept well and therefore is in the mood for exploration (i.e., touching everything the parents are concerned with). As a result, the child’s high rate of exploration coincides with a high rate of angry parental facial expressions, but assume that angry facial expressions are no less likely when the child does not explore (i.e., there is no positive contingency). Even though there is no causal relation at the level of specific actions, the child will probably infer a contingency between exploration behavior and angry reactions. Importantly, this inference relies solely on base rates in the marginal distribution (cf. Fig­ ure 11.1), rather than on an assessment of joint frequencies within the cells of the 2 x 2 table. The child may have never compared the angry-face rate given exploration behavior versus given no exploration behavior. She may have only noted—maybe from hindsight in the evening, when she goes to bed—that today a lot of exploration was met with a lot of anger, and this summary of daily base rates may lead the child to infer a contingency. This is especially likely when the rate of exploration and angry faces is low on another day, be­ cause such a conspicuous alignment of skewed base rates suggests a common cause. The child may thus infer a contingency, even though she never engaged in a proper contin­ gency test (i.e., trying out whether not exploring reduces the anger rate). Had she done so, she might have found that not exploring had made angry faces even more likely than exploring. After all, the alignment of unequal base rates does not logically imply a contin­ gency inference: when p(angry) > p(not angry) coincides with p(explore) > p(not explore), it is nevertheless possible that p(angry | explore) < p(angry | not explore). That is, at the specific level of elementary behavioral observations, the correlation may indeed be nega­ tive! Interestingly, however, the example also shows that at a higher level of aggregation, the PC inference is of course consistent with an “ecological correlation” (Robinson, 1950) that actually exists, in the example across days. Days characterized by high exploration base rates tend to be those with a high rate of angry expressions. In other words, the il­ logical or illusory nature of PC inferences merely consists in the unwarranted application of a (higher-order) contingency of base rates for the assessment of contingencies be­ tween individuating items on a lower level of aggregation (cf. Fiedler et al., 2009).

Empirical Review of Research on Pseudocon­ tingencies Pseudocontingency as a Resilient Cognitive Illusion

Page 4 of 20

Pseudocontingencies Although it should be obvious that PC inferences are logically unwarranted, some illustra­ tive examples may be in place. If there are many boys (and few girls) in a high-performing physics class with the same high rate of high performance among boys and girls, it is of course unfair and inappropriate to infer a positive contingency between the frequent gen­ der (male) and the frequent outcome (high performance). If the same high rate of positive (rather than negative) behavior holds in a minority as in a majority, it is hardly justified to infer a contingency that implies less positive evaluation of the minority than the majority. Still, there is a good deal of evidence for PC illusions of a severe type. This evidence shows that even intelligent people (typically drawn from student populations) base their contingency assessment on the alignment of two skewed base-rate distributions. The severity of these PC illusions is particularly impressive when the stimulus information ei­ ther exhibits an actual contingency that is zero or even opposite to the PC inference, or when the absence of joint occurrence rates from the stimulus input should prohibit con­ tingency inferences.

Setting Pseudocontingencies Apart from Actual Contingencies Let us first consider the case when both the joint frequencies of two attributes as well as the base rates are given, causing a competition between genuine contingencies and PCs. For instance, in a simulated classroom setting, Fiedler, Walther, Freytag, and Plessner (2002; see also Fiedler, Freytag, & Unkelbach, 2007) orthogonally varied the probability with which students raised their hands and gave (p. 192) correct responses. In a high-per­ forming class, in which the correctness rate was generally high, students with a high par­ ticipation rate, who frequently raised their hands to provide a response, were perceived to be superior in terms of performance than less motivated students who responded infre­ quently. This is because the coincidence of two high base rates, for p(correct) and p(par­ ticipate), leads to a PC inference that high correctness is (maybe causally) contingent on high participation. In contrast, in a low-performing class, a high participation rate pro­ duces a negative PC; that is, the co-occurrence of low p(correct) and high p(participate) led to the illusion that participation correlates with low correctness rates. Thus, the sign of the inferred correlation is positive when the base rates for participation and correct­ ness are skewed in the same direction, but negative when the base rate distributions are misaligned (i.e., skewed in opposite directions). Because the increased or decreased motivation and performance base rates in the pre­ ceding example might reflect unobserved causes, one might attribute these results to a common cause behind motivation and performance. Both a high rate of correct responses and a high rate of raising hands may be due to high ability as a common cause. Yet, this interpretation seems implausible for an analogous version of the task, where one elevated base rate was caused by participants’ own search behavior. Participants instructed to find out whether boys are good in science whereas girls are good in language asked more questions to boys than girls in science lessons and they asked more questions to girls than boys in language lessons. Afterward, they judged smart boys to be better than equal­ ly smart girls in science but smart girls to be better than equally smart boys in language, just as a consequence of the alignment of a high rate of the more prevalent gender and a Page 5 of 20

Pseudocontingencies high rate of correct responses in smart students. This PC illusion was independent of stereotypical expectancies. A reversal of the task focus—that is, to finding our whether boys are better in language and girls are better in science—resulted in an opposite, counter-stereotypical illusion (cf. Fiedler et al., 2002, 2007). In another series of studies (Fiedler & Freytag, 2004), participants were presented with a series of target persons described in terms of either high versus low scores on two per­ sonality tests, X and Y. Across a group of targets, high test scores were twice as frequent as low scores on both tests, and the rate of high to low Y scores was the same when X was high and low—the contingency was zero. Yet, the alignment of predominantly high X and predominantly high Y scores led judges to infer a strong positive correlation between X and Y scores, both in a recall test of the target persons’ test scores and in a prediction task calling for a new group of targets’ test scores. The PC illusion was particularly strong when in another group of targets base rates were jointly skewed in the opposite di­ rection (i.e., both X and Y scores being mostly low), thus resulting in a perfect ecological correlation between X and Y base rates across both groups. The strength of the PC illusion was further enhanced when the reported test scores could take on three values (high, medium, low) rather than only two (high, low), suggesting an increasing impact of PC effects with increasing task complexity. Thus, when the respec­ tive frequencies of high, medium, and low Y scores were 12, 6, 6, for targets with high X scores, 6, 0, 0 at medium X, and 6, 0, 0 at low X, the actual contingency was negative (–. 43), because the conditional probability of high Y scores was 1.00 (6 out of 6) for low and medium X but only .50 (12 out of 24) for high X. However, an inferred contingency as strongly positive as +.71 was built into the predictions of a new set of targets’ X and Y scores, respectively. Note also that given such an alignment of mostly high X and mostly high Y scores, the resulting positive PC effect led participants to predict low Y from low X scores as readily as the predicted high Y from high X scores, even though low Y and low X were never jointly observed. Further exploiting the facilitative impact of increasing complexity, it was even possible to demonstrate PC effects that override an actually existing opposite contingency (Fiedler, 2010). On every trial of an observation task, a target person was described in terms of gender (male vs. female), subject major (psychology vs. medicine), affiliation (University of Heidelberg vs. Mannheim), and major interest (sports vs. arts). The base rates of all four dichotomous attributes were skewed with one level occurring three times as often as the other, such that the same PC illusion could be expected to affect subsequent judg­ ments of all six pairwise contingencies. The actual contingencies were manipulated inde­ pendently of the base rates, with one contingency pointing in the same direction as the PC, one contingency pointing in the other direction, and the remaining four contingencies being set to zero. Subjective contingency estimates (i.e., differential frequency estimates of one focal level on

(p. 193)

one attribute given two different levels on the other at­

tribute) were generally biased in the direction predicted by the PC notion. Whether the actual contingency was zero, consistent, or opposite to the PC did not have the slightest effect. These and similar findings obtained in other experiments (as reviewed in Fiedler et Page 6 of 20

Pseudocontingencies al., 2009), suggesting that when both base rates and joint frequencies (i.e., genuine con­ tingency information) are given within the same task setting, subjectively inferred contin­ gencies reflect the former and often neglect the latter.

Pseudocontingencies When Contingency Assessment Is Impossible The very fact that PC effects override real contingencies highlights the robustness and re­ silience of the illusion. The same conclusion can be drawn from another set of studies drawing on task settings that obviously do not allow for contingency inferences, because no information is provided about the conditional dependency of two attributes at all. For example, with reference to two arbitrarily labeled groups, A and B, participants who had been told that Group A occurs more frequently than Group B and who observed a high rate of positive and negative behaviors—without being told what behavior came from what group—later provided more positive evaluations of Group A than of Group B (Mc­ Garty, Haslam, Turner, & Oakes, 1993). Likewise, when participants observed that in a ward of a hospital more patients received a vegetarian than a probiotic diet and then, in a separate run, that most patients showed strong rather than weak symptoms, they in­ ferred a clear-cut positive contingency between vegetarian diet and symptoms, even though they had no opportunity to see, at the level of individual patients, whether symp­ toms more often co-occurred with vegetarian or with probiotic diet (Fiedler & Freytag, 2004). Note that the absence of bivariate information (i.e., dieting and symptom informa­ tion observed in the same individuals) should have prevented them from making any con­ tingency judgment. That they were nevertheless misled by the alignment of the more fre­ quent sort of diet with the more frequent symptom level strongly implies that joint fre­ quencies are not really necessary for contingency inferences (for similar evidence, see Perales et al., Chapter 3 in this volume). Again, subsequent predictions of the less fre­ quent symptom level or dieting type were made as readily as predictions for frequent combinations, reflecting the independence of PCs from joint frequencies.

Evidence for Pseudocontingencies from Simpson’s Paradox PC inferences have direct implications for a related phenomenon that occurs when the contingency for an entire population is different from the contingencies within subcate­ gories of the same population. This constellation, in which different or even reverse con­ tingencies exist at different levels of aggregation, is known as Simpson’s paradox (Simp­ son, 1951). Studies typically find that—unless a feasible causal model helps participants to deal with Simpson’s paradox (Spellman, 1996; Waldmann & Hagmayer, 2001)—they do not “partial” out the influence of the ecological variable and act as if they were relying on the contingency represented in the entire population (e.g., Schaller, 1994). Intriguingly, given the structure of a typical Simpson’s paradox, this might actually reflect PC infer­ ences. Research by Meiser and Hewstone (2004) illustrates the role of PCs in the Simpson’s paradox. Their participants were presented with a series of positive (+) and negative (–) behavior descriptions of persons belonging to two groups, A and B, distributed over two towns, X and Y. Across both towns, Group A members exhibited more positive (16+) and Page 7 of 20

Pseudocontingencies less negative (12–) behaviors than Group B members (12+ vs. 16–). However, this appar­ ent difference reflected the ecological impact of towns. The base rate of positive behavior was generally higher in Town X (22+ vs. 6–) than in Town Y (6+ vs. 22–), and the base rate of Group A (vs. B) members was also higher for Town X (22 A vs. 6 B) than for Town Y (6 A vs. 22 B). When the influence of the ecological town variable was partialed out, Group B members showed more positive behaviors within both towns. In Town X, positive behaviors were shown by all 6 B members (100%) but only 16 of all 22 A members (72%); in Town Y, the positivity proportion was 6 out of 22 for B members (27%) but 0 out of 6 (0%) for A members. Contrary to the actually observed contingency, when asked to rate Group A and Group B members within every town, participants rated group A members more favorably. These evaluations might have been based on the higher proportion of positive behaviors among Group A members when neglecting the ecological variable. Or, the evaluations might have been based on a PC inference reflecting the alignment of base rates such that many (few) Group A members co-occurred with many (few) positive behaviors across towns. Supporting an explanation (p. 194) in terms of the PC algorithm, Meiser and Hewstone found a stronger preference for Group A among participants that had correctly learned the town differences in base rates, that is, participants who had paid attention to and not neglected the ecological variable. In sum, the failure to understand the structure of a rather complex Simpson’s paradox might not reflect an attempt to simplify the task by neglecting a confounding ecological variable. It might instead reflect the impact of ecological base rates (of towns) on the as­ sessment of contingencies between groups (within towns) and desirability of behavior.

Implicit Reliance on Pseudocontingency Inferences The evidence reviewed so far stems from experiments in which participants were explicit­ ly instructed to make judgments or predictions that depend on the contingency between stimulus attributes. Beyond these explicit inference tasks, though, the domain of PC infer­ ences also includes implicit learning tasks. In evaluative priming, for instance, partici­ pants are asked to classify target stimuli under speed instructions as either positive or negative, and the canonical finding is a congruity effect: stimulus targets preceded by a prime of the same valence are evaluated faster than targets preceded by an incongruent prime of the opposite valence (Klauer & Musch, 2003, for a review). In most priming ex­ periments, all combinations of prime valence and target valence occur with the same fre­ quency. However, when there is a positive contingency between prime and target valence (i.e., mostly congruent trials), the priming effect increases. For a negative contingency (i.e., mostly incongruent trials), the priming effect is reduced or even inverted, leading to faster responses on incongruent trials. In other words, priming effects are sensitive to the existing contingencies observed in the stimulus context (Fiedler, Bluemke & Unkelbach, 2011; Klauer, Rossnagel & Musch, 1997; Spruyt, DeHouwer, Vandromme, & Eelen, 2007).

Page 8 of 20

Pseudocontingencies Taking up this nice demonstration of adaptive cognition, Freytag, Bluemke, and Fiedler (2011) were interested in the role of PC effects in evaluative priming. They manipulated the base rates of positive versus negative primes and of positive versus negative targets separately, while the actual contingency was held constant at zero, as in the standard priming procedure. For instance, in the positive-PC condition, 96 positive primes were fol­ lowed by 72 positive and 24 negative targets, and 32 negative primes were followed by 24 positive and 8 negative targets, yielding no contingency but the same skewed (3:1) base-rate distribution for primes and targets. In contrast, in the negative-PC condition, a high rate of positive to negative primes came along with a low rate of positive to negative targets. Despite the fact that the contingency was always set to zero, a strong and regu­ lar PC effect on evaluative priming was obtained. When prime valence and target valence were skewed in the same direction, a positive PC inference led to clearly faster evalua­ tions on congruent than on incongruent trials. When, however, prime valence and target valence were skewed in opposite directions, negative PC caused a strongly reversed prim­ ing effect, manifested in significantly faster responses on incongruent than on congruent trials. In a similar vein, Kutzner, Freytag, Vogel, and Fiedler (2008) demonstrated strong and persistent PC effects in an operant feedback learning task of the two-armed bandit type. There were two types of trials initiated by different acoustical signals. One trial type was three times more frequent than the other. Within both trial types, participants had to pre­ dict one of two binary events, one of which was three times more frequent than the other. Correct predictions were motivated by performance-contingent monetary reward. Consis­ tent with the PC effects obtained in other paradigms, the trials initiated by the more fre­ quent acoustical signal led to more predictions of the more frequently presented event, whereas the more infrequent signal led to a relatively higher rate of infrequent-event pre­ dictions (causing a significant reduction of the money that might be gained).

Pseudocontingency as a Proxy for Dealing with the Paucity of Envi­ ronmental Feedback Across a variety of paradigms, then, judgments of the relation between variables seem to be often driven by PC inferences. PCs become more pronounced as the number of vari­ ables and their attribute levels increase, and PCs become more prevalent when working memory capacity is restricted (Eder, Fiedler, & Eder-Hamm, 2011). This tendency toward more PCs under more demanding task conditions points to a main advantage of PC infer­ ences: parsimony. To make inferences using PCs, only the base rates of the effect and of all potential causes have to be assessed. Base-rate information can stem from experience gathered on different occasions. It is not necessary to wait for, or to remember, (p. 195) joint observations of an effect along with multivariate information on all relevant causal dimensions. Such a complete data matrix, as ideally assumed in statistics books, is hardly ever available in real life.

Page 9 of 20

Pseudocontingencies To illustrate, consider the task of evaluating four dichotomous variables as potential causal candidates of high achievement at school. The four zero-order contingencies al­ ready amount to coordinating, recording, and storing 16 types of observations (four for each contingency). Further, trying to avoid a Simpson’s paradox, taking into account each of these variables as moderators of all others’ influence, creates 24 contingencies (96 types of observations) to be handled (four candidates related on each level of the other three candidates). Beyond complexity that might prevent observations necessary for genuine contingency assessment to be categorized and stored accurately, the data might just not be available at the time of judgment. Trying to assess the genuine contingency for a novel causal can­ didate, for example, whether spending time outside before a class affects performance in class, might simply be impossible because relevant pupil-wise observations have not (yet) been gathered. However, the prevalence of pupils spending time outside, which might be inferred from a proxy such as weather, should be recorded and recalled quite automati­ cally (Hasher & Zacks, 1984). Using these readily available base rates, PCs seem de­ signed to enable inferences about novel relations. On a side note, the lack of individual-level data characterizes a recent research area in decisions sciences and marketing, Big Data analytics. The term usually refers to “digital traces” people leave when using smartphones or computers to share or search for infor­ mation or to buy products. The availability of big data has tempted many to draw infer­ ences relevant for decision-making. For example, one study addressed search behavior on Google (Preis, Moat, Stanley, & Bishop, 2012). Using Google Trends, it first assessed to what extent searches were likely to be “future oriented” on a national level (i.e., assess­ ing the ratio of nationwide search volumes for the terms including “2011” versus “2009” for searches in the year 2010). It turned out that the proportion of future searches corre­ lated .64 with national domestic product across 45 countries. The authors argue that this is in line with the idea of “differences in attention devoted to the future and the past, where a focus on the future supports economic success” (p. 2). This study illustrates how the aggregate nature of Big Data invites PC inferences. Big Da­ ta from services such as Google Trends or Twitter API comes aggregated into regional or national ecologies. This is also true for most government-based statistical services like Eurostat or the European Social Survey. At the same time, some available variables invite individual-level interpretation. The concept of “attention” when interpreting Google searches is a good illustration. However, adopting an individual-level interpretation for relations found in Big Data is nothing else than making (unwarranted) PC inferences.

Pseudocontingency as a Smart Heuristic Affording Valid Inferences Granting the resilience of PC inferences and the fact that opportunities for assessing gen­ uine contingencies are limited, it is still possible that PCs may often inform valid infer­ ences and therefore may be of adaptive value. Even though the validity of PC inferences

Page 10 of 20

Pseudocontingencies will always depend on the specific situation, there are (at least) two strong reasons for ex­ pecting valid PC inferences under many conditions. First, skewed base rates restrict the range of possible contingencies. If 80% of patients in a ward are on a vegetarian diet, and 80% are well, there cannot be a perfectly negative correlation between a vegetarian diet and being well. It is not possible that all patients with a vegetarian diet are not well. In this example, in fact, the strongest contingency contradicting the PC inference that vegetarian diet and being well coincide is –.25, even though a perfectly positive contingency of +1.00 is possible. All patients on a vegetarian diet can be well and all those on a vegan diet not so. Thus, skewed base rates rule out strong contingencies that go against PCs. PC inferences can thus be valid by capitalizing on a shifted mean of the distribution of all possible contingencies. By the same token, PCs prevent the most dramatic mistake of inferring a strong or close to deterministic relation in the wrong direction. Second, PC inferences might derive validity from the fact that strong contingencies be­ tween variables in the population give rise to aligned sample base rates. To illustrate, let us assume that samples of n = 10 are drawn from a population in which diet preference (vegetarian vs. vegan) and being well (high vs. low) are evenly distributed and perfectly correlated. Because there are no disconfirming instances, every sample of observations will necessarily reflect the perfect correlation (cf. Table 11.1). (p. 196)

Page 11 of 20

Pseudocontingencies Table 11.1 Frequency Tables for Samples of n = 10 Observations Drawn from a Population with a Perfect Contingency Between Diet and Being Well

Diet

Page 12 of 20

Sample 1

Sample 2

Sample 3

Being well

Being well

Being well

High

Low

High

Low

Vegetar­ ian

5

0

Vegan

0 5

High

Low

5

6

0

6

2

0

2

5

5

0

4

4

0

8

8

5

10

6

4

10

2

8

10

Pseudocontingencies Yet, due to sampling error, sample base rates will deviate more or less from the even 50:50 base rates in the population (see Samples 1, 2 and 3 in Table 11.1). In the case of a perfect correlation, these accidentally skewed sample base rates must be perfectly aligned, such that the base rate of vegeratians increases and decreases to the same ex­ tent as the base rate of high well-being. A PC inference informed by these aligned base rates must therefore reflect the correct sign of the population correlation. However, cru­ cially, this nice feature of the probabilistic world also holds for existing imperfect contin­ gencies. Monte-Carlo simulations demonstrate that restricted samples drawn from a world with evenly distributed base rates and existing correlations stronger than r = .50 tend to be skewed in such a way that most PC inferences match the correct sign of the ex­ isting correlation (Kutzner, Vogel, Freytag, & Fiedler, 2011). Thus, sampling constraints insure that PC inferences tend to be valid even in the absence of skewed base rates. Future research should thus explore the validity of PC inferences for given task settings. Given the double fact that aligned base rates restrict the range of possible contingencies and that strong contingencies will produce aligned sample base rates, the validity of PC inferences may turn out to be surprisingly high in many areas of reality.

Conclusions Eventually, then, PC inferences constitute an ideal example of what Gigerenzer and col­ leagues (Gigerenzer & Goldstein, 1996; Gigerenzer & Todd, 1999) have called a fast and frugal heuristic that makes people smart. Like all heuristics, PCs produce erroneous re­ sults under certain conditions (i.e., when category base rates point in a direction opposite to the trends that exist within categories). However, on average, the PC heuristic often provides valid judgments. Moreover, PCs make do with very little information, and the conditions for estimating base rates are often met, even when many joint frequencies in a complex contingency table are lacking, due to missing data or the impossibility of coordi­ nating observations gathered on separate occasions. PC inferences might be rationalized with reference to the accuracy and the adaptive val­ ue of judgments and decisions that rely on category base rates (Huttenlocher, Hedges, & Vevea, 2000; Kareev, Fiedler, & Avrahami, 2009; Olivola & Todorov, 2010). For instance, when judging the risk of a disease, an accident, or a fatal event for a particular person (e.g., oneself), a good default strategy is to rely on the relevant base rate in the category of all comparable people. For individual risks to deviate markedly from the risk base rates, it would be necessary to have strong evidence that the individual deviates in criti­ cal attribute dimensions from the category of other people. Very often, such strong diag­ nostic information is not available, or cannot be assessed with sufficient validity. It is then recommendable to base estimations on category base rates. PC inferences, then, can be conceived as a higher-order, relational use of base-rate esti­ mates. The wisdom of base rates informs inferences from category-level probabilities PCat­ to probabilities PIndividual(Category) of individuals nested within the category, for in­ stance, from P(car accident among all people) to P(car accident involving myself). By egory

Page 13 of 20

Pseudocontingencies analogy, a PC might be simply understood as a special case of a base-rate-driven infer­ ence in which P refers to a dyadic statistical relation rather than a monadic property. However, the analogy is complicated by the following distinction. We are usually on safe ground to assume unbiased estimates when inferring an individual person’s attribute from an estimate of the same individual attribute among all individuals in a category. For instance, we can use the median individual income in a nation as a (p. 197) proxy for esti­ mating the income of one individual in this nation. In contrast, one must be careful not to confuse category-level attributes with individual attributes. Although an individual’s in­ come can be inferred from the median of individual incomes in the entire nation, it cannot be inferred from the nation’s income. Nations can be rich when most individual people are poor and vice versa, for good reasons. Therefore, going beyond monadic attributes to infer relations between variables based on category base rates, for example, between subjective well-being and a nation’s wealth (Diener, Diener, & Diener, 1995), is possibly erroneous. More generally, whenever category-level attributes and individual attributes are subject to separate causal influences (e.g., tax law determining the national income vs. workload determining individual income), inferring contingencies between individual attributes from the contingencies between category base rates constitutes a category mistake. The reviewed PC research suggests, however, that human participants hardly distinguish be­ tween these fundamentally different cases; they rather apply the PC tool uncritically in ei­ ther case. Fortunately, we have seen that base rates constrain the distribution of individual mea­ sures in a way that most of the time PCs point in the same direction as individual-level contingencies. When all 10 students in a physics class are female and all are smart, the co-occurrence of female and smart must be perfect; counterevidence is simply impossi­ ble. But even when skewed base rates are imperfect, say, when 9 of 10 students are fe­ male and 9 of 10 are smart, there can be, at most, one smart male student contradicting the perfect relation. This results in an only slightly negative correlation of –.11 contradict­ ing the PC inference. Only when base rates approach equality does the PC heuristic lose its predictive value, but at the same time it becomes inapplicable, because the PC rule on­ ly applies to skewed distributions. Thus, the applicability of the PC heuristic increases with its diagnostic validity—a nice feature of adaptive cognition. This suggests that the heuristic’s domain is restricted to skewed worlds as PC inferences cannot be applied to equal base rates. However, even when base rates are evenly distrib­ uted in the population, we have seen that small stimulus samples drawn from such a pop­ ulation are often skewed. It is intriguing to note that such skewedness due to sampling error is likely to support accurate inferences. If the population correlation is positive (negative), the sampled distributions for the two attributes tend to be skewed in the same (opposite) direction. Thus, to the extent that sampling error invites PCs, these inferences tend to be “valid by sampling” (Kutzner, Vogel, Freytag, & Fiedler, 2011).

Page 14 of 20

Pseudocontingencies Still, there is a domain for equal base rates in which PCs are not applicable, and there is indeed no reason to believe that it should not be possible for human beings to assess gen­ uine contingencies (i.e., differences between conditional probabilities) in these situations. For instance, in operant conditioning, organisms can find out that p(reward|CS+) > p(re­ ward| CS–), that is, that the conditional probability of receiving a reward is higher in the presence of one (CS+) than another conditional stimulus (CS–). An intriguing finding by Kareev and Fiedler (2008) says that operant learning is only sensitive to the contingency (i.e., the conditional difference in the reward rate for CS+ and CS–) when the reward base rate is close to equality, p(reward) = p(no reward) = .50. To the extent that partici­ pants notice that one outcome is generally more likely (e.g., p(reward) = .75), they no longer wait for predictor information (CS+ or CS–) but go for the more likely outcome in an unconditional inference process.4 This evidence suggests that organisms may be quite well calibrated for the decision to rely on genuine contingencies. Further evidence by Ka­ reev, Fiedler, and Avrahami (2009) indeed suggests that when people can influence the process of information search, they selectively sample equally distributed dichotomies to create optimal conditions for the assessment of causal contingencies. There are, however, natural limits to the assessment of genuine contingencies, as the present chapter has shown. When it comes to assessing (causal) contingencies in a multi­ variate world, with many possible causal cues and many different effects to be explained, the full contingency table is too impoverished to enable genuine contingency estimation. Too many cells in such a complex observation design will be empty, and too many disjoint data that have been observed on separate occasions cannot be used because it is unclear what observations of attribute X and what other observations of attributes Y, A, B, C, and so on, belong to the same individual. Moreover, assessing the full contingency matrix will exceed one’s memory capacity. Under many reasonable task conditions, then, people may rely on the PC algorithm as a fast and frugal default, which is easily applicable and not at all demanding. All one needs to capture is the relative prevalence of high versus low val­ ues on each attribute. Demarking the precise limits of PC inferences and their validity is a task for fu­ ture research, as is the identification of the residual domain of genuine contingency as­ sessment. However, in any case, the evidence presented here should be strong and robust enough to motivate a thorough revision of theories of causal inference that have tradition­ ally relied on conditional probabilities assessed from joint frequencies. To the extent that future research confirms that this basic assumption has to be corrected because of a sub­ (p. 198)

stantial role played by PCs, theories of causal inference would have to shift from a focus on inductive inferences from individual observations to a new focus on deductive infer­ ences from superordinate category base rates. Such a revised perspective implies, for in­ stance, that a teacher’s causal explanation of student performance will depend on the prevalence of causal conditions in the class or reference group (Marsh, 1987). And, when there is a competition between different causal conditions of student performance (e.g., gender, ethnic identity, socioeconomic status (SES), personality traits, time of the day, subject matter), the revised perspective implies that causal judgments follow outstanding base rates at category or class level, rather than isolated contingencies of specific causal Page 15 of 20

Pseudocontingencies factors with all other factors partialed out. If there is a strikingly high rate of low SES children in a low-performing class, teachers will explain performance deficits in terms of SES. And if generally low achievement in physics comes along with a conspicuous majori­ ty of girls, the deficit will be attributed to gender, even when the majority of boys in the class are even worse than the majority of girls. From hindsight, indeed, these implica­ tions from the PC perspective would appear to be plausible and worthwhile being pur­ sued in future research on causal judgment. More systematic research on the relationship between PCs and causal inference may re­ veal that PC effects may not only influence and constrain causal inferences. Causality may conversely determine whether and when cognitive processes are not restricted to us­ ing base-rate-driven PC inferences, but instead construe a problem in terms of a genuine (causal) contingency. When an available causal model explains that (and why) positive or negative outcomes are more likely under some conditions than others (Oaksford & Chater, Chapter 19 in this volume; Rottman, Chapter 6 in this volume; Waldmann, 1996), then the human (or even the animal’s) mind should have a better chance to form condi­ tional representations such that genuine contingencies dominate PCs.

Author’s Note The research underlying the present chapter was supported by a Koselleck grant from the Deutsche Forschungsgemeinschaft (Fi 294/23-1). Correspondence regarding this article should be addressed to [email protected].

References Alloy, L., & Abramson, L. (1979). Judgment of contingency in depressed and nonde­ pressed students: Sadder but wiser? Journal of Experimental Psychology: General, 108(4), 441–485. doi:10.1037/0096-3445.108.4.441 Baetu, I., & Baker, A. G. (2009). Human judgments of positive and negative causal chains. Journal of Experimental Psychology. Animal Behavior Processes, 35(2), 153–168. doi: 10.1037/a0013764 Bourne, L. R., Ekstrand, B. R., & Montgomery, B. (1969). Concept learning as a function of the conceptual rule and the availability of positive and negative instances. Journal of Experimental Psychology, 82(3), 538–544. doi:10.1037/h0028358 Buehner, M. J., & May, J. (2003). Rethinking temporal contiguity and the judgement of causality: Effects of prior knowledge, experience, and reinforcement procedure. The Quarterly Journal of Experimental Psychology A: Human Experimental Psychology, 56A(5), 865–890. Cheng, P. W. (1997). From covariation to causation: A causal power theory. Psychological Review, 104, 367–405. Page 16 of 20

Pseudocontingencies Cheng, P. W., & Novick, L. R. (1990). A probabilistic contrast model of causal induction. Journal of Personality and Social Psychology, 58, 545–567. Cheng, P. W., & Novick, L. R. (1992). Covariation in natural causal induction. Psychologi­ cal Review, 99, 365–382. Diener, E., Diener, M., & Diener, C. (1995). Factors predicting the subjective well-being of nations. Journal of Personality and Social Psychology, 69(5), 851. doi:dx.doi.org/ 10.1037/0022-3514.69.5.851 Eder, A. B., Fiedler, K., & Hamm-Eder, S. (2011). Illusory correlations revisited: The role of pseudocontingencies and working-memory capacity. The Quarterly Journal of Experimental Psychology, 64(3), 517–532. doi:10.1080/17470218.2010.509917 (p. 199)

Fiedler, K. (2010). Pseudocontingencies can override genuine contingencies between mul­ tiple cues. Psychonomic Bulletin & Review, 17(4), 504–509. doi:10.3758/PBR.17.4.504 Fiedler, K., Bluemke, M., & Unkelbach, C. (2011). On the adaptive flexibility of evaluative priming. Memory & Cognition, 39(4), 557–572. doi:10.3758/s13421-010-0056-x Fiedler, K., & Freytag, P. (2004). Pseudocontingencies. Journal of Personality and Social Psychology, 87(4), 453–467. doi:10.1037/0022-3514.87.4.453 Fiedler, K., Freytag, P., & Meiser, T. (2009). Pseudocontingencies: An integrative account of an intriguing cognitive illusion. Psychological Review, 116(1), 187–206. doi:10.1037/ a0014480 Fiedler, K., Freytag, P., & Unkelbach, C. (2007). Pseudocontingencies in a simulated class­ room. Journal of Personality and Social Psychology, 92(4), 665–677. doi: 10.1037/0022-3514.92.4.665 Fiedler, K., Kutzner, F., & Vogel, T. (2013). Pseudocontingencies: Logically unwarranted but smart inferences. Current Directions in Psychological Science, 22(4), 324–329. doi: 10.1177/0963721413480171 Fiedler, K., Walther, E., Freytag, P., & Plessner, H. (2002). Judgment biases in a simulated classroom: A cognitive-environmental approach. Organizational Behavior and Human De­ cision Processes, 88(1), 527–561. doi:dx.doi.org/10.1006/obhd.2001.2981 Freytag, P., Bluemke, M., & Fiedler, K. (2011). An adaptive-learning approach to affect regulation: Strategic influences on evaluative priming. Cognition and Emotion, 25(3), 426–439. doi:10.1080/02699931.2010.537081 Gigerenzer, G. (1991). From tools to theories: A heuristic of discovery in cognitive psy­ chology. Psychological Review, 98(2), 254–267. Gigerenzer, G., & Goldstein, D. (1996). Reasoning the fast and frugal way: Models of bounded rationality. Psychological Review, 103(4), 650. doi:10.1037/0033-295X.103.4.650 Page 17 of 20

Pseudocontingencies Gigerenzer, G., & Todd, P. (1999). Evolution and cognition: Simple heuristics that make us smart. New York: Oxford University Press. Hasher, L., & Zacks, R. (1984). Automatic processing of fundamental information: The case of frequency of occurrence. The American Psychologist, 39(12), 1372–1388. doi:dx.doi.org/10.1037/0003-066X.39.12.1372 Hattori, M., & Oaksford, M. (2007). Adaptive non-interventional heuristics for covariation detection in causal induction: Model comparison and rational analysis. Cognitive Science: A Multidisciplinary Journal, 31(5), 765–814. doi:10.1080/03640210701530755 Huttenlocher, J., Hedges, L., & Vevea, J. (2000). Why do categories affect stimulus judg­ ment? Journal of Experimental Psychology. General, 129(2), 220–241. doi: 10.1Q37//0096-3445.129.2320 Kareev, Y., Fiedler, K., & Avrahami, J. (2009). Base rates, contingencies, and prediction behavior. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35(2), 371–380. doi:10.1037/a0014545 Klauer, K. C., & Musch, J. (2003). Affective priming: Findings and theories. In J. Musch & K. C. Klauer (Eds.), The psychology of evaluation: Affective processes in cognition and emotion (pp. 7–49). Mahwah, NJ: Lawrence Erlbaum Associates. Klauer, K. C., Rossnagel, C., & Musch, J. (1997). List-context effects in evaluative priming. Journal of Experimental Psychology: Learning, Memory, and Cognition, 23(1), 246. doi:dx.doi.org/10.1037/0278-7393.23.1.246 Kutzner, F., Freytag, P., Vogel, T., & Fiedler, K. (2008). Base-rate neglect as a function of base rates in probabilistic contingency learning. Journal of the Experimental Analysis of Behavior, 90(1), 23–32. doi:10.1901/jeab.2008.90–23 Kutzner, F., Vogel, T., Freytag, P., & Fiedler, K. (2011). Contingency inferences driven by base rates: Valid by sampling. Judgment and Decision Making, 6(3), 211–221. Lieberman, D. A., & Thomas, G. V. (1986). Marking, memory and superstition in the pi­ geon. The Quarterly Journal of Experimental Psychology B: Comparative and Physiologi­ cal Psychology, 38(4-B), 449–459 Mackie, J. L. (1974). The cement of the universe. Oxford: Oxford University Press. Marsh, H. W. (1987). The big-fish-little-pond effect on academic self-concept. Journal of Educational Psychology, 79(3), 280–295. doi:10.1037/0022-0663.79.3.280 McCauley, C., & Stitt, C. L. (1978). An individual and quantitative measure of stereotypes. Journal of Personality and Social Psychology, 36(9), 929–940. doi: 10.1037/0022-3514.36.9.929

Page 18 of 20

Pseudocontingencies McGarty, C., Haslam, S., Turner, J., & Oakes, P. (1993). Illusory correlation as accentua­ tion of actual intercategory difference: Evidence for the effect with minimal stimulus in­ formation. European Journal of Social Psychology, 23(4), 391–410. doi:10.1002/ejsp. 2420230406 Meiser, T., & Hewstone, M. (2004). Cognitive processes in stereotype formation: The role of correct contingency learning for biased group judgments. Journal of Personality and Social Psychology, 87(5), 599–614. doi:10.1037/0022-3514.87.5.599 Mill, J. S. (1872/1973). System of logic (8th ed.). In J. M. Robson (Ed.), Collected works of John Stuart Mill (Vols. 7 & 8). Toronto: University of Toronto Press. Morse, W. H., & Skinner, B. F. (1958). Some factors involved in the stimulus control of op­ erant behavior. Journal of the Experimental Analysis of Behavior, 1, 103–107. doi:10.1901/ jeab.1958.1–103 Olivola, C. Y., & Todorov, A. (2010). Fooled by first impressions? Reexamining the diagnos­ tic value of appearance-based inferences. Journal of Experimental Social Psychology, 46(2), 315–324. doi:10.1016/j.jesp.2009.12.002 Perales, J. C., & Shanks, D. R. (2007). Models of covariation-based causal judgment: A re­ view and synthesis. Psychonomic Bulletin & Review, 14(4), 577–596. doi:10.3758/ BF03196807 Preis, T., Moat, H. S., Stanley, H. E., & Bishop, S. R. (2012). Quantifying the advantage of looking forward. Scientific Reports, 2, 350. doi:10.1038/srep00350 Rescorla, R. A. (1967). Pavlovian conditioning and its proper control procedures. Psycho­ logical Review, 74(1), 71–80. doi:10.1037/h0024109 Robinson, W. (1950). Ecological correlations and the behavior of individuals. American So­ ciological Review, 15, 351–357. doi:10.1093/ije/dyn357 Schaller, M. (1994). The role of statistical reasoning in the formation, preservation and prevention of group stereotypes. British Journal of Social Psychology, 33(1), 47–61. doi: 10.1111/j.2044-8309.1994.tb01010.x (p. 200)

Shanks, D. R. (1986). Selective attribution and the judgment of causality. Learning

and Motivation, 17(4), 311–334. doi:10.1016/0023-9690(86)90001-9 Simpson, E. H. (1951). The interpretation of interaction in contingency tables. Journal of the Royal Statistical Society. Series B (Methodological), 13(2), 238–241. Spellman, B. A. (1996). Acting as intuitive scientists: Contingency judgments are made while controlling for alternative potential causes. Psychological Science, 7, 337–342. Spruyt, A., Hermans, D., De Houwer, J., Vandromme, H., & Eelen, P. (2007). On the nature of the affective priming effect: Effects of stimulus onset asynchrony and congruency pro­ Page 19 of 20

Pseudocontingencies portion in naming and evaluative categorization. Memory & Cognition, 35(1), 95–106. doi: 10.3758/BF03195946 Waldmann, M. R. (1996). Knowledge-based causal induction. In D. R. Shanks, K. Holyoak, & D. L. Medin (Eds.), Causal learning (pp. 47–88). San Diego: Academic Press. Waldmann, M. R., & Hagmayer, Y. (2001). Estimating causal strength: The role of struc­ tural knowledge and processing effort. Cognition, 82(1), 27–58. doi:10.1016/ S0010-0277(01)00141-X White, P.A. (2001). Causal judgment from contingency information: Relation between sub­ jective reports and individual tendencies in judgment. Memory & Cognition, 28, 415–426. White, P. (2009). Property transmission: An explanatory account of the role of similarity information in causal inference. Psychological Bulletin, 135(5), 774–793. doi:10.1037/ a0016970

Notes: (1.) Note in passing that p(E | C) > p(E | ~C) implies that p(E | C) > p(E). Whenever the rate of E in the presence of C exceeds its rate in the absence of C, it also exceeds the base rate p(E) of E regardless of C conditions. This corollary may be relevant to under­ standing pseudocontingencies, the topic of the present chapter. (2.) For instance, in a model suggested by White (2001), causal inferences are assumed to reflect the average match value of two dichotomous variables, estimated as (a+d )/ a+b+c+d. Or in a model by Hattori and Oaksford (2007), the occurrences in cell d are completely neglected; contingencies are estimated as a/[(a+b)*(a*c)].5. (3.) The acronym PC we are using here, as in previous publications, to denote pseudocon­ tingencies should be easy to distinguish from Cheng’s (1997) “power PC” theory. (4.) According to the ExpPA rule (expected prediction accuracy), contingencies are only utilized if there are more cases in the more prevalent diagonal than in the prevalent out­ come column (cf. Kareev et al., 2009).

Klaus Fiedler

Department of Psychology University of Heidelberg Heidelberg, Germany Florian Kutzner

Department of Psychology University of Heidelberg Heidelberg, Germany

Page 20 of 20

Singular Causation

Singular Causation   David Danks The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.15

Abstract and Keywords Causal relations between specific events are often critically important for learning, un­ derstanding, and reasoning about the world. This chapter examines both philosophical ac­ counts of the nature of singular causation, and psychological theories of people’s judg­ ments and reasoning about singular causation. It explores the content of different classes of theories, many of which are based on either some type of physical process connecting cause and effect, or else some kind of difference-making (or counterfactual) impact of the cause on the effect. In addition, this chapter examines various theoretical similarities and differences, particularly between philosophical and psychological theories that appear su­ perficially similar. One consistent theme that emerges in almost every account is the role of general causal relations in shaping human judgments and understandings about singu­ lar causation. Keywords: singular causation, physical process, difference-making, reasoning, counterfactual, effect

Singular Versus General Causation In many people, caffeine causes slight muscle tremors, particularly in their hands. In gen­ eral, the Caffeine → Muscle Tremors causal connection is a noisy one: someone can drink coffee and experience no hand tremors, and there are many other factors that can lead to muscle tremors. Now suppose that Jane drinks several cups of coffee and then notices that her hands are trembling. An obvious question is, did this instance of coffee drinking cause this instance of hand trembling? Structurally similar questions arise throughout everyday life: Did this pressing of the “k” key cause this change in the pixels on the com­ puter monitor? Did these episodes of smoking cause this lung cancer? Did this studying cause this test score? And so on. These questions all ask about singular causation in a particular situation, in contrast with general causation across multiple cases. They are thus particularly salient in situations in which we care about that specific case, as in many legal contexts, social interactions, physical explanations of anomalous events, and more. Page 1 of 23

Singular Causation Singular causation is cognitively challenging, as it is quite unclear how we could come to know that, say, this coffee drinking caused these muscle tremors. On the one hand, we cannot directly observe singular causation, as famously noted by David Hume (1748/2001), but instead observe only sequences of events. One might hope that so-called causal perception (see also White, Chapter 14 in this volume) could provide a way to di­ rectly observe singular causation, but there are reasons to suspect that causal perception is not just the straightforward observation of singular causation. In particular, the partic­ ular percepts must instead be understood or interpreted in light of background causal knowledge. On the other hand, we cannot use statistical or other inductive inference methods to learn directly about which singular causation relations obtain because we have (by definition) only a single case. There are many different methods for inferring (p. 202) causal structure from observations, but those all require multiple cases, so are not directly applicable for singular causation. Moreover, even if they were usable, the causal relations across multiple situations need not perfectly track the causal relations in any particular case. For example, it is perfectly consistent for caffeine to cause shaky hands, but for these particular muscle tremors to be caused by this dose of medication, rather than the coffee that one just consumed. Judgments about singular causation require more than just knowledge of the general causal relations that obtain in some domain; at the least, we also need to know what events actually occurred in this specific case. That is, singular causation requires knowl­ edge of (at least) both the general causation that applies in cases of this type, as well as details about this particular case (Hitchcock, 2012). In fact, as we see in the following section, “Singular Causation, Normatively,” knowledge of singular causation may well re­ quire even more information (e.g., about particular physical relations, or default values). The key observation here is simply that we cannot divorce discussions of singular causa­ tion from general causation: the latter is critical for the former, even if only because those are the relations that can possibly combine in a singular fashion. Determining singular causal relations is a kind of causal reasoning, though one with a very specific goal. More generally, we must recognize that pragmatic or contextual fac­ tors can often make a difference in how we think about singular causal relations. For ex­ ample, if a lit match is dropped into a basket of paper, then it seems quite clear that the match is a singular cause of the paper burning. It is less clear how to think about the oxy­ gen in the room; it is certainly causally relevant, but it also seems less central than the match, at least in a pragmatic sense. Those intuitions might well change in a different context: the presence of oxygen is more plausibly a singular cause of the fire if we are on the International Space Station. The normative and descriptive theories of singular causa­ tion discussed in this chapter all address this distinction, though they do so in quite dif­ ferent ways. The key here is simply that we do draw a distinction between causes and en­ abling or background factors, whether that distinction is based on our pragmatic inter­ ests or more objective features of the situation or context.

Page 2 of 23

Singular Causation There is a long history of philosophical and normative inquiry into the nature of singular causation, and so this chapter begins there (the section “Singular Causation, Normative­ ly”). The theories that have been proposed can be roughly divided into two different types, though both face significant questions and concerns. Those two types of philosoph­ ical theories appear to correspond nicely to two types of descriptive, psychological re­ search on singular causation, though as we see in section “Singular Causation, Descrip­ tively,” neither of the normative, philosophical theories has clear, unequivocal empirical support. Moreover, there has been relatively less experimental study of singular causa­ tion judgments, and so there are many aspects of singular causation judgments that we do not yet understand. The normative, philosophical accounts thus suggest avenues for potentially fruitful future experiments. I conclude the chapter, in the section “Open Prob­ lems and Challenges,” with discussion of some key open problems and challenges. Before beginning, a terminological note: the contrast between causation in a specific instance versus across multiple instances is also sometimes described as “token versus type causation” (the particular cases involve “tokens” of the relevant “types” across situations) or “actual versus potential causation” (in a particular situation, only some of the “poten­ tial” causal relations “actually” obtain). I will use the language of “singular versus gener­ al causation” in this chapter, but readers should be aware that these ideas and theories are also discussed under other names.

Singular Causation, Normatively Normative theories of singular causation aim to provide an account of which factors were actually the causes of some event, rather than which factors are typically picked out by experimental participants. This focus presents a verification challenge, however: How can we test our theories of singular causation, if not against human responses? The basic strategy in this area is to compare the judgments of a proposed theory against solid, sta­ ble intuitions in critical test cases. We test our normative theories against our considered, reflective judgments about key situations, and reject those theories that fail to match on those key points. For example, if I smash a glass with a hammer (and there are no unusu­ al factors at play), then any theory of singular causation should conclude that my hammer strike caused the breaking of the glass. In addition, we might argue that our theories should exhibit particular higher-level features; for example, we might require that only distinct, separate events can singular cause one another (Menzies, 1996). In practice, however, testing against key examples is the dominant way of (p. 203) arguing for a partic­ ular normative theory of singular causation. Importantly, not just any intuitions count, but only those that are widely held, based on careful thought, and so forth. This strategy is analogous to one natural way (though not the only way) to develop a normative theory of formal logic. One would not want to build a theory of formal logic by considering every­ day judgments about whether some argument is logically valid, particularly given what we know about the factors that influence people’s judgments on these matters. Instead, our theory must conform to those inferences that we agree, upon careful reflection and

Page 3 of 23

Singular Causation consideration, are truth-preserving (e.g., modus ponens: the combination of the two premises “If A, then B” and “A” necessarily implies “B”). There are two dominant types of theories of singular causation—one based on physical processes that connect the cause with the effect (discussed in the subsection “Physical Process Approaches”) and one based on the cause making a difference to the effect (the subsection “Difference-Making Approaches”). Roughly, the first type says that C is a sin­ gular cause of E just when there is an appropriate physical process connecting C and E, while the second type holds that singular causation depends on whether E would have been different if C had been different (in the right way, in the right possible world). Both types of theories naturally explain some, but not all, stable judgments, and much of the normative, philosophical research on singular causation has aimed to find better formal­ izations of the underlying intuitive ideas so that the theories can capture more judg­ ments. Some of these elaborations are described in the next two subsections, but the de­ tails are arguably less important than the overall intuitions: either singular causes are those factors that physically produce the effect, or they are the events that made an actu­ al difference in the effect. It will be helpful to have a running example for the remainder of this chapter; I borrow one from Hall (2004), though many others would do. Suppose we have two children, Suzie and Billy, who are throwing rocks at bottles in an abandoned lot. Both Suzie and Billy are excellent throwers who are, for all practical purposes, perfectly accurate: if one of them throws a rock at a bottle, then the rock hits and shatters that bottle. That is, the general causal relations are “Suzie’s throws cause bottles to break” and “Billy’s throws cause bot­ tles to break.” Now suppose that, on one particular occasion, both Suzie and Billy throw rocks at the very same bottle, but that Suzie throws slightly earlier so that her rock is the one that hits the bottle (and the bottle subsequently breaks). A strong, obvious intuition is that Suzie’s throw is a singular cause of the bottle breaking, and Billy’s throw is not. Any normative theory of singular causation should presumably capture this clear intuition, and we now see two different ways of doing so.

Physical Process Approaches One natural view about singular causes is that they are just the events that actually physi­ cally influence the outcome. The lit match is a singular cause of the fire because the match actually physically transmits the necessary energy to the paper. Suzie’s throw was a singular cause of the bottle breaking because it imparted more force than could be ab­ sorbed by the bottle structure. This intuition finds expression in physical process (norma­ tive) theories of singular causation. The general idea is that C is a singular cause of E just when C changes E through a physical/causal process connecting the two (see also John­ son & Ahn, Chapter 8 in this volume). Of course, we must provide some independent char­ acterization of a “physical/causal process,” or the account of singular causation will be vi­ ciously circular. One of the first precise theories was Wesley Salmon’s (1984) mark trans­ mission theory. In this theory, the world is composed of processes that exhibit reasonably stable structure over some period of time, and so “marks” (i.e., structural modifications) Page 4 of 23

Singular Causation made on those processes can persist over time. A particular causal interaction occurs when a mark from one causal process is transmitted to a different process. Singular cau­ sation more generally, then, consists of propagation and mark transmission in these processes. For example, Suzie’s rock caused the bottle to break because the causal process corresponding to her thrown rock interacted with the causal process correspond­ ing to the intact bottle to yield a “mark” on the bottle, as the bottle’s structure was modi­ fied (to the “broken” state). In contrast, Billy’s rock had no such interaction with the causal process of the bottle, and so it was not a singular cause of the bottle breaking, even though it would have broken the bottle if Suzie’s rock had not hit first. The mark transmission theory can explain many standard intuitions, but suffers from sig­ nificant ambiguities in key notions (Kitcher, 1989). For example, processes are supposed to be those spatiotemporal regions (“world lines”) that exhibit a degree of structural uni­ formity or consistency when not (p. 204) interacting with other processes. This restriction is necessary to ensure that processes are appropriately coherent; random spatiotemporal regions ought not be considered causal processes. The problem is that this requirement is arguably too strong: for example, there are many seeming causal processes that require constant interaction with a background of other processes (e.g., sound waves moving through a medium), and so fail to meet the condition that they continue in the absence of interactions with other processes. Mark transmission theory thus rules out factors that are clearly (intuitively) singular causes. At the same time, mark transmission theory is overly permissive, as standard ways of defining a “mark” imply that many processes that are not intuitively causal can nonetheless transmit a “mark.” In particular, a “mark” can­ not simply be a change in a property, since there are many property changes that are not true structure modifications. We need some way of precisely saying which changes in which properties count as “marks,” and no satisfactory account has been offered. These problems with the mark transmission theory suggest that we should instead ground our physical process theory of singular causation even more in our best theories of physics. Conserved quantity theories (Dowe, 1992, 2007; Salmon, 1997) do exactly that. At a high level, the intuition is that causal processes are those that have a conserved quantity (e.g., mass-energy, momentum) and causal interactions are exchanges of that conserved quantity. Suzie’s rock, for example, has a certain mass-energy while in flight,1 and then transfers some of that mass-energy to the bottle, which leads to it breaking; Billy’s rock is also a causal process with conserved mass-energy, but it does not interact similarly with the bottle, so is not a singular cause of the bottle breaking. Crucially, the conserved quantity theories all hold that which quantities are actually conserved is some­ thing that our best scientific theories aim to discover. The (singular) causal relations in the world are an objective matter; our best sciences help us learn which quantities actual­ ly are the conserved ones. And of course, it may be a quite difficult task to actually dis­ cover the physical processes that underlie some complex causal relation, particularly in sciences other than physics. For the physical process proponent, though, it is important to keep this epistemological challenge separate from the metaphysical question of what (singular) causation is. Page 5 of 23

Singular Causation The obvious concern about physical process theories is that we seem to sometimes have a singular causal relation without any relevant process or exchange of a conserved quantity connecting the putative cause and effect. For example, if I fail to water my plants, then it certainly seems that the lack of water causes their death, even though there is no (rele­ vant) causal process between me and the plants. In fact, we think that the absence of such a process is exactly what causes the problem! More generally, many canonical cases of singular causation seem to involve the absence or removal of a causal factor; in these cases, there is no physical process between cause and effect, and thus no actual causa­ tion according to physical process theories (Schaffer, 2000). The standard reply is to ar­ gue that these cases correspond to “quasi-causation,” a relation that looks and behaves much like singular causation, but is not actually causation. More precisely, Dowe (2001) argues that quasi-causation is grounded in the truth of key counterfactuals about what would have happened if, for example, the missing causal factor had actually been present. For example, lack of water (quasi-)causes my plants’ death because of the truth of the counterfactual “if I had watered them, then the watering would have caused them to live,” where “caused” in this counterfactual is understood in the standard physical process sense. Quasi-causation relations are thus “causal” relations, but depend on coun­ terfactual causation rather than actual causation. Appeals to quasi-causation can help to explain many of our strong intuitions about singu­ lar causal relations in the world. At the same time, the use of quasi-causation comes at a cost for physical process theories. Part of the intuitive appeal of such theories is that they enable us to understand singular causation entirely in terms of the actual world. The only things that matter for these theories are the processes and interactions that actually oc­ cur (and are thereby observable, testable, and so on). In contrast with the difference-mak­ ing theories discussed in the next subsection, we do not need to consider what would have happened if the world had been different in certain ways. Quasi-causation does not have this appealing feature, however, as it depends critically on one or more counterfac­ tuals. Thus, the physical process theorist is in the difficult position of either (a) conclud­ ing that cases of prevention or causation by omission are not on par with other causal re­ lations; or (b) embracing counterfactuals and so losing an appealing feature of the theo­ ries (see also McGrath, 2005). Option (a) appears to rest largely on an intuition that omis­ sions and preventions are not “real” causes, but there is little (p. 205) theoretical justifica­ tion or experimental support for that intuition. Option (b) faces the dual challenges of ex­ plaining both which counterfactuals are relevant, and also their truth-conditions. Physical process theories are arguably ill-equipped to handle either challenge, in large part pre­ cisely because they are grounded in the actual world (Schaffer, 2001). If we are going to use counterfactuals in our analysis of singular causation, then we should perhaps instead start with ones about how the effect would be different if various factors had varied—that is, we should consider a difference-making approach.

Page 6 of 23

Singular Causation

Difference-Making Approaches The idea that the singular causes are those that made a difference to the effect dates back at least to Hume (1748/2001), though he is better known for his associationist un­ derstanding of causation. The fundamental challenge with grounding singular causation in difference-making is that any such analysis must necessarily be counterfactual in na­ ture. A factor makes a difference only if the world would have been different (in the right way) if the factor had been different, but we do not have direct access to the relevant counterfactual scenarios. For example, the claim “this hammer strike made a difference in this glass shattering” implies that the glass would not have shattered if the hammer had not struck it, but of course the hammer did strike it. Thus, any difference-making the­ ory of singular causation must depend on counterfactuals, which can be difficult to as­ sess. Moreover, the difference-making theory must also say which counterfactuals are rel­ evant. In the hammer/glass example, the relevant counterfactual is obvious, but other scenarios are much less clear. Consider again the case of the perfectly accurate Billy and Suzie throwing rocks at a bottle, where Suzie’s rock is the one that actually makes con­ tact. In this case, the “obvious” counterfactual—if Suzie had not thrown her rock, then the bottle would not have broken—turns out to be false (since Billy’s rock would have bro­ ken the bottle instead), even though (by assumption) Suzie’s throw is the singular cause of the bottle breaking. The relevant counterfactual needs to involve Billy, as Suzie’s differ­ ence-making can only be seen when he refrains from throwing his rock. The key chal­ lenge for a difference-making theory of singular causation is thus to explain which coun­ terfactuals ground the singular causal claim. David Lewis (1973, 2000) based his answer to this question on the notion of possible worlds. More specifically, the relevant counterfactual is determined by the closest possi­ ble world in which the potential singular cause C did not occur. C is an actual singular cause just when, in this closest possible world where C does not occur, the effect E does not occur, or occurs in a substantively different way (Lewis, 2000). In the Billy and Suzie case, for example, the closest possible world in which Suzie does not throw her rock is one in which Billy does, but his rock throw will lead to a substantively different bottle shattering than the one that actually happened. At the same time, the closest possible world in which Billy does not throw is one in which Suzie throws in the same way as the actual world, resulting in the same shattering. Thus, this analysis concludes (correctly) that Suzie’s throw is a singular cause of the bottle breaking, but Billy’s throw is not. At the same time, however, there are significant concerns about this way of understanding singular causation (e.g., Kvart, 2001). For example, this theory counterintuitively implies that any factor that changes the manner of the effect’s occurrence is a singular cause of the effect’s occurrence at all. For example, putting a bandage on someone’s wound can be a cause of her death if that action delays, and so changes the manner of, her death. More significantly, this theory requires some type of distance measure over possible worlds, in order to identify the appropriate grounding for the key counterfactual. No sat­ isfactory, substantive theory has been offered for such a measure, partly because Lewis aspired to provide a reductive account of singular causation, and so required that the dis­ tance measure not refer to any causal relations. One could instead base the distance mea­ Page 7 of 23

Singular Causation sure partly on general causal relations, but the distance measure then becomes dispos­ able. The more recent difference-making theories have pursued exactly the strategy of ground­ ing singular causation in general causal relations, informed by the actual events and (per­ haps) additional information (Hall, 2007; Halpern, 2016; Halpern & Hitchcock, 2015; Halpern & Pearl, 2005b; Hitchcock, 2007a; Weslake, in press; Woodward, 2003; Wood­ ward & Hitchcock, 2003). The shared intuition in all of these theories is that the general causal relations help to determine the relevant counterfactuals, though in a complicated manner. All of these theories have been expressed in the language of causal graphical models, so we need to have a brief detour to explain that formalism (see also Rottman, Chapter 6 in this volume). There are many introductions to graphical models (Pearl, 2000; Spirtes, Glymour, & Scheines, 1993; also, many of the previous references in this para­ graph), (p. 206) and so I focus here on the high-level picture. It is important to bear in mind that the causal graphical model represents only the general causal relations; both the actual events and something more (that varies between theories) are required to get singular causation. Causal graphical models are composed of two distinct, but related, components. The first is a directed acyclic graph that captures the qualitative causal relations (i.e., what causes what?). More specifically, we have a graph composed of nodes for the variables or events (e.g., nodes can take on different values depending on the actual state of the world), and an A → B connection just when A (the variable, or the occurrence of an event) is a general cause of B. For example, both Suzie’s throw and Billy’s throw are, in terms of general causal relations, causes of the target bottle breaking; they both have perfect accuracy, and so they always hit their intended targets. We can represent this qualitative causal structure as S → T ← B, where the nodes correspond to Suzie’s throw (S), Billy’s throw (B), and hitting the target (T). Absences of arrows are informative in the causal graphical model framework: for example, the lack of an S → B edge means that whether Suzie throws does not cause whether Billy throws. The second component of a causal graphical model captures the quantitative or functional (general) causal relations. This component can take many different forms, including linear or non-linear equations, or potentially complex conditional probabilities. For example, we can use T = S ∨ B (where “∨” denotes logical OR) to capture the idea that the bottle is broken if either Suzie or Billy (or both) throws. Most difference-making theories of singular causation use deterministic, quasilogical structural equations, but deterministic systems are used principally so that the rel­ evant counterfactuals are well-defined.2 The overall framework of causal graphical mod­ els (as representations of general causal relations) is perfectly well-defined for probabilis­ tic causal relations, whether those probabilities are due to ignorance or features of the physical situation. The two components of a causal graphical model must be connected together in a coher­ ent manner, typically through two assumptions. The causal Markov assumption says that every node is quantitatively independent of its non-effects (direct or indirect), conditional on its direct causes. In other words, once we know the value(s) of the direct cause(s) of Page 8 of 23

Singular Causation node X, learning the values of nodes that are not “downstream” of X does not give us any more information about X. More generally, the causal Markov assumption uses the quali­ tative graph to constrain the quantitative component. The causal faithfulness assumption (alternative, related assumptions are called Stability and Minimality) is essentially the converse of the causal Markov assumption: the only quantitative independences are those required by the causal Markov assumption. This assumption thus uses the quantitative component to constrain the qualitative graph. For example, causal faithfulness implies that any two nodes that are quantitatively independent, perhaps given knowledge of oth­ er nodes, must not be directly adjacent to one another. A key (in this context) implication of these two assumptions is that the quantitative component can be fully specified by giv­ ing the appropriate functional relation for each node in terms of its direct causes. For ex­ ample, if A → B → C, then the quantitative component can be expressed as A = f();3 B = g(A); and C = h(B).4 There are numerous philosophical debates about the status of these as­ sumptions (e.g., Cartwright, 2002, 2007; Glymour, 1999; Hausman & Woodward, 1999), but we leave those aside here. In general, the motivation for these assumptions is that they encode one important way that causation can manifest in observations and data in the world. Causal graphical models were introduced to capture general causal relations, and are particularly useful for modeling the population-level impact of manipulations, including the asymmetry of manipulation (Hausman, 1998): changing the state of a cause exoge­ nously (i.e., from outside of the causal system) leads probabilistically to changes in its ef­ fects, but an exogenous change in an effect does not lead to changes in its causes. For ex­ ample, if I change the state of a light switch, then that probabilistically leads to changes in the state of the lights; exogenously changing the state of the lights (e.g., by smashing the bulbs to guarantee that they are off) does not change the state of the switch. For con­ venience, I focus here on “hard” interventions that completely determine the value of the target of the intervention, though the formal framework can equally be used, with addi­ tional complications, to represent “soft” interventions that influence the target without completely controlling it (e.g., Eberhardt, 2014). Hard interventions are easily modeled in the causal graphical model framework. To see how it works, consider the simple Switch → Lights causal structure just mentioned. We represent a hard intervention on a target variable T by introducing a new cause I of T. For example, we (p. 207) might augment our causal structure with two specific interventions to yield Flip → Switch → Lights ← Smash. An intervention I has the special property that the value I = yes (i.e., the intervention being active) completely determines the value of T and so breaks or eliminates all other edges into T. If instead I = no, then all of the causal relations are left intact.5 In our example, if Flip = yes, then the Switch → Lights connection is preserved. In contrast, if Smash = yes, then the other incoming edge to Lights (i.e., Switch → Lights) is broken, since the state of the lights no longer causally de­ pends on the switch state. The asymmetry of manipulation thus emerges immediately from the (graphical) impact of interventions.

Page 9 of 23

Singular Causation We can now return to singular causation. Recall the core counterfactual for differencemaking accounts: C singularly caused E just if E would have been different if C (and per­ haps other factors F, G, …) had been different. The causal graphical model framework can provide us with the resources to state the relevant counterfactuals more clearly. One intu­ itive idea is (roughly) that the key test counterfactuals arise from (1) changing C to differ­ ent values by manipulation, while (2) possibly changing other variables that are not on a causal path from C to E in the underlying causal graph, but (3) preferring to leave these “off-path” variables at their actually occurring values if possible (for different ways of making this formally precise, see Halpern, 2016; Halpern & Pearl, 2005a, 2005b; Wes­ lake, in press; Woodward, 2003; Woodward & Hitchcock, 2003). The “not on a path” re­ striction in (2) is important because we want to allow for A to be a singular cause of B even if its influence passes through the intermediate cause M, but if M is held fixed, then B’s value will not depend on A. We thus only allow ourselves to change off-path variables. Using this overall idea, we find that Suzie is a singular cause of the bottle breaking be­ cause T would have a different value (= 0, or unbroken) if (1) S is set to a different value (= 0, or no throw) by intervention, while (2) a variable not on the S → T path, namely B, is also set to a different value (= 0, or no throw). Of course, the same analysis also shows that Billy is a singular cause, which is (by assumption) simply false. We thus see the im­ portance of representational choices in this framework: if we want to capture the idea that Suzie’s rock arrived first, then we need to explicitly represent that possibility in the causal graph, perhaps by introducing SH and BH nodes to represent Suzie or Billy’s rock hitting. When we do this, these causal graphical model-based approaches give the intu­ itively correct judgments. Nonetheless, these analyses fail to capture some intuitive judgments, precisely because they focus on the actual world rather than the “normal” or “regular” world. For example, suppose that I get the influenza vaccine, but then am never subsequently exposed to the influenza virus. Intuitively, it seems incorrect to say that the vaccine is a singular cause of my not being infected, as I was never exposed in the first place. But the above-referenced theories all say that it is a singular cause because of the truth of the test counterfactual: if Vaccine were different, then Infection would be different (in the world in which Exposure occurs). Moreover, this case is formally isomorphic to ones in which these theories give the correct answer, so we have to add additional information to distinguish them. One re­ sponse would be to focus on variation and covariation within a pragmatically determined, focal set of cases (Cheng & Novick, 1991), which could yield different general causal rela­ tions for this particular context. One would need a rich theory of pragmatics to fully spec­ ify this account, however. The more common response for normative theories has been to focus only on counterfac­ tual possibilities that are more “normal” (or closer to the “defaults,” or more “typical,” or …) than the actual world (different ways of capturing this idea can be found in Hall, 2007; Halpern & Hitchcock, 2015; Hitchcock & Knobe, 2009; Livengood, 2013). That is, the rel­ evant difference-making counterfactuals for singular causation must involve changing atypical aspects to more typical ones. In the influenza case, this change means that we Page 10 of 23

Singular Causation should not consider worlds in which Exposure occurs (assuming non-exposure is normal), and so the problematic counterfactual never arises. Instead, we get the intuitively correct judgment that the non-exposure is the singular cause, not the vaccine (Hall, 2007; Halpern & Hitchcock, 2015). Of course, we have to be very careful about exactly how we understand the notion of “normal” or “default” in these cases; in particular, there might be complicated multivariate patterns of normality (Livengood, 2013). Nonetheless, this adjustment can both capture our intuitions and incorporate a measure of pragmatics into our theory. On these theories, for example, oxygen is a singular cause of a fire on the In­ ternational Space Station but not in my office precisely because oxygen is abnormal in space, but not in my office. These difference-making accounts of singular causation can readily explain the cases that are difficult for the physical process theories, precisely because these accounts have no requirement that there be any consistent process, or even any process at all, con­ necting the singular cause with the singular effect. In particular, causation by omission is completely straightforward, since absences can clearly make a difference; absence of oxy­ gen, for example, certainly makes a difference to one’s survival. More generally, we sim­ (p. 208)

ply need to ask, on these accounts, whether the causal factor being absent made a differ­ ence in the effect occurring. Difference-making accounts struggle, however, with overde­ termination cases since those situations involve multiple factors that could have made a difference, but only one that actually did make a difference. These accounts, for example, require specific representations or default states to capture Billy not being a cause of the bottle breaking. Nonetheless, these approaches have inspired a number of experimental studies, as we will see in section “Singular Causation, Descriptively.” Given the comple­ mentary strengths and weaknesses of the two types of approaches, there have been some preliminary investigations into reconciling process and difference-making theories (e.g., Woodward, 2011), though there are still many open questions about the viability of such unifications.

Worries About Both Approaches Both types of normative theories of singular causation struggle to capture some of our in­ tuitions, but there are more general concerns that arise for any of the currently proposed normative theories. I focus in this section on just three issues, one methodological and two substantive. The methodological worry derives from the practice of justifying a nor­ mative theory partly by demonstrating consistency with our intuitions about “important” test cases (e.g., Suzie and Billy). It is rarely explicitly stated, however, which cases should count as “important.” If there are too few cases, then too many theories will be prima fa­ cie justifiable; every account gets the Suzie and Billy case right, for example. If we cast the net too broadly, though, then we end up with far too many cases to survey, even if we use various formal symmetries to reduce the number (Glymour et al., 2010). Instead, we need somehow to determine which test cases are truly important, and hope that there is the right number of them. No systematic position about the test cases has yet been pro­ vided, however. Page 11 of 23

Singular Causation The first substantive worry arises in the context of voting cases. The normative theories all yield the correct prediction for simple two-option elections, but this correctness is very fragile. As just one example (but see Livengood, 2013, for many more), suppose there are three options on the ballot and the option with the plurality (not necessarily majority) of votes wins. Suppose we have ten votes, where seven vote for option #1, two for option #2, and one for option #3. Intuitively, and on all of the normative theories (or at least, the difference-making theories; the physical process theories are less clear on these cases), the seven votes for option #1 are singular causes of that option winning. Much less intu­ itively, these theories say that the votes for options #2 and #3 are also singular causes of option #1 winning! The basic idea is that those votes being distributed as they are (in conjunction with some of the other votes) led to option #1 having the most votes (see Livengood, 2013, for proofs). Thus, they are held to be singular causes, which seems quite strange. It is unclear whether voting scenarios count as “important” test cases, or even what our exact intuitions are for complex voting cases (Glymour et al., 2010). Nonetheless, these cases reveal a significant shortcoming of the different normative theo­ ries. The other substantive concern is arguably partially responsible for this shortcoming: namely, these normative theories do not understand singular causation as being truly con­ trastive. That is, they all ask “is C a singular cause of E?” rather than asking “is C (rather than C*) a singular cause of E (rather than E*)?” There are, however, several different lines of argument that all suggest that singular causation is fundamentally contrastive in nature (Hitchcock, 2007b; Livengood, 2013; Northcott, 2008; Schaffer, 2005). For the most part, these arguments all depend on demonstrations that whether C is a singular cause of E sometimes depends on the possible alternatives, either for C or E. For example, suppose I drink five cups of coffee and so have some muscle tremors. It seems natural to say that the coffee is a cause of the tremors, but that response is (according to propo­ nents of contrastive accounts) based partly on our understanding that “drinking no cof­ fee” is the natural contrast alternative. If we instead consider the contrast of drinking eight cups of coffee, then it is much less clear what to say. A natural response is that it was simply drinking “too much” coffee that is the singular cause, rather than any particu­ lar number of cups, but the normative theories can yield this response only if they are very careful about exactly how they represent the situation. (p. 209) Relatedly, these theo­ ries include little information about the dynamics of the situation, but it seems that the (singular) cause is often thought to be the factor or factors that changed, where those changes naturally suggest exactly the contrast information that is required on these con­ trastive accounts (Glymour et al., 2010). Unfortunately, these normative theories typically fail to give clear answers or guidance in these types of situations.

Singular Causation, Descriptively The different theories outlined in the previous section all aim to characterize the actual singular causation relations in the world. Our singular causation judgments presumably track this relation to some extent, but those judgments could easily diverge in particular Page 12 of 23

Singular Causation cases, or in the content of the judgment. In general, empirical research on human singu­ lar causation judgments has been relatively independent from the normative theories; tighter integration of the two lines of research is an important challenge moving forward, as each would arguably benefit from the insights of the other. In particular, much of the empirical research blurs together judgments about multiple notions—for example, singu­ lar causation, moral responsibility, legal responsibility, and emotional reactions such as blaming—and does not carefully distinguish physical process and difference-making con­ siderations in the experimental stimuli (with notable exceptions, of course). We thus need to be careful about exactly what conclusions we draw from particular findings of signifi­ cant effects. There are two broad types of empirical research that are informative about singular cau­ sation judgments. First, there are many experiments in which people make judgments about particular causal relations after watching a video or other perceptual sequence. Judgments such as “this collision caused that ball to move” are clearly singular in nature, and seem to emerge relatively automatically from our perceptual inputs. Many of these experiments are in the Michottean tradition, but some experiments requiring inferences about forces also involve these types of singular causation judgments. These two empiri­ cal literatures are extensively covered in White (Chapter 14 in this volume) and Wolff and Thorstad (Chapter 9 in this volume), and so I simply refer the reader to those chapters for descriptions and citations of the experiments. The key conclusion for our present purpos­ es is that these experiments provide significant, but not unequivocal, support for physical process-style theories of singular causation. In particular, people’s causal perceptions seem to depend on whether a continuous physical process connects the different compo­ nents. At the same time, such a process seems to be sometimes inferred or imputed, ar­ guably on the basis of difference-making features. Causal perception and force dynamics experiments have largely not systematically pitted difference-making judgments against physical process judgments to see which (if either) is driving singular causation judg­ ments in this domain (though see Schlottmann & Shanks, 1992). At the current time, we can only conclude that some singular causation judgments seem to involve physical process considerations, though those processes need not be the sole basis of those judg­ ments. The second line of descriptive research is largely vignette-driven: experimental partici­ pants are provided with a story or description and are asked to judge the singular causal relations, either by identifying “what caused what” or with numeric judgments of “how much” one factor caused the target effect (see also Hilton, Chapter 32 in this volume; Lagnado & Gerstenberg, Chapter 29 in this volume). These experiments (almost) all ask for explicit conscious judgments about linguistically described situations, and the primary experimental design involves between-participant manipulations of various features of the described situation. Walsh and Sloman (2011) directly asked participants for such sin­ gular causation judgments in a number of standard cases from the philosophical litera­ ture, including ones that are structurally identical to Suzie and Billy throwing rocks. They were particularly interested in judgments of both singular causation and singular preven­ tion of some outcome. Their results suggest that people typically use the word “cause” Page 13 of 23

Singular Causation only when they have knowledge of some underlying physical process or mechanism, while “prevent” is often grounded in knowledge of difference-making. This suggests that “cause” and “prevent” might not be antonyms in everyday usage. At the same time, one striking feature of Walsh and Sloman (2011) is the high degree of variation in the results. Even in the seemingly straightforward Suzie/Billy case when Suzie’s rock hits the bottle first (but Billy’s would have hit it, if she had missed), 16% of participants do not agree that Suzie caused the bottle to break.6 More generally, there is non-trivial variation in the data for almost all of the experiments discussed throughout this section, so one must be careful not to over-interpret the results. Lombrozo (2010) found a somewhat different pattern of singular causation judgments about (p. 210) typical philosophical cases, particularly those involving so-called double prevention. In a double prevention case, some factor M would normally prevent E from oc­ curring, but a different factor C prevents M from occurring. Thus, if C occurs, then E also does. The key question is whether some particular C is a singular cause of some particu­ lar E, when this C prevents any Ms from occurring. Double prevention cases clearly sepa­ rate physical process and difference-making theories: there is no connection between C and E, so the former will judge C to not be a singular cause; in contrast, C made a differ­ ence for E, and so the latter will imply that C is a singular cause. Interestingly, people do not conform neatly to either theory, but rather attend to additional features of the situa­ tion. In particular, if the occurrence of C has the function of preventing Ms so that E can occur (e.g., if someone intentionally does C to bring about E, or a machine is designed so that C leads indirectly to E), then people typically judge C to be a singular cause of E. If the C–E connection is instead accidental, then people typically judge C to not be a singu­ lar cause of E (Lombrozo, 2010). People’s singular causation judgments are influenced by the reason why C occurs, and not simply whether C occurs at all. Other vignette-based experiments have focused less on the underlying causal structures, and more on the relationship between singular causation judgments and other types of judgments, particularly those of moral responsibility or norm violations more generally. As a concrete example, consider the widely studied “Pen Case” from Knobe and Fraser (2008). In this vignette, an administrative assistant is unable to write down an important message because the last two pens have been taken from the storage location by a pro­ fessor and a staff member. The between-participant manipulation is which individuals are allowed to take a pen—the professor, the staff member, both, or neither—and participants are asked to indicate the extent to which each individual caused (or is a cause of) the in­ ability to write down the message. Importantly, the non-social norm facts are balanced so that there is no purely physical reason to think one individual is more of a cause. Nonetheless, the standard finding is that the individual(s) who is not supposed to take a pen is judged to be more of a (singular) cause of the problem, as well as being more blameworthy. That is, singular causal judgments seem to be partially driven by moral re­ sponsibility or norm violation judgments (e.g., Alicke, 1992; Alicke, Rose, & Bloom, 2011; Hitchcock & Knobe, 2009; Kominsky, Phillips, Gerstenberg, Lagnado, & Knobe, 2015), though general causal judgments are interestingly less influenced by such considerations (Danks, Rose, & Machery, 2014). At the same time, the influence also seems to go in the Page 14 of 23

Singular Causation other direction: for example, causal judgments can influence moral culpability judgments (Cushman, Knobe, & Sinnott-Armstrong, 2008). At a high level, the results of these types of vignette-based experiments are largely conso­ nant with the more sophisticated difference-making accounts of singular causation. That is, people’s singular causal judgments seem to be sensitive to the truth of particular focal counterfactuals that can be derived from (a) causal graphical model representations of the general causal relations, and (b) facts about the specific situation, including defaults or “normal” values. We have independent grounds for thinking that causal graphical mod­ els provide a good model of human causal knowledge (e.g., Danks, 2014; Holyoak & Cheng, 2011; Rottman, Chapter 6 in this volume), and so much of the focus of this re­ search has been on the details of (b), and particularly on the role of the norms—statisti­ cal, social, conventional, or moral—present in the actual situation. Early work in this area focused on the influence of prescriptive norms that say what one ought to do, or how something ought to function. For example, someone acting illegally is judged to be more of a (singular) cause of some bad outcome than an individual acting legally (Alicke, 1992). The relevant norms need not be legal ones, though, as shown by the Pen Case: the singular causation judgments in that case track violations of the pre­ scriptive norm, even though that norm is grounded in social and institutional facts, rather than legal ones. More generally, a number of different experiments have shown that a vio­ lation of a prescriptive norm is consistently judged to be more of a singular cause than the exact same action when no prescriptive norm is being violated (e.g., Alicke, 1992; Al­ icke et al., 2011; Cushman et al., 2008; Hitchcock & Knobe, 2009; Knobe & Fraser, 2008). There is even suggestive evidence that singular causation judgments are influenced by vi­ olations of typicality norms that indicate what is statistically normal in a population (Hitchcock & Knobe, 2009; though see Sytsma, Livengood, & Rose, 2012). Moreover, these singular causal judgments are sensitive not just to whether the action violated a norm, but also whether other relevant actions or events violated a norm. In particular, the causal responsibility of a norm-violating event can be mitigated when another (p. 211) norm-violating event “supersedes” the former (Kominsky et al., 2015). One challenge in interpreting these experiments is that there is often no clear under­ standing of what additional information is carried by some prescriptive or typicality norm, or by a violation of that norm. Suppose, for example, that the Pen Case norm is that pro­ fessors are not supposed to take pens (and so the professor is judged to be more of a sin­ gular cause). Given only knowledge of the existence of this norm, one could potentially, but not necessarily, also infer that probably (a) in the past, there have been problems when professors took pens; (b) in the past, there have not been problems when staff members took pens; (c) professors only take pens when they have a reason that overrides the norm; (d) professors usually do not take pens; (e) staff members take pens more often than professors; (f) common knowledge of any of (a)–(e) within the department; and per­ haps many other implications. That is, if we learn that some action is a norm violation, we often learn more than just that there is a norm; we also potentially learn many additional facts about the relevant causal structures, statistics, and possible actions. It is quite clear Page 15 of 23

Singular Causation that norm violation information influences singular causation judgments, but the how and why of that influence is largely unknown. Further research carefully disentangling these different pathways is a significant open research problem. Social psychology research on attribution theory (e.g., Heider, 1958; Hilton, Chapter 32 in this volume; Kelley, 1973; Nisbett & Ross, 1991) provides an additional set of empirical results that have arguably been underutilized in the study of singular causation judg­ ments. Attribution theory examines the (causal) explanations that people provide to ex­ plain their own or others’ behaviors, particularly focusing on whether those explanations appeal to factors that are internal or external to the agent. For example, if I am late to a meeting, is that explained in terms of some internal disposition (“David always loses track of the time”) or external circumstances (“The bus that David takes to campus arrived late”)? The most famous result—the so-called fundamental attribution error (Jones & Har­ ris, 1967)—was that people (or at least, Western-educated undergraduate students) tend to emphasize internal factors when explaining others’ actions, but external factors when explaining their own. As with almost all “classic” findings, the story is significantly more complicated than the usual presentations of the fundamental attribution error (Malle, Knobe, & Nelson, 2007; Norenzayan, Choi, & Nisbett, 2002). Nonetheless, this overall area of social psychological research is clearly relevant, as people are making singular causation judgments, perhaps implicitly, in the course of constructing particular causal explanations. There has been some crossover between these literatures, but not yet a sys­ tematic integration of the two (see also Hilton, Chapter 32 in this volume). For all of these experimental results, it is also important to bear in mind the potential lim­ its of vignette-based research. There is a long history of experiments in judgment and de­ cision-making demonstrating that people behave differently if information is presented as a (textual) story—“learning from description”—rather than as cases or other less-linguis­ tic stimuli—“learning from experience” (Barbey & Sloman, 2007; Erev et al., 2010; Gigerenzer & Hoffrage, 1995; Hau, Pleskac, Kiefer, & Hertwig, 2008; Hertwig & Erev, 2009). The general pattern of findings about these two modes of learning is that estima­ tion and reasoning are more accurate given learning from experience (rather than learn­ ing from description), particularly in domains such as contingency learning and choice under uncertainty. In particular, people’s judgments seem to be less subject to factors that are seemingly irrelevant to those tasks, such as salience or representativeness of particular stimuli. The exact mechanisms underlying these differences are currently an open research question, so it is unclear whether similar behavior should arise in singular causal judgment. At the same time, people’s singular causation judgments given more naturalistic stimuli could plausibly be quite different from the results reported here. For example, normative considerations could conceivably play less of a role if people learn from experience rather than a vignette (see also Danks et al., 2014). Finally, there is an interesting feature that emerges from all of these lines of empirical re­ search, and that calls into question whether the modifier “singular” is appropriate. The use of that modifier suggests that the judgment is specific to this particular, unique situa­ tion, rather than applying more generally. However, we consistently find that people use Page 16 of 23

Singular Causation “singular” causation judgments, whether based in perception, vignettes, or prior knowl­ edge, to make inferences about other cases. That is, people seem to regard judgments of singular causation as “portable,” in the sense that they are informative about, and can carry over to, novel situations (Danks, 2013; Hitchcock, 2012; Lombrozo, 2010). As just one example, perceptual judgments about (p. 212) collisions between balls—that is, para­ digmatic cases of causal perception—provide generalizable information about future colli­ sions, such as the relative weights of the balls (White, 2009). In some ways, the exporta­ bility of singular causation judgments is unsurprising: there would be little reason to make such judgments if they were completely uninformative about future situations. Nonetheless, this feature has been largely ignored in normative theories of singular cau­ sation (though see Hitchcock, 2012). Despite the name, singular causation judgments are not fully unique and particularized, but rather seem to be connected closely with our abil­ ities to act, predict, and explain in future, novel situations.

Open Problems and Challenges There has been substantial research on singular causation—both psychologically and philosophically—but many open questions remain. Perhaps the most obvious open chal­ lenge is that, despite the experimental results discussed in the section “Singular Causa­ tion, Descriptively,” the relevant empirical phenomena and underlying cognitive process­ es have been only partially characterized. Some situational and psychological influences on singular causation judgments have been identified, but it is unlikely that these form a complete set. Moreover, very little is currently known about the cognitive processes by which these factors influence those judgments. For example, it seems quite likely that blame judgments (or other morally negative appraisals) influence causal judgments, but there are many possible routes by which they could come to have such impact. Consider three possible, not mutually exclusive, mechanisms inspired by different theoretical pro­ posals (for singular causation) currently in the literature: (1) morally negative appraisals (or other judgments of norm violation) trigger counterfactual thinking, which prompts particular singular causation judgments (Knobe, 2009; Kominsky et al., 2015); (2) people want (perhaps unconsciously) to justify their negative reactions, and so they judge the targets of those reactions as more causal since one can only blame or criticize something that was a cause (Alicke, 1992); and (3) morally wrong behavior or other norm-violating factors are those one most wants to change, and singular causation judgments carry in­ formation about which factors are “good” candidates for intervention (Hitchcock & Knobe, 2009). These three possible mechanisms presumably can be experimentally distin­ guished if we expand our methods beyond vignette studies to include, for example, eyetracking and reaction-time studies. The relevant experiments have not yet been per­ formed, however, and so the underlying cognitive mechanisms remain an open question. On the normative, philosophical side, theories of singular causation remain an active top­ ic of research interest, with a particular emphasis on developing accounts that are better grounded in both formal theories and established intuitions.

Page 17 of 23

Singular Causation Another significant set of open problems centers on the relationships between singular causation judgments and causal explanations (see also Lombrozo & Vasilyeva, Chapter 22 in this volume). In many cases, singular causal judgments are made partly to provide key premises in a causal explanation of some particular event. For example, if I am trying to explain why there is broken glass on the ground, I might appeal to the fact that “Suzie’s rock caused the bottle to break.” That is, the singular causation judgments do not solely describe the world, but also play an important role in our causal explanations. Thus, to the extent that our causal explanations are not simply lists of possibly relevant facts, we should expect our singular causation judgments to potentially bear some hallmarks of that function. Lombrozo (2010) explored this connection between explanation and singu­ lar causation judgments, and found that those latter judgments did seem to be sensitive to their subsequent use in causal explanations. In particular, just as causal explanations generalize to similar situations, singular causation judgments were shaped by their gen­ eralizability to future cases (see also Ahn & Bailenson, 1996). At the same time, a singu­ lar causation judgment is obviously not the same as a causal explanation, and there ap­ pear to be some systematic divergences between judgments and explanations (Livengood & Machery, 2007). The exact nature of those differences is a substantial open problem. A final significant challenge is the lurking possibility of pluralism about singular causa­ tion, either psychologically or philosophically. An implicit assumption of this whole chap­ ter has been that there is some single relation—whether in the world or in our minds— that corresponds to singular causation. The concern is that this assumption might be false: perhaps singular causation corresponds to many different relations depending on domain, (p. 213) background knowledge, and so forth. That is, perhaps there is no single description that picks out the singular causes in either the world or our judgments. A number of philosophers (e.g., Hall, 2004) have suggested that there are two different types of singular causation (either relations or judgments): one type based on physical processes, and one based on difference-making. Of course, in the actual world, these two types of singular causation typically proceed in unison: if there is a physical process con­ necting C with E, then C will make a difference in E, and vice versa. Nonetheless, these are distinct relations, and the pluralists argue that there is no single relation that corre­ sponds with singular causation, again either in the world or in our judgments. These ar­ guments leave open the possibility that some future account will successfully unify these different approaches; Wolff (2014), for example, argues that force dynamics models pro­ vide such a unification (see also Wolff and Thorstad, Chapter 9 in this volume). Despite the implicit assumption in most of this chapter of monism about singular causation, the precise taxonomy of singular causation—one versus many—remains a significant open challenge. It is clear that many different cognitive operations depend critically on judgments of, and reasoning about, singular causation. An understanding of general causal structure does not suffice for causal explanations of particular events, or rich counterfactual reasoning about a particular case, or assignment of blame for particular outcomes. Instead, we need to use additional information, whether about physical processes connecting parts of the causal structure, or about which factors (counterfactually) made a difference about the Page 18 of 23

Singular Causation target event. This area of causal cognition is somewhat unusual, as our normative, philo­ sophical understanding is arguably more advanced than our empirical, psychological un­ derstanding. Increased interaction between the two approaches can thus only help to ad­ vance our knowledge of this area.

References Ahn, W.-K., & Bailenson, J. (1996). Causal attribution as a search for underlying mecha­ nisms: An explanation of the conjunction fallacy and the discounting principle. Cognitive Psychology, 31, 82–123. Alicke, M. (1992). Culpable causation. Journal of Personality and Social Psychology, 36, 368–378. Alicke, M., Rose, D., & Bloom, D. (2011). Causation, norm violation, and culpable control. Journal of Philosophy, 108(12), 670–696. Barbey, A. K., & Sloman, S. A. (2007). Base-rate respect: From ecological rationality to dual processes. Behavioral and Brain Sciences, 30, 241–297. Cartwright, N. (2002). Against modularity, the causal Markov condition, and any link be­ tween the two: Comments on Hausman and Woodward. The British Journal for the Philos­ ophy of Science, 53, 411–453. Cartwright, N. (2007). Hunting causes and using them: Approaches in philosophy and economics. Cambridge: Cambridge University Press. Cheng, P. W., & Novick, L. R. (1991). Causes versus enabling conditions. Cognition, 40, 83–120. Cushman, F., Knobe, J., & Sinnott-Armstrong, W. (2008). Moral appraisals affect doing/al­ lowing judgments. Cognition, 108, 281–289. Danks, D. (2013). Functions and cognitive bases for the concept of actual causation. Erkenntnis, 78, 111–128. Danks, D. (2014). Unifying the mind: Cognitive representations as graphical models. Cam­ bridge, MA: MIT Press. Danks, D., Rose, D., & Machery, E. (2014). Demoralizing causation. Philosophical Studies, 171(2), 251–277. Dowe, P. (1992). Wesley Salmon’s process theory of causality and the conserved quantity theory. Philosophy of Science, 59, 195–216. Dowe, P. (2001). A counterfactual theory of prevention and “causation” by omission. Aus­ tralasian Journal of Philosophy, 79, 216–226. Dowe, P. (2007). Physical causation. Cambridge: Cambridge University Press. Page 19 of 23

Singular Causation Eberhardt, F. (2014). Direct causes and the trouble with soft interventions. Erkenntnis, 79(4), 755–777. Erev, I., Ert, E., Roth, A. E., Haruvy, E., Herzog, S. M., Hau, R., et al. (2010). A choice pre­ diction competition: Choices from experience and from description. Journal of Behavioral Decision Making, 23(1), 15–47. Gigerenzer, G., & Hoffrage, U. (1995). How to improve Bayesian reasoning with­ out instruction: Frequency formats. Psychological Review, 102, 684–704. (p. 214)

Glymour, C. (1999). Rabbit hunting. Synthese, 121, 55–78. Glymour, C., Danks, D., Glymour, B., Eberhardt, F., Ramsey, J., Scheines, R., et al. (2010). Actual causation: A stone soup essay. Synthese, 175(2), 169–192. Hall, N. (2004). Two concepts of causation. In J. Collins, N. Hall, & L. A. Paul (Eds.), Cau­ sation and Counterfactuals (pp. 225–276). Cambridge, MA: MIT Press. Hall, N. (2007). Structural equations and causation. Philosophical Studies, 132(1), 109– 136. Halpern, J. Y. (2016). Actual causality. MIT Press. Halpern, J. Y., & Hitchcock, C. (2015). Graded causation and defaults. British Journal for the Philosophy of Science, 66(2), 413–457. doi:10.1093/bjps/axt050 Halpern, J. Y., & Pearl, J. (2005a). Causes and explanations: A structural-model approach, Part I: Causes. The British Journal for the Philosophy of Science, 56, 853–887. Halpern, J. Y., & Pearl, J. (2005b). Causes and explanations: A structural-model approach, Part II: Explanations. The British Journal for the Philosophy of Science, 56, 889–911. Hau, R., Pleskac, T. J., Kiefer, J., & Hertwig, R. (2008). The description-experience gap in risky choice: The role of sample size and experienced probabilities. Journal of Behavioral Decision Making, 21(5), 493–518. Hausman, D. M. (1998). Causal asymmetries. Cambridge: Cambridge University Press. Hausman, D. M., & Woodward, J. (1999). Independence, invariance and the causal Markov condition. The British Journal for the Philosophy of Science, 50, 521–583. Heider, F. (1958). The psychology of interpersonal relations. New York: Wiley and Sons. Hertwig, R., & Erev, I. (2009). The description-experience gap in risky choice. Trends in Cognitive Sciences, 13, 517–523. Hitchcock, C. (2007a). Prevention, preemption, and the principle of sufficient reason. The Philosophical Review, 116(4), 495–532. Hitchcock, C. (2007b). Three concepts of causation. Philosophy Compass, 2–3, 508–516. Page 20 of 23

Singular Causation Hitchcock, C. (2012). Portable causal dependence: A tale of consilience. Philosophy of Science, 79(5), 942–951. Hitchcock, C., & Knobe, J. (2009). Cause and norm. Journal of Philosophy, 106(11), 587– 612. Holyoak, K. J., & Cheng, P. W. (2011). Causal learning and inference as a rational process: The new synthesis. Annual Review of Psychology, 62, 135–163. Hume, D. (1748/2001). An enquiry concerning human understanding. Oxford: Clarendon. Jones, E. E., & Harris, V. A. (1967). The attribution of attitudes. Journal of Experimental Social Psychology, 3(1), 1–24. Kelley, H. H. (1973). The processes of causal attribution. American Psychologist, 28, 107– 128. Kitcher, P. (1989). Explanatory unification and the causal structure of the world. In P. Kitcher & W. Salmon (Eds.), Scientific Explanation (pp. 410–505). Minneapolis: University of Minnesota Press. Knobe, J. (2009). Folk judgments of causation. Studies in the History and Philosophy of Science, 40, 238–242. Knobe, J., & Fraser, B. (2008). Causal judgment and moral judgment: Two experiments. In W. Sinnott-Armstrong (Ed.), Moral Psychology (pp. 441–448). Cambridge, MA: MIT Press. Kominsky, J. F., Phillips, J., Gerstenberg, T., Lagnado, D., & Knobe, J. (2015). Causal super­ seding. Cognition, 137, 196–209. Kvart, I. (2001). Counterexamples to Lewis’ “Causation as influence.” Australasian Jour­ nal of Philosophy, 79, 411–423. Lewis, D. (1973). Causation. Journal of Philosophy, 70, 556–567. Lewis, D. (2000). Causation as influence. Journal of Philosophy, 97, 182–197. Livengood, J. (2013). Actual causation and simple voting scenarios. Nous, 47(2), 316–345. Livengood, J., & Machery, E. (2007). The folk probably don’t think what you think they think: Experiments on causation by absence. Midwest Studies in Philosophy, XXXI, 107– 127. Lombrozo, T. (2010). Causal-explanatory pluralism: How intentions, functions, and mecha­ nisms influence causal ascriptions. Cognitive Psychology, 61, 303–332. Malle, B. F., Knobe, J. M., & Nelson, S. E. (2007). Actor-observer asymmetries in explana­ tions of behavior: New answers to an old question. Journal of Personality and Social Psy­ chology, 93(4), 491–514. Page 21 of 23

Singular Causation McGrath, S. (2005). Causation by omission: A dilemma. Philosophical Studies, 123, 125– 148. Menzies, P. (1996). Probabilistic causation and the pre-emption problem. Mind, 105, 85– 117. Nisbett, R. E., & Ross, L. (1991). The person and the situation: Perspectives of social psy­ chology. New York: McGraw-Hill. Norenzayan, A., Choi, I., & Nisbett, R. E. (2002). Cultural similarities and differences in social inference: Evidence from behavioral predictions and lay theories of behavior. Per­ sonality and Social Psychology Bulletin, 28, 109–120. Northcott, R. (2008). Causation and contrast classes. Philosophical Studies, 139, 111–123. Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge, UK: Cambridge University Press. Salmon, W. (1997). Causality and explanation: A reply to two critiques. Philosophy of Science, 64, 461–477. Salmon, W. C. (1984). Scientific explanation and the causal structure of the world. Prince­ ton, NJ: Princeton University Press. Schaffer, J. (2000). Causation by disconnection. Philosophy of Science, 67, 285–300. Schaffer, J. (2001). Physical causation. British Journal for the Philosophy of Science, 52, 809–813. Schaffer, J. (2005). Contrastive causation. Philosophical Review, 114, 297–328. Schlottmann, A., & Shanks, D. R. (1992). Evidence for a distinction between judged and perceived causality. Quarterly Journal of Experimental Psychology, 44A, 321–342. Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, prediction, and search (1st ed.). Berlin: Springer. Sytsma, J., Livengood, J., & Rose, D. (2012). Two types of typicality: Rethinking the role of statistical typicality in ordinary causal attributions. Studies in History and Philosophy of Biological and Biomedical Sciences, 43(4), 814–820. Walsh, C. R., & Sloman, S. A. (2011). The meaning of cause and prevent: The role of causal mechanism. Mind and Language, 26(1), 21–52. (p. 215)

Weslake, B. (in press). A partial theory of actual causation. The British Journal for

the Philosophy of Science. White, P. A. (2009). Perception of forces exerted by objects in collision events. Psychologi­ cal Review, 116(3), 580–601. Page 22 of 23

Singular Causation Wolff, P. (2014). Causal pluralism and force dynamics. In B. Copley & F. Martin (Eds.), Causation in grammatical structures (pp. 100–119). Oxford: Oxford University Press. Woodward, J. (2003). Making things happen: A theory of causal explanation. Oxford: Ox­ ford University Press. Woodward, J. (2011). Mechanisms revisited. Synthese, 183, 409–427. Woodward, J., & Hitchcock, C. (2003). Explanatory generalizations, Part I: A counterfactu­ al account. Noûs, 37(1), 1–24. (p. 216)

Notes: (1.) Technically, Suzie’s rock is interacting with other causal processes (e.g., air mole­ cules) throughout its flight, as a tiny amount of its mass-energy is being transferred to those molecules. I ignore this complication. (2.) There are theories of counterfactuals for inherently probabilistic situations (e.g., “if I had worn a blue shirt, then that radioactive atom would not have decayed”), but little agreement about even what intuitions or phenomena should be captured by such theo­ ries. (3.) That is, A’s value is set by some exogenous factors outside of the causal system. (4.) Note that “=” in these equations includes information about causal order; B = h-1(C) might hold mathematically (when h is invertible), but it fails to represent the causal rela­ tions. (5.) In terms of the corresponding structural equations, I = yes means that the equation for T changes from a function of T’s graphical parents to simply T = tI, where tI is the T-outcome of the intervention. (6.) See the Mechanism Complete condition of Experiment 5 in Walsh and Sloman (2011). They use a slightly different cover story and swap names, but it is structurally identical to the Suzie/Billy case that has been a running example in this chapter.

David Danks

Departments of Philosophy & Psychology Carnegie Mellon University Pittsburgh, Pennsylvania, USA

Page 23 of 23

Cognitive Neuroscience of Causal Reasoning

Cognitive Neuroscience of Causal Reasoning   Joachim T. Operskalski and Aron K. Barbey The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.16

Abstract and Keywords The era of functional neuroimaging promised to shed light on dark corners of the brain’s inner workings, breathing new life into subfields of psychology beset by controversy. Al­ though revelations from neuroscience provide the foundation for current views on many aspects of human cognition, there continue to be areas of study in which a mismatch be­ tween the questions asked by psychologists and neuroscientists renders the implications of neuroscience research unclear. Causal reasoning is one such topic, for which decades of cognitive neuroscience findings have revealed a heterogeneity of participating brain regions and networks across different experimental paradigms. This chapter discusses (i) three cognitive and computational models of causal reasoning (mental models, causal models, and force composition theory), (ii) experimental findings on causal judgment and reasoning using cognitive neuroscience methods, and (iii) the need for a multidisciplinary approach to understanding the nature and mechanisms of causal reasoning. Keywords: fMRI, causal reasoning, causal models, mental models, force composition

[Alice] was quite surprised to find that she remained the same size: to be sure, this is generally what happens when one eats cake, but Alice had got so much into the way of expecting nothing but out-of-the-way things to happen, that it seemed quite dull and stupid for life to go on in the common way. —Alice’s Adventures in Wonderland (Carroll, 1865) Early in Lewis Carroll’s first novella following Alice’s journey through Wonderland, the ti­ tle character found herself musing about the nature of causal relations, trying to explain and predict events based on the curious ties that connect events in a world seemingly un­ bound by the usual rules of possibility. She correctly attributed her mysteriously shrink­ ing to the fact that she drank a potion marked “drink me,” and then correctly predicted that eating cake marked “eat me” might change her size yet again. Even in fiction, when the lines between possible and impossible can be flexed to resemble nothing like those in reality, human thought is still constrained by characteristic patterns of induction and rea­

Page 1 of 47

Cognitive Neuroscience of Causal Reasoning soning; anything else ceases to be human thought as it is usually framed in cognitive psy­ chology. Similarly, the beliefs of real people can be just as far from ground truth as Alice’s, but they are still constrained by a set of processes in the mind that are implemented in the brain to support beliefs and goal-directed behavior. Just like Alice—who was surprised by her shrinkage before attributing it to the one novel event immediately preceding it—peo­ ple typically attribute their physical maladies to novel events and behaviors that break from usual habits. Anecdotes abound of people who can no longer tolerate a specific brand or type of alcohol after a particularly painful hangover; operant conditioning processes may explain the physical aversion, but causal reasoning is required to make sense of it, (p. 218) explaining previous hangovers and taking action to prevent them in the future. However, causal judgment often requires more than identifying novel events that occur together, especially in the complicated world we inhabit, where multiple variables inter­ act with one another to cause or enable some events while preventing others. Statistical patterns of co-occurrence can be probed to correctly infer causality in many cases. Alice’s dramatic growth in Wonderland would have been neither remarkable nor attributable to the magic potion if she had only grown slowly over the course of the next 10 years; the base rate of normal growth in children is on the order of several centimeters per year, and would have been expected even without the potion. This is the basis of the probabilis­ tic contrast model of causal judgment: a given relation’s causal power is the difference between the probability of seeing an effect in the presence of its purported cause, and the probability of seeing that effect without the cause, separately accounting for the base rates of the events in question (Cheng & Novick, 1990; see Cheng, 1997, for further dis­ cussion and criticism of probabilistic contrast in the “power PC model”). The aim of this chapter is to discuss causal reasoning from a perspective grounded in neuroscience. Causal reasoning can be studied in the abstract, as a guideline for how to rationally form beliefs and update them, but it can also be studied “in the wild,” as it is practiced in the sciences and in daily life, and with attention to the brain and its causali­ ty-perceiving mechanisms that result from millions of years of natural selection for more successfully surviving and reproducing systems. Far from only an abstraction of statistics and mathematics, causal reasoning is all around us; it is in essence what field epidemiolo­ gists do to explain outbreaks of illness and predict their trajectories in a population; they look at patterns of dependency between events. To explain an individual person’s cough, for example, there may be multiple possible causes under consideration: a common cold virus, lung cancer, or heartburn (see Tenenbaum et al., 2011, for this example). The prob­ ability of having a cough when experiencing each of those conditions is called its likeli­ hood, and the likelihood of a cough is much higher with a virus or lung cancer than it is for heartburn. The probability of each hypothesis being true before seeing any evidence of it is called its prior probability, and the prior for cold viruses and heartburn is much higher than that for lung cancer. Considering all three hypotheses using Bayes’ Theorem (incorporating both prior probability and likelihood) then favors the cold virus as the most Page 2 of 47

Cognitive Neuroscience of Causal Reasoning likely causal explanation for having a cough. Even physicians well trained in medical di­ agnostics, however, are prone to ignoring the base rates of rare events when judging whether a positive test result is more likely to be due to disease or an error in testing (Krynski & Tenenbaum, 2007). Although people may possess the raw information process­ ing power to calculate prior and posterior probabilities when instructed how to do so, there is clearly another, more intuitive, set of mental processes available to those who are untrained in statistics and logic. Although they can also lead to errors in belief, intuitive causal judgment processes are far from a flawed way of thinking about the world, espe­ cially when patterns of co-occurrence are inadequate to mentally separate causes from their effects. Without complete knowledge of all possible causal factors that could be op­ erating in the background, the probabilistic contrast and other statistical theories of causal judgment are unable to differentiate causes from events that simply co-occur due to a third causal factor. They are also unable to account for the fact that people make causal judgments about events never seen before, and then use those judgments in subse­ quent reasoning without any possible knowledge of base rates or patterns of co-depen­ dency. To understand our collective places in the world as both objects and effectors of change, it is necessary to recognize the generative mechanisms linking events that occur in a par­ ticular sequence. To then behave in a goal-directed manner (pursuing some ends and avoiding others), it is also necessary to use and manipulate such knowledge and beliefs about the generative mechanisms that have already been inferred. By mentally represent­ ing the world as it is, while also imagining the world as it is not, we are able to integrate new and surprising information with the entirety of our prior experiences to explain the past and predict the future. In other words, we create new knowledge by combining and manipulating prior knowledge. This is the basis of reasoning and judgment. A remarkable body of work in the cognitive sciences has been devoted to modeling the reasoning process at the behavioral level. Competing models of “rational” or “normative” reasoning describe different ways to make judgments and combine beliefs to generate new ones, and they are typically evaluated for their ability to converge with the solutions generated by the norms of probability theory. Cognitive psychologists have also devel­ oped (p. 219) “descriptive” models of reasoning that purport to characterize how lay peo­ ple actually reason, irrespective of whether that involves converging with theoretical norms. Strong programs of research are devoted to the study of both sorts of reasoning models in cognitive science and psychology without appealing to the knowledge or assumptions of neuroscience. Rationality and human thought are conceptually self-contained; that is, rationality can be studied using our assumptions on the nature of truth and the rules of formal logic, and human thought can be probed by asking people to solve reasoning prob­ lems and asking them what they believe, without ever trying to separate the brain from the behavior or asking how the brain is operating in the background. Beyond demonstrat­ ing that the human mind is a manifestation of the structure and function of the brain, then, what could neuroscience possibly contribute to the study of reasoning that isn’t Page 3 of 47

Cognitive Neuroscience of Causal Reasoning equally or more thoroughly addressed by the experiments and proofs from psychology and philosophy? More generally, this question (or criticism) could be leveled at the entire field of cognitive neuroscience. To answer it, we also consider three more direct questions: • What is the goal of cognitive neuroscience? • How is cognitive neuroscience conducted? • What can cognitive neuroscience contribute to programs focused purely on cognition or neuroscience alone? One view of cognitive neuroscience is that it is a merging of already mature disciplines. By combining principles of behavioral science and neurobiology with norms from proba­ bility theory and concepts of truth from philosophy of science, we aim to gain a fuller pic­ ture of the nature of truth and the simultaneously powerful and limited way that the hu­ man mind understands its environment. The driving goal of cognitive neuroscience is thus to describe how the properties of the brain support the intricate inner workings of the hu­ man mind. What makes a human brain different from the simple neural networks of lob­ sters or sea snails? What makes a modern human brain different from those of modern gorillas and chimpanzees, or the now-extinct Homo neanderthalis or Homo erectus? We look at loss-of-function studies and neuroimaging experiments to answer very basic ques­ tions about brain–behavior relationships as a whole, and in so doing we gain insight about the component parts as well. Therein lies the possibility of learning how individual neu­ rons represent information in a way that supports representational thought, and why skin cells or muscle cells signaling to one another do not have the same capability. Therein al­ so lies the possibility of learning about the nature of thought itself; by probing the brain’s limits of processing power, we learn about the most likely calculation being used, in addi­ tion to the nature of the cognitive task being engaged: in this case, causal reasoning. The methods favored by cognitive neuroscientists involve using brain imaging methods to measure the structural and functional correlates of specific psychological events like memory retrieval, and how they explain interindividual differences in competencies like the number of items that can be remembered or the ability to inhibit attention to distract­ ing information. Simple statistical tests can be used to show a correspondence between focal brain damage and categorical deficits on very specific information processing abili­ ties. Machine learning algorithms can be used to extract complex patterns of network ac­ tivity in the brain that correspond to subtle differences in the same abilities. Reviewing the cognitive neuroscience literature on any given topic typically yields a map of brain re­ gions where changes in blood flow, electrical field, or structural integrity correspond to some psychological function of interest. It is tempting, then, to survey the cognitive neu­ roscience literature on reasoning, combine the results onto a template brain image, and declare that we have uncovered the “reasoning network.” Doing so is certainly a promis­ ing beginning to our foray into the neuroscience of reasoning, if for no other reason than to make a list of other psychological functions supported by such a network, to then be tested for their possible involvement in the reasoning process as well. However, relying Page 4 of 47

Cognitive Neuroscience of Causal Reasoning too heavily on a map of task activations from univariate neuroimaging studies will only take us so far in trying to understand the neural mechanisms of reasoning; doing so would ignore the fact that maps of brain activation can be engaged by nuisance variables or “demand characteristics” just as easily as the task of interest, even when the underly­ ing experiments were conducted with rigorous control conditions. The “reverse infer­ ence” problem inherent in trying to explain exploratory cognitive neuroscience findings is that a particular brain region or network’s ability to support a given cognitive function does not imply that there is only one function served by that region, or that the cognitive function in question is also involved any time its supporting (p. 220) regions are implicat­ ed in some other task (Poldrack, 2006). Such an assumption ignores the facts that brain regions support multiple psychological functions, and functionally different brain net­ works frequently share some nodes in common. Finally, a fundamental functional network to support some cognitive function of interest will often appear to have moved or changed in its temporal characteristics based on contextual factors other than the function of in­ terest; this insight has led to a theory of intelligence based on a single “multiple de­ mands” network that is the core driver of all facets of goal-directed or intelligent behav­ ior in humans, with differences in brain activations being attributable to demand charac­ teristics, or low-level task features like sensory modality or the extent of attention alloca­ tion required (Duncan, 2010). With the context and limitations of early cognitive neuro­ science methods in mind, we will propose programmatic research on the neural corre­ lates of causal reasoning, with particular attention paid to how we might move beyond univariate task-activation neuroimaging studies. One motivation for studying reasoning from a cognitive neuroscience perspective could be to engage in the debate between competing models, testing their predictions to offer evidence as to which models are more plausibly being implemented. However, there is a fundamental mismatch between the methods and the conceptual canons of the respective fields; the interdisciplinary intersection between the two fields is simply too immature to pursue this end. Cognitive psychology and neuroscience operate at different levels of con­ ceptual resolution, in that subtle distinctions in symbolic representations of causality have yet to be characterized at a level that can be described in terms of broader patterns of activity in neurons or networks of neurons. Even if the conceptual resolution were made equal, the most basic units of representing information in the study of causality (e.g., truth statements, negation operators) and neuroscience (e.g., action potentials, post-synaptic potentials, blood oxygen level–dependent response curves) cannot be readi­ ly translated into one another. These problems have been referred to as “granularity mis­ match” and “ontological incommensurability,” as discussed in the context of the mismatch between neuroscience and linguistics (Poeppel & Embick, 2004). Whereas Poeppel and colleagues suggest using behavior (language, in their case) as a model system to under­ stand computation in the brain, we apply their sentiment to causal reasoning: under­ standing the form of reasoning sheds light on how the brain represents information. We also believe, however, that understanding the brain’s mechanisms can offer insight into the fundamental nature of reasoning, as long as members of the separate disciplines are

Page 5 of 47

Cognitive Neuroscience of Causal Reasoning made adequately aware of their respective assumptions in trying to map findings from one field onto another. To more directly answer our final question concerning this interdisciplinary field of study, why is neuroscience evidence important to non-neuroscientists? Information processing systems can be described at three levels that were first proposed for the study of visual perception (Marr, 1982). To fully understand the system, it is advantageous to account for its properties at each level of the hierarchy, and neuroscience offers a complementary perspective to those made available by other disciplines. For any information processing system, the calculation at hand or the goal to be achieved is the computational level; rec­ ognizing objects or categorizing items to support hunting or gathering behaviors is one type of computation. The set of rules for translating input into output is the algorithmic level, in that it functions as a set of instructions that could be carried out by different peo­ ple, or with some degree of freedom. Using checklists of necessary and sufficient features to categorize objects is one algorithm; a more effective algorithm is to appeal to underly­ ing reasons for the features to define a category (Murphy & Medin, 1985). Finally, the way a physical system carries out the calculation using a particular algorithm is the im­ plementation level. Different neurons in visual cortex use changes in the rates of their spiking patterns to signal the receipt of an image corresponding to particular colors or shapes. Object recognition computer programs, on the other hand, can implement similar calculations and algorithms using a physical system involving wires and silicon wafers. What neuroscience has to contribute to the study of causal reasoning is that it is one of few disciplines poised to support a discussion on the implementation level of human rea­ soning. Those in cognitive psychology who are interested in the descriptive validity of reasoning models clearly need to understand the properties and limitations of the system that is being modeled. Even those among us who are only interested in rational models, however, would benefit from comparing them alongside the descriptive models; this is be­ cause it could be of interest to know whether (and if so, when) the solutions to reasoning problems that are naturally generated by the (p. 221) brain perform more accurately or ef­ ficiently than those using the steps prescribed by rational models. Brains were shaped by the forces of evolution that simply rewarded the solutions for problems of survival and re­ production. This process involved simple adaptations, like cellular mechanisms for resist­ ing disease, and the behavioral propensity to band together in social groups for support. It also involved the ability to not only learn which parts of the environment were safe or dangerous, but also predict whether that might change in the future on the basis of new information. This is at the heart of causal reasoning, and we aim to understand how hu­ man biology generated a solution to the need for explanation and prediction. The goals of this chapter are twofold: to survey the current states of the cognitive and neural literatures on causality while acknowledging the mismatch between methods and observations between the disciplines, and to make the case for further developments in the cognitive neuroscience approach to study reasoning, in pursuit of a program from which scholars residing in pure cognitive science and pure neuroscience will benefit

Page 6 of 47

Cognitive Neuroscience of Causal Reasoning equally as those who operate in the intersection between the fields. Toward this end, we will use the following structure: • Examine the descriptive instantiations of several rational models of reasoning, con­ sidering the predictions they might make if they were to be implemented by a neural system as currently understood. • Review the results of neuroscience experiments aimed at probing how the brain sup­ ports the concept of causality, considering whether they have any implications for the differences between purely cognitive models of causal reasoning. The first model we will consider is the mental models (MM) theory, which suggests that abstract representations of states of affairs in the world are constructed on the basis of possible co-occurrences of events that are licensed under a particular relation like “cause” or “prevent” (see Johnson-Laird and Khemlani, Chapter 10 in this volume). Men­ tal models feature deductive reasoning over fundamentally deterministic relations as the primary method of combining knowledge about separate relations to draw conclusions or generate new knowledge. The second model is the causal models (CM) theory, suggesting that causal reasoning is supported by abstract representations linking events to one an­ other as a probabilistic network that can be depicted visually with directed graphs and structural equations (see Rottman, Chapter 6 in this volume). CM theory features the rep­ resentation of probabilities and inductive reasoning as a central element of causal reason­ ing. The third model considered in this chapter is the force composition (FC) theory, in which causal relations are represented in terms of forces interacting with one another to account for the movement of a system toward or away from a particular end state (see Wolff and Thorstad, Chapter 9 in this volume). FC theory emphasizes perceptual repre­ sentations of forces that preserve the structure of the relations being symbolized. Dia­ grams depicting force vectors are thus used to describe the way individual force repre­ sentations can be combined to draw conclusions from previously unconnected relations. Causal representations in FC theory depend on an understanding of the way physical forces interact with one another, but are also flexible enough to be analogously applied to more abstract forces like emotion and interpersonal communication. Many of the behavioral predictions of each theory are identical; they converge on the same inference being drawn in a particular context, which is part of the reason for the granularity mismatch between the psychology and neuroscience approaches to modeling causal reasoning. Cognitive scientists and psychologists draw very fine distinctions be­ tween modes of thought, while cognitive neuroscientists are still building theories that map coarse concepts like causal attribution onto large-scale brain networks. The cogni­ tive theories’ departures from one another are in the predictions of how people draw in­ ferences in complicated or ambiguous scenarios, so we will focus our discussion on how people combine multiple causal relations to draw inferences about transitivity (or lack thereof). Causal reasoning itself is complex, and has presumably evolved to support rep­ resentations roughly resembling truth to be drawn from the complex and often inconsis­ tent information about dependencies between events around us. The ability to draw con­ clusions that only approximate ground truth may be adequate to learn enough about our Page 7 of 47

Cognitive Neuroscience of Causal Reasoning environment to survive and reproduce, which may account for why the descriptive com­ putational theories of reasoning depart from the rational solution to a reasoning problem at times. A great deal of research in cognitive neuroscience is undertaken with the shortterm goal of localizing behaviors to modules or networks in the brain, identifying the common and distinct (p. 222) neural correlates of dissociable psychological functions. A loftier goal of much of the same research (and perhaps a more nebulous one) is identify­ ing the underlying organizational principles that dictate how the physical properties of the brain support the cognitive architecture supporting the mind. From this perspective, understanding the physical implementation of causal reasoning in the brain can help con­ strain psychological theories to reflect the properties of the biological system supporting it (Goel, 2005). We acknowledge from the outset that the cognitive neuroscience evidence on causal reasoning, rather like the evidence from the behavioral and cognitive realms, is not free of ambiguity. Further programmatic research, making use of recent develop­ ments in neuroimaging technology, holds promise for resolving some of the ambiguity concerning mental constructs that may fundamentally feature a number of alternative modes of thought available under different circumstances. At the most general level, cognitive neuroscience evidence supports a distributed infor­ mation processing system engaged by goal-directed behavior, including such networks as a frontoparietal tract supporting attention and executive control, and frontohippocampal tracts supporting memory encoding and retrieval (Barbey, Colom, et al., 2012; Duncan & Owen, 2000; Vincent et al., 2008). Causal reasoning is likely to engage a subset of those networks in the service of goal-directed behavior, and can be subdivided into such processes as judgment or recognition of causality, prediction, and explanation. The differ­ ent psychological theories of causal reasoning each make predictions about the informa­ tion processing steps that people use when reasoning over complex sets of relations. Many of those predictions are invisible to neuroscience methods at their current concep­ tual (and temporal/spatial) resolution, but some of them have implications with respect to the likely neural correlates of reasoning behavior, selectively highlighting elements of the attention, memory, and control networks mentioned earlier. Furthermore, the nature of the causal representations themselves will also be reflected in the neural correlates of causal reasoning. Currently, it remains unclear whether causal beliefs are supported by simulation mechanisms in the brain that are specific to sensory modalities, abstract se­ mantic knowledge networks, or some combination of the two.

Psychological Theories of Causal Reasoning Statistical patterns can be used to induce a causal relation between events not previously thought to be linked, but this does not account for all instances of causal judgment. It is simple enough to agree, in the context of several decades’ worth of medical research, that “smoking causes cancer.” Describing the precise nature of the relation as it plays out in specific cases is complicated, however, when such preventing and aggravating factors as poverty, education, diet, stress, and genetic inheritance also appear to coincide with both smoking and cancer. In moving from judgment to reasoning (especially under condi­ Page 8 of 47

Cognitive Neuroscience of Causal Reasoning tions of uncertainty about the original beliefs from judgment processes), the complexity of calculating statistical dependencies rapidly increases with the number of relations be­ ing linked as people consider a chain of possibly linked events. Statistical codependency calculations thus account for even fewer instances of reasoning than they do for cases of pure judgment. Far from trying to describe the ideal way to reason about causal rela­ tions, descriptive theories of naïve causality emerged from a desire to instead describe actual causal reasoning in daily life. For drawing conclusions on the basis of some accepted set of premises (the premises be­ ing previously accepted causal relations, or a background of prior knowledge), we thus have a family of theories of reasoning, each of which proposes a special framework for creating causal representations that can be used to map evidence (in the form of events and their co-occurrences) onto novel conclusions. Before delving into the features of each psychological theory, note the absence of several influential accounts of causal judgment and reasoning in our discussion (see Ahn et al., 1995; Cheng, 1997; Tenenbaum & Griffiths, 2003, for dependency-based accounts of causal induction and belief updating). Here, we focus on the computational theories that make behavioral predictions about how people represent and combine multiple causal re­ lations to draw conclusions, especially those for which a plausible neural implementation is available based on current theory of brain function in cognitive neuroscience.

Mental Models Theory Mental models (MM) theory as a representational account of causal reasoning was in­ spired by the fact that the rules and representational scheme of formal logic result in a combinatorial explosion of additional clauses and statements that must be represented when trying to draw conclusions from some set of events or state of affairs in the world. The key example offered by its original proponents (p. 223) is that claims of the sort “one of these statements is true and the other is false” are much more complex when repre­ sented as a series of Boolean algebraic statements than they appear to be when non-logi­ cians think about them (see Johnson-Laird, 2010a, for the full line of reasoning). The al­ ternative to Boolean algebra and formal logic is a more intuitive solution, and one that resonates with common experience: people are able to compare current states of affairs with possible states of affairs that do not currently exist, while also deciding what sorts of affairs are not possible in the context of what is already believed to exist. MM theory suggests that people reason by constructing abstract mental representations that license possible co-occurrences of events or states of affairs under a given relation, like “cause,” “enable,” or “prevent” (Goldvarg & Johnson-Laird, 2001; see Cheng & Novick, 1991, for another model of the difference between cause and enable). The models are modal in the linguistic or psychological sense; the conditions specified in a particular model represent necessary (obligatory) or possible (permitted) states of affairs. Table 13.1 demonstrates an example of how “cause,” “enable,” and “prevent” relations between two events or states A and B can be depicted using models. The capital letters represent Page 9 of 47

Cognitive Neuroscience of Causal Reasoning a variable, and lowercase letters are used to represent the presence of the variable or event in question. A negation operator is used to represent the absence of the event in question. If we consider the relation between smoking and cancer, we can generate mental models that represent possible states of affairs under each type of relation. “Smoking causes can­ cer” is a general causal statement that licenses three specific possibilities: that smoking and cancer both occurred, that no smoking occurred but cancer occurred anyway due to another cause, and neither smoking nor cancer occurred. And what about the possibility that a person smoked but did not suffer from cancer? It is easy to imagine long-term smokers who never succumbed to the graver consequences typically attributed to cancer. When people are asked to explain why not, they cite preventing factors like protective ge­ netic mutations, rather than claiming that causality is inherently probabilistic. This dis­ tinction, that the meanings of causal concepts are deterministic, is a fundamental princi­ ple of MM theory. Some prior accounts of causal reasoning do not make strong claims dis­ criminating between the meanings of verbs such as “cause” and “enable” (Cheng & Novick, 1990); they both increase the probability of an effect, and are only used different­ ly based on notions of agency versus circumstance, or background conditions versus a manipulated factor. MM theory suggests that “enable” has a fundamentally different meaning, such that “smoking enables cancer” includes the possibility that smoking oc­ curred but cancer did not, instead of the possibility that cancer occurred independently of cancer. The other two individual models in the set are identical between the two con­ cepts. “Cause” thus refers to conditions of sufficiency to bring about an effect, whereas “enable” refers to a necessary condition that is not sufficient to bring about an effect on its own. “Prevent,” on the other hand, refers to mutually exclusive conditions. “Antioxi­ dants prevent cancer,” for example, allows the possibility of antioxidants being present and cancer being absent, antioxidants being absent and cancer being present, and nei­ ther antioxidants nor cancer being present. Multiple causal relations are then combined with one another by listing all of the models (possible co-occurring states of affairs) under each relation in a single set, omitting redundant models, and then removing the middle event or state to link the first and last events. For each relation in MM theory, the entire set of models allowed is collectively referred to as the fully explicit model for that term. In special cases, concepts like “cause” and “pre­ vent” can take on the strong form of both necessity and sufficiency, such that only the first and the last models listed in each column of Table 13.1 are considered part of the meaning. If, for example, smoking were the only possible mechanism of developing can­ cer, then “smoking causes cancer” would take on the strong form of “cause.” Consider substance abuse for a more realistic example: “substance use causes intoxication.” There are no other possible causes for intoxication, and substance use (voluntary or involun­ tary) must occur before intoxication (example taken from Goldvarg & Johnson-Laird, 2001).

Page 10 of 47

Cognitive Neuroscience of Causal Reasoning According to MM theory, people typically construct implicit mental models that capture only a subset of the fully explicit representations. Causal terms are usually used with one implied meaning, so conventions in communication and notions of shared knowledge lead people to assume that the first modality of each set in Table 13.1—the implicit model—is the one intended when talking about causal relations. The selection of implicit models is further supported by the principle of truth, according to which people naturally represent what is true about a given state of affairs, rather than considering the world otherwise. This bias toward representing (p. 224) true states of affairs is the basis for the prediction that combining multiple causal relations will be easier for relations that can be accurately combined while using only the implicit models; those requiring the fully explicit models and selective removal of prohibited models will be more difficult and prone to error.

Page 11 of 47

Cognitive Neuroscience of Causal Reasoning Table 13.1 Models Representing “Cause,” “Enable,” and “Prevent” Relations in Mental Models Theory A Causes B

A Enables B

A Prevents B

a

b

a

b

a

¬b

¬a

b

a

¬b

¬a

b

¬a

¬b

¬a

¬b

¬a

¬b

Letters represent the presence of an event or state characteristic of the category being represented. The negation operator ¬ repre­ sents the absence of the specified event or state, and should be read as “not-a” or “not-b.” Each column represents the fully explicit models for each relation, and the first cell in each column represents the corresponding mental model.

Page 12 of 47

Cognitive Neuroscience of Causal Reasoning For example, consider the combination of two “cause” relations, as opposed to two “pre­ vent” relations. Smoking causes lung cancer, and lung cancer causes respiratory prob­ lems. The implicit model for both relations contains all of the information needed to com­ bine them, leading to the transitive inference that smoking causes respiratory problems. Double prevention is more difficult. Healthy habits prevent lung cancer, and lung cancer prevents good health. If we hastily limit ourselves to the implicit mental models, it is tempting to simply drop the middle term—lung cancer—and draw a similar transitive in­ ference that healthy habits prevent good health. Healthy habits clearly do not prevent good health, and not only because the conclusion is inconsistent with experience. “Pre­ vent” relations are not transitive because the second instance of prevention requires the presence of an event that is absent after realizing the first prevention. The erroneous bias toward inferring transitivity has been observed in behavioral experiments among under­ graduate students, confirming the prediction of MM theory that double prevention will re­ sult in an erroneously transitive conclusion (Goldvarg & Johnson-Laird, 2001). See Table 13.2 for an example of causal relations (A causes B, and B prevents C) that can be com­ bined using mental models alone, and Table 13.3 for causal relations requiring fully ex­ plicit models to arrive at the correct solution (A prevents B, and B causes C). In summary, MM theory proposes that causal reasoning depends on multiple determinis­ tic relations—implicit mental models—that can interact to assist or prevent one another from having a particular effect. Therefore, conclusions are drawn in MM by deducing the possible states of affairs that are entailed in the combinations of such deterministic rela­ tions in the context of background knowledge.

Page 13 of 47

Cognitive Neuroscience of Causal Reasoning Table 13.2 Fully Explicit Models Representing the Combination of Two Relations (“Cause” and “Prevent”) to Yield a “Prevent” Rela­ tion Between the First and Last Terms A Causes B

B Prevents C

A Prevents C

a

b

b

¬c

a

b

¬c

¬a

b

¬b

c

¬a

b

¬c

¬a

¬b

¬b

¬c

¬a

¬b

c

¬a

¬b

¬c

The implicit mental models in the first row are adequate to draw the rational conclusion that A prevents C. Note that dropping the middle terms of the fully explicit models in the third column yields an identical set of fully explicit models linking A and C to those that link any other two prevent relations.

Page 14 of 47

Cognitive Neuroscience of Causal Reasoning

Page 15 of 47

Cognitive Neuroscience of Causal Reasoning Table 13.3 Fully Explicit Models Representing the Combination of Two Relations (“Prevent” and “Cause”) to Infer That A and C Are Not Causally Related A Prevents B

B Causes C

A Does Not Prevent C

a

¬b

b

c

a

¬b

¬c

¬a

b

¬b

c

a

¬b

c

¬a

¬b

¬b

¬c

¬a

b

c

¬a

¬b

c

¬a

¬b

¬c

Note: The absence of B does not preclude the possibility of other causes of C being sufficient. This is counterintuitive if assuming the strong version of “B causes C,” in that B is the only possible cause of C, and its prevention also prevents any of its later effects. Us­ ing only the implicit mental models in the first row of the first two columns yields this incorrect inference, which is consistent with the principle of truth and the prediction that using fully explicit models to reason places a heavier burden on working memory than most people normally use without being instructed to.

Page 16 of 47

Cognitive Neuroscience of Causal Reasoning

Causal Models Theory Causal models (CM) theory proposes that mental representations of causal knowledge re­ flect probabilistic relationships. Fundamentally deterministic relationships can be repre­ sented using causal models as well, albeit in a probabilistic mental representation due to uncertainty concerning hidden (p. 225) variables. CM theory is based on the construction of abstract mental representations and makes use of the Bayesian belief network (here­ after: Bayes Net) as a normative approach to causal induction and causal reasoning (Pearl, 2000). A Bayes Net is a directed, acyclic graph representing events and the relations between them. Events are represented as circles or nodes, and causal links are represented as ar­ rows between the nodes. “Directed” refers to the asymmetry of a causal relation; chang­ ing the status of a cause influences the status of an effect, but changing the status of an effect does not influence the status of its antecedents. “Acyclic” only refers to the fact that the networks are not used to describe closed systems or feedback loops. Each Bayes Net is also accompanied by a series of structural equations describing the relations there­ in. “A causes B” is thus represented with a cause operator “:=” as “B := A”; the positions on either side of the cause operator are not interchangeable. “Prevent” relations can be represented using a tilde “~” instead of the negation operator “¬,” such that “A prevents B” is equivalent to “A causes ~B” or “A causes not-B” being represented by “~B := A.” “Enable” relations in CM theory imply that the first event is an enabling factor that al­ lows another causal factor to have its effect. “A causes B when X enables it” is represent­ ed by “B := A, X.” See Figure 13.1 for a Bayes Net representing possible causes, en­ ablers, and preventing factors influencing influenza infection. CM theory supports predictive and explanatory reasoning by featuring mental interven­ tion as the primary mode of reasoning; variables in the model can be manipulated to take on values that vary from reality or whose status is unknown. The structural equations rep­ resenting the statistical dependency between states of connected nodes are then used to imagine how intervening on the causal model will have effects that propagate through the network. The construction of complex causal relations is thus supported by intervening on a mental causal model while also combining the structural equations corresponding to each relation in the network. One respect in which the CM theory differs from the MM theory is that it predicts that double preventions will result in the correct inference of “cause” or “allow/enable,” in­ stead of incorrectly inferring “prevent.” Although previous experiments confirmed the predictions of MM theory concerning double prevention, subsequent studies by other ex­ perimenters have found evidence supporting this prediction of CM theory (Sloman et al., 2009). The authors acknowledge, however, that subjects’ reasoning in both studies may be subject to the unintended influence of “atmosphere effects”: that a high rate of a par­ ticular answer type being correct in an experiment may lead to perseverative answering when subjects encounter more difficult trials. See Sloman et al. (2009) for a full discus­ sion of more nuanced differences in predictions distinguishing combinations of “cause” Page 17 of 47

Cognitive Neuroscience of Causal Reasoning and “allow” or “allow” and “prevent”; these distinctions are valuable in distinguishing be­ tween theories at the behavioral level, but they are beyond the scope of the current cog­ nitive neuroscience literature on causal reasoning because noninvasive brain imaging methods have not yet been used to resolve the difference in representing such similar terms as “prevent” and “cause-not” or “allow-not.”

Force Composition Theory The third model of causal reasoning we consider is motivated by an effort to ground causal representations in the physical structure of forces and events in the world (Barbey & Wolff, 2002). According to force dynamics (or force composition, FC) theory, causal re­ lations are mechanistic and represent the (p. 226) transference of a conserved quantity from cause to effect (Ahn & Kalish, 2000). Imagine, for example, a golfer who slices a ball into a tree that knocks it into the hole. Based on understanding the physical mechanisms at work, people would correctly infer that the tree caused the hole-in-one, rather than the golfer’s poor shot, even though nobody would claim that trees are generally causes of holes-in-one (Ahn & Kalish, 2000). Whereas prior mechanistic theories did not explain causal relations that take place over a distance (e.g., gravity), over large time intervals (e.g., cancer), or with abstract influences that only resemble forces (e.g., social communi­ cation), FC theory is flexible enough to account for all such features of causal reasoning because abstract forces can be represented as vectors with magnitude and direction. FC theory is also unique in that its mechanistic representations are concrete: they are grounded in tangible features of the world being simulated (see Johnson & Ahn, Chapter 8 in this volume, for current mechanistic theories).

Figure 13.1 Causal model for influenza. Flu expo­ sure causes flu infection, unless it is prevented by vaccination. Vaccination can fail, however, if a muta­ tion in the virus enables the exposure to still cause an infection despite vaccination. Note that “enable” relations imply an accessory variable, and do not necessarily over-ride “prevent” relations as depicted in this example. “Cause” relations are represented by grey edges; B :=A. “Enable” relations are represented by blue edges; B := A, X. “Prevent” relations are represented by red edges; ~B := A.

Page 18 of 47

Cognitive Neuroscience of Causal Reasoning Specifically, causal reasoning in FC theory is supported through the construction of force vectors that represent causal mechanisms between events in the context of tendencies to­ ward or away from an end state. As with free-body diagrams in Newtonian physics, force vectors are simply iconic arrows with both direction and magnitude that can also be reconceptualized as the transfer of energy (or causal influence, if you will). Force vector addition along a single axis can thus be used to characterize the overarching structure linking a series of causal relations in a chain of events, which can then be used to predict the future and explain the past by mentally changing the vectors’ directions and magni­ tudes. The first vector in a force composition diagram is that of the patient: the thing being act­ ed upon. An affector vector then represents the force imparted by the thing acting on the patient. An end-state vector is a positional vector only, representing the state of affairs being caused or prevented. Predicting the future state of the system is achieved by com­ bining the patient vector with the affector vector; if the resultant points in the direction of the patient’s end state, then the affector is said to have caused the end state. Negative relations like “prevent” can be represented by the removal of a causal force vector, or the addition of a force vector in the opposite direction from that of the end state; similarly, the removal of a preventing force vector can result in a “cause” relation (Wolff et al., 2010). Reasoning about complex relations involving multiple patients and affectors is achieved by simply adding vectors from the individual relations. A cause relation is typically repre­ sented by a diagram with the patient’s vector pointing away from the end state, the affector’s vector pointing toward it, and the difference between the two such that the re­ sultant points toward the end state. An “enable” or “allow” relation is represented by the patient’s vector pointing toward the end state, the affector’s vector pointing in the same direction, with the resultant simply being the superposition of the two. A “prevent” rela­ tion involves the patient’s vector pointing in a particular end state’s direction and the affector’s vector pointing away from it with a magnitude great enough for the resultant to point away from the end state. Figure 13.2 depicts the free-body diagrams representing “cause,” “enable,” and “prevent” relations in FC theory. When reasoning about two “cause” or “enable” relations, the resultant of the first rela­ tion becomes the affector acting on the patient in the second relation. The end state of the conclusion is taken from the end state of the second relation. Importantly, the magni­ tudes of each force vector are relative, rather than explicitly representing a probability. Some combinations of relations are able to support multiple conclusions being drawn. In those cases, probabilistic relations can be supported by integrating a mathematical func­ tion of changes in magnitude on a particular conclusion being drawn. More specifically, by incrementally changing the magnitudes of the force vectors in each premise relation, the relative frequencies of seeing each conclusion type can be calculated (see Wolff & Barbey, 2015, supplementary materials, for details on this procedure). For example, com­ bining two prevent relations (A prevents B; B prevents C) will result in a conclusion of ei­ ther A allows C or A causes C, based entirely on (p. 227) the relative magnitudes of the af­ Page 19 of 47

Cognitive Neuroscience of Causal Reasoning fector and patient vectors in each premise (see Figure 13.3). Mentally, the second premise in a double prevention must be represented before the first relation can act on it. Imagine pulling a plug out of a drain in a basin full of water. Pulling the plug out prevents it from being in the drain, and the plug being in the drain prevents the water from leak­ ing out of the basin. Pulling the plug allows/causes the water to leak out of the basin, but mentally representing this set of causal relations requires that there be some concept of water in the basin whose leaking must be allowed in the first place (see Wolff & Barbey, 2015, for this example and a full discussion of representing double prevention with force vectors).

Figure 13.2 Each vector originates from the black circle. Free-body diagrams representing different causal concepts in force composition theory. A = the affector force; P = the patient force; R = the resultant force, depicted using a stippled line; E = end-state vector, which is a position vector, not a force. Reprinted with permission from Wolff & Barbey (2015).

Note that abstract causes like those involving interpersonal communication (e.g., a com­ pliment causes a student to feel good, and criticism causes a student to feel bad) can also be represented visually using vector arrows that point toward or away from the end state. The underlying mental representations need not be arrow-based graphs, however, and can still involve the drawing of causal inferences by superimposing such abstract “forces” onto one another.

Neural Implications of Causal Reasoning Mod­ els The reviewed computational models of causal reasoning make alternative claims about the cognitive representations and processes that support causal inference. Each theory further motivates alternative predictions about the neurobiological bases of causal rea­ soning and can be evaluated in light of the emerging neuroscience literature on causal perception, judgment, and reasoning. Before turning to a discussion of the different neural implications of each theory, we note commonalities across the reviewed frameworks. Indeed, any model of causal reasoning requires that some information about events and their relations be represented in the mind as a conscious thought. This will require attention mechanisms in the brain to direct one’s internal focus toward the piece of information being considered or manipulated; the parietal lobe is typically thought of as a major supporter of attention mechanisms, but Page 20 of 47

Cognitive Neuroscience of Causal Reasoning modality-specific attention mechanisms are also known to engage perceptual processing substrates in the occipital, temporal, and frontal lobes (Smith et al., 2013). The frontal eye field in the posterior frontal lobes, for example, supports eye movement and is in­ volved in directing visual attention; dorsolateral prefrontal cortex (anterior to the frontal eye fields) assists with selective attention, or the ability to attend to some features while suppressing attention to irrelevant ones, along with the many other functions it supports. Causal reasoning will also require the engagement of memory systems, both to retrieve semantic and episodic memories from previous experience, and to support the ongoing availability of multiple pieces of information during the reasoning process. Memory en­ coding and retrieval processes are known to rely heavily on subcortical features of the medial temporal lobe, but memory storage and simulation of previous experiences en­ gage the frontal lobes and posterior primary sensory-processing networks as well (Buckn­ er et al., 2008). Lastly, causal reasoning will require the so-called executive control mechanisms in the brain that allow the manipulation of information and selective activation and inhibition of the involved attention and memory processes (Miyake et al., 2000). Executive control processes reliably engage a frontoparietal network in the brain (Banich, 2009; Vincent et al., 2008). Taken together, the three broad constructs supporting attention, memory, and executive control processes would suggest that the entire brain is involved in causal reasoning. Where we can separate the component processes from one another, (p. 228) and perhaps separate necessary and sufficient brain networks from those that are simply correlated, is in the predictions made by each of the theories of causal reasoning discussed here, and how they map onto discrete components of the constructs outlined earlier. The greatest distinction between the descriptive theories of causal reasoning is that they emphasize different underlying modes of information processing. These processing modes can be used to predict the large-scale networks that should be engaged in the brain beyond par­ ticular subregions in the prefrontal or medial temporal cortex, which we outline here.

Page 21 of 47

Cognitive Neuroscience of Causal Reasoning

Figure 13.3 Force composition allows the combina­ tion of force vectors representing separate causal re­ lations to draw conclusions that are not explicitly represented in the set of causal premises. Combining two CAUSE relations results in the system moving to­ ward the end state of the second relation. Combining two PREVENT relations similarly results in the sys­ tem moving toward the end state in the second rela­ tion, with the major difference between the two be­ ing the original tendency for the system toward or away from the end state. Reprinted with permission from Wolff & Barbey (2015).

Mental Models Theory The key to predicting the neural substrates of causal reasoning on the basis of MM theory is the fact that it is based on an abstract, modal code representing possible states of af­ fairs entailed by a causal relation, and uses deductive inference to draw conclusions from multiple relations. The neural correlates of deductive reasoning have been characterized in prior work as a system of modular brain networks (Goel, 2005). Deductive inference from syllogisms— reasoning from accepted premises to a conclusion guaranteed on the basis of the premis­ es—typically engages a network of frontal and temporal regions. Hemispheric lateraliza­ tion of deductive reasoning has been observed, but different experimental approaches have resulted in a diverse pattern of findings. For example, studies comparing syllogistic and spatial reasoning against a comprehension control condition revealed a left-sided frontal and temporal reasoning network (Goel et al., 1998). Evidence further indicates that deductive reasoning is localized to the left hemisphere when contrasted against an inductive reasoning task (Goel & Dolan, 2004). Others studies report different cross-hemispheric dissociations of deductive and inductive reasoning when directly contrasting the two against each other (Osherson et al., 1998). It initially appeared that deductive reasoning engaged a right-sided pattern of parietal acti­ vation, which would be consistent with visual and spatial representations being the pri­ mary mode of deduction (Osherson et al., 1998). However, when using stimuli that are more easily represented using propositional logic than spatial models (or at least an ab­ stract code), another division of labor emerged; deductive reasoning engaged a rightsided frontal and temporal network (specifically, middle temporal lobe and ventrolateral prefrontal cortex), while inductive reasoning engaged a left-sided frontotemporal network (specifically, ventrolateral PFC, dorsomedial PFC, insula, posterior cingulate, and the me­ dial temporal lobe) (Parsons & Osherson, 2001). Page 22 of 47

Cognitive Neuroscience of Causal Reasoning Some of the differences in results across studies can be attributed to task-specific en­ gagement of brain networks beyond the core reasoning components of a particular task; syllogistic reasoning about unfamiliar information engages perceptual processing regions in the parietal lobe as well, while reasoning about transitive relations concerning familiar semantic content additionally engages medial temporal structures such as the hippocam­ pus and parahippocampal gyrus (Goel, 2007; Goel et al., 2000). Further research is neces­ sary to characterize the different contributions of stimulus-specific activation and the true correlates of a core reasoning network, but it is most likely that deduction engages the frontal and parietal cortex, with some right-hemispheric specialization for both spa­ tial and deterministic reasoning. Proponents of MM theory have described it as an iconic, visuospatial theory of reasoning (Johnson-Laird, 2010b). It employs representations that are iconic in the sense that they preserve the structure or order of events in the world without taking the form of a senso­ ry representation that is isomorphic with the events or objects in the world; events pre­ ceding each other in the world are represented by mental models that similarly precede one another in the same order, for example. MM theory is a visuospatial theory of reason­ ing in that its models can take the form of spatial diagrams. Just as categorical syllogisms can be represented visually with Venn diagrams or Euler circles, the possible events that are entailed by a particular causal concept can be represented in the mind using spatial models and set theory notation. MM theory does not claim, however, to feature mental simulations grounded in the senso­ ry modalities (in this sense, we use “modality” to refer to a particular type of sensory in­ put, rather than its use in modal logic to refer to possibility; any further reference to this meaning will be called “sensory modality” rather than “logical modality”). Whereas de­ ductive inference appears to rely on the left frontal lobes in the brain, the manipulation of spatial objects using action representations is considered to rely on the parietal lobes (O’Reilly, 2010; Ungerleider & Mishkin, 1982). Human lesion evidence indicates that the right hemisphere is selectively engaged by spatial reasoning, with right posterior cortical lesions (to the parietal, occipital, (p. 229) or posterior temporal lobes) conferring a marked deficit in the mental manipulation of spatial representations, particularly for mental rota­ tion of visual objects (Ratcliff, 1979). Neuropsychological evidence from split-brain pa­ tients (whose hemispheres have severely limited communication with one another after surgical resection of the corpus callosum), in addition to neuroimaging evidence measur­ ing the activity in healthy subjects’ brains, suggests that the right hemisphere is the seat of a visuospatial “interpreter” of sorts; both hemispheres are able to perceive visual and spatial information for simple tasks like object identification, but the right hemisphere ap­ pears privileged in complex spatial reasoning (Corballis, 2003). Right-hemisphere domi­ nance has thus been suggested as a possible organizing scheme in the brain’s implemen­ tation of causal reasoning (Johnson-Laird, 1995). Recent evidence confirms that posterior cortex plays a role in spatial intelligence, but the picture is now more complicated than the right-hemisphere hypothesis suggested previ­ ously. Bilateral parietal cortex was engaged by a task requiring spatial discrimination in a Page 23 of 47

Cognitive Neuroscience of Causal Reasoning neuroimaging study, while mental rotation of spatial objects engaged only the left inferior parietal cortex and right subcortical nuclei (Alivisatos & Petrides, 1997). Another study found bilateral parietal activation during mental rotation of a variety of visual objects (Jordan et al., 2001). The original right-hemisphere reasoning hypothesis (Johnson-Laird, 1995) has thus been updated to suggest instead that the visuospatial processing system in the brain—including primary visual cortex in the occipital lobe and the correlates of higher-order cognitive manipulation in parietal cortex—forms the core of a reasoning net­ work (Goel, 2005). Evaluating this prediction in light of the neuroscience evidence on rea­ soning about syllogisms (i.e., explicitly focusing on deductive reasoning) confirms that some types of reasoning engage a visuospatial system, but a number of experimental ma­ nipulations also yield engagement of other systems in the brain, suggesting that a dualprocess framework of belief activation and evaluation of evidence might be the most ac­ curate way to describe the neural correlates of deduction (Fugelsang & Thompson, 2003). An intuitive system involving the neural correlates of emotion, language, and memory is engaged by the use of heuristics and biases from prior beliefs when reasoning about fa­ miliar premises and evidence that is consistent with previous experience, whereas a slow­ er, reflective system involving visuospatial manipulation is engaged by reasoning about unfamiliar premises or those involving a conflict between evidence and belief (Goel, 2005). The intuitive system includes ventromedial prefrontal cortex (vmPFC), medial tem­ poral lobe (MTL) memory-processing structures, and distributed temporal lobe structures for supporting conceptual coherence and the implementation of language rules. The re­ flective reasoning system includes bilateral parietal cortex. Whereas other theories of reasoning predict counterfactual inference and the manipula­ tion of mental representations as a major component, the principle of truth in MM theory —that it is easier to represent what exists by using mental models than what does not ex­ ist in fully explicit models—suggests that people will only engage in counterfactual rea­ soning when necessary, instead giving a central role to only the maintenance component of working memory in support of tracking multiple possibilities. This process engages the vmPFC, and could explain why people intuitively preferred calling a double-prevention re­ lation a transitive prevention in some previous experiments (Barbey et al., 2011). This would be consistent with the dual-process hypothesis that causal reasoning includes both intuitive judgments (with simple mental models that are consistent with prior beliefs) re­ lying on the vmPFC and the combination of mental models to draw more complex conclu­ sions using the parietal lobes for visuospatial manipulations (Goel, 2005). Such a pattern has been observed in studies focusing on deductive reasoning. Seeing a similar pattern in reasoning about causal relations that are not explicitly deductive would be consistent with deductive reasoning and mental models being descriptively valid in the context of complex causality, but would not rule out the possibility that deductive reasoning is pri­ marily causal, instead of causal reasoning being fundamentally deductive. Recent developments in MM theory go beyond visuospatial manipulation, however, sug­ gesting that the core features of the theory can also be mapped onto other brain struc­ tures (Khemlani et al., 2014). Specifically, the reflective system in the dual-processing Page 24 of 47

Cognitive Neuroscience of Causal Reasoning framework mentioned earlier should be expanded beyond the parietal lobes to include the prediction that reasoning over mental models will engage the lateral prefrontal cortex (lPFC). Recall that mental models are modal in nature; they license possible states of af­ fairs. The encoding of stimulus–response rules has been mapped to populations of neu­ rons in dorsolateral PFC (dlPFC) (Mian et al., 2014); these stimulus–response mappings could be interpreted as an action-based instantiation of a (p. 230) more general mecha­ nism in the lPFC for supporting mental models that link events in space and time. Al­ though mental models are not iconic in perceptual form, they have been called iconic in that they preserve the relative structures of objects in space, events in time, and mem­ bers of abstract sets to each other (Johnson-Laird et al., 2004). Subdivisions of lPFC are recognized as signaling object and concept representations that could plausibly support the maintenance of abstract sets of objects and events in the mind. Finally, the principle of truth suggests a natural preference for representing true states of affairs, rather than the alternative possibilities that can be imagined from a fully explicit set of models. The ability of prefrontal neurons to maintain stimulus-related activity in the absence of a stim­ ulus has long been a central principle of understanding the PFC (Fuster, 1989). This fea­ ture is also the basis of the guided activation model of the PFC, a process model in which the PFC primarily serves a control function, selectively activating or inhibiting different stimulus–response mappings according to the contexts that dictate when they are appro­ priate or not (Miller & Cohen, 2001). The fact that the PFC (lPFC in particular) must as­ sign task relevance to some representations over others, and sustain attention to them in the service of goal-directed behavior, has been suggested as also supporting belief-orient­ ed processes like reasoning (Khemlani et al., 2014). Under this view, sustaining attention to a task-relevant set of representation and discarding distractors is analogous to sustain­ ing attention to a model of what is true while disregarding alternative possibilities. Note that a role being played by lPFC in reasoning is not unique to MM theory. The neur­ al correlates of cognitive flexibility feature prominently in this hypothesis, but alternative hypotheses emphasizing causal models or force representations would presumably appeal to cognitive flexibility as well (Barbey et al., 2013). What is specific to MM theory among the other models discussed here is its suggestion that a combination of abstract models and symbolic manipulations (e.g., truth statements and negations) is part of causal rea­ soning. Lateral PFC supports abstract symbolic manipulations of information being held in working memory, including information that is not primarily a sensory mapping or sen­ sory reconstruction of the world (Khemlani et al., 2014; Ramnani & Owen, 2004; Tetta­ manti et al., 2008). Together, the neural implications of MM theory can be summarized as the engagement of a reflective reasoning system including the right lateral prefrontal cortex and right pari­ etal cortex, with the interaction between prior beliefs and evidence supported by bilateral ventromedial prefrontal cortex and frontotemporal memory systems.

Page 25 of 47

Cognitive Neuroscience of Causal Reasoning

Causal Models Theory One advantage of CM theory is that its implementation would place less of a demand on the number of slots available to working memory processes (or bits of information that can be represented). The combinatorial explosion characteristic of purely statistical ac­ counts of causal judgment and reasoning is not entirely escaped by MM theory, when multiple relations require the expansion of mental models into their fully explicit models. When people reason correctly about the combination of two prevent relations into a sin­ gle allow or cause relation, a graphical Bayes Net would place less of a demand on the limits of short-term memory, while also supporting the manipulation of variables’ values and edges’ directions to recognize the lack of transitivity. This process of intervention, central to the CM approach, requires the manipulation of information in working memory, and selectively relies on the dlPFC (Barbey, Koenigs, et al., 2012). Furthermore, CM theory flexibly supports the representation of either a probabilistic or deterministic worldview, such that probabilistic data about a relation can be interpreted just as easily in terms of alternative causal nodes as in terms of a fundamentally stochas­ tic relation. The neural correlates of inductive and deductive reasoning, however, remain to be well characterized (see the section “Mental Models Theory”). As mentioned previ­ ously, some studies find that induction is specific to the left PFC, with deduction being lo­ calized to the right hemisphere (Parsons & Osherson, 2001), whereas others find that in­ duction and deduction are both left-lateralized, with ventral selectivity for deduction and dorsal selectivity for induction (Goel & Dolan, 2004). Other neuroimaging studies from the decision-making literature, rather than those directly bearing on causal reasoning, have identified an uncertainty monitoring network engaging the PFC, parietal and insular lobes (Huettel et al., 2005). The neural correlates even appear to change according to the type of probability being represented; ventromedial PFC, insula, amygdala, and putamen are increasingly activated when judging on the basis of uncertain prior probabilities, and activation in posterior occipital cortex scales up according (p. 231) to increasingly uncer­ tain conditional probabilities (Vilares et al., 2012). A key feature of CM theory is that it supports intervention—that is, causal models can be manipulated to reflect some alternative, or “counterfactual,” state of affairs. This is known as counterfactual reasoning. Neuroscience theories of counterfactual reasoning suggest that the medial PFC plays a key role in imagining alternative states of affairs, with different regions supporting different types of manipulation (Barbey et al., 2009). The vmPFC, often associated with value assignment and motivational representations, supports counterfactuals that differ in valence of value (better or worse than reality). The dorsal medial PFC (dmPFC) supports the distinction between action and inaction. Accord­ ing to this view, the dmPFC supports a general mechanism for representing states of af­ fairs that do not currently match reality. This view is consistent with other accounts of the dmPFC (including the cingulate gyrus, a fold in the frontal cortex beneath the outermost layer) that assign to it a central role of representing prediction error or conflict between expectancies and reality (Botvinick et al., 2001). Counterfactual reasoning will also be supported by the working memory mechanisms described earlier, by allowing multiple al­ Page 26 of 47

Cognitive Neuroscience of Causal Reasoning ternative states of affairs to coexist while a particular state is being modeled and manipu­ lated. The concurrent maintenance of multiple goals or action outcomes has been mapped to the frontopolar cortex, another name for the most anterior region of the PFC, with the two hemispheres dividing the labor in a task-switching paradigm (Charron & Koechlin, 2010). On the basis of cognitive neuroscience theory concerning the functions that comprise rea­ soning through the use of Bayes Nets, the neural implementation of reasoning according to CM theory would engage neural correlates of counterfactual reasoning, working mem­ ory manipulation (rather than pure maintenance), probability judgments, and explanatory reasoning to resolve uncertainty. We would thus expect to see a primarily left-hemispher­ ic frontotemporal network supporting causal reasoning.

Force Composition Theory The force representations in force composition theory are based on iconic perceptual codes in that their organization reflects the structure of the relations being represented (Wolff & Barbey, 2015). By analogy, a subway tunnel map is an iconic representation of the true physical structure that preserves topology (while sacrificing topographical accu­ racy). If the iconic character of perceptual codes in causal reasoning is limited to their or­ ganization in relation to one another as represented in visual free-body diagrams, then the process of reasoning from causal premises to a conclusion would primarily engage neural mechanisms for constructing and manipulating symbolic visuospatial models, rather than isomorphic representations that reflect the details of actual causal relations. The superior parietal lobe serves as the node within a larger frontoparietal working mem­ ory network that supports the manipulation of visuospatial models (Koenigs et al., 2009). On the other hand, if the iconic nature of force representations goes as far as imagining visual re-creations of the way forces interact with one another in real life, then we would expect to see involvement of more ventral, modality-specific neural correlates in the con­ struction of simulations that are then “watched” in the mind’s eye to predict how an imagined causal system would behave (Patterson & Barbey, 2012). Specifically, the occipi­ tal and parietal cortex support the construction of visual simulations; premotor cortex, temporal cortex, and occipital cortex support representations of action and biological mo­ tion (Patterson & Barbey, 2012; Paus, 2005; Schacter et al., 2007). Biological motion may be particularly important to the construction of causal force representations involving agency, because the features discriminating biological motion as detected by more re­ cently evolved anterior structures in the brain from more primitive motion detectors in early visual processing are not purely structural. The movement of articulated joints and facial features are processed differently than pure motion because they signify the coher­ ent, animated activity of an organism toward some goal or away from some consequence. This could be considered a very rudimentary form of causal judgment, and it could con­ ceivably support causal reasoning through the use of mental simulations. If so, we would expect to see the neural correlates of agency and intention featuring prominently in causal reasoning neuroscience experiments: predominantly the middle temporal and me­ Page 27 of 47

Cognitive Neuroscience of Causal Reasoning dial superior temporal regions (typically designated by landmarks at the posterior end of the superior and inferior temporal sulci) (Grossman et al., 2000). In summary, the neural correlates of force composition would primarily involve modalityspecific engagement of sensory processing networks in occipital, parietal, and posterior temporal cortex (p. 232) to support the creation of an iconic mental simulation, with rightsided parietal lobe engagement to support the manipulation or “running” of the simula­ tion.

Review of the Cognitive Neuroscience Evidence on Causal Reasoning This section will review the results of cognitive neuroscience research using experiments that involve thinking about causal relations (see Figure 13.4 and Table 13.4 for a summa­ ry of the main findings from the fMRI studies on causal judgment and reasoning dis­ cussed here). The vast majority of cognitive neuroscience research on causal relations is focused on causal judgment, or inductively concluding that a causal relation exists. Here we distinguish causal judgment from the type of causal reasoning that the psychological models describe in greater depth: the combination of previously induced causal relations to infer a larger causal relation that has not been directly observed. Although judgment and reasoning can be thought of separately in this manner, the major claims of each theo­ ry can be applied to causal judgment as well. Causal judgment is a form of causal reason­ ing, in that it involves transforming one form of knowledge (the perception of events oc­ curring together) into another (that the events are linked by some generative mecha­ nism). Further neuroimaging research should focus on causal reasoning over multiple complex relations, such that the behavioral models of causal reasoning can be directly mapped onto neuroscience models of network activity, rather than indirectly inferred, as in this chapter.

Causal Judgments Concerning Physical Events To study the neural mechanisms in the brain supporting the perception of causal relations among physical objects interacting with one another, study participants are asked to dis­ criminate between causal and non-causal event chains in a series of videos or vignettes of objects colliding. Michotte’s “launching events” are the original and most frequently used (p. 233) version of this testing paradigm, involving of billiard balls colliding with one an­ other (Michotte, 1963; White, Chapter 14 in this volume).

Page 28 of 47

Cognitive Neuroscience of Causal Reasoning

Figure 13.4 Brain activation foci from fMRI studies on causal judgment and reasoning. The patterned spheres each represent the peak voxel in an activa­ tion cluster resulting from a linear contrast or re­ gression model. Each pattern (or shades of grey) rep­ resents a different study. Spheres with dot patterns represent studies using Michotte-style collision stim­ uli. Spheres in shades of grey represent studies us­ ing social or interpersonal causal attribution stimuli. Spheres with lined patterns represent studies using abstract or verbal causal task stimuli. Individual clus­ ters with fewer than 10 voxels are excluded from this figure, as are cerebellar clusters and most subcorti­ cal clusters.

Table 13.4 fMRI Studies on Causal Judgment and Reasoning Study Name

Stimuli

Contrasts of Interest

Fonlupt, 2003 (white sphere)

Physical colli­ sions

Causality > movement

Fugelsang et al., 2005 (light grey sphere)

Physical colli­ sions

Causality > causal violations

Straube and Chatterjee, 2010 (dark gray sphere)

Physical colli­ sions

Activation increasing with sensitivi­ ty to violations of causality

Woods et al., 2014 (black sphere)

Physical colli­ sions

Causal judgment (causal and noncausal events) > resting baseline

Fugelsang and Dunbar, 2005 (vertical ruled lines sphere)

Simulated med­ ical treatment data

Plausible > implausible medical ex­ planation

Page 29 of 47

Cognitive Neuroscience of Causal Reasoning Data inconsistent with plausible ex­ planation > data consistent with plausible explanation Data consistent with plausible ex­ planation > data inconsistent with plausible explanation Satpute et al., 2005 (horizontal ruled lines sphere)

Verbal pairs

Causal relationship > non-causal association

Blackwood et al., 2003 (dots sphere)

Vignettes

Internal > external attribution Self-serving bias > self-critical bias Self-critical bias > self-serving bias

Harris et al., 2005 (dashed spehere)

Vignettes

Individual attribution > general at­ tribution Social attribution > general attri­ bution

One of the earliest neuroscience studies of causal judgment involved comparing two con­ ditions: the first in which people were instructed to judge whether an event was causal or not, and a second in which they were instructed to judge which direction a particular ball moved (Fonlupt, 2003). Within each condition were two event types: a causal event in which one ball strikes another to launch it, and a non-causal event in which one ball sim­ ply passes a stationary ball without contacting it. The reason for this 2 x 2 study design was to settle a controversial question concerning the nature of causal judgments: Is there an automatic “cause-perceiving” module in the visual-processing regions of the brain, similar to feature detectors or ensembles of neurons that respond to specific shapes in specific orientations? Or is causality something that must be inferred by putting together the components of an image or vignette? By showing people videos of causal and noncausal events while asking them to alternate being attending to causal status or lowerlevel physical features, it is possible to test whether the “neural signature” of causal per­ ception is activated any time a causal event is viewed (regardless of attention and intent), or is only activated when trying to make a causal judgment. This study found no differ­ ence in brain activation between viewing causal and non-causal events within a particular block of the experimenters’ instructions, but the process of trying to making a causal judgment, regardless of the stimulus’s features, activated the medial prefrontal cortex when compared with the lower-level perceptual process of judging motion. There are a number of ways to interpret this finding, but the most commonly accepted one is that causal judgment is not an automatic result of low-level perceptual features activating a causality-specific module; even simple causal judgments are the result of a PFC-mediated conscious process in the brain.

Page 30 of 47

Cognitive Neuroscience of Causal Reasoning Another study using Michotte’s launching events focused only on the process of making causal judgments about events in which a moving ball approaches another ball before stopping and launching a stationary second ball (Fugelsang et al., 2005). The goal of this study was to investigate the neural mechanisms that enable people to use features like spatial and temporal contiguity between events to infer causality. The authors manipulat­ ed the stimuli such that the non-causal events still involved collisions, but included some violation of the normal rules of physics. The spatial gap condition involved the first ball coming to rest before making contact with the other ball, and the other ball beginning to move without having been touched by the first; if this happened in real life, we would as­ sume some force other than the collision accounted for the second ball moving, even if somehow related to the first ball’s movement (e.g., electric repulsion between subatomic particles with the same charge). The temporal gap condition involved the first ball collid­ ing with the second, but the second ball only beginning to move after a delay of several seconds. (p. 234) When comparing the neural activations of judging causal events with judging events featuring causal violation, the authors found neural activation in the right middle frontal gyrus (in the prefrontal lobe) and right inferior parietal lobule (in the pari­ etal lobe). Comparing the causal condition to only the temporal delay condition selective­ ly activated the right inferior parietal lobule. Comparing the causal condition to only the spatial gap condition activated the right middle temporal gyrus. Together, these results indicate that there is a predominantly right-hemisphere network for perceiving causality from physical events, with the parietal lobe being particularly sensitive to detecting spa­ tial contiguity (or inactivation by temporal discontiguity) and the temporal lobe being sen­ sitive to detecting temporal contiguity between events (or inactivation by spatial disconti­ guity). The intriguing nature of the causal violation studies motivated another neuroimaging ex­ periment using launching events. This study used materials in which causal violations were not in discrete categories unto their own, but instead involved gradually introducing violations of causality such that they were barely noticeable at first and only slowly be­ came more extreme in two domains (Straube & Chatterjee, 2010). The temporal delay do­ main involved increasing the increment of time between the collision and the launching of a second ball, starting at zero. The spatial domain involved increasing the incident angle of the second ball’s trajectory, starting at zero degrees from horizontal, until the balls be­ gan moving away at 90 degrees from the direction in which they were hit. Participants were asked to judge whether the collisions and launchings were causally linked while having the BOLD response in their brains imaged in an MRI (magnetic resonance imag­ ing) scanner. Treating the conditions as categorical in the first analysis (causal vs. noncausal) resulted in no difference in neural activation between judging causal and noncausal stimuli. Instead, a general causal judgment network was engaged by either condi­ tion when comparing the neural activity against a resting baseline with no visual stimula­ tion; it included large areas of the occipital, parietal, and frontal lobes. This is consistent with the earlier findings on causal judgment and perception of motion. The absence of a clear distinction between conditions could have been an artifact of there not being a clear boundary between causal and non-causal events in the study stimuli, however. Crucially, Page 31 of 47

Cognitive Neuroscience of Causal Reasoning not all participants responded equally to the violations of causality. People who were more sensitive to temporal delays had greater activity in the left putamen, a subcortical structure associated with controlling movement and the construct of implicit memory, in­ cluding motor and procedural memory. Individuals who were more sensitive to spatial vio­ lations showed greater activation in the right post-central gyrus and the right parietal lobule. The post-central gyrus is also known as the location of the sensory homunculus, allowing localization of touch, pain, proprioception, and other aspects of bodily states. The parietal lobule is known to support spatial mapping and manipulation of objects in space. These findings suggest the emergence of a domain-general causal network sup­ porting perceptual causal reasoning about physical events, with some specific nodes se­ lectively active in the processing of particular features of causal relations, namely spatial and temporal violations of expectation. A subsequent study used the same stimuli: launching events with increasingly extreme vi­ olations based on temporal delays and angles of impact. The key manipulation in this study was that participants were alternately instructed to focus only on one domain or the other—spatial or temporal contiguity—while making causal judgments (Woods et al., 2014). As with the previous neuroimaging studies, no difference was seen between the neural activations of responding to causal events and non-causal events. Comparing the general mental state of causal judgment against a resting baseline revealed activation in a largely right-sided network involving the right inferior and middle temporal gyrus, right lingual gyrus, right caudate, bilateral putamen, bilateral insula, right parietal cortex, mid­ dle frontal gyrus, and bilateral cerebellum. When participants were asked to focus on the spatial properties of the events, those who were more sensitive to violations had in­ creased activity in the bilateral inferior frontal gyrus, bilateral inferior parietal cortex, and right superior parietal cortex. When asked to focus on time, the participants who were more sensitive to violations had greater activation in the right hippocampus and the bilateral vermis of the cerebellum. A confirmatory follow-up study used transcranial di­ rect-current brain stimulation (tDCS) to test whether the responses to causal violations can be made more sensitive by selectively stimulating single brain regions from among those revealed in the fMRI analysis. tDCS is believed to have an effect by lowering the threshold for neuronal firing in the brain regions activated in the fMRI experiment, rather than actively causing neuronal firing (p. 235) (see (Nitsche et al., 2008) for a review of hy­ potheses concerning the effects of tDCS). Specifically, the authors used anodal stimula­ tion of the right hemisphere, comparing three conditions: frontal lobe stimulation, pari­ etal stimulation, and sham stimulation as a control condition. Frontal stimulation in­ creased sensitivity to violations of either spatial consistency or temporal contiguity. Pari­ etal stimulation increased sensitivity to violations of spatial contiguity only. Together, the fMRI and tDCS findings provide evidence for a right-sided network suggesting frontal support for general perceptual causality, and parietal sensitivity to the spatial properties of events and relations between them.

Page 32 of 47

Cognitive Neuroscience of Causal Reasoning

Complex Judgments of Abstract Relations We can imagine any number of complex events that cannot be adequately explained or predicted in terms of visuospatial representations of physical collisions. Judgments of agency and intentionality in a conversation between friends or enemies, for example, re­ quire more nuanced explanations involving current evidence (statements), prior knowl­ edge (personality traits and the likelihood of their causing particular statements), and an understanding of the fact that tone and context influence meaning as much as the seman­ tic content of a conversational exchange. By using vignettes with people interacting with one another, or descriptions of events that need to be explained or predicted in the con­ text of prior experience, the neural correlates of complex causal judgment and reasoning can be explored in comparison with the neural framework for supporting physical causali­ ty. As alluded to by our prior examples of reasoning about the relations between smoking and lung cancer, the field of medical diagnostics and treatment planning is an area rich with opportunity for studying how people understand, represent, and manipulate complex causal networks. One fMRI study was designed to explore the neural correlates of inter­ actions between evidence and prior beliefs, especially as it pertains to the plausibility of a causal mechanism having the effect being predicted or explained (Fugelsang & Dunbar, 2005). Specifically, it measured the neural correlates associated with judging the efficacy of two treatments for depression (either a plausible pharmacological treatment or place­ bo) in the context of two statistical patterns (low rate of treatment success, or high rate of treatment success). Participants were asked to decide how effective each of the treat­ ments was in predicting happiness after having seen 20 individual trials of each condition (each trial being a hypothetical patient who was administered the drug and either re­ sponded or failed to respond to the treatment). The authors correctly predicted that causal attributions would be highest in the condition featuring a plausible mechanism (not placebo) and high treatment response rates. The use of correlation data alone in causal judgment, as simulated by the placebo without a plausible mechanism, appears in­ adequate to infer a causal mechanism. Activity in several brain regions was observed when comparing the BOLD response corre­ sponding to the different study conditions. First, the authors compared the consideration of plausible theories with implausible theories, regardless of treatment response rates. The left inferior frontal gyrus, right superior frontal gyrus, and primary visual cortex were all activated when contrasting participants’ consideration of plausible theories against their consideration of implausible ones. The authors suggest that this provides ev­ idence for the involvement of working memory, executive control, and visual attention mechanisms that have been previously attributed to these regions. Within each plausibili­ ty condition (medicine or placebo), different activations were also seen when directly con­ trasting the blocks with treatment response rates that are consistent or inconsistent with prior knowledge. Treatment response rates that are consistent with prior beliefs about the causal mechanism (treatment success after medicine) selectively engaged the left parahippocampal gyrus (PHG) and right caudate nucleus, whereas data conflicting with Page 33 of 47

Cognitive Neuroscience of Causal Reasoning the plausible mechanism (taking medicine, but no response) selectively engaged the right cingulate gyrus, left dorsolateral prefrontal cortex (dlPFC) and left superior parietal lobe. Surprising treatment rates were generally disregarded in the implausible, or placebo, condition. The fact that medial temporal lobe structures (of which the PHG is one) are overwhelmingly implicated in the processing of episodic memory and semantic knowl­ edge suggests that memory representations are retrieved when considering the plausibili­ ty of a theory; the error-monitoring and conflict-monitoring role often attributed to the cingulate cortex and dlPFC suggests that inconsistent data in the context of a plausible mechanism are manifested as a prediction error of sorts. Greater activation of the left hemisphere during conflicts between theory and evidence provides support for the coun­ terfactual or causal models approach to causal reasoning, and right-sided frontal activa­ tions while participants evaluated evidence consistent (p. 236) with an implausible theory could support an inhibitory or conflict-monitoring function under any of the theories of causal reasoning. It is worth noting here that the multiplicity of functions supported by a given structure or network in the brain renders the reverse inference approach to understanding brain func­ tion (inferring which functions or calculations are being used to complete a task on the basis of seeing a particular neural pattern of activation) less than exhaustive (Poldrack, 2006). The fact that a particular brain region has been associated with some known func­ tion in the past does not necessitate that it must fulfill the same function in all subse­ quent tasks that engage it; many brain regions have the ability to support more than one function. This is not a criticism unique to the methods reported here, and is instead a lim­ itation inherent to exploratory analyses in cognitive neuroscience. Still, the reverse infer­ ence approach is a reasonable starting point for hypothesis development in a field as young as the cognitive neuroscience of causal reasoning. This is especially the case when dealing with such reliably observed function-mappings as medial temporal lobe memory processing, and executive control in dlPFC.

Social Causal Reasoning Social and emotional intelligence are burgeoning areas of study in the cognitive and neur­ al sciences (Hilton, Chapter 32 in this volume). An open question concerning the struc­ ture of social intelligence and its relation to other constructs of human brain function is whether it is a unique set of faculties specific to processing social and emotional informa­ tion, or simply an aspect of general intelligence that emerges when the content of repre­ sentations being supported happens to feature social and emotional information. Identify­ ing the neural basis of social cognition will not end the debate, but the fact that some studies reveal an independent set of competencies specialized for social cognition sug­ gests that there should be unique neural contributions to such competencies. Explaining and predicting the behaviors of others (and our own behaviors) are two processes requiring the representation and manipulation of causal relations involving sta­ ble attributes, intentional states, and actions of people as they interact with one another. By asking people to make judgments about the actions of others while being imaged in an Page 34 of 47

Cognitive Neuroscience of Causal Reasoning fMRI study, several studies have begun to separate the layers of social causal cognition into the types of information that are central to this type of reasoning beyond physical collisions or impersonal probabilistic events. One study examined the types of information that influence how we generate causal ex­ planations for human behavior. People are presented with vignettes, and are asked to draw one of four conclusions: the behavior was due to the main actor’s characteristics; the behavior was due to the characteristics of another person in the exchange; the behav­ ior was due to impersonal contextual factors; or it was due to some combination of the three (Harris et al., 2005). According to the authors, people tend to make attribution judgments concerning an actor’s behavior on the basis of information about consensus (whether other people act similarly), distinctiveness (whether the behavior is specific to a particular object, or all members of a target category), and consistency (whether the be­ havior is reliably seen by this actor). People are also most likely to attribute an action to a person’s individual characteristics when consistency is high, but consensus and distinc­ tiveness are low. For example, a kind gesture will be attributed to the personality of the actor if that person is routinely kind, especially in situations in which others might not be, and if the person acts that way indiscriminately, regardless of whom is receiving the kind­ ness. In this experiment, brain activity was measured using fMRI (functional magnetic reso­ nance imaging) while participants engaged in the attribution task. Activity in the superior temporal sulcus (STS) was elevated in the combination of conditions evoking person-attri­ bution (low consensus and distinctiveness, high consistency), when compared with the other combinations of conditions. Activity in the mPFC, but not the STS, was also associ­ ated with social judgments not specific to a single person (high consensus, low distinc­ tiveness, high consistency). The person-attribution condition also activated other regions in the brain (right middle temporal gyrus, right middle occipital gyrus, right precentral gyrus, right precuneus, left insula, and left cingulate gyrus), but not uniquely when com­ pared with other study stimulus conditions (e.g., low consensus, distinctiveness, and con­ sistency). The fact that the right STS and the left mPFC are preferentially engaged by so­ cial cognition and Theory of Mind test paradigms suggests that social causal reasoning converges with the ability to infer the mental states of others. The left-sided prefrontal activation is consistent with CM theory, but could conceivably be an artifact of the social component of reasoning rather than causal attribution per se. The right-sided (p. 237) temporal activation is consistent with the biological motion (or agency) and spatial rea­ soning aspects of FC theory, but could also hypothetically be part of a larger right-sided activation pattern characteristic of deductive reasoning, as seen in MM theory. Note that the areas that are activated together by multiple study conditions are primarily righthemispheric, which would provide support for MM theory. Biased attribution of behavior on the basis of trying to serve some personal motivation al­ so has a rich literature demonstrating exactly how error-prone and flexible human social judgment can be (Kunda, 1990). In Western cultures that assign a high value to individu­ alistic notions of self, we tend to erroneously attribute the actions of others to their dispo­ Page 35 of 47

Cognitive Neuroscience of Causal Reasoning sitions while underweighting the influence of context; this was famously termed “funda­ mental attribution error,” and it features prominently in the social psychology literature (Mason & Morris, 2010; Ross, 1977). Similarly, people often make overly forgiving judg­ ments of their own actions, taking credit for successes and blaming circumstance for fail­ ures and indiscretions, presumably to reduce dissonance and preserve a positive self-im­ age (Greenberg et al., 1982). As with the other tendencies to construct causal models as dictated by our goals and con­ textual factors, the presence of attribution biases can be mapped to a network in the brain supporting its component parts: in this case, general causal reasoning and mecha­ nisms for representing the assignment of value to particular explanations (Blackwood et al., 2003). By instructing participants to imagine themselves as the central actor in a se­ ries of social vignettes requiring an explanation for their own behaviors, it was possible to directly compare the neural correlates of self-attribution and other-attribution. Partici­ pants could choose from self-attribution, other-attribution, or situational factors. Attribut­ ing actions to the self without a bias (so, including negative and positive actions) engaged the left lateral cerebellum, the bilateral dorsal premotor cortex, and the right lingual gyrus. External attribution (collapsing other person and impersonal contextual influences into a single category) engaged the left posterior superior temporal sulcus (STS). Com­ paring the activations associated with a self-serving bias with those associated with a selfdeprecating bias revealed that favorable attributions activated the bilateral caudate nu­ clei, while a bias against self-serving attributions activated the left lateral orbitofrontal cortex (OFC, which overlaps with vmPFC), the right angular gyrus, and the right middle temporal gyrus. The role of the STS in general external attribution is attributed to its role in inferring the mental or intentional states of others, and the role of premotor cortex, the cerebellum, and the lingual gyrus in general self-attribution is linked to their role in simu­ lating one’s own actions and intention states in decision-making. The neural basis of these biases is of particular interest, because the very presence of a bias suggests some error in reasoning—a departure from rational thought that might help explain what makes humans unique. Activation of the caudate nucleus when making self-serving attri­ butions suggests that representations of reward and motivation are involved. It is con­ ceivable that multiple causal representations are concurrently constructed in this con­ text: first, a plausible causal model linking self and situational factors to the behavior in consideration, and second, an implicit causal model linking the very inference being drawn to a particularly desirable emotion state, accounting for why the self-serving bias is even observed in the first place. The engagement of bilateral frontal, temporal, and parietal lobes in any internal attribution is consistent with all three theories of causal rea­ soning. Engagement of the lingual gyrus and parietal lobes in particular, especially in contrasts not involving a major difference in visual processing, supports an iconic sensory modality-specific representation as suggested by FC theory. Bilateral caudate activation associated with a self-serving bias and a left frontal, right parietal activation in the selfdeprecating bias are not clearly supportive of any one theory of reasoning. The left-sided temporal engagement of external attribution, when contrasted with all other attribution

Page 36 of 47

Cognitive Neuroscience of Causal Reasoning types, appears consistent with CM theory, but only when considered on its own without the context of the other conditions. Evidence from causal reasoning experiments enrolling participants with brain damage serves to confirm several general trends in the neuroscience of social causal attribution. Between two otherwise-equivalent explanations for an event of interest, healthy adults will tend to favor the explanation featuring agency or intention on the part of some in­ volved person. Accumulating evidence suggests that some patterns of brain damage are more likely than others to impair the discrimination between intentional and unintention­ al acts in their causal power (Channon et al., 2010). The study stimuli involved asking participants to read chains of events with two causes preceding some effect, with each cause varying from intentional or unintentional (p. 238) human acts to physical events not involving humans at all. Then, participants were asked to rate the causal power of each cause in the chain on a four-point likert scale, before using a similar scale to decide which cause was central to the effect. When comparing neurological patients with frontal dam­ age (especially in the right hemisphere) to those with posterior damage and healthy con­ trol subjects, it appears that the frontally damaged group is still able to discern the two, but to a lesser extent than participants in the other groups. Specifically, right middle frontal gyrus, right inferior frontal gyrus, right ventrolateral PFC (vlPFC), and right insu­ lar cortex damage predicted a lesser extent of discrimination between intentional and un­ intentional human acts in their causal power. The findings provide evidence for an anteri­ or right-hemisphere network that is critical to the discrimination between acts made in­ tentionally or unintentionally. This pattern of lesion-symptom mapping is consistent with the MM view of deductive inference as the driving force behind causal attribution.

Causal Judgment Versus Associative Learning The final element of causal judgment that has been studied using neuroscience methods is the distinction between associative learning mechanisms and causal reasoning. In inte­ grating cognitive and behavioral psychology with the rich philosophical tradition on mod­ els of reasoning, connections can be drawn between such constructs as goal-directed be­ havior and causal representations; goal-directed behavior would be incoherent without some understanding of causality to predict the consequences of actions and adapt behav­ ior accordingly. One hurdle to explaining brain function and higher cognition in terms of causal representations is that associative learning mechanisms along the lines of classical conditioning can explain animal behavior that resembles an understanding of causal rela­ tions. Associative learning is based on tracking patterns of coincidence, and although it might engage semantic knowledge representations for the objects being associated, there should be no need to engage semantic knowledge to describe the nature of relations if causality truly exists as a privileged class of representation beyond associations (see LeP­ elley, Griffiths, & Beesley, Chapter 2 in this volume, for associative theories of causality; Boddez, De Houwer, & Beckers, Chapter 4 in this volume, for reasoning theory). To study the neural correlates engaged by causal judgment beyond the semantic knowledge net­ Page 37 of 47

Cognitive Neuroscience of Causal Reasoning work in the brain supporting associative judgment, one fMRI study instructed partici­ pants to judge the nature of a series of paired words while being scanned (Fenker et al., 2005; Satpute et al., 2005). The pairs of words varied as to whether they were causally linked (e.g., wind and erosion), non-causally associated (e.g., ring and emerald), or unre­ lated (e.g., eggs and liar). The study was divided into two block types that involved ma­ nipulating whether participants were instructed to judge each pair as causal versus unre­ lated, or associated versus unrelated. In the causal judgment condition, then, associated pairs would warrant a “no” response, while causal pairs in the associative condition would still warrant a “yes,” because causal links are simply one type of association in this context. Note that semantic knowledge is still required to make an associative judgment, but not what the authors call a “role-binding” process in which each word in the pair is assigned to either the cause or effect role. The neural correlates of this role-binding process emerged when common features of causal and associative judgment were sub­ tracted from those associated with causal judgment only. A mostly left-hemisphere seman­ tic-processing network emerged when combining the two conditions: left dorsolateral prefrontal cortex (dlPFC), left middle frontal gyrus, inferior frontal gyrus, superior pari­ etal lobule, anterior cingulate gyrus, fusiform gyrus, and bilateral cerebellum. The neural correlates of causal reasoning contrasted against associative reasoning were much more focal, including a more anterior cluster in dlPFC and the right precuneus. Associative rea­ soning selectively engaged right superior temporal gyrus (STG) when contrasted with causal-only judgment.

Discussion At first glance, it is tempting to divide the psychological theories of reasoning according to the dissociable brain networks that they would predict as those supporting causal rea­ soning. Mental models theory emphasizes deduction over an abstract code of possible states of affairs, which generally engages a left-hemisphere frontoparietal network. Causal models theory emphasizes inductive reasoning over an abstract network repre­ senting both statistical dependencies and generative mechanisms linking the variables in a set of counterfactual manipulations, which should engage a right-hemisphere frontaltemporal-parietal network. Force composition theory emphasizes iconic force vector rep­ resentation, which could (p. 239) take the form of symbolic free-body diagrams or lifelike representational simulations. This would suggest a network involving the superior pari­ etal lobe, early perceptual processing streams in parietal and occipital lobes, and the me­ dial prefrontal cortex. On the basis of the causal judgment neuroscience literature alone, there appears to be broad evidence for a frontal-temporal-parietal network, in support of a plurality of cogni­ tive or psychological models. Particularly in the context of necessary and sufficient rela­ tions that are considered transitive according to formal logic (e.g., cause and enable), joining multiple relations occurs using a deductive process by definition, which would suggest engagement of the right hemisphere, and the use of mental models. Uncertainty and probability could plausibly engage a separate, inductive mechanism of reasoning us­ Page 38 of 47

Cognitive Neuroscience of Causal Reasoning ing Bayes Nets in the left hemisphere, or it could still proceed using a fundamentally de­ ductive mechanism as suggested by Johnson-Laird and colleagues (Johnson-Laird, 1994). The divided tracking of multiple outcomes in a decision-making task is also known to re­ quire bilateral representation in the brain (Charron & Koechlin, 2010). Furthermore, rea­ soning about abstract causal relations, like intention on the part of agents interacting with one another, seems to support the face validity of a simulation approach to causal reasoning when modal and statistical network-based representations do not quite capture the nature of the causal mechanism. Simplifying a series of abstract “forces” as a system to move toward or away from an end state is a plausible mechanism that may be engaged when necessary. In summary, there is most likely a plurality of modes of causal judgment and reasoning that are available. They may not all function at once, and may not even de­ scribe the same processes in the brain, but could rather be selectively recruited when a situation requires it. Reasoning about deterministic concepts—whether truly determinis­ tic in nature or not—will likely involve deduction over mental models. Reasoning over probabilistic codependencies will likely involve induction over causal models. Reasoning about scenarios involving agency and complex instances of multiple preventions co-occur­ ring will likely involve the use of iconic force representations.

Questions for Future Study It is important to consider the questions that remain unanswered by the early research findings to date. In the carefully controlled environment of a psychology lab, there are a number of features of causal reasoning that can be readily manipulated. Briefly, the dif­ ference between diagnostic and predictive reasoning has been the subject of some in­ quiry from a purely psychological standpoint (see Meder & Mayrhofer, Chapter 23 in this volume). Causal action simulations plausibly support predictive reasoning, for example. Diagnostic reasoning, however, appears to rely on similar information about causal de­ pendencies and mechanisms, and it has been suggested that causal simulations cannot be run in reverse to generate explanations (Fernbach et al., 2011; Meder et al., 2014; Slo­ man & Lagnado, 2015); it remains undecided whether a series of alternate causal simula­ tions can be run in the forward direction to decide on an explanatory inference (see Lom­ brozo and Vasilyeva, Chapter 22 in this volume, for a discussion on explanation), along the lines of the multiple models featured prominently in MM theory (Goldvarg & JohnsonLaird, 2001) or the structural equations in CM theory (Sloman & Lagnado, 2015). Map­ ping this distinction to key events or networks in the brain will help answer an old ques­ tion in cognitive neuroscience: Is the reasoning process best described by a single, do­ main-general neural mechanism that is distinct from lower-level information processing steps, or instead by separate domain-specific neural mechanisms that include the neural correlates of the perceptual processing dictated by the content of the relevant informa­ tion (e.g., emotion processing, or visuospatial information). Another open question is to what extent other cognitive functions can be re-explained in terms of causality. Action representations supporting goal-directed behavior represent causal knowledge of the consequences of possible actions. The ability to construct coher­ Page 39 of 47

Cognitive Neuroscience of Causal Reasoning ent categories with meaningful boundaries relies on a basis of theoretical knowledge over and above any sort of feature combination or exemplar model; this “theory theory” is fun­ damentally causal in nature as well (Rehder, 2003; Rehder, Chapters 20 and 21 in this vol­ ume). More fundamental cognitive functions like attention and memory may not be intrin­ sically causal, but they clearly support causal reasoning. The exact nature of this relation­ ship in the brain remains to be seen.

Conclusions We return to the broader, theory-relevant questions left unanswered by prior research. Definitively confirming or rejecting the neuroscience predictions (p. 240) of descriptive models of causal reasoning will require neuroimaging studies that use identical materials to those used previously in behavioral experiments. Let us not forget that all models are, by definition, incomplete and therefore inaccurate. They involve simplifying assumptions on some domains to allow perturbations on other domains of interest to be studied. None of the descriptive models of causal reasoning is likely to encapsulate all of human causal reasoning—even if a unified descriptive theory should be developed. For this reason, the short-term goal of neuroimaging studies on causal reasoning should be to use experimen­ tal materials that resonate with the personal experiences that most people have in trying to predict the future and explain the past. And once we have more thoroughly mapped the causal judgment and complex causal reasoning to networks in the brain, we can be­ gin to tackle the more daunting task of uniting all brain function together in a single code. Some have proposed that there is a common thread of information processing or representation types that unites the different elements of intelligence, reasoning, and perception in the human brain (Christoff & Gabrieli, 2000; Duncan, 2001, 2010; Koechlin et al., 2003; Miller & Cohen, 2001; O’Reilly, 2010). We suggest that the common thread is the representation of causal relations, and eagerly await the next findings to confirm our hypothesis or situate it within a more all-encompassing framework.

References Ahn, W., & Kalish, C. W. (2000). The role of mechanism beliefs in causal reasoning. In F. Keil, & R. Wilson (Eds.), Explanation and Cognition (pp. 199–225). Cambridge, MA: MIT Press. Ahn, W., Kalish, C. W., Medin, D. L., & Gelman, S. A. (1995). The role of covariation versus mechanism information in causal attribution. Cognition, 54, 299–352. doi: 10.1016/0010-0277(94)00640-7 Alivisatos, B., & Petrides, M. (1997). Functional activation of the human brain during mental rotation. Neuropsychologia, 35, 111–118. doi: 10.1016/S0028-3932(96)00083-8 Banich, M. T. (2009). Executive function: The search for an integrated account. Current Directions in Psychological Science, 18(2), 89–94. doi: 10.1111/j.1467-8721.2009.01615.x

Page 40 of 47

Cognitive Neuroscience of Causal Reasoning Barbey, A. K., Colom, R., & Grafman, J. (2013). Architecture of cognitive flexibility re­ vealed by lesion mapping. NeuroImage, 82, 547–554. doi:10.1016/j.neuroimage. 2013.05.087 Barbey, A. K., Colom, R., Solomon, J., Krueger, F., Forbes, C., & Grafman, J. H. (2012). An integrative architecture for general intelligence and executive function revealed by lesion mapping. Brain, 135(Pt 4), 1154–1164. doi:10.1093/brain/aws021 Barbey, A. K., Koenigs, M., & Grafman, J. H. (2011). Orbitofrontal contributions to human working memory. Cerebral Cortex (New York, N.Y. : 1991), 21(4), 789–795. doi:10.1093/ cercor/bhq153 Barbey, A. K., Koenigs, M., & Grafman, J. H. (2012). Dorsolateral prefrontal contributions to human working memory. Cortex, 49(5), 1195–1205. doi:10.1016/j.cortex.2012.05.022 Barbey, A. K., Krueger, F., & Grafman, J. H. (2009). Structured event complexes in the me­ dial prefrontal cortex support counterfactual representations for future planning. Philo­ sophical Transactions of the Royal Society of London. Series B, Biological Sciences, 364(1521), 1291–1300. doi:10.1098/rstb.2008.0315 Barbey, A., & Wolff, P. (2002). Causal reasoning from forces. In L. Erlbaum (Ed.), Proceed­ ings of the 28th annual conference of the Cognitive Science Society (p. 2439). Mahwah, NJ: Lawrence Erlbaum Associates. Blackwood, N. J., Bentall, R. P., Ffytche, D. H., Simmons, A., Murray, R. M., & Howard, R. J. (2003). Self-responsibility and the self-serving bias: An fMRI investigation of causal at­ tributions. NeuroImage, 20(2), 1076–1085. doi:10.1016/S1053-8119(03)00331-8 Botvinick, M. M., Braver, T. S., Barch, D. M., Carter, C. S., & Cohen, J. D. (2001). Conflict monitoring and cognitive control. Psychological Review, 108(3), 624–652. doi: 10.1037//0033-295X.I08.3.624 Buckner, R. L., Andrews-Hanna, J. R., & Schacter, D. L. (2008). The brain’s default net­ work: Anatomy, function, and relevance to disease. Annals of the New York Academy of Sciences, 1124, 1–38. doi:10.1196/annals.1440.011 Channon, S., Lagnado, D., Drury, H., Matheson, E., Fitzpatrick, S., Shieff, C., et al. (2010). Causal reasoning and intentionality judgments after frontal brain lesions. Social Cogni­ tion, 28(4), 509–522. doi:10.1521/soco.2010.28.4.509 Charron, S., & Koechlin, E. (2010). Divided representation of concurrent goals in the hu­ man frontal lobes. Science (New York, N.Y.), 328(5976), 360–363. doi:10.1126/science. 1183614 Cheng, P. W. (1997). From covariation to causation: A causal power theory. Psychological Review, 104(2), 367–405. doi:10.1037//0033-295X.104.2.367

Page 41 of 47

Cognitive Neuroscience of Causal Reasoning Cheng, P. W., & Novick, L. R. (1990). A probabilistic contrast model of causal induction. Journal of Personality and Social Psychology, 58(4), 545. doi: 10.1037/0022-3514.58.4.545 Cheng, P. W., & Novick, L. R. (1991). Causes versus enabling conditions. Cognition, 40(1– 2), 83–120. doi:10.1016/0010-0277(91)90047-8 Christoff, K., & Gabrieli, J. D. E. (2000). The frontopolar cortex and human cognition: Evi­ dence for a rostrocaudal hierarchical organization within the human prefrontal cortex. Psychobiology, 28(2), 168–186. doi: 10.3758/BF03331976 Corballis, P. M. (2003). Visuospatial processing and the right-hemisphere interpreter. Brain and Cognition, 53, 171–176. doi:10.1016/S0278-2626(03)00103-9 Duncan, J. (2001). An adaptive coding model of neural function in prefrontal cortex. Na­ ture Reviews Neuroscience, 2(November), 820–829. doi: 10.1038/35097557 Duncan, J. (2010). The multiple-demand (MD) system of the primate brain: mental pro­ grams for intelligent behaviour. Trends in Cognitive Sciences, 14(4), 172–179. doi: 10.1016/j.tics.2010.01.004 Duncan, J., & Owen, A. M. (2000). Common regions of the human frontal lobe recruited by diverse cognitive demands. Trends in Neurosciences, 23, 475–483. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/11006464 Fenker, D. B., Waldmann, M. R., & Holyoak, K. J. (2005). Accessing causal relations in se­ mantic memory. Memory & Cognition, 33(6), 1036–1046. doi:10.3758/BF03193211 Fernbach, P. M., Darlow, A., & Sloman, S. A. (2011). Asymmetries in predictive and diagnostic reasoning. Journal of Experimental Psychology, 140(2), 168–185. doi: 10.1037/a0022100 (p. 241)

Fonlupt, P. (2003). Perception and judgement of physical causality involve different brain structures. Cognitive Brain Research, 17, 248–254. doi: 10.1016/S0926-6410(03)00112-5 Fugelsang, J. A., & Dunbar, K. N. (2005). Brain-based mechanisms underlying complex causal thinking. Neuropsychologia, 43(8), 1204–1213. doi:10.1016/j.neuropsychologia. 2004.10.012 Fugelsang, J. A., Roser, M. E., Corballis, P. M., Gazzaniga, M. S., & Dunbar, K. N. (2005). Brain mechanisms underlying perceptual causality. Brain Research. Cognitive Brain Re­ search, 24(1), 41–47. doi:10.1016/j.cogbrainres.2004.12.001 Fugelsang, J. A., & Thompson, V. A. (2003). A dual-process model of belief and evidence interactions in causal reasoning. Memory & Cognition, 31(5), 800–815. doi:10.3758/ BF03196118 Fuster, J. (1989). The prefrontal cortex. New York: Raven.

Page 42 of 47

Cognitive Neuroscience of Causal Reasoning Goel, V. (2005). Cognitive neuroscience of deductive reasoning. In K. Holyoak & R. Morri­ son (Eds.), Cambridge handbook of thinking and reasoning (pp. 475–492). New York: Cambridge University Press. Goel, V. (2007). Anatomy of deductive reasoning. Trends in Cognitive Sciences, 11(10), 435–441. doi:10.1016/j.tics.2007.09.003 Goel, V., Buchel, C., Frith, C., & Dolan, R. J. (2000). Dissociation of mechanisms underly­ ing syllogistic reasoning. NeuroImage, 12, 504–514. doi:10.1006/nimg.2000.0636 Goel, V., & Dolan, R. J. (2004). Differential involvement of left prefrontal cortex in induc­ tive and deductive reasoning. Cognition, 93(3), B109–121. doi:10.1016/j.cognition. 2004.03.001 Goel, V., Gold, B., Kapur, S., & Houle, S. (1998). Neuroanatomical correlates of human reasoning. Journal of Cognitive Neuroscience, 10, 293–302. doi: 10.1162/089892998562744 Goldvarg, E., & Johnson-Laird, P. N. (2001). Naive causality: A mental model theory of causal meaning and reasoning. Cognitive Science, 25(4), 565–610. doi:10.1207/ s15516709cog2504_3 Greenberg, J., Pyszczynski, T., & Solomon, S. (1982). The self-serving attributional bias : Beyond self-presentation. Journal of Experimental Social Psychology, 18(56–67). Grossman, E., Donnelly, M., Price, R., Pickens, D., Morgan, V., Neighbor, G., et al. (2000). Brain areas involved in perception of biological motion. Journal of Cognitive Neuroscience, 12, 711–720. doi:10.1162/089892900562417 Harris, L. T., Todorov, A., & Fiske, S. T. (2005). Attributions on the brain: Neuro-imaging dispositional inferences, beyond theory of mind. NeuroImage, 28(4), 763–769. doi: 10.1016/j.neuroimage.2005.05.021 Huettel, S. A., Song, A. W., & McCarthy, G. (2005). Decisions under uncertainty: Proba­ bilistic context influences activation of prefrontal and parietal cortices. The Journal of Neuroscience, 25(13), 3304–3311. doi:10.1523/JNEUROSCI.5070-04.2005 Johnson-Laird, P. N. (1994). Mental models and probabilistic thinking. Cognition, 50, 189– 209. doi:10.1016/0010-0277(94)90028-0 Johnson-Laird, P. N. (1995). Mental models, deductive reasoning, and the brain. In M. S. Gazzaniga (Ed.), The cognitive neurosciences (pp. 999–1008). Cambridge, MA: MIT Press. Johnson-Laird, P. N. (2010a). Against logical form. Psychologica Belgica, 50(3–4), 193– 221.

Page 43 of 47

Cognitive Neuroscience of Causal Reasoning Johnson-Laird, P. N. (2010b). Mental models and human reasoning. Proceedings of the National Academy of Sciences of the United States of America, 107, 18243–18250. doi: 10.1073/pnas.1012933107 Johnson-Laird, P. N., Girotto, V., & Legrenzi, P. (2004). Reasoning from inconsistency to consistency. Psychological Review, 111(3), 640–661. doi:10.1037/0033-295X.111.3.640 Jordan, K., Heinze, H. J., Lutz, K., Kanowski, M., & Jäncke, L. (2001). Cortical activations during the mental rotation of different visual objects. NeuroImage, 13, 143–152. doi: 10.1006/nimg.2000.0677 Khemlani, S. S., Barbey, A. K., & Johnson-Laird, P. N. (2014). Causal reasoning with men­ tal models, Frontiers in Human Neuroscience, 8(October), 1–15. doi:10.3389/fnhum. 2014.00849 Koechlin, E., Ody, C., & Kouneiher, F. (2003). The architecture of cognitive control in the human prefrontal cortex. Science (New York, N.Y.), 302(5648), 1181–1185. doi:10.1126/ science.1088545 Koenigs, M., Barbey, A. K., Postle, B. R., & Grafman, J. H. (2009). Superior parietal cortex is critical for the manipulation of information in working memory. The Journal of Neuro­ science, 29(47), 14980–14986. doi:10.1523/JNEUROSCI.3706-09.2009 Krynski, T. R., & Tenenbaum, J. B. (2007). The role of causality in judgment under uncer­ tainty. Journal of Experimental Psychology. General, 136(3), 430–450. doi: 10.1037/0096-3445.136.3.430 Kunda, Z. (1990). The case for motivated reasoning. Psychological Bulletin, 108(3), 480– 498. Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. San Francisco, CA: W. H. Freeman. Mason, M. F., & Morris, M. W. (2010). Culture, attribution and automaticity: A social cog­ nitive neuroscience view. Social Cognitive and Affective Neuroscience, 5(2–3), 292–306. doi:10.1093/scan/nsq034 Meder, B., Mayrhofer, R., & Waldmann, M. (2014). Structure induction in diagnostic causal reasoning. Psychological Review, 121(3), 277–301. doi: 10.1037/a0035944 Mian, M. K., Sheth, S. A, Patel, S. R., Spiliopoulos, K., Eskandar, E. N., & Williams, Z. M. (2014). Encoding of rules by neurons in the human dorsolateral prefrontal cortex. Cere­ bral Cortex (New York, N.Y. : 1991), 24(3), 807–816. doi:10.1093/cercor/bhs361 Michotte, A. (1963). The perception of causality. London: Methuen. Miller, E. K., & Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. An­ nual Review of Neuroscience, 24, 167–202. doi:10.1146/annurev.neuro.24.1.167 Page 44 of 47

Cognitive Neuroscience of Causal Reasoning Miyake, A., Friedman, N. P., Emerson, M. J., Witzki, a H., Howerter, A., & Wager, T. D. (2000). The unity and diversity of executive functions and their contributions to complex “frontal lobe” tasks: A latent variable analysis. Cognitive Psychology, 41(1), 49–100. doi: 10.1006/cogp.1999.0734 Murphy, G. L., & Medin, D. L. (1985). The role of theories in conceptual coherence. Psy­ chological Review, 92(3), 289–316. Nitsche, M. A., Cohen, L. G., Wassermann, E. M., Priori, A., Lang, N., Antal, A., et al. (2008). Transcranial direct current stimulation: State of the art 2008. Brain Stimulation, 1(3), 206–223. doi:10.1016/j.brs.2008.06.004 O’Reilly, R. C. (2010). The what and how of prefrontal cortical organization. Trends in Neurosciences, 33(8), 355–361. doi:10.1016/j.tins.2010.05.002 Osherson, D., Perani, D., Cappa, S., Schnur, T., Grassi, F., & Fazio, F. (1998). Dis­ tinct brain loci in deductive versus probabilistic reasoning. Neuropsychologia, 36(4), 369– 376. doi:10.1016/S0028-3932(97)00099-7 (p. 242)

Parsons, L. M., & Osherson, D. (2001). New evidence for distinct right and left brain sys­ tems for deductive versus probabilistic reasoning. Cerebral Cortex, 11(10), 954–965. Re­ trieved from http://www.ncbi.nlm.nih.gov/pubmed/11549618 Patterson, R., & Barbey, A. K. (2012). A cognitive neuroscience framework for causal rea­ soning. In J. H. Grafman & F. Krueger (Eds.), The neural representation of belief systems (pp. 76–120). New York: Psychology Press. Paus, T. (2005). Mapping brain maturation and cognitive development during adoles­ cence. Trends in Cognitive Sciences, 9(2), 60–68. doi:10.1016/j.tics.2004.12.008 Pearl, J. (2000). Causality: Models, reasoning and inference. Cambridge, MA: MIT Press. Poeppel, D. and Embick, D. (2005). The relation between linguistics and neuroscience. In A. Cutler (ed.), Twenty-First Century Psycholinguistics: Four Cornerstones. Poldrack, R. A. (2006). Can cognitive processes be inferred from neuroimaging data? Trends in Cognitive Sciences, 10(2), 59–63. doi:10.1016/j.tics.2005.12.004 Ramnani, N., & Owen, A. M. (2004). Anterior prefrontal cortex: insights into function from anatomy and neuroimaging. Nature Reviews. Neuroscience, 5(March), 184–194. doi: 10.1038/nrn1343 Ratcliff, G. (1979). Spatial thought, mental rotation and the right cerebral hemisphere. Neuropsychologia, 17, 49–54. doi:10.1016/0028-3932(79)90021-6 Rehder, B. (2003). A causal-model theory of conceptual representation and categoriza­ tion. Journal of Experimental Psychology. Learning, Memory, and Cognition, 29(6), 1141– 1159. doi:10.1037/0278-7393.29.6.1141 Page 45 of 47

Cognitive Neuroscience of Causal Reasoning Ross, L. (1977). The intuitive psychologist and his shortcomings: Distortions in the attri­ bution process. Advances in Experimental Social Psychology, 10, 173–220. Satpute, A. B., Fenker, D. B., Waldmann, M. R., Tabibnia, G., Holyoak, K. J., & Lieberman, M. D. (2005). An fMRI study of causal judgments. The European Journal of Neuroscience, 22(5), 1233–1238. doi:10.1111/j.1460-9568.2005.04292.x Schacter, D. L., Addis, D. R., & Buckner, R. L. (2007). Remembering the past to imagine the future: the prospective brain. Nature Reviews. Neuroscience, 8, 657–661. doi: 10.1080/08995600802554748 Sloman, S. A., Barbey, A. K., & Hotaling, J. M. (2009). A causal model theory of the mean­ ing of cause, enable, and prevent. Cognitive Science, 33(1), 21–50. doi:10.1111/j. 1551-6709.2008.01002.x Sloman, S., & Lagnado, D. (2015). Causality in thought. Annual Review of Psychology, 66, 3.1–3.25. doi:10.1146/annurev-psych-010814-015135 Smith, D. V, Clithero, J. A., Rorden, C., & Karnath, H.-O. (2013). Decoding the anatomical network of spatial attention. Proceedings of the National Academy of Sciences of the United States of America, 110(4), 1518–1523. doi:10.1073/pnas.1210126110 Straube, B., & Chatterjee, A. (2010). Space and time in perceptual causality. Frontiers in Human Neuroscience, 4(April), 28. doi:10.3389/fnhum.2010.00028 Tenenbaum, J. B., & Griffiths, T. L. (2003). Theory-based causal inference. In S. Becker, S. Thrun, & K. Obermayer (Eds.), Advances in Neural Information Processing Systems 15: Proceedings of the 2002 Conference, 43–50. Tenenbaum, J. B., Kemp, C., Griffiths, T. L., & Goodman, N. D. (2011). How to grow a mind: statistics, structure, and abstraction. Science (New York, N.Y.), 331, 1279–1285. doi:10.1126/science.1192788 Tettamanti, M., Manenti, R., Della Rosa, P. A., Falini, A., Perani, D., Cappa, S. F., et al. (2008). Negation in the brain: Modulating action representations. NeuroImage, 43, 358– 367. doi:10.1016/j.neuroimage.2008.08.004 Ungerleider, L. G., & Mishkin, M. (1982). Two cortical visual systems. In D. J. Ingle, M. A. Goodale, & R. J. W. Mansfield (Eds.), Analysis of visual behavior (pp. 549–586). Cam­ bridge, MA; London: MIT Press. Vilares, I., Howard, J. D., Fernandes, H. L., Gottfried, J. A., & Kording, K. P. (2012). Differ­ ential representations of prior and likelihood uncertainty in the human brain. Current Bi­ ology, 22(18), 1641–1648. doi:10.1016/j.cub.2012.07.010. Vincent, J. L., Kahn, I., Snyder, A. Z., Raichle, M. E., & Buckner, R. L. (2008). Evidence for a frontoparietal control system revealed by intrinsic functional connectivity. Journal of Neurophysiology, 100(6), 3328–3342. doi:10.1152/jn.90355.2008 Page 46 of 47

Cognitive Neuroscience of Causal Reasoning Wolff, P., & Barbey, A. K. (2015). Causal reasoning with forces. Frontiers in Human Neuro­ science, 9, 1–21. doi: 10.3389/fnhum.2015.00001 Wolff, P., Barbey, A. K., & Hausknecht, M. (2010). For want of a nail: How absences cause events. Journal of Experimental Psychology: General, 139(2), 191–221. doi:10.1037/ a0018129 Woods, A. J., Hamilton, R. H., Kranjec, A., Minhaus, P., Bikson, M., Yu, J., et al. (2014). Space, time, and causality in the human brain. NeuroImage, 92, 285–297. doi: 10.1016/ j.neuroimage.2014.02.015

Joachim T. Operskalski

Decision Neuroscience Laboratory Beckman Institute University of Illinois UrbanaChampaign Urbana, Illinois, USA Aron K. Barbey

Decision Neuroscience Laboratory Beckman Institute University of Illinois UrbanaChampaign Urbana, Illinois, USA

Page 47 of 47

Visual Impressions of Causality

Visual Impressions of Causality   Peter A. White The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.17

Abstract and Keywords A simple animation in which a moving object contacts a stationary one and the latter then moves off gives rise to a visual impression in which the first object makes the second one move. There are several other kinds of visual causal impressions; they are perceptual in­ terpretations that go beyond, and disambiguate, the information in the stimulus. It has been argued that visual causal impressions result from the activation of innate perceptual structures or mechanisms. It has also been argued that visual causal impressions result from the activation of memorial information based on past experience. These past experi­ ences may be of actions on objects or of other things acting on a passive actor. Whichever of these hypotheses turns out to be correct, the visual world as experienced is already in­ terpreted in terms of causality before causal reasoning processes begin to operate. Keywords: phenomenal causality, perception, visual causal impression, innate perceptual mechanisms, causal rea­ soning

Visual Impressions of Causality Imagine that a small black disc (hereafter the “target”) is visible, stationary, in the middle of a computer screen, surrounded by a uniform white field. Another small black disc (the “mover,” meaning just the object that moves first), enters from the left of the screen and moves horizontally toward the target at constant speed. When the mover contacts the tar­ get, the mover stops moving and the target starts to move in the same direction and at the same speed. This stimulus is deliberately highly abstracted. The objects are simple two-dimensional geometrical forms with little clue to their identity, and there is no con­ text apart from the uniform white field. The only kinds of information available to the eye are the shapes of the objects and their kinematics—motion information. It would be easy to hypothesize that only shapes and kinematics would be perceived. In fact, observers re­ liably report perceiving something more than that. They perceive the mover make the tar­ get move by bumping into it. This is a visual impression of causality, and it is the main topic of this chapter. There is no causality in the stimulus, which is a computer-generated animation comprising successive presentations of static frames. If it is going too far to Page 1 of 32

Visual Impressions of Causality call the impression an illusion, it is certainly no more than a construct of visual informa­ tion processing. I shall use the term “causal impression” to refer to the visual impression of causality. The causal impression is a product of automatic perceptual processes: we do not choose whether to have it or not, and no amount of reasoning or deliberation can alter it. Why should we perceive something that goes beyond the available visual information in such a particular way? How are these impressions affected by stimulus conditions? How do they originate? If we experience a world that has already been automatically perceptually in­ terpreted in terms of physical causality, what does that imply for theories of causal learn­ ing? I shall try to address these questions, and in the process to show (p. 246) how impor­ tant visual impressions of causality are to causal learning and causal reasoning in gener­ al.

Figure 14.1 Typical stimulus for the launching ef­ fect. In panel (a), the white disc (the target) is sta­ tionary in the middle of the frame and the black disc (the mover) is moving toward it. Panel (b) shows the mover approaching the target. Panel (c) shows the frame in which the mover contacts the target. At this point the mover stops moving and the target moves away with the same or slightly lesser speed as the mover’s pre-contact speed. Panel (d) shows the tar­ get moving away from the mover after contact. Mo­ tion continues until the target exits the frame.

What Kinds of Causal Impression Are There? Launching The stimulus described in the first paragraph was first used in research by Michotte (1963). Michotte devised an ingenious apparatus that could present stimuli that appeared to show objects moving and coming into contact. More recent research (e.g., Schlottmann, Ray, Mitchell, & Demetriou, 2006) has used animations presented on a com­ puter screen. Figure 14.1 depicts the kind of stimulus usually used. Michotte asked ob­ servers to describe what they saw, and most observers reported a causal impression, an Page 2 of 32

Visual Impressions of Causality impression that the mover made the target move by bumping into it. This has become known as the launching effect. It has been claimed that not all participants experience the launching effect when observing launching stimuli (Boyle, 1960; Beasley, 1968; Joyn­ son, 1971; Schlottmann, 2000). However, more recent research using computer-generat­ ed stimuli has confirmed that a large majority of observers, if not all, do report the launching effect (Gordon, Day, & Stecher, 1990; Schlottmann et al., 2006; Scholl & Tremoulet, 2000). There is also evidence from brain scanning studies, and from a study using a visual adaptation method, that the visual system is involved in the processing of causally relevant features of visual stimuli (Blakemore, Fonlupt, Pachot-Clouard, Darmon, Boyer, Meltzoff, Segebarth, & Decety, 2003; Fonlupt, 2003; Fugelsang, Roser, Corballis, Gazzaniga, & Dunbar, 2005; Rolfs, Dambacher, & Cavanagh, 2013; Roser, Fugelsang, Dunbar, Corballis, & Gazzaniga, 2005). The launching effect is a remarkable phenomenon for two main reasons. First, the stimu­ lus is deliberately highly abstracted, as I have already described. Second, it is quite am­ biguous in relation to real-world events. It could represent something like one billiard ball contacting another one and setting it in motion, or it could represent an event in which an object contacts an animate object, which then retreats of its own accord. Despite this am­ biguity, the visual system does not treat the stimulus as ambiguous, but adopts a definite interpretation of it. This can be contrasted with the results of some experiments in which Michotte (1963) investigated qualitative causality. Qualitative causality refers to a sudden change in a property of an object such as shape or color, but without motion. In some ex­ periments, the target changed color or disappeared; in others, a change in color or other property of an object was temporally associated with a sound; in others, both objects were motionless but changed color in rapid succession. A small proportion of (p. 247) ob­ servers reported a causal impression when the mover contacted the target and the target changed color while remaining motionless, but all other variations gave rise to no appar­ ent causal impression. Observers notice and report the contingency between one event and another, but they do not experience the contingency as a causal relation.

Page 3 of 32

Visual Impressions of Causality

Figure 14.2 Typical stimulus for the entraining ef­ fect. In panel (a), the white disc (the target) is sta­ tionary in the middle of the frame and the black disc (the mover) is moving toward it. Panel (b) shows the frame in which the mover contacts the target. At this point, both objects move with the same velocity to­ ward the right side of the frame. Panel (c) shows a frame partway through this phase, with the objects remaining in contact. Motion continues until the ob­ jects exit the frame.

Entraining Michotte (1963) also investigated an effect he called “entraining.” Figure 14.2 depicts a typical stimulus for entraining. It is similar to the stimulus for launching except that, af­ ter coming into contact, mover and target remain in contact and move together toward the right side of the screen. Observers usually report that they see the mover push or car­ ry the target. This is qualitatively different from launching in that it is a temporally ex­ tended interaction: instead of a momentary collision that seems to launch the target, in entraining the mover is perceived as continuing to push the target for as long as their conjoined motion continues. This stimulus is also ambiguous: the target could be per­ ceived as pulling the mover. But, again, the visual system adopts a definite interpretation of the stimulus as entraining.

Triggering Michotte (1963) defined triggering as “the impression that one movement, which is other­ wise clearly autonomous, depends on the appearance of a separate event which is its antecedent” (p. 57). Triggering occurs with a launching stimulus when the target moves faster than the mover (Michotte, 1963; Natsoulas, 1961). It is the impression that the target’s movement was initiated by contact from the mover without being actually pro­ duced by it. Michotte (1963) used a stimulus similar to the typical launching stimulus ex­ cept that, instead of stopping at the point of contact, the mover returned to its starting point. Michotte reported that the launching effect did not occur with this stimulus, but

Page 4 of 32

Visual Impressions of Causality the impression was that the contact triggered the subsequent movements of both objects. Michotte regarded triggering as “a weakened form of launching” (1963, p. 217).

Pulling Michotte (1963) ran some experiments that gave rise to an impression that one object was pulling another. In one (experiment 56), the stimulus was similar to that for entrain­ ing, except that the mover passes object B and, as soon as the mover passes the target, “the two objects move side by side at the same speed” (p. 160). Michotte reported an im­ pression that the mover towed the target. He argued that this impression, which he called the “traction” impression, was a variety of entraining, differing only in that the causal ob­ ject took up a position in front of the target instead of behind it. (p. 248)

Figure 14.3 Stimulus used in the study of the pulling impression by White and Milne (1997; adapted from Figure 1 in that paper). Panel (a) shows the first frame of the stimulus, with all objects stationary. The top object begins to move, followed shortly after by the second object. Panel (b) shows the two objects in motion, with arrows indicating direction of motion. The succeeding panels show each successive object starting to move in the same way. The final panel (f) shows a stage from later in the stimulus, which con­ tinues until all objects have exited the frame.

White and Milne (1997) presented stimuli in which five initially stationary objects began moving one after another. The setup is shown in Figure 14.3 shows the first frame of the stimulus. After a short pause, the top object begins to move horizontally at constant speed, then the second object does the same, and so on until all five objects are moving horizontally at the same speed. The figure shows how events unfold. White and Milne (1997) found that all participants reported an impression of pulling with the Figure 14.3

Page 5 of 32

Visual Impressions of Causality stimulus. A version of the stimulus with just two objects does also yield an impression of pulling, but it is not as strong (White, 2010). The typical stimulus for the pulling impression has some noteworthy features. The causal object never approaches any other object, but instead moves away from them. There is a gap between the objects that initially gets larger: the objects never come into contact. There is no visible connection between the objects. Considering this, the stimulus seems to represent objects moving independently. Despite all these features, the visual system interprets the stimulus as presenting a causal relation. How can we perceive a causal re­ lation when the objects perceived never come into contact? It seems to defy the laws of physics. I shall return to this question when I consider possible explanations for causal impressions later in this chapter.

Enforced Disintegration White and Milne (1999) presented stimuli in which the mover contacted a target that was a square array of nine squares, whereupon the target fragmented into its nine component squares, which moved off in various directions at the same speed as or slightly lesser speed than the mover. Figure 14.4 depicts an example stimulus. It resembles a launching stimulus, except for the fragmentation of the target. The stimulus yields an impression that the authors called “enforced disintegration,” an impression that the mover smashes the target to pieces. (p. 249)

Figure 14.4 Stimulus used in the study of the en­ forced disintegration impression by White and Milne (1999). In panel (a), a square composed of nine smaller squares (the target) is stationary in the mid­ dle of the frame and the black disc (the mover) is moving toward it. Panel (b) shows the frame in which the mover contacts the target. At contact, the nine smaller squares move off in different directions while the mover continues to move more slowly toward the right side of the frame. Panels (c) and (d) show frames partway through this phase. Motion contin­ ues until all objects have exited the frame.

Page 6 of 32

Visual Impressions of Causality

Figure 14.5 Typical stimulus for the penetration im­ pression (White & Milne, 2003). Toward the right side of the frame is a black rectangle that remains stationary throughout. Panel (a) shows the first frame of the stimulus with a long, thin ellipse in mo­ tion toward the rectangle. Panel (b) shows the frame in which the ellipse contacts the rectangle. At this point the ellipse rapidly decelerates, coming to a halt in the position shown in panel (c).

Penetration White and Milne (2003) presented stimuli in which a long, thin object moved toward a large black rectangle. On contacting the rectangle, the long, thin object decelerated to a halt. Figure 14.5 shows an example stimulus in which the long, thin object is almost com­ pletely occluded by the rectangle: Figure 14.5 shows the final frame, when the long, thin object had come to a halt. In other stimuli, the long, thin object came to a halt much more quickly, so that very little of it was occluded, or much more slowly, so that part of it ap­ peared on the far side of the rectangle, or not at all, so that it continued to move out of the frame at the same speed. With rapid deceleration and little occlusion, participants re­ ported an impression that the long, thin object pierced or penetrated the rectangle. As the rate of deceleration decreased, this impression (p. 250) was reported less and less, and was replaced by an impression that the long, thin object passed behind the rectangle. In this case, too, the stimulus is ambiguous. Because the stimuli are two-dimensional there is no objective cue as to whether the long, thin object moves into or behind the rec­ tangle, as Figure 14.5 illustrates. Yet the visual system adopts a definite interpretation of it, which changes depending on the observed motion of the long thin object.

Comment This is not an exhaustive catalog (see, e.g., Hubbard, 2013a; Michotte, 1963; White, 2005), and indeed other kinds of causal impressions may well be awaiting discovery. In addition, however, it is not clear exactly how a “kind” of causal impression can be de­ fined. Michotte (1963) investigated a large number of effects and impressions that oc­ curred when viewing stimuli presented in his long series of experiments, but he regarded all of those that could be called causal impressions as being variants on just two basic kinds, launching and entraining. This is unlikely to be the case, because he overlooked Page 7 of 32

Visual Impressions of Causality the possibility that objects can be deformed by contact (as in the enforced distintegration impression) as well as moved by it. But it is worth bearing in mind that there is no sys­ tematic or objectively justified classification of kinds of causal impression. Most research on phenomenal causality has been concerned with the launching effect (Hubbard, 2013a, 2013b; Scholl & Tremoulet, 2000), but there is no reason to think that the launching ef­ fect is in any way a more fundamental kind of visual causal impression than any other.

Stimulus Conditions Affecting the Occurrence of Visual Causal Impressions Most research on the conditions affecting the occurrence of visual causal impressions has been carried out on the launching effect. What follows is a brief summary to provide some context for the sections that follow. More detailed accounts can be found in Hubbard (2013a, 2013b) and Scholl and Tremoulet (2000).

Delay In the typical stimulus for the launching effect, the target starts to move as soon as the mover contacts it. This can be manipulated so that there is a delay between the mover contacting the target and the target starting to move. The launching effect is extremely sensitive to effects of delay: even with delays as short as 58 milliseconds, reports of launching do not occur on all trials, and with delays of 182 milliseconds or longer, the launching effect is never reported (Michotte, 1963; White, 2014a; see also Buehner, Chapter 28 in this volume).

Gap In the typical launching stimulus, the mover contacts the target, but it is possible to make the mover stop before reaching the target, whereupon the target starts moving without delay. Michotte (1963) and Yela (1952) found that the launching effect persisted even with quite large gaps, so long as the objects moved reasonably quickly, but did tend to weaken as gap size increased. The launching effect can be strengthened by the presence of a sta­ tionary object filling the gap (Young & Falmier, 2008) and by the presence of lines resem­ bling a tunnel or channel within which the objects move (Michotte, 1963).

Relative Direction In the typical launching stimulus, the target moves in the same direction as the mover. Michotte (1963) investigated the effect of varying the direction of the target’s motion rel­ ative to that of the mover, and reported that the launching effect was considerably weak­ ened with a difference in direction of 25°. However, the objects were rectangles, and the axis of motion of the mover passed through the geometric center of the target. White (2012a) used discs and manipulated the axis of motion of the mover as well as the direc­ tion of the target’s motion. An example stimulus is shown schematically in Figure 14.6. Page 8 of 32

Visual Impressions of Causality The mover is the black disc and the target is the white disc. Figure 14.6a is the first frame of the stimulus, with the target stationary in the center of the frame and the mover in its starting position at the left edge of the frame. The mover moves horizontally at constant speed (18.9 cm/s) toward the target. Figure 14.6b shows the frame in which the mover contacts the target. In this case, the point of contact is at 60° around the circumference of the target from the usual point of contact. At that point, the mover stops moving and the target starts moving. In this case, the angle of motion for the target is 60° to the hori­ zontal, and Figure 14.6c shows a later point in the target’s motion at that angle. Motion continues until the target has exited the frame. With on-center contact, the mean report­ ed causal impression in White (2012a) was strongest when the angle of object B’s motion was 0°, and decrease as angle increased. With off-center (p. 251) contact, the strongest re­ ported causal impression occurred for the angle that corresponded to the degree of offcenter contact: at 20° angle with 20° off-center contact, at 40° angle with 40° off-center contact, and so on. So the effect of direction of the target’s motion is tied to that of the axis of the mover’s motion.

Figure 14.6 Variation on the launching stimulus used in research on effects of relative direction of motion (W hite, 2012a). Panel (a) is the first frame of the stimu­ lus, with the target stationary in the center of the frame and the mover in its starting position at the left edge of the frame. Panel (b) shows the frame in which the mover contacts the target. In this case, the point of contact is at 60° around the circumference of the target from the usual point of contact. At that point, the mover stops moving and the target starts moving. In this case, the angle of motion for the tar­ get is 60° to the horizontal, and panel (c) shows a lat­ er point in the target’s motion at that angle. Motion continues until the target has exited the frame.

Relative Speed Michotte (1963) claimed that the launching effect appeared most distinct when the speed of the mover was about four times faster than that of the target. Natsoulas (1961) found that the launching effect was clearest when mover and target moved at the same speed or when the launcher was just a little faster than the target. If the target moved percepti­ bly faster than the launcher, both Michotte (1963) and Natsoulas (1961) found that the Page 9 of 32

Visual Impressions of Causality launching effect was replaced by the triggering effect. If the target moved much more slowly than the mover, Natsoulas (1961) found that the launching effect was replaced by an impression he called braking, in which the mover is perceived as producing the motion of the target, but the target’s motion seems to be held up or at least slower than expect­ ed.

Attention and Motion Context Michotte (1963) found that the launching effect did not occur in presentation of a typical launching stimulus under a variety of conditions. If the objects were very small (1 mm in width), if the observer was far away, or if a sheet of ground glass was placed between ob­ server and stimulus, the launching effect did not occur and was replaced by an impres­ sion of a single object in motion. In addition, if the observer fixated a point away from the point of contact of the objects, the impression again occurred that a single object passed over a stationary object, an impression that Scholl and Nakayama (2002) called “noncausal passing.” Scholl and Nakayama (2002) also found that the occurrence of the launching effect could be affected by the manipulation of a motion context. They present­ ed stimuli similar to typical launching effect stimuli, except that object A completely over­ lapped object B before object B started moving (full overlap stimulus). With this stimulus, some observers reported launching, and others reported non-causal passing. But if a typi­ cal launching stimulus was presented adjacent to and contemporaneous with the full overlap stimulus, the launching effect reliably occurred. Perception was affected by the motion context. Choi and Scholl (2004) presented stimuli in which the target was the lowest object in a column of four similar objects (context objects). The stimuli were full overlap stimuli, but the launching effect was more likely to be reported if the three context objects moved in the same way as the target than if they remained stationary or moved in the opposite di­ rection. This effect also occurred with a (p. 252) single context object, but was more likely to occur if the context object was close to object B than if it was more distant. Like Mi­ chotte (1963), Choi and Scholl found that the occurrence of the launching effect was af­ fected by manipulations of fixation point. Using a partial overlap test stimulus, they found that the launching effect was less likely to be reported if observers fixated either a near­ by non-causal full overlap stimulus or the space between the two stimuli. Choi and Scholl argued that similar motion properties led to perceptual grouping effects, and that the ob­ jects in a perceptual group were perceived as doing the same thing. Thus, if the motion context was single objects executing unbroken motion, the ambiguous stimulus was per­ ceptually grouped with the context objects and thereby was perceived as a single object executing unbroken motion passing over another stationary object, and not as the launch­ ing effect. They argued further that this was an effect of visual attention. Grouping cues influence automatic attentional processes, so that attention is allocated to the perceptual group, and thereby becomes focused on the collective motion of the group. These percep­ tual grouping effects can also be influenced by voluntary allocation of attention.

Page 10 of 32

Visual Impressions of Causality

Explanations for Visual Impressions of Causali­ ty Why should any visual impression of causality occur at all? Why not just perceive the ob­ ject properties and kinematics that are specified by the visual information entering the eye? There have been several attempts to account for causal impressions, but it has to be said that much more research will be necessary before we can have confidence in any of them. In this section I review the main contenders.

Ampliation of the Movement Michotte (1963) distinguished between movement and mere displacement or change in position. In any case where a visual causal impression occurs, the motion of the target is perceived as a continuation of the movement of the mover, which is perceptually indepen­ dent of the spatial displacement of the target. This transfer of the mover’s movement to the target is the ampliation of the movement, which Michotte (1963, p. 217) defined as follows: “ampliation of the movement is a process which consists in the dominant move­ ment, that of the active object, appearing to extend itself onto the passive object, while remaining distinct from the change in position which the latter undergoes in its own right.” In the case of launching, ampliation does not continue indefinitely. Michotte found evidence for what he called a radius of action, a point beyond which the target’s motion was perceived as its own and no longer that of the mover transferred onto it. Within the radius of action there is phenomenal duplication, which refers to the fact that the target’s motion is seen as its own and not its own: it is its own insofar as it is displaced, but not its own insofar as the motion is seen as the mover’s motion displaced onto the target. Be­ yond the radius of action, phenomenal duplication no longer occurs, and the target is per­ ceived as engaging in its own movement. The key to ampliation is kinematic integration, which refers to the fact that the motion se­ quence has a unified nature despite the distinct identities of the objects involved. Kine­ matic integration occurs when the stimulus has Gestalt properties. In the case of the launching effect, kinematic integration is explained by the Gestalt principle of good con­ tinuation. Good continuation refers to the perpetuation of the motion properties of the mover in the target, which means that motion continues without a break in space or time, and without change in its properties. Thus, the launching effect is predicted to occur when the mover contacts the target and, without delay, the target starts moving with the same speed and direction as the mover. Violation of any of these conditions should result in weakening or absence of the launching effect. Some of the findings on the conditions affecting the occurrence of the launching effect are consistent with the ampliation explanation. Most notably, good continuation requires no delay between the mover contacting the target and the target starting to move, and Michotte’s experiment described earlier shows that the launching effect is indeed very sensitive to temporal disruption. Michotte noted that good continuation should require no Page 11 of 32

Visual Impressions of Causality gap between the objects as well as no delay, and that his own experiments (as well as those of Yela, 1952) showed that the launching effect could occur even in the presence of substantial gaps. Michotte argued that good continuation could occur despite the pres­ ence of a gap, and supported this argument by reference to other research showing Gestalt phenomena in which spatially separated objects are perceptually grouped and perceived as a single whole. Other findings are more problematic, however. The occurrence of good continuation should not depend on the absolute speed of the objects (except (p. 253) where motion is so slow as to be imperceptible, or so fast that the object crosses the screen in too few frames for motion to be detected). Michotte (1963) varied the absolute speeds of both ob­ jects from 0.4 centimeters/second (cm/s) to 110 cm/s. He found that “[t]he most perfect impression of launching is given with speeds between 20 and 40 cm. per second” (p. 107). At the highest speed of 100 cm/s, he reported that the launching effect did not oc­ cur. At speeds below 20 cm/s, he reported that the impression was “slight and lacking in vigour” (p. 107), and at even lower speeds the launching effect essentially disappeared. The conditions for good continuation are met at all of these speeds, so the ampliation hy­ pothesis does not account for the observed variations in occurrence of the launching ef­ fect. The occurrence of good continuation depends on the target’s motion being in the same di­ rection as that of the mover. Michotte (1963) reported findings consistent with this. How­ ever, as discussed earlier, White (2012a) found that the effect of direction depended on the axis of motion of the mover relative to the geometric center of the target. With offcenter contacts, reported causal impressions were stronger when the direction of the target’s motion was objectively commensurate with the point at which it was contacted than when the target moved in the same direction as the mover. This disconfirms the pre­ diction based on the ampliation hypothesis. In the case of entraining, kinematic integration is explained by the Gestalt principle of common fate. This just means that the impression depends on the two objects sharing the same motion properties after coming into contact. Given Michotte’s argument that the presence of a gap between the two objects (i.e., between the point at which the mover stops moving and the initial location of the target) does not preclude good continuation, it seems likely that the presence of a gap would not preclude common fate. Thus, the pulling impression could be a product of kinematic integration because of common fate, even though the objects involve never come into contact, because they share the same motion properties once the target starts moving. The gap between the objects in the pulling impression, therefore, is not fatal to this application of the ampliation hypothesis. A critical feature of the ampliation hypothesis is that the object that is perceived as causal is the one that moves first. Michotte (1963) emphasized this point repeatedly. For example, he stated that ampliation meant “the creating or establishing of the extension on to a second object of the already existing movement of a first object, in such a way that this movement brings about the displacement of the second object” (p. 143). This proposi­ Page 12 of 32

Visual Impressions of Causality tion holds for both launching and entraining. In fact, Michotte (1963) wrote, “There are two sorts of extension—extension by prolongation and extension by fusion. There are no other alternatives. It follows that it is impossible in principle that there should be any types of causal impression other than launching and entraining” (p. 218). Given the forcefulness with which Michotte put the matter, any demonstration of a causal impression in which the object perceived as causal was not the one that moved first would have strong disconfirmatory import for the ampliation hypothesis. Such a demon­ stration has now been reported. White (2012b) presented a stimulus in which a black square moved toward a stationary white square, as in a typical launching stimulus. When the black square contacted the white square, both objects moved back in the direction from which the black square had come. If the two objects remained in contact during this second stage of motion, the predominant impression was that the black square was pulling the white square, though this depended on the speed with which the objects moved. If the black square moved at constant speed but the white square rapidly came to a halt, the impression reported by almost all participants was that the white square pushed or shoved the black square. In other words, the white square was perceived as causing the return motion of the black square. A schematic representation of this stimu­ lus is shown in Figure 14.7, which gives no real idea of the dramatic and strong impres­ sion that occurs when the stimulus is viewed. The study by White (2012b) showed that the kind of causal impression that occurred, and the identity of the causal object, depended not on which object moved first, but on whether the objects remained in contact or not after the black square contacted the white square. This shows that, in some stimuli at least, the kind of causal impression that oc­ curs is not fixed until it is clear whether the two objects are remaining in contact or not after contact has occurred. The causal impression is not fixed when a collison occurs; it is fixed only when the aftermath of the collision can be seen (Choi & Scholl, 2006). The main point for this section is that this finding disconfirms the ampliation hypothesis. At present, therefore, it appears unlikely that the ampliation hypothesis can be correct. (p. 254)

Page 13 of 32

Visual Impressions of Causality

Figure 14.7 Stimulus used in a study by White (2012b). In panel (a), the white square is stationary toward the right side of the frame and the black square is moving toward it as in a launching stimu­ lus. Panel (b) shows the frame in which the black disc contacts the white disc. At this point, both ob­ jects move toward the left side of the frame, initially at the same speed. The black disc moves at constant speed but the white disc rapidly slows to a halt. Pan­ el (c) shows a frame shortly after the commencement of leftward motion, with the white disc decelerating. Panel (d) shows a frame from a later point, where the white disc has stopped moving. The black disc con­ tinues to move until it has exited the frame.

Innate Structures or Modules Michotte (1963) claimed that the perceptual structures for the launching and entraining effects were innate. The main argument offered in support of this claim was that the two effects had some properties that Michotte called paradoxical, meaning that they were in­ consistent with the possibilities afforded by the laws of physics. I shall discuss three ex­ amples of supposedly paradoxical impressions. One supposedly paradoxical case concerns the relative speeds of the objects. Michotte found that the launching effect disppeared and was replaced by the triggering impression when the target’s speed was greater than that of the mover, and he noted that the causal impression was better when the efficacy of the cause, in terms of the outcome for the tar­ get, was less: that is, there was a relatively strong causal impression when the target moved more slowly than the mover and a relatively weak one when it moved faster, de­ spite the fact that the latter implied greater force exerted on the target. As I discussed earlier, the launching effect also weakens when the speeds of both objects become very fast, yet, other things being equal, greater speed implies greater force. Michotte regard­ Page 14 of 32

Visual Impressions of Causality ed this dissociation between the causal impression and the implied objective force as paradoxical and as supporting the case that “the causal impression is independent of ac­ quired knowledge of the movement of bodies” (p. 109). The main reply to this has been that Michotte’s knowledge of the movement of bodies was not entirely correct. In the typical launching stimulus, the mover stops at the point of im­ pact. Runeson (1983) pointed out that this would only happen in reality if the target was at least as massive as the mover. If the target was less massive than the mover, then the mover would continue to move with reduced speed after contact. If the target is at least as massive as the mover, then it cannot possibly move faster than the mover because that would violate the law of conservation of momentum. In that case, absence of a causal im­ pression with stimuli in which the target moves faster than the mover is therefore consis­ tent with the physical possibilities because this would only happen in reality if some addi­ tional force (such as an internal motor in the target) was operating on the target. The second supposedly paradoxical case concerns a stimulus in which both objects were moving before contact. The mover moves faster, so it catches up with the target and con­ tacts it, whereupon the target slows down. The mover either stops (launching) or contin­ ues at the same speed as the target (entraining). In both cases, Michotte reported that a causal impression occurred. He argued that this was paradoxical on the grounds that a moving object (p. 255) could not be slowed down by a forceful impact in the direction in which it was moving, so the occurrence of a causal impression with this physically impos­ sible stimulus is evidence that the causal impression is not learned. The problem with this case is that, in fact, moving objects can be slowed down by collisions. For example, one car crashes into the back of another moving car and the latter slows down because of the damage done in the impact (Runeson, 1983). Any impact that either deforms an object or damages its internal motor is likely to slow it down, so there are certainly opportunities for the visual system to acquire an impression of causality from such experiences. The third case concerns stimuli in which there is a gap between the target and the loca­ tion at which the mover stops moving. Michotte (1963) and Yela (1952) both found that the causal impression could occur, even with substantial and easily perceptible gaps. Yela (1952) argued that this finding was contradictory to everyday experience and therefore supported the hypothesis that the launching effect was not learned: “Everybody knows perfectly well that it is not sufficient to stretch one’s arm in the direction of a ball to put it in motion, or to hammer in the air in order to drive a nail” (p. 152). To this one could add that the pulling impression (White, 2010; White & Milne, 1997) could not be acquired from experience, because it is impossible for one object to pull another if there is no phys­ ical connection between them. The main reply to these arguments is that the gap be­ tween the objects is perceptually interpreted as a medium through which causal influ­ ence can be transmitted. I shall return to that issue in the section on the actions on ob­ jects hypothesis. Others have argued that causal impressions occur because of the operation of an innate visual mechanism, but on different grounds. Leslie and Keeble (1987) and Scholl and Page 15 of 32

Visual Impressions of Causality Tremoulet (2000) argued that there is an innate module for the launching effect: the mod­ ule is brought into operation by definable stimulus conditions, and the causal impression occurs when it operates. Modules have three distinctive characteristics. They are encap­ sulated, which means that external processes and knowledge do not influence their oper­ ation: for example, explicit knowledge that one object cannot cause the motion of another object if they do not come into contact does not have any effect on the occurrence of the launching effect with gap stimuli. Second, modules are opaque, meaning that the prod­ ucts of modular processing may be available to other processes, but the operation of the module is not accessible. Third, modules operate automatically, which means that they are rapid and not bound by the limited capacity of working memory (Moors & De Houwer, 2006). All three characteristics can be discerned in the launching effect. Acquired knowl­ edge, for example of the laws of physics, does not affect the occurrence of the launching effect. Some authors have claimed that the occurrence of the launching effect can be modified by experience, such as prior experience with similar stimuli, or just overexpo­ sure to typical launching stimuli (Falmier & Young, 2008; Powesland, 1959; Schlottmann, 2000; Schlottmann & Anderson, 1993; Schlottmann et al., 2006) but, as Scholl and Tremoulet (2000) argued, these findings can be interpreted as response biases, in other words, as effects on how people make overt responses about what they perceive, rather than effects on the perceptual impressions themselves. Effects of fatigue and attention may also be involved (Choi & Scholl, 2004). Scholl and Tremoulet (2000) pointed out that a module need not be innate. Indeed, ac­ quired automatic perceptual processes can have all the characteristics that Scholl and Tremoulet attributed to modules. Nevertheless, if there is an innate module for the launching effect, one would expect it to be operative early in infancy, and a good deal of developmental research has been carried out to test this prediction. Young infants cannot report their perceptual impressions. Infant researchers have used methods such as dishabituation methods, in which infants are first habituated to one stimulus and then are presented with another to see if they respond with surprise, usually measured by recording how long the infant looks at the stimulus. Surprise would indicate that they perceive the new stimulus as different from the one to which they were habitu­ ated. Several studies found evidence consistent with the hypothesis that infants as young as 6 months have visual causal impressions (Leslie, 1982; Leslie & Keeble, 1987; Newman, Choi, Wynn, & Scholl, 2008; Muentener & Bonawitz, Chapter 33 in this volume). The key to a valid test lies in separating the causal impression from all other properties of the stimuli to which infants might respond, such as spatiotemporal and kinematic properties. Leslie and Keeble (1987) achieved this by a simple reversal of a launching stimulus. They argued that spatiotemporal and kinematic properties (except for the reversal of direction) were all preserved in both the forward and reverse presentations, but the causal impres­ sion (p. 256) would be reversed because the object perceived as causal in one version would be perceived as the target in the reverse version, and the object perceived as the

Page 16 of 32

Visual Impressions of Causality target in the former would be perceived as causal in the latter. They found dishabituation to the reversal stimulus, and argued that this result supported the module hypothesis. Desrochers (1999) found no evidence for causal impressions in infants aged between 3 and 4 months. Other studies have found evidence for developmental trends (Cohen & Am­ sel, 1998; Cohen, Amsel, Redford, & Casasola, 1998; Oakes, 1994; Oakes & Cohen, 1990). Cohen and Amsel (1998) found evidence consistent with the occurrence of a causal im­ pression in infants aged 6.25 months but not in younger infants. Infants at 5.5 months seemed to be sensitive to spatial and temporal contiguity but not to causality, and infants at 4 months seemed to be sensitive to whether motion was continuous or discontinuous, but again not to causality. These developmental trends are consistent with the hypothesis that causal impressions owe something to experience (Saxe & Carey, 2006). However, as Newman et al. (2008) pointed out, they are also consistent with the hypothesis of an innate module that under­ goes maturation during development. The only way to be sure of the existence of an in­ nate module is to find evidence for it operating at birth, before significant learning could have occurred. Mascalzoni, Regolin, Vallortigara, and Simion (2013) reported evidence for visual causal impressions in infants aged only 2 days. If this is valid, then the case for some form of innate visual causal perception would be confirmed beyond reasonable doubt, because there is virtually no chance of suitable and adequate learning occurring in the first 2 days of life. The evidence is not yet conclusive, however: to cut a long critical analysis short, there are additional controls that need to be run to rule out alternative possible interpretations. If there is innate visual perception of causality, that does not rule out the possibility of further development, through either maturation or learning, after birth (Newman et al., 2008; Muentener & Bonawitz, Chapter 33 in this volume). There is one more conundrum for the hypothesis of an innate module. Why should it oper­ ate in just the way that it does? If we do not have direct perception of something that is out there in reality, how can a module that gives us causality as a perceptual construct be acquired? We should just perceive static object properties such as shape, and kinematic properties. How can our visual system go beyond that at all, and how does it end up just where it does? Bear in mind that both the launching and pulling impressions can occur with moving objects that do not come into contact, which contradicts the laws of mechan­ ics. And causal impressions occur when no causality is there, as in the animated stimulus presentations used in research. So, whatever we have and however it is acquired, it does not seem to be very accurate. If it were, physics would probably be a lot easier to learn (diSessa, 1982, 1993). There are many possible modules that could generate something that goes beyond mere kinematics in a way that is not very accurate, so why do we have just the ones that we do have? We do not yet have the answers to these questions, but in the remainder of the chapter I hope to go a little further down the road toward them.

Page 17 of 32

Visual Impressions of Causality

The Actions on Objects Hypothesis There is one kind of causation with which all of us are very familiar, and that is acting on objects to move them, alter their properties, or manipulate them. I have proposed that ex­ periences of acting on objects yield a kind of knowledge that can be used to interpret vi­ sually perceived events (White, 2009, 2012c, 2014b). When someone acts on an object, let us say pushing it away from her, two important kinds of information about that action are available. One is an internal model of the action itself and its anticipated sensory consequences (Blakemore, 2003; Blakemore, Frith, & Wolpert, 1999; Blakemore, Wolpert, & Frith, 2002; Desmurget & Sirigu, 2009; Frith, 2002; Hag­ gard, 2005; Wolpert & Flanagan, 2001; Wolpert, Ghahramani, & Jordan, 1995). The model of the action is a copy of the planned motor commands, constructed slightly in advance of the actual execution of the action. Since any action involves exchange of forces with something in the environment, the model specifies forces exerted in the action and the anticipated forces that would be exerted by the object being acted on. To illustrate, if we go to pick up an object resting on a table, we have to exert sufficient grip force that the object will not slip out of our hands, and this means coping with changes in the load as the object is manipulated. Flanagan and Wing (1997) showed that grip force is modified in anticipation of the changing inertial load of the object. We do receive sensory feedback about the object, but processing time for this feedback is too long to account for the re­ sults of their study. This is evidence that grip force is determined by an internal model that predicts the inertial loads that will be encountered in lifting. The other kind of information that is available is sensory information about the ac­ tual execution of the action. This can be compared with an internal model of the anticipat­ (p. 257)

ed sensory consequences of the action, which is useful for error correction (e.g., Fourneret & Jeannerod, 1998). Although visual information is used for this purpose, the main sensory channel connected with actions on objects is the mechanoreceptor system (Gibson, 1966; White, 2009, 2012c). There are two main components to the mechanore­ ceptor system. One is skin pressure sensors, which register deformation of the skin due to contact forces. The other is that part of the proprioceptive system that registers limb position and movement, generally termed kinaesthesis. In a recent review of research on proprioception, Proske and Gandevia (2012) concluded that kinaesthesis is mediated by sensors in joints and muscle spindles, with the muscle spindle receptors playing the prin­ cipal role. Through the mechanoreceptor system, the brain receives sensory input about exertion, changes in limb position, and resistance exerted by objects acted on. The combination of the motor control model and the mechanoreceptor system means that we have a great deal of information about our own actions and their consequences. Of particular relevance in the present context, we have abundant information about the forces we exert on objects, and how those change from moment to moment, and about the forces that objects exert on us. We experience ourselves as doing things to objects—in other words, as causing outcomes for objects acted on—because of the availability of in­ formation about both output (motor commands) and feedback (information from the Page 18 of 32

Visual Impressions of Causality mechanoreceptor system). Experiences of actions on objects that incorporate that infor­ mation are stored in long-term memory, subject to forgetting. The hypothesis is that those stored experiences are available to automatic perceptual processing, and are the source of visual impressions of causality (White, 2009, 2012c). When we observe an interaction between objects, such as a moving billiard ball contact­ ing a stationary one and the stationary one then moving away, the visual system receives and processes information about the shapes and other visible properties of the objects, and kinematic information. The kind of information that the motor control model and the mechanoreceptor system provide about exertion and forces is missing. That kind of infor­ mation relates primarily to kinematics: there are law-like relations between kinematics and forces, and incidental properties of objects such as color do not partake in those rela­ tions. So the kinematic information in the visual input is matched to stored representa­ tions of actions on objects that have similar kinematic properties. The information about forces in the matched representations is activated and thereby forms part of the percep­ tual interpretation of the visually perceived event. That is, in part, the causal impression. Visual impressions of forces are also generated by that process (White, 2009), but the causal impression is specifically the impression of efficacious action, incorporating infor­ mation about motor output and mechanoreceptor feedback. The idea that perceptual interpretation of interactions between objects involves filling in information that is not available in visual input can account for some of the conundrums of causal impressions discussed earlier. Yela (1952), commenting on the occurrence of causal impressions with gaps between the location of the target and the stopping point of the mover, argued that it was impossible for an object to produce an outcome in an object with which it did not come into contact. But it is possible for force to be transmitted through an intervening medium. In the executive toy known as Newton’s cradle, one ob­ ject makes another one move despite never contacting it because force is transmitted through a series of intermediate objects. The kinematics of the gap stimulus match ac­ tions in which objects are moved by acting on physical intermediaries; therefore the per­ ceptual interpretation involves filling in the gap with an unseen intermediary through which force is transmitted. The pulling impression (White, 2010; White & Milne, 1997) oc­ curs despite the lack of visible connection between the objects. The kinematics of the stimulus match those of real pulling events in which there are physical connections be­ tween the objects (such as string or chain), and so the perceptual interpretation involves filling in the gap with an unseen physical connection through which pulling is effected. Thus, the causal impression in the launching effect occurs because the kinematic proper­ ties of the stimulus are matched to a stored representation of an action on an object. If no such match can be made, then no causal impression occurs. In principle, this enables pre­ dictions to be generated concerning when causal impressions should and should not oc­ cur. In practice, it is difficult to do this because the range of actions on objects that can give rise to stored experiences has to be ascertained empirically. Although actions on ob­ jects conform to (p. 258) the laws of mechanics, they are not classical momentary colli­ sions. Usually, in an action on an object, contact occurs for an extended period of time, Page 19 of 32

Visual Impressions of Causality and this enables actions that are in effect manipulative, producing a wide range of possi­ ble outcomes. What can be predicted, however, is that causal impressions should have ac­ tion-like qualities that can be detected. I shall discuss three examples. The first concerns the stimuli used in the study by White (2012b), shown in Figure 14.7. The black object contacts the initially stationary white object; both then move back in the direction from which the black object came, but the white object rapidly decelerates to a standstill. If one imagines the objects instantiated as inanimate billiard balls, it is immedi­ ately obvious that the motion of the objects after contact cannot occur with such objects. The kinematic features of the stimulus do, however, match a particular kind of action, which might be more properly described as a reaction: for example, something contacts one’s skin and one moves to dislodge it or push it away. So the causality and forces speci­ fied by the stored representations of such reaction events become part of the perceptual interpretation of the stimulus. The second example concerns how the motion of the mover in launching and entraining stimuli is perceived. Michotte (1963) noted that the mover is perceived as approaching the target, as if making for it deliberately. This can be contrasted with a stimulus in which the mover did not move but lengthened horizontally in both directions until its leading edge contacted the target. Observers tended to report that the contact of the mover with the target was accidental, and lacked the phenomenal quality of approaching (Michotte, 1963). Considered objectively, the mover in launching and entraining stimuli could move in any direction along any plane relative to the target. The fact that it moves toward the target in a plane of motion that brings it into contact with the target is a considerable co­ incidence. The visual system does not like coincidences and tends to interpret them as connected in some way (Choi & Scholl, 2004). In this case, the interpretation is that the mover’s motion toward the target is not accidental, in other words, it is making for the target. This is an indication that the mover is perceived as engaged in an action involving approach and contact. There is no indication that the mover is identified as an animate being with mental states, like a human being, only that it has the capability of directing its movement toward a target. This is very limited animacy, but it is also very different from inanimate motion. The behavior of the mover is therefore perceived as action, as the actions on objects hypothesis would predict. The third example concerns entraining. If the objects involved in an entraining stimulus are inanimate, the stimulus represents a physical impossibility. In the typical stimulus, the two objects move after contact with the same speed that the mover had prior to con­ tact. This could happen only if the target had negligible mass relative to the mover, none of the energy in the collision was dissipated in either deformation of the objects or heat, and no frictional forces were operating. The closest approach to an entraining stimulus involving objects of comparable size would be a billiard ball rolling down an incline (so that gravitational acceleration would counter frictional slowing) into a feather. In that case, it is likely that the momentary deceleration of the ball at contact would be too small to be perceived. That is, however, a very rare and unusual occurrence, and in the typical stimulus for entraining the objects move horizontally, in which case reduction in velocity Page 20 of 32

Visual Impressions of Causality due to frictional forces should be perceptible (Rosenbaum, 1975). Because of this, the mover in the entraining stimulus can only be perceived as an object with an internal mo­ tor that maintains its velocity at a constant value. An animate being is a plausible candi­ date for such an object. It is very likely, therefore, that the stimuli for launching and entraining, and for any kind of causal impression in which the mover initially moves toward the target, are perceived as actions on objects. This does not mean that the mover is ascribed all sorts of mental properties that humans are believed to have, only that its perceived behavior is perceptu­ ally interpreted as action-like and directed at the mover. This is consistent with the ac­ tions on objects hypothesis. One important implication of the actions on objects hypothesis is that visual causal im­ pressions are perceptual interpretations constructed by the use of existing information. As such, they are akin to many phenomena of perception where incomplete or ambiguous stimuli are perceptually completed by the activation of pre-existing information. Such processes of perceptual interpretation have some of the characteristics of modules: they are encapsulated, opaque, and automatic. But they are subject to learning, and they are not divorced from other processes. If this is the case for visual causal impressions, it should be possible to find evidence that visual causal impressions influence other auto­ matic perceptual interpretation. A study by (p. 259) Kim, Feldman, and Singh (2013) has found exactly that. In brief, they set up a stimulus with two spatially separated objects flashing on and off out of phase in a manner that would give rise to an interpretation in terms of apparent motion—that is, a single object moving back and forth between the two visible objects. Kim et al. found that the perceived trajectory of apparent motion depend­ ed on the direction of motion of two targets, one adjacent to each of the flashing objects. These targets would move either horizontally or vertically. The apparent motion between the flashing objects was perceived in a trajectory suitable for producing whichever direc­ tion of motion of the targets was observed. This result is hard to explain under the innate module hypothesis, because the trajectory of the mover does not form part of the percep­ tual interpretation of the stimulus at all until after the visual impression of causality oc­ curs. This means that the motion cues thought to be necessary to trigger the causality module are absent. Instead, it fits with the hypothesis that perceptual interpretation in terms of causality is integrated with other perceptual interpretation—just part of what goes on in perception, influenced by and influencing other parts of it.

Forces Applied to the Skin Wolff and Shepard (2013) have proposed that humans pick up an understanding of forces in interactions between objects through the skin, specifically through perception of other objects acting on the skin (see also Johnson & Ahn, Chapter 8 in this volume). They ar­ gued, first, that both the strength and direction of force applied to the skin can be accu­ rately judged (Jones & Piateski, 2006; Panarese & Edin, 2011; Wheat, Salo, & Goodwin, 2004) and that information about forces is stored and used in subsequent actions (e.g., Page 21 of 32

Visual Impressions of Causality Davidson & Wolpert, 2004). They discussed evidence that forces are automatically judged not only when applied directly to the judge’s body, but also when the judge observes ob­ jects being applied to the bodies of other people (e.g., Keysers, Wicker, Gazzola, Anton, Fogassi, & Gallese, 2004). Having made a case for experience, storage, and use of force information, Wolff and Shepard then argued that visual impressions of causality are mediated by perception of forces; that is, observation of a launching stimulus gives rise to perceptual interpretation in terms of forces, and that leads to the causal impression. Given the accuracy with which contact forces are detected, this argument requires two things. One is that the causal im­ pressions that occur in viewing stimuli such as launching stimuli should be reasonably ac­ curate representations of the objective forces. The other is that causal impressions should closely resemble perceived forces. For example, both should be influenced in similar ways by the manipulation of stimulus properties. Wolff and Shepard analyzed the results of experiments on causal impressions and found good correspondence with objective forces. For example, they commented on the study by White (2012b) in which direction and point of contact were manipulated, that the causal impression was strongest for combinations of direction and point of contact that corresponded most closely to what would actually occur with spherical objects. They also pointed out that the strong effect of short delays on the causal impression (Michotte, 1963) is consistent with the general truth about mechanical interactions that forces oper­ ate instantaneously at contact. And, as I discussed earlier, the cases that Michotte (1963) regarded as paradoxical are not in fact inconsistent with the physical possibilities allowed by the laws of mechanics. At present, therefore, there is no evidence of inconsistency be­ tween variation in occurrence of the causal impression and objective analyses of forces. The relation between perceived forces and causal impressions is more problematic. Forces are perceived, and occur, in many interactions where no causal impression occurs. This issue will be addressed more extensively in the next section, but let me illustrate the problem with one example. Michotte (1963) studied stimuli in which an object moved to contact another object and then reversed direction, while the stationary object remained stationary. With this kind of stimulus, people do perceive forces operating (White, 2009), especially if the moving object fragments when it reverses (White & Milne, 1999). Howev­ er, no causal impression occurs (Michotte, 1963; White, 2014b). In addition, there is now evidence that visual impressions of force and visual impressions of causality are influ­ enced in different ways by manipulations of stimulus properties (White, 2014a). For ex­ ample, White (2014a) replicated the study manipulating point of contact and direction in the launching effect, but asked participants to report impressions of force. The results were completely different. The direction of the target’s motion had an effect on the force impression that was opposite to that found for the causal impression. Most important, while the causal impression is roughly attuned to objective

(p. 260)

relations between

point of contact and direction (White, 2012b), the force impression was not.

Page 22 of 32

Visual Impressions of Causality At present, therefore, the hypothesis that causal impressions depend on perceived forces is not strongly supported. Causal impressions do not occur in some interactions where forces are both perceived and believed to be operating, and effects of manipulations of stimulus properties differ considerably between force impressions and causal impres­ sions. Wolff and Shepard (2013) ran a series of experiments in which they repeatedly presented launching stimuli and then applied very weak forces to the participant’s skin. They showed that the threshold for detection of these contact forces was reduced by exposure to launching stimuli. The effect was not found with various control stimuli, so the results support the contention that participants have perceptual impressions of forces when ob­ serving launching stimuli that have a kind of priming effect on detection of contact forces. This demonstrates an important link between visual perception and propriocep­ tive detection of forces. Whether proprioceptive detection of contact forces can account for visual impressions of causality, as opposed to visual impressions of forces, depends on further clarification of the relationship between visual impressions of causality and of forces.

Causality and the Laws of Physics Suppes (1970) claimed that “discussions of causality are now very much a part of contem­ porary physics” (p. 6). Although he did not carry out a statistical survey, he reported that “[t]here is scarcely an issue of Physical Review that does not contain at least one article using either ‘cause’ or ‘causality’ in its title” (p. 5), and he provided two examples. Forty years later, things have changed. I sampled all 219 titles of journal articles in the January 2010 issue of Physical Review and I found no uses of “cause” or “causality.” There were, on the other hand, 11 references to dynamics/dynamical, 10 to interactions/interacting, 6 to collisions, 3 to forces, 2 uses of the term “deterministic,” and 1 of momentum, along with frequent uses of what might be called causal verbs—transitive verbs describing what one object or system does to another (e.g., compressing, splitting, generating, impact-in­ duced ionizing, and stimulating). What has happened to causality in physics? Much of the content of articles in Physical Re­ view is at a low level of description, such as the subatomic level. Macroscopic processes, which are of most relevance to causal impressions, include interactions between objects that are dealt with by the laws of mechanics (diSessa, 1993; Todhunter, 1887). These ap­ ply to the kinds of interactions on which stimuli used in experiments on causal impres­ sions are modeled. I use the term “interaction” because that is how contact events are represented in the laws of mechanics. The law that is most relevant for present purposes is Newton’s third law of motion (Tod­ hunter, 1887), which states that, in a contact event involving two objects, A and B, the force exerted by A on B is equal and opposite to the force exerted by B on A. Thus, in a real-world version of a typical launching stimulus (such as a moving billiard ball contact­ ing a stationary one), the force exerted by the target on the mover is equal and opposite Page 23 of 32

Visual Impressions of Causality to the force exerted by the mover on the target. It is as true to say that the target makes the mover stop moving as it is to say that the mover makes the target start moving. Yet, in studies that have asked participants to give free verbal descriptions of their perceptions (Michotte, 1963; Schlottmann et al., 2006; White, 2012b), there is no record of a partici­ pant reporting an impression that the target makes the mover stop moving. The impres­ sions reported are all in one and the same direction: the mover is perceived as producing an outcome for the target, but not vice versa. This tendency is not immutable and has been found to vary to some extent, depending on features such as the temporal order in which the two objects become visible and the interpretation of prior motion as either selfpropelled or not self-propelled (Mayrhofer & Waldmann, 2014), but it seems to hold strongly for standard launching stimuli. This means that the perceptual impressions of causality that occur do not match the phys­ ical reality of contact events as described by the laws of mechanics. Equally worryingly, it implies that causality itself is not a feature of physical reality. Whatever else may be true of them, it is certainly the case that the causal impressions discussed in this chapter are all impressions in which one object is perceived as acting on another, and little or nothing is perceived in the latter as acting back on the former. This is not what happens in real contact events. As I said at the start of the chapter, it might be going too far to call causal impressions il­ lusions, but they are no more than constructs of the visual system that misrepresent the physics of interactions between objects. The specific misrepresentation, as I have said, is to see an interaction between objects as one-way, with object A producing some outcome (p. 261) for object B, but B not producing any outcome for A. This raises two important questions. The first is whether we perceive all interactions between objects in this way or not, and the second is whether this has implications for the understanding of causality in general. The answer to the first question is no, because causal impressions simply do not occur in many cases where interactions between objects are going on. In many static situations, objects are exerting forces on each other. When a book is sitting on a bookshelf, for exam­ ple, the book is exerting force on the bookshelf, because of its mass and the Earth’s gravi­ ty, and the bookshelf is exerting an equal and opposite force on the book. Obviously we do not perceive forces or causality in such cases, and it is likely that we have little under­ standing of the extent to which forces are operating in static situations in real life. There are also many interactions between moving objects where causal impressions do not oc­ cur. An example would be a rebound event, such as a ball bouncing off a floor. Michotte (1963) reported that no causal impression occurred with such events and that there was, at most, an impression of activity. But rebound events are interactions: Newton’s third law applies, and the ball exerts as much force on the floor as the force exerts on the ball. Another example would be a collision between two objects, both of which are in motion prior to the collision. If one object stops at contact and the other reverses direction, then Michotte (1963) reported that the launching effect occurs, but if both objects reverse di­ rection at contact, then no causal impression occurs. Yet both are interactions involving Page 24 of 32

Visual Impressions of Causality the operation of forces. More research needs to be done on contact events that do not give rise to causal impressions, but it is clear that they are not rare. The answer to the second question is yes. We have a tendency to understand physical sys­ tems as collections of causal relations, each of which involves influence going just one way, from cause to effect. This is often mistaken. In some cases a one-way causal under­ standing can be justified. The level of description that is often used of physical systems can encompass many individual events, some of which are connected in causal chains. Take a simple causal chain in which there is an interaction between A and B, and then an interaction between B and C. Under some circumstances, it is legitimate to talk of A af­ fecting C. In a fairly literal example, a pool player uses a cue (A) to strike a ball (B), which then moves off and contacts another ball (C), which then moves off in turn. For each indi­ vidual component of the chain, Newton’s third law applies. A and B exert equal and oppo­ site forces on each other, and so do B and C. But C exerts no force or influence on A. So in that case it may be legitimate to regard the causal connection between A and C as going just one way, even though that is not true of any of the individual interactions of which the chain is constituted. Thus, in the case of smoking causing cancer, the actual process involves multitudes of individual interactions, some of which are organized in chains. There is no opportunity for lung tissue to affect the cigarette that started the physiologi­ cal chain connecting them. In that case also, it is legitimate to regard causality as going one way. For this and other reasons, it is going too far to say that causality is just an illusion. Even so, if the world as we experience it has already been interpreted in terms of causal im­ pressions, and those causal impressions are constructs that do not accurately reflect the laws of mechanics, that must have profound implications for the way we think about causality. There is already evidence for this. A bias in the direction of treating interac­ tions as causal relations in which influence passes just one way can be found in research on naïve physics (diSessa, 1982, 1983, 1993) and in the expression of causal concepts in language (Talmy, 1988; see also Johnson & Ahn, Chapter 8 in this volume). The clearest example is a study by diSessa (1982) in which participants interacted with a computergenerated object that behaved in accordance with Newton’s laws of motion. In the ab­ sence of force applied to it, the object moved in a straight line at constant speed. When asked to control the motion of the object by applying simulated pushes to it, participants consistently erred by expecting that the object would move in the direction in which it was pushed, thereby failing to take into account the object’s momentum. In effect, they expected the cause (the push) to be the sole determinant of the outcome, and neglected interactionally relevant properties of the object. In White (2006) I argued that the same bias can be found in research on causal judgment from contingency information. Consider a typical scenario used in a causal judgment study by Dennis and Ahn (2001). Participants were asked to judge the extent to which a kind of plant ingested by a patient caused a kind of physical reaction. This question im­ plies a one-way conceptualization of causality: the plant does something to the patient (or not, as the case may be), but nothing else is involved. This is clearly incorrect: the anato­ Page 25 of 32

Visual Impressions of Causality my and physiology of patient (p. 262) must be involved as well. If the plant was ingested by a caterpillar or by a koala, there is no reason to expect that the same outcome would occur. In fact, I would argue that at least three things are involved in every interaction: the (supposed) source of causality (the plant in the scenario used by Dennis and Ahn); the (supposed) effect entity (the patient in the plant scenario); and the contact or interaction between the source of causality and the effect entity (ingestion in the plant scenario). (This relates to the causal powers theory [White, 1989], discussed by Johnson & Ahn, Chapter 8 in this volume.) This is the case in billiard ball collisions, where the moving ball, the stationary ball, and the occurrence of contact between them all interact to deter­ mine the outcome. As Runeson (1983) pointed out, the interaction can be broken down even further without leaving the macroscopic world: force, mass, momentum, deformabil­ ity, frictional forces, gravity, and more are all involved. The tripartite role differentiation between one object, the other object, and contact between them is still a convenient sim­ plification, but it serves to show how much is neglected by a conceptualization in which a cause just produces an outcome. Other chapters in this book outline models and hypotheses in which empirical informa­ tion, usually contingency information, is the basic form of information for causal learning (e.g., in this volume, Le Pelley, Griffiths, & Beesley, Chapter 2; Perales, Catena, Cándido, & Maldonado; Boddez, De Houwer, & Beckers, Chapter 4; Cheng & Lu, Chapter 5). The problem with that hypothesis is that, before contingency can be detected, the relevant events must be perceived and perceptually interpreted. That means, first, that different events must be recognized as being of the same kind: in the example from Dennis and Ahn (2001), different instances must be recognized as involving the same kind of plant and not other kinds of plants, for example. Second, events in the world are perceived as involving causality before contingency detection can get to work on them. How can con­ tingency detection be the foundation of causal learning if the very events that are input to contingency detection processes have already been perceptually interpreted in terms of causality? Our whole understanding of causality as involving one-way influence, which is implicit and essential to contingency detection models, is given to us by visual percep­ tion, and it is incorrect. It may, of course, be the case that some forms of causal reasoning are best analyzed with covariation-based accounts; some of these, such as abstract rea­ soning about relations between genes and diseases, are far removed from perceptual da­ ta. Nevertheless, the raw data for such analyses are the stored products of perceptual processing, with all that entails about implicit categorization and causal interpretation. So it can be argued that perceptual processing is the basic form of causal learning.

Conclusions How easy is it to think of cases of inanimate physical causality where neither object is an artifact made by humans? Billiard ball collisions are excluded because billiard balls are made by humans. I asked a colleague and he suggested rocks rolling down a mountain. I pointed out that bouncing and rebounding were not good examples of causality (as Mi­ chotte, 1963, argued) because they did not involve one thing doing something to another Page 26 of 32

Visual Impressions of Causality thing, with perceptible outcome for that other thing. He then suggested that rocks could knock into each other on their way down the hill. There are physical processes such as earthquakes and volcanoes where solid objects are moved by forces exerted on them by inanimate objects. There are atmospheric processes that could be regarded as involving causality, such as raindrops bending leaves and tornadoes damaging or moving objects. Rivers carry floating debris to the sea. But all of these either involve liquids or gases or are very rare. If we did have an innate perceptual mechanism for perceiving the effects of liquids such as rivers, it is not likely that it would generate a causal impression in percep­ tion of a launching effect stimulus. On the other hand, we do frequently experience causality (with suitable caveats about what causality really is) when acting on objects. On an evolutionary time scale, it is im­ portant that we have fairly accurate perceptions of efficacious and productive interac­ tions with objects under many circumstances. We are almost always in contact with either surfaces (when sitting in a chair or walking down a street, for example) or manipulable objects. The most basic competence at moving around in the physical world requires us to have not just motor abilities but also sensory equipment that orients us in reality and that tells us whether we are getting the desired outcome of our actions. As a tool-using species, we also have specialized requirements for precision interactions with objects, such as are used in flint-knapping, for example. It would make sense if we had innate per­ ceptual mechanisms that were adapted to create (p. 263) causal impressions in such ac­ tions because it is important for us to see exactly what we are doing to the object in ques­ tion. It would also make sense if we had an innate predisposition to acquire such percep­ tual mechanisms through experience, and indeed that would make more sense because it would make us more adaptable, provide the capability to learn new motor skills in inter­ actions with objects, and so on. Whether innate, acquired from experience, or some com­ bination of the two, it is very likely that visual causal impressions owe much to our inter­ actions with objects. We do not perceive causality as it is, but as it appears in the experi­ ence of efficacious action.

References Beasley, N. (1968). The extent of individual differences in the perception of causality. Canadian Journal of Psychology, 22, 399–407. Blakemore, S.-J. (2003). Deluding the motor system. Consciousness and Cognition, 12, 647–655. Blakemore, S.-J., Boyer, P., Pachot-Clouard, M., Meltzoff, A., Segebarth, C., & Decety, J. (2003). The detection of contingency and animacy from simple animations in the human brain. Cerebral Cortex, 13, 837–844. Blakemore, S.-J., Frith, C. D., & Wolpert, D. M. (1999). Spatio-temporal prediction modu­ lates the perception of self-produced stimuli. Journal of Cognitive Neuroscience, 11, 551– 559.

Page 27 of 32

Visual Impressions of Causality Blakemore, S.-J., Wolpert, D. M., & Frith, C. D. (2002). Abnormalities in the awareness of action. Trends in Cognitive Sciences, 6, 237–242. Boyle, D. G. (1960). A contribution to the study of phenomenal causation. Quarterly Jour­ nal of Experimental Psychology, 12, 171–179. Choi, H., & Scholl, B. J. (2004). Effects of grouping and attention on perception of causali­ ty. Perception and Psychophysics, 66, 926–942. Choi, H., & Scholl, B. J. (2006). Perceiving causality after the fact: Postdiction in the tem­ poral dynamics of causal perception. Perception, 35, 385–399. Cohen, L. B., & Amsel, G. (1998). Precursors to infants’ perception of the causality of a simple event. Infant Behavior and Development, 21, 713–732. Cohen, L. B., Amsel, G., Redford, M. A., & Casasola, M. (1998). The development of infant causal perception. In A. Slater (Ed.), Perceptual development: Visual, auditory, and speech perception in infancy (pp. 167–209). Hove, East Sussex: Psychology Press. Davidson, P. R., & Wolpert, D. M. (2004). Internal models underlying grasp can be addi­ tively combined. Experimental Brain Research, 155, 334–340. Dennis, M. J., & Ahn, W.-K. (2001). Primacy in causal strength judgments: the effect of ini­ tial evidence for generative versus inhibitory relationships. Memory and Cognition, 29, 152–164. Desmurget, M., & Sirigu, A. (2009). A parietal-premotor network for movement intention and motor awareness. Trends in Cognitive Sciences, 13, 411–419. Desrochers, S. (1999). The infant processing of causal and noncausal events at 3.5 months of age. Journal of Genetic Psychology, 160, 294–302. diSessa, A. A. (1982). Unlearning Aristotelian physics: A study of knowledge-based learn­ ing. Cognitive Science, 6, 37–75. diSessa, A. A. (1983). Phenomenology and the evolution of intuition. In D. Gentner & A. L. Stevens (Eds.), Mental Models (pp. 15–33). Hillsdale, NJ: Lawrence Erlbaum Associates. diSessa, A. A. (1993). Toward an epistemology of physics. Cognition and Instruction, 10, 105–225. Falmier, O., & Young, M. E. (2008). The impact of object animacy on the appraisal of causality. American Journal of Psychology, 121, 473–500. Flanagan, J. R., & Wing, A. M. (1997). The role of internal models in planning and control: evidence from grip force adjustments during movements of hand-held loads. Journal of Neuroscience, 15, 1519–1528.

Page 28 of 32

Visual Impressions of Causality Fonlupt, P. (2003). Perception and judgment of physical causality involve different brain structures. Cognitive Brain Research, 17, 248–254. Fourneret, P., & Jeannerod, M. (1998). Limited conscious monitoring of motor perfor­ mance in normal subjects. Neuropsychologia, 36, 1133–1140. Frith, C. (2002). Attention to action and awareness of other minds. Consciousness and Cognition, 11, 481–487. Fugelsang, J. A., Roser, M. E., Corballis, P. M., Gazzaniga, M. S., & Dunbar, K. N. (2005). Brain mechanisms underlying perceptual causality. Cognitive Brain Research, 24, 41–47. Gibson, J. J. (1966). The senses considered as perceptual systems. Boston: Houghton Mif­ flin. Gordon, I. E., Day, R. H., & Stecher, E. J. (1990). Perceived causality occurs with strobo­ scopic movement of one or both stimulus elements. Perception, 19, 17–20. Haggard, P. (2005). Conscious intention and motor cognition. Trends in Cognitive Sciences, 9, 290–295. Hubbard, T. L. (2013a). Phenomenal causality I: Varieties and variables. Axiomathes, 23, 1–42. Hubbard, T. L. (2013b). Phenomenal causality II: integration and implication. Axiomathes, 23, 485–524. Jones, L. A., & Piateski, E. (2006). Contribution of tactile feedback from the hand to the perception of force. Experimental Brain Research, 168, 298–302. Joynson, R. B. (1971). Michotte’s experimental methods. British Journal of Psychology, 62, 293–302. Keysers, C., Wicker, B., Gazzola, V., Anton, J., Fogassi, L., & Gallese, V. (2004). A touching sight: SII/PV activation during the observation and experience of touch. Neuron, 42, 335– 346. Kim, S.-H., Feldman, J. & Singh, M. (2013). Perceived causality can alter the perceived trajectory of apparent motion. Psychological Science, 24, 575–582 Leslie, A. M. (1982). The perception of causality in infants. Perception, 11, 173–186. Leslie, A. M., and Keeble, S. (1987). Do six-month-old infants perceive causality? Cogni­ tion, 25, 265–288. Mascalzoni, E., Regolin, L., Vallortigara, G., & Simion, F. (2013). The cradle of causal rea­ soning: newborns’ preference for physical causality. Developmental Science, 16, 327–335. Michotte, A. (1963). The perception of causality. New York: Basic Books. Page 29 of 32

Visual Impressions of Causality Moors, A., & De Houwer, J. D. (2006). Automaticity: A theoretical and conceptual analysis. Psychological Bulletin, 132, 297–326. Natsoulas, T. (1961). Principles of momentum and kinetic energy in the perception of causality. American Journal of Psychology, 74, 394–402. Newman, G. E., Choi, H., Wynn, K, & Scholl, B. J. (2008). The origins of causal perception: evidence from postdictive processing in infancy. Cognitive Psychology, 57, 262–291. (p. 264)

Oakes, L. M. (1994). Development of infants’ use of continuity cues in their perception of causality. Developmental Psychology, 30, 869–879. Oakes, L. M., & Cohen, L. B. (1990). Infant perception of a causal event. Cognitive Devel­ opment, 5, 193–207. Panarese, A., & Edin, B. B. (2011). Human ability to discriminate direction of three-di­ mensional force applied to the finger pad. Journal of Neurophysiology, 105, 541–547. Powesland, P. F. (1959). The effect of practice upon the perception of causality. Canadian Journal of Psychology, 13, 155–168. Proske, U., & Gandevia, S. C. (2012). The proprioceptive senses: their roles in signaling body shape, body position and movement, and muscle force. Physiological Review, 92, 1651–1697. Rolfs, M., Dambacher, M., & Cavanagh, P. (2013). Visual adaptation of the perception of causality. Current Biology, 23, 250–254. Rosenbaum, D. A. (1975). Perception and extrapolation of velocity and acceleration. Jour­ nal of Experimental Psychology: Human Perception and Performance, 1, 395–403. Roser, M. E., Fugelsang, J. A., Dunbar, K. N., Corballis, P. M., & Gazzaniga, M. S. (2005). Dissociating processes supporting causal perception and causal inference in the brain. Neuropsychology, 19, 591–602. Runeson, S. (1983). On visual perception of dynamic events. Acta Universitatis Upsalien­ sis: Studia Psychologica Upsaliensia (pp. 1–56). Uppsala, Sweden: University of Uppsala. Saxe, R., & Carey, S. (2006). The perception of causality in infancy. Acta Psychologica, 123, 144–165. Schlottmann, A. (2000). Is perception of causality modular? Trends in Cognitive Sciences, 4, 441–442. Schlottmann, A., & Anderson, N. H. (1993). An information integration approach to phe­ nomenal causality. Memory and Cognition, 21, 785–801.

Page 30 of 32

Visual Impressions of Causality Schlottmann, A., Ray, E., Mitchell, A., & Demetriou, N. (2006). Perceived social and physi­ cal causality in animated motions: Spontaneous reports and ratings. Acta Psychologica, 123, 112–143. Scholl, B. J., & Nakayama, K. (2002). Causal capture: contextual effects on the perception of collision events. Psychological Science, 13, 493–498. Scholl, B. J., & Tremoulet, P. D. (2000). Perceptual causality and animacy. Trends in Cogni­ tive Science, 4, 299–309. Suppes, P. (1970). A probabilistic theory of causality. Acta Philosophica Fennica (Fasc. XXIV). Amsterdam: North-Holland. Talmy, L. (1988). Force dynamics in language. Cognitive Science, 12, 49–100. Todhunter, I. (1887). Mechanics for beginners. London: Macmillan. Wheat, H. E., Salo, L. M., & Goodwin, A. W. (2004). Human ability to scale and discrimi­ nate forces typical of those occurring during grasp and manipulation. Journal of Neuro­ science, 24, 3394–3401. White, P. A. (2005). Visual causal impressions in the perception of several moving objects. Visual Cognition, 12, 395–404. White, P. A. (2006). The causal asymmetry. Psychological Review, 113, 132–147. White, P. A. (2009). Perception of forces exerted by objects in collision events. Psychologi­ cal Review, 116, 580–601. White, P. A. (2010). The property transmission hypothesis: a possible explanation for visu­ al impressions of pulling and other kinds of phenomenal causality. Perception, 39, 1240– 1253. White, P. A. (2012a). Visual impressions of causality: effects of manipulating the direction of the target object’s motion in a collision event. Visual Cognition, 20, 121–142. White, P. A. (2012b). Visual impressions of pushing and pulling: the object perceived as causal is not always the one that moves first. Perception, 41, 1193–1217. White, P. A. (2012c). The experience of force: the role of haptic experience of forces in vi­ sual perception of object motion and interactions, mental simulation, and motion-related judgments. Psychological Bulletin, 138, 589–615. White, P. A. (2014a). Perceived causality and perceived force: same or different? Visual Cognition, 22, 672–703. White, P. A. (2014b). Singular clues to causality and their use in human causal judgment. Cognitive Science, 38, 38–75.

Page 31 of 32

Visual Impressions of Causality White, P. A., & Milne, A. (1997). Phenomenal causality: Impressions of pulling in the visu­ al perception of objects in motion. American Journal of Psychology, 110, 573–602. White, P. A., & Milne, A. (1999). Impressions of enforced disintegration and bursting in the visual perception of collision events. Journal of Experimental Psychology: General, 128, 499–516. White, P. A. & Milne, A. (2003). Visual impressions of penetration in the perception of ob­ jects in motion. Visual Cognition, 10, 605–619. Wolff, P., & Shepard, J. (2013). Causation, touch, and the perception of force. Psychology of Learning and Motivation, 58, 167–202. Wolpert, D. M., & Flanagan, J. R. (2001). Motor prediction. Current Biology, 11, R729– R732. Wolpert, D. M., Ghahramani, Z., & Jordan, M. I. (1995). An internal model for sensorimo­ tor integration. Science, 269, 1880–1882. Yela, M. (1952). Phenomenal causation at a distance. Quarterly Journal of Experimental Psychology, 4, 139–154. Young, M. E., & Falmier, O. (2008). Launching at a distance: The effect of spatial markers. Quarterly Journal of Experimental Psychology, 61, 1356–1370.

Peter A. White

School of Psychology Cardiff University Cardiff, Wales, UK

Page 32 of 32

Goal-Directed Actions

Goal-Directed Actions   Bernhard Hommel The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.18

Abstract and Keywords Personal causation relies on translating goals into goal-directed behavior. This chapter addresses how humans generate a goal-directed behavior, that is, how they initiate and control intentional, goal-directed actions. In particular, it discusses how anticipated ac­ tion effects are integrated with motor patterns, so to guide future effect-driven actions, and how action intentions struggle with overlearned habits. It argues that intentional and conscious processes typically precede, rather than accompany, intentional actions, and that the experience of personal agency and the identification of action errors are based on a comparison between expected and actual action effects. A final outlook addresses the implications of increasing insight into cognitive embodiment and of increasing inter­ disciplinarity for the study of human action control and personal causation. Keywords: intentional action, human action control, habit, action, agency, consciousness

As various contributions to this Handbook discuss, humans can reason about cause–effect relationships in rather abstract ways, but they can also extract information about such re­ lationships from visual events they perceive (White, Chapter 14 in this volume). And yet, arguably the most direct experience of causality emerges from one’s own action. It is through actions that we can change our environment and express ourselves from the first months of our lives (Rochat, 2001). Actions also have a social function that provides the attentive observer with information about mental causation, that is, about the fact that human actions are caused by, and thus express, internal goals. Given that we cannot di­ rectly access the goals of other individuals, their actions are often the only clues to their goals we have. Some authors have even argued that the same applies to ourselves: we may often learn about our own goals only by observing the actions they generate (Wegn­ er, 2002). In any case, in order to understand how personal causation works, we need to understand how goals translate into goal-directed behavior. Accordingly, this chapter ad­ dresses how humans initiate and control intentional, goal-directed actions. In some sense, the term “goal-directed action” is a pleonasm, because actions are considered to differ from mere movements in their goal-directedness—a characteristic that mere movements do not possess. Hence, at least overt actions can be considered goal-directed movements, Page 1 of 21

Goal-Directed Actions which means that the term “action” is sufficient to express goal-directedness. Covert, purely mental actions (e.g., O’Brien & Soteriou, 2009) are more difficult to define. As mental actions are arguably derived from, and thus are simulations of, overt actions (e.g., Grafton, 2009; Vygotsky, 1934/1986), I will restrict the discussions in this chapter to overt actions and the way they are controlled. In any case, in order to meet the criterion of being directed toward a goal, a given process would need to rely and be conditional on some sort of knowledge about the rela­ tionship between this process and some future event, and it would need to be activated as the consequence or side effect of activating (p. 266) a representation of this future event (Dickinson & Balleine, 1994). Among other things, this means that, as William James (1890, p. 487) has put it, “… if, in voluntary action properly so-called, the act must be foreseen, it follows that no creature not endowed with divinatory power can perform an act voluntarily for the first time.” That is, before someone can carry out a goal-directed process, he or she must have acquired some knowledge about the fact that a particular event—the representation of which must be included in the goal representation—can in fact be created by carrying out that process. Unless we assume that humans are born with a particular set of concrete goals—a possibility for which no empirical support has been provided so far—learning about which processes can lead to which effects in the in­ ternal or external world must precede the ability to perform goal-directed activities.

From Movement to Action The importance of many basic cognitive abilities often becomes clear only if they are im­ paired, declining, or absent—be it through degeneration, accidents, or the lack of suffi­ cient development. Particularly instructive for a deeper understanding of goal-directed action is the behavior of a newborn child. This behavior shows frequent and dramatic changes of activity levels, rapid successions of phases of highest degrees of liveliness and sleepiness. But even the most active phases show very little, if any, expression of what we call goal-directed action. Instead, we can see numerous examples of relatively rigid re­ flexes, including the rooting and sucking reflexes that help the child to engage in breast­ feeding, the stepping reflex that facilitates the acquisition of walking, and the grasping reflex that supports the exploration of the object world. As the frontal lobe of the human brain—which is required to generate more complex action plans—develops, almost all ba­ sic reflexes disappear, but only after having left many important traces of the child’s sen­ sorimotor experience with its environment. According to ideomotor theory (Harless, 1861; Hommel, 2009; James, 1890; Lotze, 1852; for an overview, see Stock & Stock, 2004) and Piaget’s (1946) approach to cognitive de­ velopment, sensorimotor interactions allow the child to acquire information about the contingencies between its movements and their impact on the environment. Storing these contingencies is considered to be the first step in generating a cognitive representation of one’s world, with an emphasis on one’s own opportunities to actively change it. Accord­

Page 2 of 21

Goal-Directed Actions ingly, these contingencies provide the database necessary to generate goal-directed ac­ tion (i.e., movements that are driven by some anticipation about their outcome). Turning movements into actions thus requires some sort of anticipation of the likely out­ comes of a given movement and a selection of the movement based on this anticipation (two aspects that I will discuss in more detail under “Action Selection and Prediction”). The ability to select a movement based on its likely outcomes requires the integration of movement codes (i.e., cognitive/neural codes that generate movements) and action-effect codes (i.e., cognitive/neural codes that represent to-be-expected movement outcomes). According to ideomotor theory, this integration emerges through Hebbian learning (what fires together wires together): the agent starts with some motor babbling (the execution of more or less random movements or reflexes), registers the sensory feedback that these movements generate, and associates the motor patterns underlying the movements with representations of the feedback. The resulting associations are bidirectional, so that from now on the activation of the movement leads to an anticipation of its outcome (i.e., to the priming of the outcome representations) and the activation of the outcome representa­ tions can activate the movement pattern resulting in such outcomes. Hence, the agent can now intentionally activate particular movements by simply activating the representa­ tions of wanted outcomes (e.g., by actively imagining them). In other words, ideas can now lead to motor behavior, and movements become goal-directed actions. The recent years have provided considerable evidence supporting this scenario (for overviews, see Hommel, 2009; Shin, Proctor, & Capaldi, 2010). Infants, children, and adults were shown to pick up action-effect contingencies on the fly, irrespective of the current action goal, and to create bidirectional associations between the underlying movement patterns in the representations of the effects (e.g., Elsner & Hommel, 2001; Kray, Eenshuistra, Kerstner, Weidema, & Hommel, 2006; Verschoor, Weidema, Biro, & Hommel, 2010; for an overview, see Hommel, 2009; Hommel & Elsner, 2009). Representa­ tions of action effects are not just acquired, they can also be shown to be involved in ac­ tion selection. For instance, presenting people with action effects before or during action selection interferes with selecting an action producing another effect (Hommel, 1996) and selecting actions with mutually incompatible action effects (such as pressing the left key to produce an event on the right (p. 267) side) is less efficient than selecting actions with compatible effects (Kunde, 2001). This suggests that the sensory consequences of actions are considered when and in the process of selecting them. This is consistent with findings from brain-imaging studies, which show that presenting people with possible ac­ tion effects tends to activate the action producing them (Elsner at al., 2002; Melcher et al., 2008) and that preparing for particular actions leads to the activation of brain areas that are involved in perceiving the sensory effects of these actions (Kühn, Keizer, Rom­ bouts, & Hommel, 2011).

Page 3 of 21

Goal-Directed Actions

Action and Habit Heraclitus reminded us that we cannot step into the same river twice, which means that events that look the same at the surface often keep changing in the underlying structure. This certainly holds for actions, which keep changing in character and the way they are controlled with every execution. What will become a goal-directed action often begins with rather uncoordinated, sometimes explorative movements. This is true for the motor babbling of infants, which is often only constrained by the available reflexes and the ac­ tivity level of the agent, but it is also true for the adult learner of a particular skill. The first moves of a beginning dancer or skier are only weakly hinting at the intended direc­ tion, and the many degrees of freedom of the limbs are often strategically “frozen” to re­ duce the demands on control processes (Bernstein, 1967). As the motor adjustments be­ come more efficient, predicted and actual action effects become more similar which, as described earlier, turns movement into actual action. But the changes in controlling actions do not end here. Exercising an action is known to reduce its control demands and to free up cognitive resources: the action or skill be­ comes automatic. “Automaticity” is a rather problematic term that has defeated all at­ tempts to define it properly (Bargh, 1997; Hommel, 2000). Originally, it was intended to indicate the lack of endogenous control over the behavior that highly overlearned stimuli are able to trigger (Shiffrin & Schneider, 1977). A famous example is the Stroop effect (Stroop, 1935): the fact that people find it difficult to ignore incongruent meanings of col­ or words (the color of which they are intending to name) has been taken to suggest the automaticity of word reading. Another example is the Simon effect (Simon & Rudell, 1967): the finding that people are slower and less accurate in responding to stimuli the location of which does not correspond to the location of the correct response has been taken to imply automaticity of location processing. However, there are reasons to doubt that action-in-using tendencies are truly automatic in the original sense (Hommel, 2000). For instance, participants in a Stroop task are be­ ing asked to respond to words and to articulate color names, which implies that they have established a cognitive set that is likely to draw attention to color names. Accordingly, what seems to be an automatic process is likely to reflect the consequences of intentional task preparation. Indeed, it has been suggested that action control rarely operates online (Bargh, 1989; Exner, 1879). Rather, people intentionally prepare possible actions and del­ egate their control to external stimuli—which can produce response conflict under very artificial conditions like the Stroop task, but frees up precious cognitive resources to “look ahead” even further. Nevertheless, it is true that overlearned actions become habits, which goes hand in hand with a change in the way they are controlled (Gershman, Chapter 17 in this volume). For instance, overlearned actions tend to be efficient, to occur outside of awareness, and to be stimulus-driven and ballistic (i.e., difficult to stop once they are triggered) (Bargh, 1994). Stimulus-driven actions seem to be mediated by different neural structures than those that mediate more goal-driven actions (Passingham, 1997) and they are less sensi­ Page 4 of 21

Goal-Directed Actions tive to sensory action effects (Herwig, Prinz, & Waszak, 2007) and reward (e.g., Watson, Wiers, Hommel, & de Wit, 2014). The lack of reward-sensitivity of overlearned actions has been taken to imply that they no longer should be considered true goal-directed actions (Dickinson & Balleine, 2009). Ac­ cording to this perspective, the desire to realize a particular outcome is a defining criteri­ on of goal-directed actions, and the lack of sensitivity to action-contingent reward must be considered a lack of desire. While this account is consistent with some philosophical definitions of human action, it is rather problematic if applied to systems-level accounts of action control. For instance, there is evidence that even overlearned actions are based on outcome anticipations: Band et al. (2009) had participants engage in a complicated stimulus–response mapping task, in which each of the four responses produced a particu­ lar auditory outcome. Each response produced one specific outcome in the majority of the trials, but another outcome, or a number of outcomes, in the remaining trials. Even though the outcomes (p. 268) were entirely irrelevant to the task, actions that produced unexpected outcomes triggered electrophysiological components similar to the feedbackrelated negativity that is observed if agents are told that their action was incorrect. This suggests that even stimulus-driven actions are accompanied by anticipations about their consequences, which makes them goal-directed. Moreover, recent findings provide evi­ dence that, consistent with White (1959), the mere production of particular action out­ comes is rewarding (Eitam, Kennedy, & Higgins, 2013), which suggests that the elimina­ tion of external reward does not necessarily imply the absence of any reward. In particu­ lar, the available evidence seems to suggest that the main reward for acquiring an action comes from the match between anticipation and actual outcome (Wolpert & Ghahramani, 2000), rather than receiving some added external reinforcers. If so, goal-directed actions are producing their own reward to the degree that the intended, or at least anticipated effect is actually produced.

Preparation and Execution The previous section may be taken to imply that a given action can only be either inten­ tional/effortful or automatic, but not both. However, while this may be true for one given act, process, or procedure, it does not seem to capture the essence of everyday actions, which rather seem to rely on complex interactions between intentional and automatic processes. This has been clearly fleshed out by Exner (1879), who reported his introspec­ tion during a simple reaction-time experiment. Preparing for a task, so he argues, in­ volves some sort of self-automatization: one establishes a mental state that allows the lat­ er arriving stimulus to trigger the assigned reaction automatically. On the one hand, the entire action would be intentional, as it was the intentional preparation that allowed the stimulus to trigger the action. On the other hand, however, the action itself can be consid­ ered automatic, as it does not require some additional process to translate the stimulus into overt movement. Hence, goals and intentions allow us to prepare our cognitive sys­ tem in such a way that further processing can be more or less entirely stimulus- or envi­

Page 5 of 21

Goal-Directed Actions ronmentally driven (Bargh, 1989, 1997)—a process that creates what Woodworth (1938) has called “prepared reflexes.” As pointed out earlier, this logic applies to many psychological phenomena, such as the Stroop effect, which involves automatic word-reading that nevertheless was enabled by the intention to utter color words in response to presented color words. A particularly convincing demonstration for the Simon effect stems from Valle-Inclán and Redondo (1998). The Simon effect consists in the observation that responding to non-spatial stim­ uli is easier and more efficient if the location of the stimulus corresponds to the location of the response (Simon & Rudell, 1967). This has been attributed to the automatic prim­ ing of responses by spatially corresponding stimuli, and there is indeed electrophysiologi­ cal evidence that processing a lateralized or otherwise spatial stimulus activates cortical areas involved in planning movements with the corresponding hand (Eimer, 1995; Som­ mer, Leuthold, & Hermanutz, 1993). Valle-Inclán and Redondo (1998) were able to repli­ cate this observation in a condition in which they presented the relevant stimulus-re­ sponse mapping before the stimulus—a finding that would commonly be interpreted as demonstrating automaticity. And yet, the authors did not find automatic action activation in a condition where the stimulus-response mapping appeared after the stimulus. This means that implementing the stimulus-response mapping is a precondition of automatici­ ty, suggesting that automatic action activation is actually a “prepared reflex.” How do people implement prepared reflexes? The implementation is commonly attributed to executive-control processes, but how the preparation works is still under investigation. At least three kinds of preparation processes seem to exist. First, preparing the cognitive system for goal-directed movements seems to include the establishment of stimulus–re­ sponse links. Task-switching studies have revealed that implementing such links takes considerable time (Allport, Styles, & Hsieh, 1994; Monsell, 2003) and disabling them is difficult, as they are rather inert and affect subsequent processing (Allport et al., 1994; Hommel & Eglau, 2002). There is also evidence that maintaining stimulus–response links requires substantial cognitive effort (de Jong, Berendsen, & Cools, 1999), which in longer periods of task execution can induce goal forgetting (Altmann & Gray, 2002). Second, preparing for a task often involves the preactivation of possible actions. This is particular­ ly likely if only a few action alternatives are relevant, while larger numbers of action al­ ternatives are likely to prevent preactivation and to and use more cognitive response–se­ lection strategies. And third, preparing for a task has been shown to include the atten­ tional focusing on relevant stimuli (Bekkering & Neggers, 2002 (p. 269) ) and the priming of task-relevant stimulus dimensions (Hommel, 2010; Memelink & Hommel, 2013). For in­ stance, while preparing for a grasping action increases attention to the shape of visual stimuli, preparing for a pointing action attracts attention to location information (Fagioli, Hommel, & Schubotz, 2007). These last two observations are likely to relate to another aspect of the interaction be­ tween offline preparation and online execution. There is considerable neuroscientific evi­ dence that visually guided actions emerge from the interaction of two separate and disso­ ciable stimulus–response routes. While earlier approaches have characterized these path­ Page 6 of 21

Goal-Directed Actions ways in terms the particular stimulus information they provide (what vs. where), Milner and Goodale (1995) have emphasized the offline versus online character of these path­ ways. According to their reasoning, humans and other primates have a ventral offline channel for processing information that allows the identification of objects and other events and a dorsal online channel for providing real-time environmental information about location, intensity, and other rather low-level aspects of objects and events. Later approaches have criticized this approach for underestimating the interaction between the two channels (Glover, 2004) and for relating them to perception and action (rather than to action and sensorimotor processing), respectively (Hommel, Müsseler, Aschersleben, & Prinz, 2001a, 2001b). However, the basic idea that human action emerges from the inter­ action of preparatory offline processing and sensorimotor online processing has been widely embraced. One problem with the original idea of two entirely independent channels was that the sensorimotor online channel was assumed to have no access to memory and higher-level processes, which raises the question of how it can be used to control flexible goal-direct­ ed actions. However, this problem can be tackled by assuming that offline preparation not only preactivates the relevant action systems into which sensorimotor processing would need to feed, but also selects the input provided to online sensorimotor processing by in­ creasing the output gains, and thus the contribution of, features on task-relevant stimulus dimensions (Hommel, 2010). In other words, the sensorimotor online channel might in­ deed have some autonomy, but its contribution is tailored to the goal at hand by selecting its input and channeling its output to the relevant action systems.

Action Selection and Prediction The main purpose of ideomotor models of action control and of the theory of event coding (TEC) is to explain how people acquire the ability to select goal-directed actions. Figure 15.1 captures the main idea sketched earlier (see Hommel, 2009): random firing of motor cells produce overt movements, which are registered by the cognitive system and coded by sensory cells, which then become integrated with the corresponding motor cells through Hebbian learning. This integrated unit can then become internally activated through goal representations, that is, through representations that are coding for the in­ tended action effects. Activating such representations will tend to activate sensory repre­ sentations of similar events, and this activation primes the associated motor cells. To se­ lect an action, the agent simply needs to create a representation of the wanted action ef­ fect, which then leads to the selection of those motor patterns that have been produced such effects in the past. Note that this approach is only concerned with setting up the cognitive system for pro­ ducing an action, but not with testing whether this action has been carried out and whether it came out as expected. This testing aspect has been emphasized by comparator models of action control. Comparator models use cybernetic principles to compare in­ tended output (actions) and the associated expected reafferent input (the sensory conse­ Page 7 of 21

Goal-Directed Actions quences of the action) against the actual reafferent input. Figure 15.2 sketches the basic principle. The representation of the desired state informs a perception-movement transla­ tion system to produce motor commands, which produce overt action. The perceived reaf­ ferent information is compared with the expected reafferent information to find out whether the intended action effect has actually been produced, that is, (p. 270) whether the action was as expected. A comparison between the actual outcome of the action and the wanted outcome serves to determine whether the action was successful in reaching the intended goal.

Figure 15.1 The ideomotor principle, simplified after James (1890).

Figure 15.2 The basic structure of the comparator model, simplified after Frith, Blakemore, & Wolpert (2000).

It is easy to see that the ideomotor model and the comparator model of action control are complementary (see Chambon & Haggard, 2013; Hommel, 2015): the ideomotor model is rather articulated regarding the acquisition of action-effect associations and the selection of actions, but silent with respect to the evaluation of the outcome, while the comparator Page 8 of 21

Goal-Directed Actions model is not overly specific with regard to the selection aspect, but very articulated re­ garding the outcome evaluation. One might consider the possibility that both models are simply using different language to refer to the same process, so that the process of select­ ing an action by anticipating an action effect might be the same that also specifies the outcome expectations, against which reafferent information can be evaluated. In other words, intending an action may simply consist in specifying the intended action effect, and this representation may be responsible for both selection and evaluation. However, there are reasons to assume that the scenario is more complex than that. First, Elsner and Hommel (2004) investigated the conditions under which adults acquire novel action-effectcontingencies. Participants were presented with sounds that were con­ tingent on pressing the left or right key before being presented with the same sounds as stimuli. As reported by Elsner and Hommel (2001), participants had more difficulties in pressing a key in response to a sound if that sound had previously been produced by the alternative key than if that sound had previously been produced by the same key. This suggests that the participants were spontaneously acquiring bidirectional action-effect associations in the first phase of the experiment. Interestingly, the size of this effect was systematically modulated by the contingency and the temporal contiguity between action and effect. The effect increased with the strength of (p. 271) correlation between action and effect and the frequency of the effect, and it was largest with zero delay between ac­ tion and effect. Interestingly, Elsner and Hommel (2004) also assessed the perceived agency of the participants, that is, the degree to which participants perceive themselves to be the cause of the sounds. While agency judgments were also sensitive to contingency and temporal contiguity, the sizes of the action-effect learning effect and the degree of perceived agency were uncorrelated. If we assume that agency reflects the match be­ tween the expected and the actual action effect, the representation of the expectation (as assessed by agency judgments) does not seem to be related to the representation that is responsible for a response selection (as assessed by the action–effect learning effect). Second, Verschoor, Spapé, Biro, and Hommel (2013) investigated action-effect acquisition in 7- and 12-month-old infants (cf., Muentener & Bonawitz, Chapter 33 in this volume), and in adults. Participants were presented with sounds that were contingent on the hori­ zontal direction of their saccades. In a later test phase, saccades were evoked by periph­ eral visual stimuli that appear together with a tone that was previously produced by leftor right-ward saccades. Saccade initiation was slower if that direction did not correspond with the direction associated with the tone, suggesting that saccades were selected based on representations of the resulting auditory effects. However, this effect was only ob­ served in the 12-month-old infants and in the adults, but not in the youngest group. Con­ sistent with earlier findings of Verschoor, Weidema, Biro, and Hommel (2010), this sug­ gests that infants below one year of age have difficulties in selecting actions based on ex­ pected outcomes, presumably reflecting a not yet sufficiently developed frontal cortex. Verschoor et al. (2013) also measured pupil dilation, a measure of surprise. Participants exhibited more strongly dilated pupils if the actually carried out saccade went in another direction than the direction indicated by the tone—that is, participants were surprised by what they were doing currently because of the mismatch between the action-related ex­ Page 9 of 21

Goal-Directed Actions pectation and the tone-induced expectation. Importantly, even the youngest group showed this effect, suggesting that representing expectations of action-contingent out­ comes precedes, and thus does not require, the ability to use outcome expectations to se­ lect voluntary actions. These findings suggest that action selection and action evaluation are separable process­ es that may develop at different paces and that seem to rely on different processes. As ar­ gued elsewhere (Hommel, 2015), this suggests the integration of ideomotor action selec­ tion and comparator-based action evaluation, as indicated in Figure 15.3.

Agency and Ownership Actions not only serve to reach particular goals, they also have a particular personal and social meaning. Accordingly, the ability to carry out goal-directed actions has often been associated with issues regarding agency and ownership of actions—issues of particular relevance for the juridical evaluation of deviant behavior. While it is debatable whether the experience of agency and ownership plays a decisive role in carrying out voluntary ac­ tions (an issue that I will discuss in the next section), recent

(p. 272)

research has looked

into the factors determining such experiences.

Figure 15.3 An integrated model combining ideomo­ tor action selection with comparator-based action evaluation, redrawn after Hommel (2015).

Of particular interest for the experience of agency—that is, the impression that it is me who is carrying out (i.e., causally producing) a particular action—is the relationship be­ tween expected and actual action effects. More specifically, humans experience greater causal impact on events the stronger the temporal and spatial proximity between their ac­ tions and these events, and the more their actions and the events covary (for overviews, see LePelley, Griffiths & Beesley, Chapter 2 in this volume; Shanks & Dickinson, 1987; Wasserman, 1990). While the ideomotor approach does not really speak to the issue of agency, the comparator model provides a crucial comparison, namely, that between ex­ pected and actual reafferent stimulation (Frith et al., 2000). The assumption is that per­ forming an action is accompanied by expectations regarding the sensory changes this ac­ tion is assumed to evoke. These expectations are then matched with the actually pro­

Page 10 of 21

Goal-Directed Actions duced sensory changes, and the better that match, the more pronounced the experience of agency might be (Chambon & Haggard, 2013). This assumption is consistent with observations that expectation-inconsistent sensory ac­ tion effects cause measurable surprise (Verschoor et al., 2013), a decreased sense of agency (Spengler, von Cramon, & Brass, 2009), and electrophysiological signs of internal conflict (a so-called feedback-related negativity, NFB, which is commonly observed if agents are informed to have committed an error; Band et al., 2009). However, while the relationship between expected and actual outcomes is likely to be a major determinant of experienced agency, there are likely to be more sources for agency judgments, such as contextual plausibility, past experience, and the agent-specific typicality of the action. Hence, action control provides some input to agency judgments, but certainly not all, and sometimes perhaps not even the most important one (Synofzik, Vosgerau & Newen, 2008) —as Figure 15.3 indicates. While agency refers to the attribution of action outcomes to oneself, perceived ownership refers to the attribution of effectors to oneself. Originally, interest in the mechanisms un­ derlying body ownership was fueled by observations in patients, such as individuals suf­ fering from the alien hand syndrome (Scepkowski & Cronin-Golomb, 2003). Some of these observations suggested that perceiving one’s own body is a non-trivial cognitive task that underlies all sorts of possible illusions and misinterpretations. Indeed, even healthy participants can have difficulties to tell whether it is their own hand or that of a confederate that is drawing a picture (Nielsen, 1963; van den Bos & Jeannerod, 2002). More recently, the so-called rubber-hand illusion has attracted a lot of attention. It shows that healthy individuals can be led to perceive a rubber hand lying in front of them as part of their own body if their own (invisible) hand and the rubber hand are stroked in synchrony (Botvinick & Cohen, 1998). Synchronous stroking even induces some sort of primitive empathy: for instance, watching a synchronously stroked rubber hand about to being pricked by a pin activates the same pain areas in the brain that are activated when being approached by a pin oneself (Morrison, Lloyd, di Pellegrino, & Roberts, 2004). These observations suggest that multimodal synchronicity of perceptual input is one of the criteria that determine perceived body ownership. Related investigations using virtual-reality manipulations have shown that active control or controllability is another, perhaps even more potent, criterion. If moving one’s own hand leads to synchronous movements of a virtual hand on a screen or in a virtual-reality scenario, people perceive the virtual hand as part of their own body—the virtual-hand illu­ sion (Sanchez-Vives, Spanlang, Frisoli, Bergamasco, & Slater, 2010). One possible impli­ cation of these kinds of effects is that humans might have an internal model of their body that mediates ownership perception (Tsakiris, 2010). Under suitable conditions, an artifi­ cial and/or novel candidate effector would thus be perceived as belonging to one’s body, and to the degree that it is sufficiently similar to one of the effectors defined in this mod­ el. However, recent findings have shown that people can perceive ownership for virtual balloons that vary in size, and for virtual squares that vary in color, with people’s own hand movements (Ma & Hommel, 2015). This suggests that controllability is more impor­ Page 11 of 21

Goal-Directed Actions tant, and can overrule similarity, which does not fit with the idea of a stable internal body model. It also suggests that agency, that is, the perception of contingencies between one’s own actions and their effects, might be the crucial criterion for representing one’s own body (Tsakiris, Schütz-Bosbach, & Gallagher, 2007).

Consciousness and Action Control Folk-psychological and philosophical traditions are based on the assumption that actions emerge from conscious decision-making and are controlled by conscious goal representa­ tions (Hommel, 2007). (p. 273) Indeed, early ideomotor theories were motivated by the question of how consciously represented action effects can recruit and drive consciously inaccessible motor processes (Baars, 1988; James, 1890) and information-processing ap­ proaches since Donders (1868) have often assumed that conscious representations shield the rather automatic perceptual processes from unwanted impact on action-related deci­ sion-making. Indeed, many introductory textbooks still contrast conscious decision-mak­ ing with automatic processes, implying that unconscious decision-making cannot exist. A highly influential milestone marking a transition in this thinking was the study of Libet, Wright, and Gleason (1982). In this study, participants were to carry out simple key-press­ ing actions at their own pace while their electrophysiological responses were continuous­ ly recorded. As expected, each key-pressing response was preceded by a readiness poten­ tial, a standard electrophysiological component that can be observed about one second or more before a voluntary action is performed. Participants were also asked to report when they would feel the urge to act. For that purpose, they saw a quickly rotating clock and they were reporting the position of the clock face at the point in time when they felt the urge at the end of each trial. Researchers were thus able to calculate the temporal rela­ tion between the electrophysiological indicator of the action (the readiness potential) and the conscious indicator (the perceived urge). While both indicators preceded the overt ac­ tion, the theoretically significant observation was that the electrophysiological indicator preceded the conscious representation by several hundreds of milliseconds. This observa­ tion triggered numerous philosophical and psychological debates about the functional role, if any, of conscious goal representations (e.g., Klemm, 2010) and it has motivated Wegner (2003) to distinguish between the true cause of voluntary actions (which would produce the readiness potential) and its conscious representation—which he considers an only apparent cause. Unfortunately, almost all of these discussions neglect basic aspects of action control, which actually render the findings of Libet and colleagues rather undiagnostic for assess­ ing the role of conscious representations. As discussed under “Action and Habit,” actions are rarely controlled online. Rather, goals are translated into a task set, which then regu­ lates information processing in a more or less automatic fashion (Bargh, 1989; Hommel, 2000). Indeed, given that the implementation of a task set is a rather time-consuming process taking several hundreds of milliseconds (Allport et al., 2004), reaction times in the order of a few hundred milliseconds, as in typical reaction time experiments, would Page 12 of 21

Goal-Directed Actions not be possible if people would translate their goals into actions online in every trial. The same is likely to hold for tasks in the tradition of Libet et al. (1982), which require partici­ pants to carry out the same action hundreds of times in a row. The most interesting time point for assessing conscious decision-making and goal representation in such a task would thus not be within a given trial but in the very beginning, when the participant translates the experimenter’s instruction into a particular task set. In psychological ex­ periments, negotiations between experimenters and participants are commonly verbal in nature, and goals are commonly explicitly defined. This renders it highly likely that goals are consciously represented, at least at the beginning of a given task. Whether and to what degree this extends to daily life is not yet understood. At this point, the available evidence allows for four interpretations. First, it is possible that goals need to be consciously represented while agents implement a particular goal and action plan, but not necessarily after the implementation is completed. Second, it is possible that conscious representations often accompany but do not serve any purpose in action control proper. As pointed out by Wegner (2002), the idea that actions are con­ trolled by conscious representations may thus be an illusion. Third, it is possible that con­ scious representations of action goals are unnecessary for immediate action control but serve the social communication about actions (Hommel, 2013; Masicampo & Baumeister, 2013). Finally, recent observations suggest that conscious representation may be system­ atically related to response conflict (Morsella, 2005), which might suggest a specific role for action monitoring.

Action Monitoring As pointed out earlier, the comparison between expected and actual action outcomes pro­ vides information about an action’s success, that is, about whether the intended action ef­ fect has been realized. This can be considered a kind of action monitoring, as possible failures in achieving one’s goals are signaled by discrepancies between expected and ac­ tual outcome. This allows for adjustments of actions in the future, so as to make them more likely to achieve one’s goals. However, recent research has (p. 274) provided strong evidence for the existence of action monitoring at a less general level. Most evidence comes from conflict tasks, such as the Stroop task or the Simon task, in which different aspects of stimuli indicate conflicting responses. These tasks indicate con­ flict within the trial by showing that stimulus-feature combinations that imply the same response (such as the word “red” written in red ink in a Stroop task, or a left-response cue appearing on the left side in a Simon task) allow for faster and more accurate re­ sponses than combinations that imply different responses (such as the word “green” writ­ ten in red ink or a left-response cue appearing on the right side). Interestingly, however, trial-to-trial analyses of performance in conflict tasks have shown that conflict has less of an impact on performance after a conflict trial than after a non-conflict trial (Gratton, Coles, & Donchin, 1992). Even though various factors might contribute to this observa­ tion (Hommel, Proctor, & Vu, 2004; Spapé & Hommel, 2014), this outcome pattern sug­ Page 13 of 21

Goal-Directed Actions gests that the experience of conflict leads to a readjustment of cognitive-control settings to reduce the impact of future conflict (Botvinick, 2007). Neuroscientific evidence suggests that the presence of conflict in a given trial is commu­ nicated to brain areas involved in conflict monitoring (the anterior cingulate in particular; Botvinick, Nystrom, Fissell, Carter & Cohen, 1999), which then signal the demand for stronger top-down control to (frontal dorsolateral) systems involved in goal representa­ tion. As a consequence, the goal representation is strengthened and more top-down con­ trol is exerted, thus reducing the probability and strength of future conflict. One interest­ ing question is how conflict is signaled to conflict monitoring systems. It is possible that conflict is picked up directly, but it may also be the case that it is conflict-induced reduc­ tions in mood that represent the relevant information (Botvinick, 2007). Indeed, receiving unexpected reward and positive-mood inductions reduce the probability of conflict-relat­ ed adjustments (van Steenbergen, Band, & Hommel, 2009), suggesting that affective va­ lence is the currency used to signal the presence of response conflict.

Outlook After decades of neglect (which is still obvious from almost all introductory textbooks in cognitive psychology), the question of how humans control goal-directed actions has re­ ceived ample, and well-deserved attention in the last 20 years or so. The field is blooming and researchers have started to elaborate the cross-links between several areas, such as between consciousness and action control. And yet, quite some work needs to be done, and so I will conclude by discussing three challenges that would be particularly valuable to be tackled in the near future. First, there is increasing insight into the embodiment of human cognition, that is, on the role of the body and of active interactions with our environment for the emergence of cognitive representations, including the construction of our self (Hommel, 2016; Wilson, 2002). Up to this point, however, the role of action is often referred to rather metaphori­ cally and/or taken as a given, while systematic ontogenetic investigations or training studies actually demonstrating and tracking this emergence are lacking. Also lacking are mechanistic models that explain exactly how actions generate cognition and whether the role of action is restricted to the construction of cognitive models, or whether action re­ mains important for the maintenance of cognitive representations. Second, the cognitive sciences still tend to suffer from the stage approach to information processing and the idea that information processing mainly occurs in dedicated modules (e.g., Fodor, 1983; Sternberg, 2011). This approach has led to a rather drastic specializa­ tion of research on human cognition and action, and has prevented systematic communi­ cation between researchers working on particular modules, say, perception, attention, memory, thinking, language, and (non-verbal) action. Unfortunately, however, these con­ cepts are taken from everyday language, which is not specialized for separating underly­ ing mechanisms. In fact, it is very likely that the mechanisms underlying these and other functions highly overlap. The neglect of this possibility has led to parallel developments Page 14 of 21

Goal-Directed Actions and reinventions of various wheels, with modeling the control of complex, sequential ac­ tion in the vocal and the non-vocal/non-verbal domain being just one of many examples. With respect to action, it is very likely that action control has a strong impact on almost all other cognitive processes, not the least because evolution, a major driving force in the phylogenetic development of our cognitive system, is selecting actions but not attention, memory, or other internal processes. And yet, there is very little attempt to create system­ atic places for action in research on these internal processes. Overcoming these kinds of splendid isolations requires the systematic development of more integrative and more ambitious approaches than those currently being discussed in the literature. Third, psychology is an interface discipline connecting the humanities with sci­ ence. With respect to action, this means that we are facing two very different approaches to human action: the humanities approach that emphasizes reasons and considerations leading to action, and the biological approach that emphasizes causes and mechanisms. The dividing line between these two different meta-theoretical approaches is even going through psychology, separating cognitive/neurocognitive psychology from social psycholo­ gy. The recent years have seen very interesting attempts to overcome these kinds of divi­ (p. 275)

sions, for instance by developing mechanistic models of social phenomena and by consid­ ering socially relevant concepts in cognitive models. The systematic continuation and fur­ ther development of these attempts provides interesting opportunities for psychology in bridging between different scientific languages and styles of thinking.

References Allport, D. A., Styles, E. A., & Hsieh, S. (1994). Shifting intentional set: Exploring the dy­ namic control of tasks. In C. Umilta & M. Moscovitch (Eds.), Attention and performance XV (pp. 421–452). Cambridge, MA: MIT Press. Altmann, E. M. & Gray, W. D. (2002). Forgetting to remember: The functional relationship of decay and interference. Psychological Science, 13, 27–33. Baars, B. J. (1988). A cognitive theory of consciousness. Cambridge, UK: Cambridge Uni­ versity Press. Band, G. P. H., van Steenbergen, H., Ridderinkhof, K. R., Falkenstein, M., & Hommel, B. (2009). Action-effect negativity: Irrelevant action effects are monitored like relevant feed­ back. Biological Psychology, 82, 211–218. Bargh, J. A. (1989). Conditional automaticity: Varieties of automatic influences in social perception and cognition. In J. S. Uleman & J. A. Bargh (Eds.), Unintended thought (pp. 3– 51). New York: Guilford Press. Bargh, J. A. (1994). The four horsemen of automaticity: Intention, awareness, efficiency, and control as separate issues. In R. Wyer & T. Srull (Eds.), Handbook of social cognition (pp. 1–40). Hillsdale, NJ: Lawrence Erlbaum Associates.

Page 15 of 21

Goal-Directed Actions Bargh, J. A. (1997). The automaticity of everyday life. In R. S. Wyer (Ed.), Advances in so­ cial cognition (pp. 1–61). Mahwah, NJ: Lawrence Erlbaum Associates. Bekkering, H., & Neggers, S. F. W. (2002). Visual search is modulated by action inten­ tions. Psychological Science, 13, 370–374. Bernstein, N. (1967). The coordination and regulation of movements. Oxford: Pergamon Press. Botvinick, M. M. (2007). Conflict monitoring and decision making: Reconciling two per­ spectives on anterior cingulate function. Cognitive Affective & Behavioral Neuroscience, 7, 356–366. Botvinick, M., & Cohen, J. (1998). Rubber hands “feel” touch that eyes see. Nature, 391, 756. Botvinick, M., Nystrom, L. E., Fissell, K., Carter, C. S., & Cohen, J. D. (1999). Conflict monitoring versus selection-for-action in anterior cingulate cortex. Nature, 402, 179–181. Chambon V., & Haggard, P. (2013). Premotor or ideomotor: How does the experience of action come about? In W. Prinz, M. Beisert, & A. Herwig (Eds.), Action science: Founda­ tions of an emerging discipline (pp. 359–380). Cambridge, MA: MIT Press. de Jong, R., Berendsen, E., & Cools, R. (1999). Goal neglect and inhibitory limitations: Dissociable causes of interference effects in conflict situations. Acta Psychologica, 101, 379–394. Dickinson, A., & Balleine, B. W. (1994). Motivational control of goal-directed action. Ani­ mal Learning & Behaviour, 22, 1–18. Dickinson, A., & Balleine, B. (2009). Hedonics: The cognitive-motivational interface. In M. L. Kringelbach & K. C. Berridge (Eds.), Pleasures of the brain (pp. 74–84). New York: Ox­ ford University Press. Donders, F. C. (1868). Over de snelheid van psychische processen. Onderzoekingen, gedan in het physiologisch laboratorium der Utrechtsche hoogeschool, 2. reeks, 2, 92– 120. Eimer, M. (1995). Stimulus-response compatibility and automatic response activation: Ev­ idence from psychophysiological studies. Journal of Experimental Psychology: Human Per­ ception and Performance, 21, 837–854. Eitam, B., Kennedy, P. M., & Higgins, T. E. (2013). Motivation from control. Experimental Brain Research, 229, 475–484. Elsner, B., & Hommel, B. (2001). Effect anticipation and action control. Journal of Experi­ mental Psychology: Human Perception and Performance, 27, 229–240.

Page 16 of 21

Goal-Directed Actions Elsner, B., & Hommel, B. (2004). Contiguity and contingency in the acquisition of action effects. Psychological Research, 68, 138–154. Elsner, B., Hommel, B., Mentschel, C., Drzezga, A., Prinz, W., Conrad, B., & Siebner, H. R. (2002). Linking actions and their perceivable consequences in the human brain. Neuroim­ age, 17, 364–372. Exner, S. (1879). Physiologie der Grosshirnrinde. In L. Hermann (Ed.), Handbuch der Physiologie, 2. Band, 2. Theil (pp. 189–350). Leipzig: Vogel. Fagioli, S., Hommel, B., & Schubotz, R.I. (2007). Intentional control of attention: Action planning primes action-related stimulus dimensions. Psychological Research, 71, 22–29. Fodor, J. A. (1983). Modularity of mind: An essay on faculty psychology. Cambridge, MA: MIT Press. Frith, C. D., Blakemore, S. J., & Wolpert, D. M. (2000). Abnormalities in the awareness and control of action. Philosophical Transactions of the Royal Society of London B: Biolog­ ical Sciences, 355, 1771–1788. Glover, S. (2004). Separate visual representations in the planning and control of action. Behavioral and Brain Sciences, 27, 3–24. Grafton, S. T. (2009). Embodied cognition and the simulation of action to understand oth­ ers. Annals of the New York Academy of Sciences, 1156, 97–117. Gratton, G., Coles, M. G. H., & Donchin, E. (1992). Optimizing the use of information: Strategic control of activation of responses. Journal of Experimental Psychology: General, 121, 480–506. Harless, E. (1861). Der Apparat des Willens. Zeitschrift fuer Philosophie und philosophis­ che Kritik, 38, 50–73. Herwig, A., Prinz, W., & Waszak, F. (2007). Two modes of sensorimotor integration in in­ tention-based and stimulus-based actions. Quarterly Journal of Experimental Psychology, 60, 1540−1554. Hommel, B. (1996). The cognitive representation of action: Automatic integration of per­ ceived action effects. Psychological Research, 59, 176–186. Hommel, B. (2000). The prepared reflex: Automaticity and control in stimulus-re­ sponse translation. In S. Monsell & J. Driver (Eds.), Control of cognitive processes: Atten­ (p. 276)

tion and performance XVIII (pp. 247–273). Cambridge, MA: MIT Press. Hommel, B. (2007). Consciousness and control: Not identical twins. Journal of Conscious­ ness Studies, 14, 155–176. Hommel, B. (2009). Action control according to TEC (theory of event coding). Psychologi­ cal Research, 73, 512–526. Page 17 of 21

Goal-Directed Actions Hommel, B. (2010). Grounding attention in action control: The intentional control of se­ lection. In B. J. Bruya (Ed.), Effortless attention: A new perspective in the cognitive sci­ ence of attention and action (pp. 121–140). Cambridge, MA: MIT Press. Hommel, B. (2013). Dancing in the dark: No role for consciousness in action control. Frontiers in Psychology, 4,380. Hommel, B. (2015). Action control and the sense of agency. In P. Haggard & B. Eitam (Eds.), The sense of agency (pp. 307–326). New York: Oxford University Press. Hommel, B. (2016). Embodied cognition according to TEC. In Y. Coello & M. Fischer (eds.), Foundations of embodied cognition, Vol. 1: Perceptual and emotional embodiment (pp. 75–92). New York: Taylor & Francis. Hommel, B., & Eglau, B. (2002). Control of stimulus-response translation in dual-task per­ formance. Psychological Research, 66, 260–273. Hommel, B., & Elsner, B. (2009). Acquisition, representation, and control of action. In E. Morsella, J. A. Bargh, & P. M. Gollwitzer (Eds.), Oxford handbook of human action (pp. 371–398). New York: Oxford University Press. Hommel, B., Müsseler, J., Aschersleben, G., & Prinz, W. (2001a). The Theory of Event Cod­ ing (TEC): A framework for perception and action planning. Behavioral and Brain Sciences, 24, 849–937. Hommel, B., Müsseler, J., Aschersleben, G., & Prinz, W. (2001b). Codes and their vicissi­ tudes. Behavioral and Brain Sciences, 24, 910–937. Hommel, B., Proctor, R. W., & Vu, K.-P. L. (2004). A feature-integration account of sequen­ tial effects in the Simon task. Psychological Research, 68, 1–17. James, W. (1890). The principles of psychology, Vol. 2. New York: Dover Publications. Klemm, W. R. (2010). Free will debates: Simple experiments are not so simple. Advances in Cognitive Psychology, 6, 47–65. Kray, J., Eenshuistra, R., Kerstner, H., Weidema, M., & Hommel, B. (2006). Language and action control: The acquisition of action goals in early childhood. Psychological Science, 17, 737–741. Kühn, S., Keizer, A., Rombouts, S. A. R. B., & Hommel, B. (2011). The functional and neur­ al mechanism of action preparation: Roles of EBA and FFA in voluntary action control. Journal of Cognitive Neuroscience, 23, 214–220. Kunde, W. (2001). Response-effect compatibility in manual choice reaction tasks. Journal of Experimental Psychology: Human Perception and Performance, 27, 387–394.

Page 18 of 21

Goal-Directed Actions Libet, B., Wright, E., & Gleason, C. (1982). Readiness potentials preceding unrestricted spontaneous pre-planned voluntary acts. Electroencephalography and Clinical Neuro­ physiology, 54, 322–325. Lotze, R. H. (1852). Medicinische Psychologie oder die Physiologie der Seele. Leipzig: Weidmann’sche Buchhandlung. Ma, K., & Hommel, B. (2015). Body-ownership for actively operated non-corporeal ob­ jects. Consciousness and Cognition, 36, 75–86. Masicampo, E. J., & Baumeister, R. F. (2013). Conscious thought does not guide momentto-moment actions-it serves social and cultural functions. Frontiers in Psychology, 4, 478. Melcher, T., Weidema, M., Eenshuistra, R. M., Hommel, B., & Gruber, O. (2008). The neur­ al substrate of the ideomotor principle: An event-related fMRI analysis. NeuroImage, 39, 1274–1288. Memelink, J., & Hommel, B. (2013). Intentional weighting: A basic principle in cognitive control. Psychological Research, 77, 249–259. Milner, A. D., & Goodale, M. A. (1995). The visual brain in action. Oxford: Oxford Univer­ sity Press. Monsell, S. (2003). Task switching. Trends in Cognitive Sciences, 7, 134–140. Morrison, I., Lloyd, D., di Pellegrino, G., and Roberts, N. (2004). Vicarious responses to pain in anterior cingulate cortex: Is empathy a multisensory issue? Cognitive, Affective, and Behavioral Neuroscience, 4, 270–278. Morsella, E. (2005). The function of phenomenal states: Supramodular interaction theory. Psychological Review, 112, 1000–1021. Nielsen, T. I. (1963). Volition: A new experimental approach. Scandinavian Journal of Psy­ chology, 4, 225–230. O’Brien, L., & Soteriou, M. (Eds.) (2009). Mental actions. Oxford: Oxford University Press. Passingham, R. (1997). Functional organisation of the motor system. In R. S. J. Frack­ owiak, K. J. Friston, C. D. Frith, R. J. Dolan, & J. C. Mazziota (Eds.), Human brain function (pp. 243–274). San Diego, CA: Academic Press. Piaget, J. (1946). La formation du symbole chez l’enfant. Paris: Delachaux & Niestlé. Rochat, P. (2001). The infant’s world. Cambridge, MA: Harvard University Press. Sanchez-Vives, M. V., Spanlang, B., Frisoli, A., Bergamasco, M., & Slater, M. (2010). Virtu­ al hand illusion induced by visuomotor correlations. PLoS ONE, 5, e10381.

Page 19 of 21

Goal-Directed Actions Scepkowski, L.A., & Cronin-Golomb, A. (2003). The alien hand:Cases, categorizations, and anatomical correlates. Behavioral and Cognitive Neuroscience Reviews, 2, 261–277. Shanks, D. R., & Dickinson, A. (1987). Associative accounts of causality judgment. Psy­ chology of Learning and Motivation, 21, 229–261. Shiffrin, R. M., & Schneider, W. (1977). Controlled and automatic human information pro­ cessing. II. Perceptual learning, automatic attending and a general theory. Psychological Review, 84, 127–190. Shin, Y. K., Proctor, R. W., & Capaldi, E. J. (2010). A review of contemporary ideomotor theory. Psychological Bulletin, 136, 943–974. Simon, J. R., & Rudell, A. P. (1967). Auditory S-R compatibility: The effect of an irrelevant cue on information processing. Journal of Applied Psychology, 51, 300–304. Sommer, W., Leuthold, H., & Hermanutz, M. (1993). Covert effects of alcohol revealed by event-related potentials. Perception & Psychophysics, 54, 127–135. Spapé, M. M., & Hommel, B. (2014). Sequential modulations of the Simon effect depend on episodic retrieval. Frontiers in Psychology, 5, 855. Spengler, S., von Cramon, D. Y., & Brass, M. (2009). Was it me or was it you? How the sense of agency originates from ideomotor learning revealed by fMRI. Neuroimage, 46, 290–298. Sternberg, S. (2011) Modular processes in mind and brain. Cognitive Neuropsy­ chology, 28, 156–208. (p. 277)

Stock, A., & Stock, C. (2004). A short history of ideo-motor action. Psychological Research, 68, 176–188. Stroop, J. R. (1935). Studies of interference in serial verbal reactions. Journal of Experi­ mental Psychology, 28, 643–662. Synofzik, M., Vosgerau, G., & Newen, A. (2008). Beyond the comparator model: A multi­ factorial two-step account of agency. Consciousness and Cognition, 17, 219–239. Tsakiris, M. (2010). My body in the brain: A neurocognitive model of body-ownership. Neuropsychologia, 48, 703–712. Tsakiris, M., Schütz-Bosbach, S., & Gallagher, S. (2007). On agency and body-ownership: Phenomenological and neurocognitive reflections. Consciousness and Cognition, 16, 645– 660. Valle-Inclán, F., & Redondo, M. (1998). On the automaticity of ipsilateral response activa­ tion in the Simon effect. Psychophysiology, 35, 366–371.

Page 20 of 21

Goal-Directed Actions Van den Bos, E., & Jeannerod, M. (2002). Sense of body and sense of action both con­ tribute to self-recognition. Cognition, 85, 117–187. van Steenbergen, H., Band, G. P. H., & Hommel, B. (2009). Reward counteracts conflict adaptation: Evidence for a role of affect in executive control. Psychological Science, 20, 1473–1477. Verschoor, S. A., Spapé, M., Biro, S., & Hommel, B. (2013). From outcome prediction to action selection: Developmental change in the role of action-effect bindings. Developmen­ tal Science, 16, 801–814. Verschoor, S. A., Weidema, M., Biro, S., & Hommel, B. (2010). Where do action goals come from? Evidence for spontaneous action-effect binding in infants. Frontiers in Psy­ chology, 1, 201. Vygotsky, L. (1934/1986). Thought and language. Cambridge, MA: MIT Press. Wasserman, E. A. (1990). Detecting response-outcome relations: Toward an understand­ ing of the causal texture of the environment. Psychology of Learning and Motivation, 26, 27–82. Watson, P., Wiers, R. W., Hommel, B., & de Wit, S. (2014). Working for food you don’t de­ sire: Cues interfere with goal-directed food-seeking. Appetite, 79, 139–148. Wegner, D. M. (2002). The illusion of conscious will. Cambridge, MA: MIT Press. White, R. W. (1959). Motivation reconsidered: The concept of competence. Psychological Review, 66, 297–333. Wilson, M. (2002). Six views of embodied cognition. Psychonomic Bulletin & Review, 9, 625–636. Wolpert, D. M., & Ghahramani, Z. (2000). Computational principles of movement neuro­ science. Nature Neuroscience, 3, 1212–1217. Woodworth, R. S. (1938). Experimental psychology. New York: Holt, Rinehart & Winston. (p. 278)

Bernhard Hommel

Cognitive Psychology UnitLeiden Institute for Brain and Cognition Leiden University Leiden, The Netherlands

Page 21 of 21

Planning and Control

Planning and Control   Magda Osman The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.19

Abstract and Keywords For the best part of 30 years, the most influential theoretical and empirical work examin­ ing control-based decision-making and planning behaviors has largely neglected the im­ portance of causality. Causal relations are essential for capturing the structural relation­ ship between events in the world and individuals as they coordinate their actions toward anticipating (planning) and then managing (control) those events. Causal representations are the mental form by which individuals are able to simulate future events resulting from actions aimed at reaching a goal (planning), and maintaining that goal (control). The aim of this chapter is to examine the unsung work in planning and control that brings the role of causal relations and causal representations into the fore, and to speculate what the future research horizons for both might look like. Keywords: planning, control-based decision-making, causal relations, causality, causal representation

An Example: Public Policy Program Imagine that the government has decided that reducing levels of obesity in the population is one of its key priorities, and it has proposed that a public policy program is needed to deal with this issue. Developing a set of interventions to help promote healthy living and reduce levels of obesity needs to be decided soon because the annual spending review is imminent; this means that the new public policy program must be budgeted before being sure of the rate of success of any intervention. A panel of psychologists, behavioral econo­ mists, physicians, and dieticians have been commissioned to survey the latest scientific research. Their report suggests that the key factors associated with obesity are poor diet, lack of exercise, poor lifestyle choices, and negative self-image. Their recommendation for tackling obesity is to intervene at two levels, which will work at different time scales. The first intervention, which is designed to reap rewards in the long term, involves edu­ cating parents and children about the benefits of healthy eating and warning them about the consequences of poor diet and lack of exercise. The second intervention, which is de­

Page 1 of 25

Planning and Control signed as a quick fix, involves introducing a “fat tax” on foods that have low nutritional value and high fat content. Using the preceding example, this chapter begins by setting out the features of situations in which planning and control-based decision-making behaviors are found, and the core characteristics underlying both processes. Then the chapter will detail what role causali­ ty plays in the mental representations that individuals form when planning and control­ ling future outcomes, and how causality helps to capture the essential structural proper­ ties of the problems that people face when planning and controlling the future. Then the remainder of the chapter focuses specifically on empirical work on dynamically uncertain situations for which there is growing interest in the connection between control and casu­ al representations/causal relations. By drawing the core insights together at the end of the (p. 280) chapter, the aim here is to conclude with a short comment on what we cur­ rently know, and what we should focus on as future directions of research on planning and control-based decision-making behaviors.

Defining Planning and Control-Based DecisionMaking Problems The preceding example is a real-world problem that many governments face; obesity is a global epidemic (WHO, 2000), but a preventable one. In this example we find that the ac­ tions taken by the government broadly involve developing a course of interventions (i.e., planning) in order to change behavior from its current state to a future state, in which more people are healthier and fewer are overweight/obese (i.e., exerting control by bring­ ing about a desirable outcome). The problem the government faces is ill-defined in the sense that there is no single solution to the problem, but rather multiple options or strate­ gies that could be taken to achieve the same desirable outcome (Schraw, Dunkle, & Ben­ dixen, 1995; Simon, 1977). Also, as with many other real-world problems, it cannot be solved in one step, but rather the problem requires that the target goal (i.e., the solution to the overarching problem) be decomposed into sub-goals, which each require that a number of steps are met (Sweller, 1988); the number of sub-goals depends on the com­ plexity and speed with which the target goal will (needs to) be reached. In addition, the problem illustrated here is transformational, which is another key feature in which con­ trol and planning behaviors are found—that is, when solving a problem, the current state needs to be transformed in order to get close to a solution (Greeno, 1978). Another way of thinking about the example is that the problem can also be characterized as a situation of dynamic uncertainty (Brehmer, 1992; Osman, 2010a, 2014a). It is diffi­ cult to predict what is likely to happen as a result of the government’s decisions because there are so many unknown factors involved in most real-world situations; this means that what we plan for and how successful we are at controlling events involve estimates of uncertainty (i.e., forecasting the likelihood of certain relevant events in the problem space). Often, the most challenging aspect of dynamic uncertainty is trying to isolate the effect of one’s actions from the effects generated as a result of internal factors in the Page 2 of 25

Planning and Control structural composition of the problem. Or, in some cases, it can be harder still because the changes to the states of the problem can even result from a combination of external and internal factors. For instance, what we have in the example is a choice of interventions (e.g., educating parents, introducing a fat tax) that are being made without knowing or being able to ac­ curately know at any point in the future the likely rate of success because circumstances (foreseen and unforeseen) are likely to change the structure of the problem in the short and long term. This will in turn impact on the success of the interventions taken, and what new possible interventions might be needed to tackle the future changes that will result from trying to solve the problem. For instance, the outcome of a recent spending review and a downturn in the economic climate could lead to higher unemployment and greater poverty, and therefore resistance to the fat tax bills, and cuts to facilities for edu­ cating parents and children. These would be difficult to plan for and control, because the combination of factors could work in an additive or multiplicative way such that the inter­ ventions taken by the government would have limited or reverse effects. To sum up here, we have various ways in which to characterize real world problems, which essentially reduce down to factors that make it hard to identify what course of ac­ tion to take, what the likely outcomes are, and how to adapt one’s actions in light of changes to the state of the problem. And yet, we are able to plan and control—so what are the processes behind these faculties?

Association Between Planning Behaviors and Control-Based Decision-Making To “plan” is to coordinate a set of actions (i.e., develop a procedure) that are directed to­ ward achieving a specified goal (Hayes-Roth & Hayes-Roth, 1979; Ward & Morris, 2005). Thus forming a plan (i.e., planning) is essentially a goal-directed activity that involves or­ ganizing one’s thoughts, choices, and actions toward achieving a specific outcome (also see Hommel, Chapter 15 in this volume). The description of control-based decision-mak­ ing follows a similar line in which the process of controlling an outcome is characterized as a goal-directed process that involves choosing, from a range of possible actions, those that are judged to reliably achieve and maintain a desired goal (Brehmer, 1992); hence this is referred to as a form of decision-making. In addition, control in general is seen as a combination of cognitive processes (attention, memory, reasoning, (p. 281) monitoring, and prediction) that include planning and that are needed to coordinate actions in order to achieve a goal on a reliable basis over time (Osman, 2014a). Planning behaviors can be subject to the same constraints as control-based decision-making because both are cogni­ tively expensive, given that both engage similar executive processes (Güss, 2000; Strohschneider, 2001; Sundberg, Poole, & Tyler, 1983); that is, they both involve activities that include remembering and manipulating information over a delayed period, switching between different domains of knowledge/information types, initiating plans of action inde­ pendent of external prompts, employing metacognition–tracking behavior, updating esti­ Page 3 of 25

Planning and Control mates of success of achieving outcomes, multitasking, and engaging prospective memory (Gilbert & Burgess, 2008). Another obvious similarity between planning and control-based decision-making is that they have a long-established history in theories of problem-solving. Planning has enjoyed this association, dating back to the research program developed by Simon and his col­ leagues (Newell, Shaw, & Simon, 1958; Newell & Simon, 1972; Simon, 1978). Work on control also has a long heritage in the problem-solving domain, to the extent that it has, and continues to be, referred to as complex problem-solving (Dörner, 1987; Dörner, Kreuzig, Reither, & Stäudel, 1983; Dörner & Wearing, 1995; Funke, 2001). Again, the rea­ son for the same etiology is that the defining conditions under which planning and con­ trol-based decision-making behaviors are similar: fundamentally, the situations are ill-de­ fined, transformational, with multiple steps and multiple sub-goals. What precisely the critical differences are between planning and control within both research communities appears to be vague, but this is because, in the case of planning, the community has yet to settle on a good definition (McCormack & Atance, 2011). However, one clue as to the essential differences between the two types of processes is the paradigms in which they are studied. Typically, the measurement tools that are specifically developed to study planning involve transformations of states of problems and finding the most efficient way to get to an end state (e.g., Balls and Boxes [Kotovsky & Simon, 1990]; Missionaries and Cannibals [Reed, Ernst, & Banerji, 1974]; Tower of Hanoi [Hayes & Simon, 1974], see Figure 16.1).

Figure 16.1 Depiction of a typical Tower of Hanoi task in various stages of being solving. Source: Figure provided by http:// www.ritambhara.in/tower-of-hanoi-code.

(p. 282)

These are not tasks of uncertainty, or dynamic in the way that tasks devised to ex­

amine control-based decision-making are (Berry & Broadbent, 1984; Dörner, 1975; Toda, 1962). Here the situation the individual faces involves transforming states in which there can be epistemic uncertainty regarding the source of the state change (i.e., the change

Page 4 of 25

Planning and Control can result from the individual’s actions, the internal operations of the problem situation, or both; see Figure 16.2). Even here, some have argued that planning behaviors are often required in dynamically uncertain contexts as well, as in the example at the start of this chapter, in which plans are set in place to determine a particular outcome, and in which the circumstances are li­ able to change as the plans are executed. Thus some have commented that planning be­ haviors are also adaptive and flexible in order to deal with unforeseen events (McCorma­ ck & Atance, 2011). Another critical factor that might differentiate planning from control is that control be­ haviors involve the maintenance or regulation of a goal state, again because the situa­ tions in which control-based decision-making is required are dynamic, whereas planning often involves predetermining a succession of actions leading toward a single final out­ come (Hayes-Roth & Hayes-Roth, 1979; Ward & Morris, 2005). Generally, though, planning and control engage executive functions, can be seen as goaldirected processes (Funke, 2010), and are distinctly future-minded (Atance & O’Neill, 2001; Brandimonte, Einstein, & McDaniel, 1996; Burgess, Simons, Coates, & Channon, 2005; Hayes-Roth & Hayes-Roth, 1979; Jeannerod, 1994; Osman, 2014a; Seligman, Rail­ ton, Baumeister, & Sripada, 2013); that is, they are goal-directed processes that are ways of psychologically pinning down our future world by developing expectations for out­ comes we think will occur in the future, so that we can act in ways that can bring about those future desirable outcomes. Thus planning and control are psychological processes that help reduce the uncertainty of the future through achieving and maintaining goals (Osman, 2014a).

Page 5 of 25

Planning and Control

Figure 16.2 Illustration of a dynamic control task in which there are input variables that the participant can intervene on, and output variables that indicate the effects of the interventions that the participant makes. The various stages that are involved in a sin­ gle trial are shown. What the participant is unaware of, but must learn, is the dynamic properties of the system he or she is attempting to control, and the causal structure that captures the relationship be­ tween input and output variables. Source: Osman & Ananiadis-Basias (2014).

Causality: Causal Representations and Causal Relations (p. 283)

We have not yet mentioned the concept of causation in association with either planning or control-based decision-making, but it is actually implicit in both processes. Causality is critical by way of the psychological (mental) representations people form about a problem space, and is the conceptual apparatus that connects the various elements of problem space together. At this juncture, some discussion is required of the difference between causal relations in the world (causal texture according to Tolman & Brunswik, 1935) and causal representations that people have (Burstein, 1988; Dretske, 1982; Gopnik, 1982). Casual relations concern the environment in which there are contingencies between caus­ es and effects, such that one can be used to predict the other. At the most fundamental level, we perform actions on the basis of learning something about the combination of contingencies. This can be difficult because the world is made up of situations in which there are numerous combinations, for instance a cause can have multiple effects, but also there can be multiple causes of a single effect (Tolman & Brunswik, 1935). Nevertheless, for the concept of causal relations, we have a belief that there are existing structures that connect events in our environment that we are attuned to and respond to. When it comes to causal representations, what is meant here is that there are mental constructs that contain information about the properties regarding combinations of events (i.e., the sta­ tistical, structural, mechanistic) (Holyoak & Cheng, 2011), also see Rottman (Chapter 6 in Page 6 of 25

Planning and Control this volume). Critically, for the purposes of the discussion in this chapter, the concept of causal representations adopted here is that our causal understanding of the world is based on mental representations of cause–effect relations that we believe exist in the ex­ ternal world with which we interact (Gallistel 1990; Holyoak & Cheng, 2011; Waldmann & Holyoak 1992). Unfortunately, the general approach in the literature on planning and control, particular­ ly work on control, is that the problem that the planner/controller faces (i.e., the variables that need to be manipulated in order to reach a desirable state/maintain that state) is de­ scribed in non-causal terms (i.e., unspecified). That is, the most typical structural form used to represent the problem space is input–output connections, and the functions are used to describe the dynamics of events experienced in the problem without any refer­ ence to causal relations. Correspondingly, in both literatures, typically the representa­ tions that the planner/controller has are also assumed to be non-causal. However, what this chapter aims to show is that in actual fact I and others assume that, particularly for dynamically uncertain events that are experienced in real-world problem scenarios (that in turn require planning and control processes), dynamic events follow causal laws. More to the point in recent experimental work, again particularly on control-based decisionmaking in dynamically uncertain situations, the ability to successfully control a dynamic environment is in fact dependent on the accuracy of the causal representations that are acquired either through experimental instructions, or spontaneously. More specifically, the kinds of causal representations that planners/controllers are likely to develop, and the structures of the problems themselves, will contain properties that in­ clude the following: causal arrows (direction of association between cause–effect vari­ ables; Hagmayer, Sloman, Lagnado, & Waldmann, 2007); causal models (e.g., common ef­ fect [multiple causes that generate the same effect; Waldmann & Holyoak 1992] and com­ mon cause [a single cause that has multiple effects; Reichenbach, 1956]); asymmetry of observing and intervening (that there are differences in observing statistical regularities, and making interventions, in other words “doing,” something either hypothetically (i.e., we imagine what we could do) or real (i.e., we perform a particular act), in which consid­ eration of the impact of action on the causal system is inferred; Meder, Gerstenberg, Hag­ mayer, & Waldmann, 2010); explaining away (one cause explains the observed effect, and therefore reduces the need to invoke other causes; Wellman & Hennrion, 1993); temporal and causal order (the sequence by which events occur influences representations of cause–effect relation; Segura, Fernandez-Berrocal, & Byrne, 2002); and spurious versus causal relations (Lewis, 1973). In summary, the crucial point is that, while this is not typical of theoretical and empirical approaches in planning and control, the rest of this chapter aims to show that there are core causal concepts that feed into the way in which we plan sequences of actions that are designed to reach a goal designed to maintain an outcome(s) in the face of a constant­ ly changing environment. Armed with a distinction between causal representations and causal relations, and qualifications of the kinds of causal concepts that influence the way in which the world might be structured and how we might represent it, the next (p. 284) Page 7 of 25

Planning and Control step is to examine where causal representations and causal relations emerge in the theo­ retical landscape of planning and control research.

Planning Behaviors: Theoretical Connections to Causal Representations and Causal Rela­ tions With regard to work on planning, there are many avenues that theoretical work has ex­ plored, but not until recently has there been a dedicated development of the connection between planning and the role of causal representations or causal relations. There has been a dedicated literature on action planning (Gollwitzer, 1999), which has been a way of specifying the contexts under which there is a correspondence between our intentions and our actions, and when there is a breakdown. Much of this work has been extended to the applied domain, particularly the health domain, as has work on coping planning (Sniehotta, Schwarzer, Scholz, & Schuz, 2005), which is essentially concerned with deal­ ing with the obstacles between intended and planned actions. However, the most com­ mon theoretical interest is focused around the paradigmatic work of the Tower of Hanoi and Tower of London problem-solving tasks (Burgess, et al, 2005; Dehaene & Changeux, 1997; Fincham, Carter, van Veen, Stenger, & Anderson, 2002; Gilbert, & Burgess, 2008; Phillips, Wynn, McPherson, & Gilhooly, 2001; Shallice, 1982; Ward & Allport, 1997). Most of this work is concerned with the connections that planning has to executive functions, or alternatively, what constitutes planning as an executive function. Connected to this is a separate debate regarding whether planning involves special processes that are distinct from other core cognitive processes, or whether there is a generic set of processes that contributing to moving an individual from his or her current state into a new desirable fu­ ture state (Burgess et al., 2005). Nevertheless, aside from the aforementioned concerns of the planning community, there were early hints as to the role of causality in planning. Schank and Abelson (1977) were among the earliest theorists to identify causal representations, as distinct from other types of representations, which we use to organize and structure our behavior. They pro­ posed that when we face problem scenarios in which we need to plan our behavior, we have at our disposal representations of typical sequences of previously performed ac­ tions, or causal chains of component events, which they referred to as “scripts.” The as­ sumption here is that planning involves a goal that can be reached through the hierarchi­ cal organization of actions that have consequences, and that the combinations of actions and consequences form causal chains that can be used to infer what to do in a context that is unfamiliar. This very simple idea has formed the basis of much current work, par­ ticularly formal modeling of planning behaviors. This focus has been concerned with rep­ resentational systems that enable domain-independent generative planning (in other words, a generic representational system that can be applied to any situation which re­ quires planned actions; Kambhampati, 1993). The view is that causal representations are flexibly applied and reused in multiple situations to anticipate likely future consequences Page 8 of 25

Planning and Control that follow from sequences of actions (Kambhampati, 1993; Minton, 2014). Moreover, the compliment to this is the view that the planning scenario itself can be formally described in terms of causal relations, with dependencies between the various components of the problem situation. Returning to the psychological domain, in spite of the various focuses taken by theoreti­ cal work, it is fair to say that there is more agreement than disagreement about the fact that, while planning may depend on stored previous experiences to anticipate/simulate fu­ ture outcomes in familiar and unfamiliar situations, critical to planning is “looking ahead,” or thinking prospectively (Atance & O’Neill, 2001; Brandimonte et al., 1996; Burgess, et al., 2005; Hayes-Roth & Hayes-Roth, 1979; Jeannerod, 1994; Osman, 2014a, 2014b; Seligman et al., 2013). An area of work that has been gaining momentum and has been forging connections with work on causal learning and causal reasoning is prospec­ tive thinking (Osman, 2014a, 2014b; Seligman et al., 2013). For instance, Osman (2014c) defined prospection as “the process of representing and planning for possible future states of the world,” and Seligman et al. (2013) define it as “guidance not by the future it­ self but by present, evaluative representations of possible future states.” One theoretical claim that is shared by both Osman (2014a, 2014b, 2014c) and Seligman et al. (2013) is that people are able to reach intended goals by using causal representa­ tions to associate planned actions with intended outcomes (Karniol & Ross, 1996; Strath­ man, Gleicher, Boninger, Edwards, 1994). In other words, both view goals as prospects; they are representations of future possible outcomes, and they guide planned behavior. But, as highlighted by work in the machine-learning domain, our ability to construct rep­ resentations (p. 285) of the future and adapt to future outcomes is built on causal repre­ sentations that help generate expectations (i.e., form predictions of likely outcomes), which help organize the sequences of actions that need to be carried out to reach an in­ tended goal (a view also endorsed by Gollwitzer, 1999). One strong theoretical claim that Osman (2014b) makes regarding the link between causal representation and planning is that accurate representations of the future states that lead to reaching an intended goal will depend on the accuracy of causal representa­ tions that support the planned actions needed to achieve the desired future outcome. The prediction that follows from Osman’s (2014a) future-minded framework is that the more people unpack their causal representations about the consequences of their actions that are directed toward achieving a future outcome (i.e., goal), (a) the higher their judgments of likelihood will be of achieving the future outcome, and (b) the more efficient (few, or more effective) the sequence of planned actions will be in reaching the future outcome. The concept of unpacking is based on Tversky and Koehler’s (1994) support theory, in which they propose that there is a focal hypothesis. In this case, the focal hypothesis can be considered the goal that we intend to reach. It will have a higher probability of being reached if we decompose (unpack) it into its constituents (i.e., the sub-goals that con­ tribute to it). This means that when we do this we are expanding our causal representa­ tion so that we consider more factors that need to be taken into account when planning our behaviors. Thus the likelihood of us achieving our goal is increased through unpack­ Page 9 of 25

Planning and Control ing, because we are forming more comprehensive causal representations on which to or­ ganize our planned behaviors. In summary, the theoretical work on planning has made some early connections with causal representations, but it is only recently that there has been a dedicated effort to in­ tegrate causal representations into planning behaviors, particularly with the most recent developments in the nascent research field of prospective thinking. However, while the theoretical grounding exists, there is limited empirical work to support the claims re­ viewed here.

Control-Based Decision-Making: Theoretical Connections to Causal Representations and Causal Relations The research on control-based decision-making has been dominated for over 30 years by theoretical work that has largely been concerned with two issues: (1) the formulations of the extent to which intuitive (unconscious) and hypothesis-testing (conscious) processes guide learning and decision-making processes in unfamiliar (novice) and familiar (expert) dynamically uncertain control situations; and (2) the extent to which control behaviors can be predicted by individual difference measures of cognitive ability. There is considerable support for the view that there appears to be dissociation between intuitive and hypothesis-testing processes (Berry & Broadbent, 1984, 1987; Dienes & Fa­ hey, 1995; Eitam, Hassin, & Schul, 2008; Gibson, Fichman, & Plaut, 1997; Lee, 1995; Stanley, Mathews, Buss, & Kotler-Cope, 1989), and in large part, both in familiar and un­ familiar situations, intuitive processes predominantly guide control-based processes. However, there is also work by Osman (2008a, 2008b, 2008c, 2010a, 2010b, 2012a, 2012b) and colleagues (Osman & Ananiadis-Basias, 2014; Osman, Ryterska, Karimi, Tu, Obeso, Speekenbrink, & Jahanshahi, 2013; Osman & Speekenbrink, 2011, 2012; Osman, Wilkinson, Beigi, Parvez, & Jahanshahi, 2008) suggesting that dissociationist theories are conceptually problematic, and the empirical support for them, while reliable, is question­ able on methodological grounds. The ongoing work examining the extent to which components of IQ measures and other methods of cognitive aptitude influence control-based decision-making has suggested a complex picture. Some have failed to show an association between control-based behav­ iors and general intelligence (Gebauer, & Mackintosh, 2007; Rigas & Brehmer, 1999; Rigas, Carling, & Brehmer, 2002). However, in line with many individual differences re­ searchers, there is also work supporting the general view that control-based decisionmaking is separate from, but associated with, general intelligence measures (Danner, Hagemann, Schankin, Hager, & Funke, 2011; Greiff, Wüstenberg, Molnár, Fischer, Funke, & Csapó, 2013; Wüstenberg, Greiff, & Funke, 2012). The association is mostly driven by strategic behavior in control tasks and problem-solving skills in general (Greiff et al., 2013). Strategic behavior in control tasks is usually indicated by the way in which individ­ Page 10 of 25

Planning and Control uals develop strategies to initially figure out the characteristics of the dynamic control situation (e.g., the number of variables there are, how they might be connected, in what order the variables should be manipulated, and to what degree) and how they implement the strategies and adjust them in light of the changing circumstances of the dynamic con­ trol scenario (Dörner, 1986, 1987, 1990). The latter of these two issues does have some bearing on the connection between control-based decision-making behaviors and causal representations and causal relations. The dependent measures of a typical dynamic control task involves indexing the type of knowledge acquired regarding the system (i.e., typically reporting the actual relations be­ tween the different variables that are manipulated [input variables] and the effects of those manipulations [output variables]; see Figure 16.2). There is also a measure of knowledge application; this refers to how well people are able to implement what they have learned about the structure of the dynamic control task (i.e., the relations between inputs–outputs) to be able to control it to specific criteria (i.e., maintaining a specific out­ come over time). The design of the input and output relations in a dynamic control task can in effect be thought of in terms of causal relations, as defined earlier in this chapter (p. 286)

(Tolman & Brunswik, 1935). The measure of knowledge acquisition indexes the accuracy of causal representations of the causal relations of the dynamic control task. The work examining the association between control behaviors and general intelligence tells us that knowledge acquisition and knowledge application are in fact distinct mea­ sures of control behavior (Danner et al., 2011; Greiff et al., 2013; Wüstenberg et al., 2012). Nevertheless, the conclusions of the work by Grieff et al. (2013) and a later study (Öllinger, Hammon, von Grundherr, & Funke, 2015) are that knowledge acquisition is a necessary but not a sufficient condition for knowledge application. In other words, the ability to control a dynamic situation is not dependent on the accuracy of causal represen­ tations of the dynamic situation. This is also in line with dissociationist theorists’ claims that our ability to control dynamic situations is based on a mechanism that does not re­ quire structural (including causal representations) or rule-based knowledge (i.e., it is pro­ cedural based). It might well be for this reason that the role of causal representations and causal relations has been of such limited interest in the control-based decision-making community—that is, there has been a long theoretical and empirical precedent showing that structural knowledge and ability to control a dynamic environment are dissociated, rather than associated, so there is little research incentive to focus on the role of causal representations on control-based decision-making. Moreover, this possibly explains why explicit measures of causal representations rarely are recorded, in which people are re­ quired to draw the causal relations that they think are present in a dynamic control task (Funke, 2014). While the dominating issues have only lightly touched on the connection between causal representations and causal relations to control-based decision-making, there has been some notable theoretical work that has argued that causal knowledge is central to the control of behaviors (Dörner, 1990; Dörner, & Wearing, 1995; Eseryel, Ifenthaler, & Ge, 2013; Funke, 2014; Huff, 2002; Kersting & Süß, 1995; Osman, 2008a, 2008b, 2010b, Page 11 of 25

Planning and Control 2014a; Plate, 2010; Sterman, 2000). In addition, some have argued that the causal repre­ sentation that an individual generates while interacting with a dynamic control task re­ flects what dependencies between variables they believe are present in the problem (i.e., the causal relations in the world), and this should be an important index for assessing performance in dynamic control tasks (Dörner & Wearing, 1995; Eseryel et al., 2013; Funke, 2014; Osman, 2008a, 2008c, 2010a, 2014a). More to the point, unlike the work discussed previously, the claim made by these theorists is that causal representations are necessary (not just sufficient) for understanding the specific types of actions that people perform in a dynamic control task. So, in precisely what way do causal representations play a role in control-based decision-making? Some have drawn the connection between control-based decision-making and causal rep­ resentations on the basis of mental simulation (Eseryel et al., 2013). Because causal rep­ resentations allow the individual to connect variables in the problem space (e.g., work­ force, profits, and productivity), this means that the consequences of prospective inter­ ventions can be considered without having to implement the action (e.g., if I want to in­ crease my profits, then I could increase the number of workers in my sugar factory, in which case I know productivity should go up because more workers means greater work produced, but this will reduce my overall profit). Thus causal representations are also as­ sociated with hypothesis testing (Osman, 2008a, 2008c, 2010a), again for the reason that causal representations of the relations between variables in the dynamic environment en­ able people to mentally test out various interventions they plan to take to control the dy­ namic situation. By extension, this also implies that causal representations are able to help people anticipate and predict what is likely to happen as a result of their actions, be­ cause if they (p. 287) have represented the cause–effect relations (though not the same, they are often referred to in the control literature as input–output relations), then they can predict the outcome of their interventions (Osman, 2008a, 2008b; Osman & Speeken­ brink, 2012). Also, a connection has been made between the types of strategies that people develop and the acquisition of accurate causal representations (Funke, 2014). Because dynamic control tasks can have anything between three variables to several hundred variables that need to be tracked and potentially intervened upon, the choice of strategy designed to manipulate variables in order to exert control is extremely important. Choosing to make singular interventions, or to vary one variable at a time (VOTAT; Tschirgi, 1980), is seen as the most effective strategy for learning to control a dynamic control task; the idea is that it is easier to also uncover existing causal relations, as compared to varying as many variables as possible all at once. By choosing which interventions to make, one can understand the causal system better; also, one cannot choose the best interventions to make without having some rudimentary causal representation of the causal system. The general approach to developing systematic interventions (i.e., one’s preferred strategy) has a direct impact on the accuracy of the causal representation one develops, which is in turn used to help develop strategies to control the dynamic causal system.

Page 12 of 25

Planning and Control To sum up, the main focus of theoretical work in control-based decision-making has not directly considered the relevancy of causal representations of causal relations. However, considerable headway has been made, particularly in the domain of cognitive assessment, in advancing the view that both causal representations and causal relations are necessary for control. In addition, the view is that causal representations in particular are implicat­ ed in simulation, hypothesis testing, strategy development, and prediction, all of which are essential features of control-based decision-making. In planning research there is the­ oretical speculation about the relevance of causal representations/causal relations in planning behaviors, and limited empirical work examining this connection. However, this is not true for work in control-based decision-making, where there is a theoretical and empirical basis to suggest that causal representations are central to control-based deci­ sion-making. Before drawing the insights of the chapter together for the concluding sec­ tion, the next section considers the empirical work that has examined the link between causal representations and control.

Control-Based Decision-Making: Empirical Ex­ aminations of the Link Between Causal Repre­ sentations and Control The popular view to come out of pivotal work by Berry and Broadbent (1984, 1988) re­ search program was that many of the complex processes involved in planning actions and exerting control on multiple variables that changed over time were not based on repre­ sentations in the form of rules (or structural representations of the dynamic control task), and were not based on representations that were consciously accessible. Intuition was a main driver in enabling people to handle to complex day-to-day situations that required planning and control. In response to this, many researchers have examined the extent to which knowledge implementation is indeed immune to rules, and instead is based on gen­ eral application of past experiences of successful actions that led to better control of the dynamic control task. The most recent program developed by Osman (2008a, 2008b, 2008c, 2012a, 2012b) and colleagues (Osman & Ananiadis-Basias, 2014; Hagmayer, Med­ er, Osman, Mangold, & Lagnado, 2010; Osman, Ryterska, Karimi, Tu, Obeso, Speeken­ brink, & Jahanshahi, 2013; Osman & Speekenbrink, 2011, 2012; Osman, Wilkinson, Beigi, Parvez, & Jahanshahi, 2008) has explored dynamic control in a number of contexts by ma­ nipulating a number of factors, such as underlying relationship between the variables (i.e., linear vs. non-linear, stable vs. unstable), the role of different framings of the context on dynamic control, comparisons between predictive-based learning and control-based learning in dynamic control tasks, comparisons between healthy participants and clinical populations (i.e., patients with Parkinson’s disease) on dynamic control tasks—in particu­ lar, transfer learning and the impact on causal knowledge gained during dynamic control. The key message from this research program has been that, contrary to dissociationist claims, knowledge of the underlying structure of the problem is critical in determining the accuracy in dynamic control. Page 13 of 25

Planning and Control For instance, Osman and colleagues’ (2012; Osman and Speekenbrink, 2012) study of a medical dynamic task involved examining dynamic control performance in two control tests in which people had to control the system to a variety of different criteria following a stage of training. One group was trained by predicting what happened next to the out­ come value, and the other group was trained by controlling what happened next to the outcome by (p. 288) trying to make it match the target value. In general, the dynamic con­ trol problem still involved the same properties as all the other tasks, in that the problem was ill-defined, transformational, and dynamically uncertain. In this case, the system con­ sisted of three variables that participants could manipulate and one outcome that they had to control the value of. One of the variables increased the outcome value, and the oth­ er variable decreased the outcome value. The third variable had no effect on the outcome and was included to add further difficulty to learning the relationships between the ma­ nipulated variables and the outcome variable. More formally, the task environment can be described as in the following equation

in which y(t) is the outcome on trial t, x1 is the positive cue, x2 is the negative cue, and e a random noise component, normally distributed with a zero mean and standard deviation of 8 (which makes the system moderately unstable and therefore hard to control). In the training phase, the control-based learners were informed that as part of a medical research team they would be conducting tests in which they could inject a patient with any combination of three hormones (labeled as hormones A, B, and C), with the aim of maintaining a specific safe level of neurotransmitter release. Prediction-based learners were assigned the same role, but were told that they would have to predict the level of neurotransmitter release by observing the level of hormones injected. The general find­ ings from the studies revealed that control performance in later tests of control were no different when people had learned about the dynamically uncertain environment via pre­ diction-based learning or control-based learning (Osman, 2012b; Osman & Speekenbrink, 2012). This finding has also been replicated in work that has compared clinical with nonclinical populations (Osman et al., 2008; Osman et al., 2013). More to the point, individu­ als are able to successfully control a dynamically uncertain environment because they have developed causal knowledge of the scenario (Osman, 2014; Osman & AnaniadisBasias, 2014). This programmatic work has been successful at demonstrating that causal representa­ tions of the dynamic environment are implicated in control-based decision-making. One reason for this might be that the cover stories provide a useful frame of reference that can be usefully brought to bear when interacting with the dynamic control task, consis­ tent with some prior work (Blech & Funke, 2006; Burns & Vollmeyer, 2002; Kluge, 2004; Preußler, 1998; Sanderson, 1989). This may also explain the differences between the present work reviewed here and prior work that has failed to show a necessary relation­ ship between causal representations and the ability to accurately control a dynamic sys­ tem (Berry & Broadbent, 1984, 1987; Grieff et al., 2013; Öllinger, et al., 2015; Stanley et Page 14 of 25

Planning and Control al., 1989). However, some prior work has shown that even using control tasks in which limited prior knowledge could be imported, because the cover story involved alien worlds, causal representations of the causal relations in the system predicted control perfor­ mance (Preußler, 1998). Another potential prosaic explanation is that some of the dynamic systems, as mentioned earlier, contain (in extreme cases) hundreds of variables, and the time that participants have to explore the system and develop accurate causal representations is actually rather limited. For instance, in a recent study Öllinger et al. (2015) examined the relationship between participants’ causal maps of a dynamic control task. Theirs was a managerial task, “the Tailor shop task” (Putz-Osterloh, 1981), which involves 25 variables, 10 of which can be directly intervened on by the participant. In many cases, the participant has very limited exposure to the dynamic control task, ranging from 12 to 36 trials. In Öllinger et al.’s (2015) study, participants had approximately 8 trials to explore the envi­ ronment (i.e., learn to figure out which variables to intervene on before implementing the knowledge to increase the company stock) (see Figure 16.3). Thus another factor is that participants in many control tasks are not given sufficient time to learn about and explore the environment (Osman, 2010a), which explains why dissociations between structural knowledge of the system and control performance are found (Osman, 2010a). However, Öllinger et al. (2015) also speculated that becoming familiar with the process of generating causal maps can be an important route to emphasizing the causal relations in the system and focusing people’s attention more on discovering them. More to the point, through the use of techniques such as drawing causal maps, this works like a training tool, or aid, that can help bring together a closer map between causal representations held in the mind of the control-based decision-maker, and the causal relations that exist in the dynamic control (p. 289) task. However, by implication, the idea is that people do not necessarily rely on causal representations to guide the control-based decisions, but rather need to be shown that they are relevant, which is still contrary to the position I and oth­ ers propose, which is that causal representation are formed spontaneously and guide con­ trol-based decisions, and we simply cannot do without them.

Page 15 of 25

Planning and Control

Figure 16.3 Depiction of the structure of the dynam­ ic control task, including all the variables and the links between them. The black boxes highlight the variables that can be controlled by the control-based decision-makers. Reproduced from the original study by Danner et al. (2011).

Finally, one other factor that might contribute to the fact that there is limited evidence showing associations between causal representations and control-based decision-making performance is, as mentioned earlier, that there are few studies focused on looking at this association because there has been limited theoretical precedent to do so, given that for a long time, the theoretical field in control-based decision-making has been that causal rep­ resentations are simply not relevant. Therefore, as a result there are few studies that have developed measures to actually examine causal representations, rather than input– output associations per se, which can be non-causal. In general then, across a variety of dynamic control tasks, such as the ones described, there are general heuristics that have been uncovered, which people appear to rely on as a first pass when approaching dynamic control tasks. These heuristics tend to reveal the core basic assumptions that people make, possibly about how causal relations are struc­ tured in dynamic settings in general. Typically, people assume that the dynamic control task is causally uncomplicated—that is, they assume the simplest associations between in­ puts and outputs (Berry & Broadbent, 1988; Chmiel & Wall, 1994). Often this means that they anticipate that there in a one-to-one mapping of causes and effects that is configured in the way that the inputs and outputs appear in the actual display of the task (Schoppek, 2002). In addition, people also assume that the relationships between variables is linear; however, there are many examples of dynamic control tasks that included non-linear rela­ tionships, which means that people find them hard to detect given their basic assump­ tions (Brehmer, 1992; Strohschneider & Guss, 1999). People also tend to assume as a default that there is a positive association between our actions and the (p. 290) effects on the environment (Diehl & Sterman, 1995; Sterman, Page 16 of 25

Planning and Control 1989). More to the point, when it comes to planning the course of actions that we take in a dynamic control environment, we tend to infer that the changes we experience in the outcome are typically a direct result of our interventions (Kerstholt & Raaijmakers, 1997). In many of the dynamic control tasks with which participants interact, changes to the out­ come result from endogenous properties of the environment, as well as from direct inter­ ventions from the individual interacting with the environment. However, rates of learning are slowed because of the fact that initial inaccurate causal models are being used to guide control-based decision-making. Finally, when engaged in dynamic control, we also tend to form expectancies that our actions will lead to immediate effects on the outcome; hence the temporal relationship between intervention and observed effects can also im­ pede successful control, because the wrong assumption is being made (Bredereke & Lankenau, 2005; Brehmer, 1992; Degani, 2004; Diehl & Sterman, 1995; Kerstholt & Raaij­ makers, 1997). This insight has come from several key studies that examined the way in which we control systems when the structure of the task is designed in such a way as to include temporal delays.

Conclusions The aim of this chapter has been to showcase the work of many who have sought to speci­ fy how and where causal representations and causal relations are in fact necessary to un­ derstanding planning and control-based behaviors. The core connections between plan­ ning and causal relations has come from machine learning that has attempted to formal­ ize the way in which we develop generic representational systems that flexibly allow us to adapt our plans to changing circumstances. Moreover, theoretical developments in work on prospective psychology have sought to describe how casual representations may be foundational to a variety of prospective behaviors, including planning. Control-based decision-making research has approached the role of causality in control at both the theoretical and empirical level. Causal representations have been described as critical for a variety of processes that underpin control-based decision-making (e.g., simu­ lation, hypothetical thinking, prediction). Moreover, in the empirical domain, work look­ ing at the connection between causal representations and the ability to control a dynamic environment show that people come to the lab with basic priors about the causal rela­ tions that exist in the dynamic world that they experience, and that these priors are often mismatched with the actual causal relations that are inherent in the dynamic control task. In fact, the empirical work has shown that if the causal representations are inaccurate, this interferes with the ability to accurately control a dynamic environment, thus suggest­ ing that causal representations are necessary for control behaviors.

Future Directions What is clear from the review of work in the planning domain is that there is limited em­ pirical work directly looking at the link between causal representations and prospection/ planning. One potential avenue is to explore planning behaviors in dynamic control tasks, Page 17 of 25

Planning and Control which would have the advantage of bringing together both research communities, and linking planning to task situations not often explored. As mentioned at the start, controlbased decision-making and planning are closely related processes, and so investigating strategic development in dynamic control tasks is one way of bridging control and plan­ ning. Also, as discussed, the connection between strategic development and causal repre­ sentations has a theoretical and empirical basis to it, and so this too would be a potential basis on which causal representation and planning can be more obviously investigated. One area in which both planning and control suffer is that there is no coherent frame­ work that connects representational forms in the mind of the individual(s) with the envi­ ronment—in other words, the mental model of the individual and the model of the world. Creating such a frameworkwould not only allow us to provide formal descriptions of dy­ namically uncertain situations in which control and planning are often found, but also would be a prescriptive approach to suggesting optimal ways of planning and controlling dynamically uncertain situations. One way to do this is to develop a framework in which the world is represented as casual structures, and this is a benchmark from which causal models in the mind of the agent can be compared. A good candidate is the causal Bayes networks framework, which formalizes our representation of probabilistic causal rela­ tions in the world, from which we reason and make decisions, and assumes that causes change the probabilities of their effects. Thus far, there has been no dedicated attempt to bridge psychological work on planning and control-based decision-making with causal Bayes networks or dynamic Bayesian networks. In turn, more work needs to (p. 291) be fo­ cused on examining the relationship between causal representations of the dynamic envi­ ronment and control performance, and planning behaviors (e.g., strategic development). In order to advance our understanding of planning and control, future research should look to advances in causal learning and reasoning; without a better understanding of these processes, we cannot understand how planning and control-based decision-making function.

References Atance, C. M., & O’Neill, D. K. (2001). Episodic future thinking. Trends in Cognitive Sciences, 5(12), 533–539. Berry, D., & Broadbent, D. E. (1984). On the relationship between task performance and associated verbalizable knowledge. Quarterly Journal of Experimental Psychology: Hu­ man Experimental Psychology, 36(A), 209–231. Berry, D., & Broadbent, D. E. (1987). The combination of implicit and explicit knowledge in task control. Psychological Research, 49, 7–15. Berry, D. C., & Broadbent, D. E. (1988). Interactive tasks and the implicit‐explicit distinc­ tion. British Journal of Psychology, 79(2), 251–272. Blech, C., & Funke, J. (2006). Zur Reaktivita¨t von Kausaldiagramm-Analysen beim kom­ plexen Problemlösen. Zeitschrift für Psychologie, 214(4), 185–195. Page 18 of 25

Planning and Control Brandimonte, M., Einstein, G. O., & McDaniel, M. A. (Eds). (1996). Prospective memory: Theory and applications. Hillsdale, NJ: Erlbaum. US. Bredereke, J., & Lankenau, A. (2005). Safety-relevant mode confusions modelling and re­ ducing them. Reliability Engineering and System Safety, 88, 229–245. Brehmer, B. (1992). Dynamic decision making: Human control of complex systems. Acta Psychologica, 81, 211–241. Burgess, P., Simons, J. S., Coates, L. M. A., & Channon, S. (2005). The search for specific planning processes. In R. Morris & G. Ward (Eds.), The cognitive psychology of planning (pp. 199–227). Hove, UK; New York: Psychology Press. Burstein, M. H. (1988). Combining analogies in mental models. In D. H. Helman (Ed.), Analogical Reasoning: Perspectives of Artificial Intelligence, Cognitive Science, and Phi­ losophy. Dordrecht: Kluwer. Chmiel, N., & Wall, T. (1994). Fault prevention, job design, and the adaptive control of ad­ vance manufacturing technology. Applied Psychology: An International Review, 43, 455– 473. Danner, D., Hagemann, D., Schankin, A., Hager, M., & Funke, J. (2011). Beyond IQ: A la­ tent state trait analysis of general intelligence, dynamic decision making, and implicit learning. Intelligence, 39(5), 323–334. Degani, A. (2004). Taming HAL: Designing interfaces beyond 2001. New York: Palgrave Macmillan. Dehaene, S., & Changeux, J. P. (1997). A hierarchical neuronal network for planning be­ haviour. Proceedings of the National Academy of Sciences, 94(24), 13293–13298. Diehl, E., & Sterman, J. D. (1995). Effects of feedback complexity on dynamic decision making. Organizational Behavior and Human Decision Processes, 62, 198–215. Dienes, Z., & Fahey, R. (1995). Role of specific instances in controlling a dynamic system. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 848–862. Dienes, Z., & Fahey, R. (1998). The role of implicit memory in controlling a dynamic sys­ tem. Quarterly Journal of Experimental Psychology: Human Experimental Psychology, 51(A), 593–614. Dörner, D. (1986). Diagnostik der operativen Intelligenz [Assessment of operative intelli­ gence]. Diagnostica, 32(4), 290–308. Dörner, D. (1987). On the difficulties people have in dealing with complexity. In J. Ras­ mussen, K. Duncan, & J. Leplat (Eds.), New technology and human error (pp. 97–109). Chichester, UK: Wiley.

Page 19 of 25

Planning and Control Dörner, D. (1990). The logic of failure. In D. E. Broadbent, J. T. Reason, & A. D. Baddeley (Eds.), Human factors n hazardous situations (pp.15–36). New York: Oxford University Press. Dörner, D., Kreuzig, H. W., Reither, F., & Stäudel, T. (1983). Lohhausen. Vom Unigang mit Unbestimmtheit und Komplexität [Lohhausen. On dealing with uncertainty and complexi­ ty]. Bern: Huber. Dretske, F. (1982). The informational character of representations. Behavioural and Brain Sciences, 5(3), 376–377. Dörner, D., & Wearing, A. (1995). Complex problem solving: Toward a (computersimulat­ ed) theory. In P.A. Frensch & J. Funke (Eds.), Complex problem solving: The European perspective (pp. 65–99). Hillsdale, NJ: Lawrence Erlbaum Associates. Eitam, B., Hassin, R. R., & Schul, Y. (2008). Nonconscious goal pursuit in novel environ­ ments: The case of implicit learning. Psychological Science, 19(3), 261–267. Eseryel, D., Ifenthaler, D., & Ge, X. (2013). Validation study of a method for assessing complex ill-structured problem solving by using causal representations. Educational Tech­ nology Research and Development, 61(3), 443–463. Fincham, J. M., Carter, C. S., van Veen, V., Stenger, V. A., & Anderson, J. R. (2002). Neural mechanisms of planning: a computational analysis using event-related fMRI. Proceedings of the National Academy of Sciences, 99(5), 3346–3351. Funke, J. (2010). Complex problem solving: A case for complex cognition? Cognitive Pro­ cessing, 11, 133–142. Funke, J. (2014). Analysis of minimal complex systems and complex problem solving re­ quire different forms of causal cognition. Frontiers in Psychology, 5. Gallistel, C. R. (1990). The Organization of Learning. Cambridge, MA: MIT Press. Gebauer, G. F., & Mackintosh, N. J. (2007). Psychometric intelligence dissociates implicit and explicit learning. Journal of Experimental Psychology: Learning, Memory, and Cogni­ tion, 33(1), 34. Gibson, F. P., Fichman, M., & Plaut, D. C. (1997). Learning in dynamic decision tasks: Computational model and empirical evidence. Organizational Behavior and Human Per­ formance, 71, 1–35. Gilbert, S. J., & Burgess, P. W. (2008). Executive function. Current Biology, 18(3), R110– R114. Gollwitzer, P. M. (1999). Implementation intentions: Strong effects of simple plans. Ameri­ can Psychologist, 54(7), 493–503.

Page 20 of 25

Planning and Control Gopnik, M. (1982). Some distinctions among representations. Behavioural and Brain Sciences, 5(3), 378–379. Greiff, S., Wüstenberg, S., Molnár, G., Fischer, A., Funke, J., & Csapó, B. (2013). Complex problem solving in educational contexts: Something beyond g: Concept, assessment, mea­ surement invariance, and construct validity. Journal of Educational Psychology, 105(2), 364–379. Güss, C. D. (2000). Planen und Kultur? [Planning and culture]. Lengerich, Germany: Pabst Science. Hagmayer, Y., Sloman, S. A., Lagnado, D. A., & Waldmann, M. R. (2007). Causal reasoning through intervention. In A. Gopnik & L. Schulz (Eds.), Causal learning: Psychol­ ogy, philosophy, and computation (pp. 86–100). Oxford, England: Oxford University Press. (p. 292)

Hayes-Roth, B., & Hayes-Roth, F. (1979). A cognitive model of planning. Cognitive Science, 3(4), 275–310. Holyoak, K. H., & Cheng, P. W. (2011). Causal learning and inference as a rational process: The new synthesis. Annual Review of Psychology, 62, 135–163. Huff, A. S. (2002). Mapping strategic knowledge. London: Sage. Jeannerod, M. (1994). Motor representations and reality. Behavioural and Brain Sciences, 17(02), 229–245. Kambhampati, S. (1993). Supporting flexible plan reuse. In S. Minton (Ed.), Machine learning methods for planning (pp. 397–434). San Mateo, CA: Morgan Kaufmann. Karniol, R., & Ross, M. (1996). The motivational impact of temporal focus: Thinking about the future and the past. Annual Review of Psychology, 47, 593–620. Kerstholt, J. H., & Raaijmakers, J. G. W. (1997). Decision making in dynamic task environ­ ments. In R. Ranyard, R. W. Crozier, & O. Svenson (Eds.), Decision making: Cognitive models and explanations (pp. 205–217). New York: Routledge. Kersting, M., & Süß, H. M. (1995). Kontentvalide Wissensdiagnostik und Problemlo¨sen: Zur Entwicklung, testtheoretischen Begru¨ndung und empirischen Bewa¨hrung eines problemspezifischen Diagnoseverfahrens. Zeitschrift für Pädagogische Psychologie, 9, 83– 93. Kluge, A. (2004). Wissenserwerb für das Steuern komplexer Systeme [Knowledge acquisi­ tion for the control of complex systems]. Lengerich: Pabst Science Publishers. Lee, Y. (1995). Effects of learning contexts on implicit and explicit learning. Memory and Cognition, 23, 723–734. Lewis, D. (1973). Causation. The Journal of Philosophy, 2, 556–567. Page 21 of 25

Planning and Control Meder. B., Gerstenberg, T., Hagmayer, Y., & Waldmann, M. R. (2010). Observing and in­ tervening: Rational and heuristic models of causal decision making. The Open Psychology Journal, 3, 119–135. McCormack, T., & Atance, C. M. (2011). Planning in young children: A review and synthe­ sis. Developmental Review, 31(1), 1–31. Minton, S. (Ed.). (2014). Machine learning methods for planning. San Mateo, CA: Morgan Kaufmann. Öllinger, M., Hammon, S., von Grundherr, M., & Funke, J. (2015). Does visualization en­ hance complex problem solving? The effect of causal mapping on performance in the computer-based microworld Tailorshop. Educational Technology Research and Develop­ ment, 63(4), 621–637. Osman, M. (2008a). Observation can be as effective as action in problem solving. Cogni­ tive Science, 32, 162–183. Osman, M. (2008b). Positive transfer and negative transfer/anti-learning of problem-solv­ ing skills. Journal of Experimental Psychology: General, 137, 97–115. Osman, M., (2008c). Seeing is as good as doing. Journal of Problem Solving, 2(1). Osman, M. (2010a). Controlling uncertainty: A review of human behaviour in complex dy­ namic environments. Psychological Bulletin, 136, 65–86. Osman, M. (2010b). Controlling uncertainty: Learning and decision making in complex worlds. Malden, MA: Wiley Blackwell Publishers. Osman, M. (2012a). How powerful is the effect of external goals on learning in an uncer­ tain environment? Learning and Individual Differences, 22, 575–584. Osman, M. (2012b). The role of feedback in dynamic decision making. Frontiers in Deci­ sion Neuroscience: Human Choice and Adaptive Decision Making, 6, 56. Osman, M. (2014a). Future-minded: The psychology of agency and control. New York: Pal­ grave-MacMillan. Osman, M. (2014b). What are the essential cognitive requirements for prospection (think­ ing about the future)?. Frontiers in Psychology, 5, 626. Osman, M., & Ananiadis-Basias, A. (2014). Context and animacy play a role in dynamic decision-making. Journal of Entrepreneurship, Management and Innovation, 2, 61–78. Osman, M., Ryterska, A., Karimi, K., Tu, L., Obeso, I., Speekenbrink, M., & Jahanshahi, M. (2013). The effects of dopaminergic medication on dynamic decision making in Parkinson’s disease. Neuropsychologia, 53, 157–164.

Page 22 of 25

Planning and Control Osman, M., & Speekenbrink, M. (2011). Cue utilization and strategy application in stable and unstable dynamic environments. Cognitive Systems Research, 12, 355–364. Osman, M., & Speekenbrink, M. (2012).Predicting vs. controlling a dynamic environment. Frontiers in Decision Neuroscience: Dynamics of Decision Making, 3, 68. Osman, M., Wilkinson, L., Beigi, M., Parvez, C., & Jahanshahi, M. (2008). The striatum and learning to control a complex system? Neuropsychologia, 46, 2355–2363. Phillips, L. H., Wynn, V. E., McPherson, S., & Gilhooly, K. J. (2001). Mental planning and the Tower of London task. The Quarterly Journal of Experimental Psychology: Section A, 54(2), 579–597. Plate, R. (2010). Assessing individuals’ understanding of nonlinear causal structures in­ complex systems. System Dynamics Review, 26(1), 19–33. Preußler, W. (1998). Strukturwissen als Voraussetzung für die Steuerung komplexer dy­ namischer Systeme. Zeitschrift für Experimentelle Psychologie, 45(3), 218–240. Putz-Osterloh, W. (1981) Über die Beziehung zwischen Testintelligenz und Problemlö seerfolg. Zeitschrift für Psychologie, 189, 79–100. Reichenbach, H. (1956). The direction of time. Berkeley; Los Angeles: University of Cali­ fornia Press. Rigas, G., & Brehmer, B. (1999). Mental processes in intelligence tests and dynamics deci­ sion making tasks. London: Erlbaum. Rigas, G., Carling, E., & Brehmer, B. (2002). Reliability and validity of performance mea­ sures in microworlds. Intelligence, 30, 463–480. Sanderson, P. M. (1989). Verbalizable knowledge and skilled task performance: Associa­ tion, dissociation, and mental models. Journal of Experimental Psychology: Learning, Memory, & Cognition, 15, 729–747. Schank, R. C., & Abelson, R. P. (1977). Scripts, plans, goals, and understanding: An in­ quiry into human knowledge structures. Hillsdale, NJ: Lawrence Erlbaum Associates. Schoppek, W. (2002). Examples, rules and strategies in the control of dynamic systems. Cognitive Science Quarterly, 2, 63–92. Schraw, G., Dunkle, M. E., & Bendixen, L. D. (1995). Cognitive processes in well‐defined and ill‐defined problem solving. Applied Cognitive Psychology, 9(6), 523–538. Seligman, M. E., Railton, P., Baumeister, R. F., & Sripada, C. (2013). Navigating into the future or driven by the past. Perspectives on Psychological Science, 8(2), 119–141.

Page 23 of 25

Planning and Control Segura, S., Fernandez-Berrocal, P., & Byrne, R. M. (2002). Temporal and causal order ef­ fects in thinking about what (p. 293) might have been. The Quarterly Journal of Experi­ mental Psychology: Section A, 55(4), 1295–1305. Simon, H. A. (1977). The structure of ill-structured problems. In Models of discovery by H. A. Simon (pp. 304–325). Dordrecht: Springer Netherlands. Shallice, T. (1982). Specific impairments of planning. Philosophical Transactions of the Royal Society B: Biological Sciences, 298(1089), 199–209. Sniehotta, F. F., Schwarzer, R., Scholz, U., & Schuz, B. (2005). Action planning and coping planning for long-term lifestyle change: Theory and assessment. European Journal of So­ cial Psychology, 35(4), 565–576. Stanley, W. B., Mathews, R. C., Buss, R. R., & Kotler-Cope, S. (1989). Insight without awareness: On the interaction of verbalization, instruction, and practice in a simulated process control task. Quarterly Journal of Experimental Psychology, 41, 553–577. Sterman, J. D. (2000). Business dynamics: Systems thinking and modeling for a complex world. Boston: McGraw-Hill. Strathman, A., Gleicher, F., Boninger, D. S., & Edwards, C. S. (1994). The consideration of future consequences: Weighing immediate and distant outcomes of behaviour. Journal of Personality and Social Psychology, 66, 742–752. Strohschneider, S. (2001). Kultur—Denken—Strategie: Eine indische Suite [Culture— thinking—strategy: An Indian suite]. Bern: Hans Huber. Strohschneider, S., & Guss, D. (1999). The fate of the MOROS: A cross-cultural explo­ ration of strategies in complex and dynamic decision making. International Journal of Psy­ chology, 34, 235–252. Sundberg, N. D., Poole, M. E., & Tyler, L. E. (1983). Adolescents expectations of future events: A cross-cultural study of Australians, Americans, and Indians. International Jour­ nal of Psychology, 18, 415–427. Tolman, E. C., & Brunswik, E. (1935). The organism and the causal texture of the environ­ ment. Psychological Review, 42(1), 43–77. Tschirgi, J. E. (1980). Sensible reasoning: A hypothesis about hypotheses. Child Develop­ ment, 51, 1–10. Tversky, A., & Koehler, D. (1994). Support theory: A nonextensional representation of sub­ jective probability. Psychological Review, 101, 547–567. Waldmann, M. R, & Holyoak, K. J. (1992). Predictive and diagnostic learning within causal models: asymmetries in cue competition. Journal of Experimental Psychology: General, 121, 22–36. Page 24 of 25

Planning and Control Ward, G., & Allport, D. A. (1997). Planning and problem-solving using the 5-disk Tower of London task. Quarterly Journal of Experimental Psychology, 50, 49–78. Wellman, M. P., & Henrion, M. (1993). Explaining ‘explaining away’. Pattern Analysis and Machine Intelligence, IEEE Transactions, 15(3), 287–292. World Health Organization. (2000). Obesity: Preventing and managing the global epidem­ ic (No. 894). Geneva: World Health Organization. Wüstenberg, S., Greiff, S., & Funke, J. (2012). Complex problem solving: More than rea­ soning? Intelligence, 40, 1–14. (p. 294)

Magda Osman

School of Biological and Chemical Sciences Queen Mary University of London Lon­ don, England, UK

Page 25 of 25

Reinforcement Learning and Causal Models

Reinforcement Learning and Causal Models   Samuel J. Gershman The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.20

Abstract and Keywords This chapter reviews the diverse roles that causal knowledge plays in reinforcement learning. The first half of the chapter contrasts a “model-free” system that learns to re­ peat actions that lead to reward with a “model-based” system that learns a probabilistic causal model of the environment, which it then uses to plan action sequences. Evidence suggests that these two systems coexist in the brain, both competing and cooperating with each other. The interplay of two systems allows the brain to negotiate a balance be­ tween cognitively cheap but inaccurate model-free algorithms and accurate but expensive model-based algorithms. The second half of the chapter reviews research on hidden state inference in reinforcement learning. The problem of inferring hidden states can be con­ strued in terms of inferring the latent causes that give rise to sensory data and rewards. Because hidden state inference affects both model-based and model-free reinforcement learning, causal knowledge impinges upon both systems. Keywords: causal knowledge, learning, reinforcement learning, environment, brain

Reinforcement learning (RL) is the study of how an agent (human, animal, or machine) can learn to choose actions that maximize its future rewards (Sutton & Barto, 1998). Two strong constraints have shaped the evolution of RL in the brain. On the one hand, the world is complex, favoring the development of rich causal models that can be used to ac­ curately predict future reward. On the other hand, building and using causal models are computationally costly. If an agent needs to act quickly and energy-efficiently, cheaper but less accurate predictions may be required. Algorithms that directly estimate future reward without building an explicit causal model are known as model-free, in contrast to model-based algorithms that employ a causal model. To make this distinction concrete, imagine how you might navigate from home to your of­ fice. If you just moved to a new house, the route may be unfamiliar and so you rely on a map to figure out a step-by-step plan. The map is a causal model: it tells you that taking an action (i.e., moving in a particular direction) causes a change in your state (i.e., loca­ tion). Constructing a plan amounts to designing a causal chain that terminates at your in­ Page 1 of 22

Reinforcement Learning and Causal Models tended goal. For this reason, the map-based strategy is a form of model-based control. As you become more familiar with the route, you may find yourself relying less on maps—you simply “know” what direction to go in a particular location. One way this might happen is that you have learned to cache the values of actions in different states, so that you can determine where to go simply by inspecting these cached values. A causal model is not required for this navigation strategy. This is the essence of model-free control. Experimental work has revealed that humans and animals use a combination of modelbased and model-free algorithms, implicating the coexistence of two “systems” in the brain that are at least partially dissociable (Balleine & Dickinson, 1998; Daw, Niv, & Dayan, 2005; Dolan & Dayan, 2013). These systems compete for control of behavior, but may also cooperate with each other, as I will discuss later. The aim of this chapter is to highlight the diverse roles that causal knowledge plays in model-based and model-free RL. I begin with a brief summary of the historical background, and then review the modern computational synthesis of model-based and model-free RL. While it is tempting to view causal knowledge as falling strictly within the purview of model-based RL, this is not the case. Agents must perpetually contend with (p. 296)

partial observability: sensory data provide imperfect information about the underlying “state” of the environment. It is the hidden state, rather than sensory data, that is causal­ ly related to reward. For example, if one smells food cooking at a restaurant and sits down to eat, it is not the smell (sensory data) that caused the food (reward) to appear, but rather the cook who made the food (the hidden state). Both model-based and model-free learning systems employ causal knowledge to form a belief about the hidden state. The second half of this chapter is devoted to a review of research on the role of causal models in dealing with partial observability. The concept of causality appears in various forms throughout this chapter. Table 17.1 provides a summary of the three forms of causality that play key roles in RL: taking an ac­ tion in a state causes both a reward and a transition to a new state, and in partially ob­ servable environments the state generates perceptual signatures (observations). In later sections, I will formalize these ideas and discuss how they have been studied experimen­ tally.

Historical Background The early study of RL was dominated by behaviorism, which explicitly rejected any notion of an internal model. The behaviorist view of learning is succinctly summarized by Thorndike’s law of effect: if an action leads to reward, it will become more likely to be re­ peated in the future (Thorndike, 1911). While later computational models posited more complex rules governing behavior, virtually all of them embodied the law of effect (e.g., Mackintosh, 1975; Pearce, 1980; Rescorla & Wagner, 1972). As reviewed in the next sec­ tion, this characterization also applies to contemporary theories of model-free RL. Nonetheless, a variety of behavioral phenomena suggest that there exist powerful deter­ minants of responding that cannot be reduced to simple reinforcement. Tolman (1948) Page 2 of 22

Reinforcement Learning and Causal Models described a number of ingenious experiments whose results are perplexing from a behav­ iorist perspective. For example, Tolman showed that rats could take shortcuts or plan de­ tours around obstacles without ever being reinforced for these actions. Another example described by Tolman is latent learning: a rat allowed to explore a maze without reinforce­ ment was subsequently faster at learning to navigate to a reward. Since the rat was not reinforced for its actions during the exploratory phase, this behavior cannot be explained by the law of effect. Later research on contextual fear conditioning revealed a similar phenomenon: brief pre-exposure to a context enhanced the acquisition of contextual fear (Fanselow, 1990; Kiernan & Westbrook, 1993). Table 17.1 Summary of Causal Relationships in Reinforcement Learning Reward

state, action → reward

Transition

state, action → state

Hidden state

hidden state → observation

Tolman interpreted latent learning and other findings as evidence for a “cognitive map”— an internal model of the environment that encodes information about spatial layout, ob­ ject attributes, and relations between objects. Several decades after Tolman’s pioneering work, the idea of a cognitive map received direct support from recordings in the hip­ pocampus that revealed neurons tuned to an animal’s location in space (O’Keefe & Nadel, 1978). Subsequent research showed that the hippocampal cognitive map is replete with representations of landmarks, boundaries, sequences, and relations (Eichenbaum, 2004; Hasselmo, 2012). Another line of assault on the law of effect was pursued by Dickinson and his colleagues in the early 1980s (Dickinson, 1985). These studies mapped out the conditions under which instrumental behavior is controlled by goals, overriding the actions prescribed by an animal’s reinforcement history. For example, rats trained to press a lever for sucrose would subsequently cease lever pressing in an extinction test after the sucrose was sepa­ rately paired with illness (thereby devaluing the sucrose reinforcer), demonstrating out­ come sensitivity consistent with a cognitive map or goal-directed view of instrumental be­ havior (Adams, 1982). Since the instrumental action (lever pressing) was never directly paired with illness, the law of effect predicts no reduction of responding under these cir­ cumstances. Importantly, goal-directed control of behavior could be superseded by stimu­ lus–response habits, given enough training. In particular, rats overtrained with the su­ crose reinforcer continued (p. 297) to press the lever after the devaluation treatment, demonstrating outcome insensitivity more consistent with a habit learning system gov­ erned purely by the law of effect. These observations led to the idea of multiple compet­ ing learning systems, which will be discussed further in the next section.

Page 3 of 22

Reinforcement Learning and Causal Models The important point to take away from these studies is that causality is central to a com­ plete understanding of RL in the brain. The cognitive map encodes information about how actions cause changes in state, and the goal-directed nature of instrumental behavior suggests that animals understand the causal effects of their actions on subsequent re­ wards. These correspond to the first two causal relationships listed in Table 17.1.

Reinforcement Learning Theory Contemporary RL theory has its origins in a family of engineering techniques developed to deal with complex planning and control problems (Bellman, 1957). This section intro­ duces these techniques formally (see Sutton & Barto, 1998, for a thorough introduction), and describes how they have been used to explain experimental findings from psychology and neuroscience. I will first review some basic notation and concepts, and then discuss several important algorithms for solving the RL problem.

Formalization of the Problem The basic RL problem is summarized in Figure 17.1. An agent at time t occupies state st, takes action at from a policy π(at|st), receives a reward rt with expected value R(st, at) and transitions to a new state st+1 according to a transition distribution T (st+1|st, at). The agent continues interacting with the environment ad infinitum (or until it reaches a termi­ nal state), accumulating rewards. For example, consider the simple foraging environment shown in Figure 17.1. At each point in time, the forager collects resources at one of two patches and continually chooses whether to stay at one patch or transit to the other patch. In this example, the patches correspond to states, the resources correspond to re­ wards, and the actions are stay/switch decisions. One criterion of optimality is the maximization of cumulative reward, or return,

.

However, this does not take into account the fact that most biological agents prefer re­ wards sooner rather than later (Frederick, Loewenstein, & O’Donoghue, 2002). This can be captured by assuming that future rewards are discounted exponentially, leading to the following definition of discounted return:

(1)

The discount factor γ ∈ [0, 1] represents the agent’s preference for immediate rewards: lower values of γ indicate a steeper discounting of future rewards. Because rewards and transitions may be stochastic, and hence R is a random variable, we take the goal of the agent to be maximizing expected discounted return, or value, defined as: Page 4 of 22

Reinforcement Learning and Causal Models

(2)

Figure 17.1 Reinforcement learning. (a) The agentenvironment interface. (b) Example of a Markov deci­ sion process. Circles denote states, and arrows de­ note deterministic state transitions caused by partic­ ular actions.

where E[·] is the expectation operator, returning the average of its arguments (in this case, averaging over randomness in states, actions, and rewards under a particular poli­ cy). To understand this equation, imagine an agent who takes action a in state s and then pursues policy π over an infinitely long trajectory through the state space, meanwhile recording the discounted return. We can imagine the agent restarting this trajectory many times, and then averaging the discounted return recorded on each trajectory (this is known as a “Monte Carlo” approximation). The resulting value Q(s, a) is equivalent to the average of discounted returns over all possible trajectories, weighted by the probabil­ ity of each trajectory under policy π. The optimal action in state s maximizes Q(s, a):

(3)

We say that a policy π∗ is optimal if it maximizes Q(s, a) for all states. While poli­ cies may in general be probabilistic, the optimal policy is always deterministic, with π∗(s, a∗) = 1 and 0 for all other actions. In the foraging example given in the preceding, sup­ pose that the expected reward in Patch A is larger than in Patch B; provided γ is suffi­ ciently large, the optimal policy is to always take the “stay” action in Patch A and the (p. 298)

Page 5 of 22

Reinforcement Learning and Causal Models “switch” action in Patch B. If γ gets small enough, however, the optimal policy is to stay in Patch B, since switching will result in delayed reward. The environment described in the preceding is known as a Markov decision process (MDP) because it obeys the Markov property: state transitions and rewards are indepen­ dent of the agent’s history, conditional on the current state and action. In the patch forag­ ing example used here, the Markov property says that the probability of transit to anoth­ er patch depends only on the current patch and the agent’s stay/switch decision (likewise for the resource collection at the current patch). The Markov property enables the value function to be expressed recursively:

(4)

This expression is known as the Bellman equation (Bellman, 1957). Intuitively, the Bell­ man equation shows that the value function can be broken down into the immediate re­ ward (first term) and the expected future reward 𝔼[Q(st, at)] (second term). The sum over future states and actions in the second term reflects the agent’s uncertainty; in probabili­ ty theory, this is known as marginalization. The optimal Q-value (i.e., the Q-value under the optimal policy π∗) can correspondingly be written as:

(5)

Here we have simply substituted π∗(at|st) = argmaxat Q(st, at) into Equation 4. The Bell­ man equation serves as the basis of efficient learning and planning algorithms, which we discuss next.

Algorithmic Solutions Model-based and model-free algorithms can, loosely speaking, be seen as working on dif­ ferent sides of the Bellman equation. Model-based algorithms operate on the right-hand side of the Bellman equation, in the sense that they compute Q(s, a) by directly applying the Bellman equation to the learned reward and transition functions. For example, the value iteration algorithm (Sutton & Barto, 1998) initializes the Q-values randomly and then repeatedly applies Equation 5 to compute new Q-values for each state-action pair. Value iteration is guaranteed to converge to the optimal Q-value. However, it is in­ tractable for large state and action spaces. For this reason, the most successful modern techniques use some form of local tree search (Browne et al., 2012). These algorithms employ the model as a means of simulating trajectories though the state space around the current state, and estimate Q-values on the basis of these trajectories. While there is evi­ dence that humans carry out something resembling tree search (e.g., De Groot, 1978; Page 6 of 22

Reinforcement Learning and Causal Models Holding & Pfau, 1985; Huys et al., 2012, 2015), our current knowledge about modelbased planning in the brain is very limited (see Daw & Dayan, 2014, for further discus­ sion). Model-free algorithms operate on the left-hand side of the Bellman equation: Instead of learning a model, they directly estimate Q(s, a) from experience and cache these esti­ mates in a look-up table.1 The most influential class of model-free algorithms is known as temporal difference (TD) learning (Sutton, 1988). All TD algorithms have in common the idea that learning is driven by the discrepancy between observed and predicted reward (the prediction error). To understand how TD learning is connected to the Bellman equa­ tion, notice that Equation 5 can be written as an expectation:

(6)

where we have replaced the reward and transition functions with sampled rewards (rt) and states (st+1) inside the expectation. The expectation can always be approximated by averaging many such samples (cf. the Monte Carlo approximation described in the previ­ ous section). This equation implies a consistency condition: If we have appropriately esti­ mated the optimal Q-values, then the difference between and should, on average, be zero:

(7) (p. 299)

(8)

The variable δt is precisely the prediction error mentioned earlier, because it reflects the difference between observed and predicted rewards. What happens if we do not have an accurate estimate of the optimal Q-values (or we are following a suboptimal policy)? Then the prediction error will, on average, be non-zero. In fact, the direction of the prediction error tells you something important about how to update the Q-values. When the predic­ tion error is positive, the value function has underestimated the expected future reward and therefore the Q-value should be increased; likewise, when the prediction error is neg­ ative, the value function has overestimated the expected future reward and therefore the Q-value should be decreased. This is the essential idea underlying one of the most important TD algorithms, Q-learning (Watkins & Dayan, 1992), which updates an estimate of the optimal value function ac­ cording to: Page 7 of 22

Reinforcement Learning and Causal Models

(9)

(10)

where α ∈ [0, 1] is a learning rate parameter. Although it is still a matter of debate what particular form of TD learning is used by the brain (Niv, 2009), all TD algorithms embody the basic prediction error logic laid out earlier. The main reason that TD learning has figured so prominently in neuroscience is that the phasic firing of midbrain dopamine neurons appears to correspond closely with the theo­ retical prediction error (Bayer & Glimcher, 2005; Glimcher, 2011; Niv & Schoenbaum, 2008; Schultz, Dayan, & Montague, 1997; Schultz & Dickinson, 2000). Some of the key evidence comes from Pavlovian conditioning tasks (Schultz et al., 1997), where dopamine neurons fire in response to unexpected reward (e.g., early in learning) but not to expect­ ed reward (e.g., late in learning). Furthermore, dopamine neurons fire below baseline when an expected reward is omitted. The prediction error interpretation of dopamine has received support from a wide range of studies, too numerous to review here (see Glimch­ er, 2011). The TD model has also played an important role in the development of animal learning theory (Ludvig, Sutton, & Kehoe, 2012; Sutton & Barto, 1990). It can be seen as a “realtime” generalization of the Rescorla–Wagner model (which does not make predictions about intra-trial events), allowing the TD model to explain various phenomena outside the scope of the Rescorla–Wagner model (Rescorla & Wagner, 1972). For example, in trace conditioning, reward is delivered following an unfilled delay after the offset of cue A. Ac­ quisition of a conditioned response is facilitated if another cue (B) is presented during the delay interval (Kehoe, 1982). According to the TD model, this facilitation occurs because cue B acquires positive value, which generates a large positive prediction error at the off­ set of cue A, thereby providing an amplified learning signal. TD learning provides a simi­ lar account of second-order conditioning: when cue A is paired with reward, and subse­ quently cue B is paired with cue A, cue B acquires the ability to elicit a conditioned re­ sponse. According to the TD model, the prediction error is positive when cue B is paired with cue A (since cue A has a positive value), and this error signal drives learning of a positive value for cue B (Sutton & Barto, 1990).2 Despite these successes, the TD model is still essentially an implementation of Thorndike’s law of effect, and hence fails to explain the phenomena discussed in the pre­ vious section, such as latent learning and goal-directed control. What is needed, as Tol­ man pointed out, is a “cognitive map.” Model-based RL provides one possible formaliza­ tion of how a cognitive map can be used to support goal-directed control (see also Reid & Staddon, 1998). Because model-based RL computes values on the fly, rather than retriev­ Page 8 of 22

Reinforcement Learning and Causal Models ing cached estimates, it can immediately and flexibly respond to changes in rewards or transition probabilities, without having to back-propagate the TD error along an unbro­ ken sequence of states. It is worth noting here that some authors have proposed mechanisms for goal-directed control that are associative rather than model-based (de Wit & Dickinson, 2009; Elsner & Hommel, 2001). According to these theories, goal-directed control arises from associative links between stimuli, actions, and outcomes. Supporting evidence comes from studies showing that outcomes can activate the representations of actions that have caused the outcomes in the past (e.g., Elsner & Hommel, 2001). While these associative theories are not grounded in the formalism of RL, more recent ideas have begun to bridge the gap. In particular, Stachenfeld, Botvinick, and Gershman (2014) showed that one way to con­ struct a cognitive map is to learn (p. 300) a predictive representation (Dayan, 1993), which is, in essence, an association between current and future states. This predictive representation can then be combined with a reward function to efficiently compute action values. In addition to reproducing some of the behavior typically attributed a model-based system, the predictive representation can capture many aspects of the hippocampal cog­ nitive map. Importantly, the predictive representation is not a causal model of the envi­ ronment, in the sense that it cannot be given a causal Bayes net interpretation—it does not encode the transition function that governs the causal effect of actions on the envi­ ronment. Rather, it can be understood as a kind of summary representation of the under­ lying causal system. Thus, it remains an open question whether goal-directed control re­ quires a system that learns a causal model of the environment and uses it to formulate plans.

Transitions and Interactions Between the Sys­ tems The transition from goal-directed to habitual behavior has been rationalized in terms of uncertainty-based arbitration between model-free and model-based RL (Daw et al., 2005). The idea is that each learning system keeps track of its uncertainty via Bayesian estima­ tion of its values, and the system with lower uncertainty is given control of behavior. In the case of the model-free system, the uncertainty is dominated by the stochasticity of transitions, rewards, and actions (all sources of “statistical noise”). In the case of the model-based system, the uncertainty is dominated by “computational noise” induced by fi­ nite cognitive resources (e.g., truncation of tree search). Generally speaking, the modelfree system requires considerably more experience to suppress its uncertainty to the lev­ el of the model-based system. On the other hand, the model-free system is much more computationally efficient, since values can be computed merely by inspecting the look-up table. Thus, the model-based system controls behavior early in learning, when the modelfree values are mostly useless; later in learning, the model-free system takes control, when its values become more accurate (statistical noise is reduced through averaging).3

Page 9 of 22

Reinforcement Learning and Causal Models The devaluation experiments described earlier (Adams, 1982; Dickinson, 1985) provide examples of this transition. For an animal that has been moderately trained on an instru­ mental learning task, the model-based system retains control of behavior (because its val­ ues are more accurate than those of the model-free system), and hence instrumental re­ sponding is sensitive to reinforcer devaluation. For an extensively trained animal, the model-free system (whose value estimates are now sufficiently accurate) assumes control of behavior, rendering instrumental control insensitive to devaluation. Various factors can shift the balance between the two learning systems. For example, en­ vironments in which the reward and transition probabilities change quickly favor the model-free system (Simon & Daw, 2011). Placing people under working memory load also shifts control to the model-free system, presumably by diverting some of the cognitive re­ sources upon which the model-based system depends (Otto, Gershman, Markman, & Daw, 2013). Concomitantly, working memory capacity predicts the degree to which behavior appears model-based (Otto, Raio, Chiang, Phelps, & Daw, 2013). So far, the two learning systems have been treated as largely independent, interacting on­ ly in their competition for control of behavior. However, competition may not be their only form of interaction. Sutton (1990) proposed that the systems could also interact coopera­ tively; in this architecture, called Dyna (Figure 17.2), the model-based system was used to produce simulated experience from which the model-free system could then learn. Re­ cently, behavioral evidence for this form of interaction has begun to emerge (Gershman, Markman, & Otto, 2014). For example, Gershman et al. (2014) showed that human sub­ jects can make choices on the basis of model-based knowledge under conditions where the model-free system is ostensibly in control of behavior. The utilization of model-based knowledge by the model-free system can be enhanced by a brief period of quiescence (lis­ tening to a piece of classical music), consistent with the idea that the (p. 301) model-based system simulates experience “offline” in the service of model-free learning.

Figure 17.2 The Dyna architecture.

The Dyna architecture also sits well with the observation that the hippocampus appears to simulate spatial trajectories, leading to a corresponding simulation in the striatum, the putative seat of model-free learning (Lansink, Goltstein, Lankelma, McNaughton, & Pen­ Page 10 of 22

Reinforcement Learning and Causal Models nartz, 2009). If the model-free and model-based systems interact in this way, it may ex­ plain why model-based knowledge infiltrates reward prediction errors measured in the striatum (Daw, Gershman, Seymour, Dayan, & Dolan, 2011), a finding that is perplexing from the perspective of a competitive architecture. Various other possibilities for interac­ tions between the two systems are discussed further in Daw and Dayan (2014).

Causal Knowledge and Partial Observability Both model-free and model-based learning rely on a representation of state. However, the state representation that is relevant for obtaining rewards is often not the representation furnished by early sensory processing. Rather, the state must be inferred from sensory da­ ta. Formally speaking, this is a case of partial observability (Kaelbling, Littman, & Cassan­ dra, 1998), where an agent only has access to the hidden state via noisy sensory data. If the hidden state obeys the Markov property, then we can call this environment a partially observable Markov decision process (POMDP). Bayes’s rule can be employed to infer the posterior distribution over hidden states given sensory data, and this posterior distribu­ tion functions as a “belief state” in a fully observable MDP over which learning can oper­ ate (albeit in a higher-dimensional space). The belief state MDP has the appealing proper­ ty that all the machinery of the previous section can be applied to this representation.

From Hidden States to Latent Causes One way to think about hidden state inference is in terms of latent causes: an MDP corre­ sponds to a probabilistic causal model in which states and actions jointly cause rewards, transitions, and sensory data. In the foraging example presented earlier, choosing the “switch” action in Patch A causes a transition to Patch B and the receipt of reward; in a partially observable setting, the action would also cause the observation of sensory infor­ mation (e.g., entering the patch causes the appearance of a prey type that is informative about which patch has just been entered). Hidden state inference is a form of causal rea­ soning in this model, and thus shares much in common with causal reasoning in other do­ mains. For present purposes, the important point is that even the ostensibly “model-free” system utilizes these inferential computations, thus further blurring the sense in which such a system is truly model-free. One plausible possibility, suggested by several authors (Daw, Courville, & Touretzky, 2006; Rao, 2010), is that the belief state is computed by cor­ tical circuitry late in the sensory processing stream, and then fed into subcortical circuits responsible for RL. Both model-based and model-free systems, in this scheme, rely on the same belief-state representation. Rao (2010) has offered one neurobiologically detailed proposal for how this might work in the case of simple perceptual decisions about random dot motion. In the reaction-time version of the random dots task (Roitman & Shadlen, 2002), a subject must make a rapid binary decision (left/right) about the motion direction of randomly moving dots, where some fraction (the coherence) of the dots are moving in the same direction. By changing Page 11 of 22

Reinforcement Learning and Causal Models the coherence of the dot motion, the experimenter can parametrically adjust the per­ ceived motion strength, and this produces corresponding changes in discrimination accu­ racy (lower accuracy for low coherence) and response time (longer response times for low coherence). While on the surface the random dots task may not appear like a problem of latent causal inference, it resembles an ecologically valid problem faced by many ani­ mals. Imagine, for example, a lion moving through the savannah brush; its camouflage in­ duces a noisy, fluctuating percept, with different points along the surface of the lion bound together by their common motion. The visual system must integrate the noisy mo­ tion information to discern the lion’s direction of movement, the latent cause generating the sensory information. According to Rao (2010), motion selective neurons in area MT report the momentary like­ lihood of sensory data (transmitted from early visual cortex) under different motion direc­ tions. The likelihoods are integrated over time in area LIP to compute the belief state (i.e., the posterior over motion directions), producing a ramping of activity as evidence accumulates (Gold & Shadlen, 2002). The striatum (a part of the basal ganglia) receives inputs from cortical regions (including LIP) and computes the Q-value, which then gets fed into midbrain circuits that compute the prediction error, reported in the (p. 302) form of dopamine release. The dopamine signal drives updating of the value function by modu­ lating plasticity at cortico-striatal synapses (Reynolds & Wickens, 2002). Here the value function is defined over belief states and actions (motion direction judgments, typically registered by a saccadic response). In addition to explaining how animals could learn to solve the random dots task, Rao’s POMDP model offers a functional explanation of dopaminergic responses in the task. Nomoto, Schultz, Watanabe, and Sakagami (2010) found that when dot coherence is 60%, dopamine neurons ramped up their activity, peaking at the time of response. In the POMDP model, this occurs because the value is lowest at the highest entropy belief state (i.e., when the animal is completely uncertain), and increases rapidly as perceptual infor­ mation reduces the entropy; because the prediction error tracks temporal differences in the value function, this results in the observe ramping pattern.4

Structure Learning Any RL system operating in a real-world environment must not only perform hidden state inference, but must also discover the hidden states underlying its observations. This is a form of latent structure learning (Courville, Daw, & Touretzky, 2006; Gershman & Niv, 2010). In the rest of this section, I will describe several case studies illustrating how structure learning can explain various empirical lacunae that have troubled RL theories. Consider a Pavlovian fear conditioning experiment, in which a cue is repeatedly paired with an aversive outcome (e.g., a shock). Over the course of training, the cue will come to elicit an innate fear response (freezing, in the case of rat subjects). If the cue is subse­ quently extinguished, by presenting it repeatedly without a shock, the fear response will subside. If states have a one-to-one mapping with cues, then standard RL theory predicts Page 12 of 22

Reinforcement Learning and Causal Models that extinction produces unlearning of the (negative) value acquired during training. However, this prediction is problematic, because a variety of assays demonstrate that the fear memory persists despite extinction, and will re-emerge under certain circumstances (Bouton, 2004). Pavlov (1927) demonstrated that simply presenting the cue again after a retention interval was sufficient to elicit conditioned responding, a phenomenon known as spontaneous recovery (Rescorla, 2004). In another procedure, known as reinstatement, exposing the subject to an isolated shock before testing can lead to conditioned respond­ ing to the subsequently presented cue (Rescorla & Heth, 1975). These phenomena indi­ cate that the states are not identical with cues—rather, states are latent and must be in­ ferred. The problem is made difficult by the fact that nothing tells the observer how many states exist or what their properties are; hence these must be inferred as well. A principled approach to this problem can be derived by appealing to ideas from Bayesian non-parametrics, a field of statistics that deals with inference over latent structures with unbounded complexity (Gershman & Blei, 2012). Recent work has developed models of Pavlovian conditioning that use Bayesian non-parametric priors over latent causes, allow­ ing the model to simultaneously infer the number and properties of the latent causes (Gershman et al., 2010; Gershman & Niv, 2012; Soto, Gershman, & Niv, 2014). Interested readers are referred to these papers for more details; here I will simply convey a few ex­ amples of how these models are applied (see also Redish, Jensen, Johnson, & Kurth-Nel­ son, 2007, for a related, non-probabilistic approach). To a first approximation, a latent cause model (Figure 17.3) is a good representation of the true (p. 303) causal structure underlying Pavlovian conditioning experiments. Cues do not cause outcomes—the experimenter causes both cues and outcomes. That is, the ex­ perimenter is a latent cause. This shows why it is useful to think about hidden states in terms of latent causes rather than simply as expedient mental constructs. As in other do­ mains of cognition, rational analysis leads us to hypothesize that the mind has evolved the capacity to learn about and represent the underlying causal structure of the environment (Anderson, 1990).

Page 13 of 22

Reinforcement Learning and Causal Models

Figure 17.3 The latent cause theory. Each box repre­ sents the animal’s observations on a single trial. The circles represent latent causes, labeled to distinguish different causes. The upward arrows denote proba­ bilistic dependencies: observations are assumed to be generated by latent causes. The animal does not get to observe the latent causes; it must infer these by inverting the generative model using Bayes’s rule, as indicated by the downward arrow. As shown at the top of the schematic, Bayes’s rule denies the proba­ bility of latent causes conditional on observations, which is obtained (up to a normalization constant) by multiplying the probability of observations given hy­ potheticalcauses (the likelihood) and the probability of the hypothetical latent causes (the prior).

Gershman et al. (2010) argued that memory recovery following extinction occurs because training and extinction trials are assigned to separate latent causes. This partition of tri­ als into latent causes prevents unlearning of the fear memory during extinction, allowing it to return later. The theory predicts that performing training and extinction in different contexts will increase the probability of assigning them to separate latent causes. Bouton and Bolles (1979) confirmed this prediction, showing that returning the subject to the training context increases conditioned responding (an effect known as renewal). One can also reverse the order of training and extinction, so that the extinction phase be­ comes a “pre-exposure” phase, causing a retardation of learning during training known as latent inhibition (Lubow, 1973). This phenomenon is interesting because there is no re­ ward prediction error during the pre-exposure phase (assuming that values are initialized to 0), and hence no learning signal according to the TD model. The latent cause model, on the other hand, naturally explains latent inhibition in terms of changes in the joint proba­ bility of cues and outcomes (Gershman et al., 2010). Latent inhibition is also context sen­ sitive: performing pre-exposure and training in different contexts attenuates the latent in­ hibition effect (Hall & Honey, 1989). Differential context, according to the latent cause model, increases the posterior probability that the two phases were generated by sepa­ rate latent causes (Gershman et al., 2010). One might object that positing latent causes is superfluous when the different contexts are distinguished by observable stimuli. Context-dependency could therefore be captured by assuming that context acts as another cue, so that context effects are a form of com­ pound conditioning. However, this assumption runs into the problem that contexts do not act like punctate cues such as tones and lights. Contexts do not summate with other cues: pairing a previously conditioned context with a cue does not enhance responding com­ pared to a condition in which the cue is presented alone (Bouton & Swartzentruber, Page 14 of 22

Reinforcement Learning and Causal Models 1986), and pairing an extinguished context with a cue does not suppress conditioning to the cue (Bouton & Bolles, 1979). In a similar vein, contexts do not excite conditioned re­ sponding on their own (Bouton & Swartzentruber, 1986). These findings support the pro­ posal that contexts are modulatory in nature (Swartzentruber, 1995). At present, it is not clear that existing latent cause theories can adequately account for the modulatory role of context, but the findings at least cast doubt on a simple compound conditioning ac­ count. The context-dependency of renewal and latent inhibition both rely on an intact hippocam­ pus (Honey & Good, 1993; Ji & Maren, 2005), leading Gershman et al. (2010) to suggest that the ability to flexibly infer new latent causes depends crucially on the hippocampus. This suggestion fits with the work (reviewed earlier) characterizing the hippocampus as the seat of the “cognitive map,” but in this case the inferred latent causes might feed into both model-based and model-free RL. Young rats also appear to lack context-dependent renewal and latent inhibition (Yap & Richardson, 2005, 2007), possibly due to immature hippocampal development. Another factor that influences the assignment of trials to latent causes is reinforcement rate. A classic finding in Pavlovian conditioning is the partial reinforcement extinction ef­ fect: partially reinforcing the cue during training results in slower extinction (Capaldi, 1957; Wagner, Siegel, Thomas, & Ellison, 1964). This is surprising because standard RL models predict that partial reinforcement will produce a weaker value estimate that can be extinguished more easily. The latent cause model, in contrast, offers an intuitive expla­ nation: slower extinction occurs because similar reinforcement rates during training and extinction provide evidence that the two phases were generated by the same latent cause (Courville et al., 2006; Gershman & Niv, 2012). Gershman, Jones, Norman, Monfils, and Niv (2013) took this idea one step further and ex­ amined the effects of manipulating the reinforcement sequence. The logic of these studies was that large prediction errors during extinction induce the inference of a new latent cause. Thus, extinguishing gradually (by incrementally reducing the frequency with which a cue was paired with shock) should prevent the prediction errors from being large enough to induce the inference of a new latent cause, while being small enough to drive unlearning of the fear memory. The gradual extinction procedure was (p. 304) compared to a standard extinction procedure and a “gradual reverse” control, in which the cue and shock were paired with the same probability as in the gradual extinction condition but in reverse order (i.e., gradually increasing). All the conditions had a buffer of eight unrein­ forced trials at the end of extinction to ensure that conditioned responding fell to the same level across groups. Despite similar responding at the end of extinction, the groups differed strikingly in their recovery: while both the standard and gradual reverse groups showed spontaneous recovery and reinstatement, the gradual extinction group showed no evidence of recovery. This finding is consistent with the interpretation that gradual ex­ tinction led to a single latent cause assignment for both training and extinction.

Page 15 of 22

Reinforcement Learning and Causal Models These are a few examples of how latent cause models can address the problem of latent structure learning in partially observable domains. Undoubtedly, the models reviewed here are simplistic in a number of ways, and other versions attempt to address these shortcomings. For example, both Courville et al. (2006) and Soto et al. (2014) explored versions allowing multiple latent causes to be simultaneously active. Lloyd and Leslie (2013) have developed a version of the latent cause model that deals with a variety of complex instrumental learning phenomena. An important open question is how these ap­ proaches can be more tightly integrated into the RL formalism reviewed earlier, and ide­ ally furnished with detailed neurobiological correlates.

Conclusions In this chapter, I have argued that causal knowledge plays several roles in RL. First, mod­ el-based RL involves building a causal model of the environment and using this model to compute values. Second, both model-based and model-free RL rely upon inferences about latent causes in partially observable domains. For many cognitive psychologists, RL has the inescapable odor of behaviorist ideology, and indeed traditional model-free RL enshrines this ideology by embracing Thorndike’s law of effect. However, my hope is that this chapter conveys some of the ways in which theoretical ideas about RL have evolved beyond the law of effect. Moreover, some of the same formalisms invoked here appear throughout cognitive psychology. In particular, the probabilistic approach to causal learning and structure discovery has played a prominent role in the “rational analysis” of cognition (Anderson, 1990; Tenenbaum, Kemp, Griffiths, & Goodman, 2011). Modern theories of RL are now firmly ensconced in the cognitive fold.

Acknowledgments I am grateful to the many collaborators who have influenced my thinking about these top­ ics, in particular Nathaniel Daw, Yael Niv, Peter Dayan, Fabian Soto, and Ross Otto. This research was supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via Air Force Research Laboratory (AFRL), under contract FA8650-14-C-7358. The views and conclusions con­ tained herein are those of the author and should not be interpreted as necessarily repre­ senting the official policies or endorsements, either expressed or implied, of ODNI, IARPA, AFRL, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annota­ tion thereon.

References Adams, C. D. (1982). Variations in the sensitivity of instrumental responding to reinforcer devaluation. The Quarterly Journal of Experimental Psychology, 34, 77–98.

Page 16 of 22

Reinforcement Learning and Causal Models Anderson, J. R. (1990). The adaptive character of thought. Hillsdale, NJ: Lawrence Erl­ baum Associates. Balleine, B. W., & Dickinson, A. (1998). Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology, 37, 407–419. Bayer, H. M., & Glimcher, P. W. (2005). Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129–141. Bellman, R. (1957). Dynamic programming. Princeton, NJ: Princeton University Press. Bouton, M. (2004). Context and behavioral processes in extinction. Learning & Memory, 11, 485–494. Bouton, M., & Bolles, R. (1979). Contextual control of the extinction of conditioned fear. Learning and Motivation, 10, 445–466. Bouton, M., & Swartzentruber, D. (1986). Analysis of the associative and occasion-setting properties of contexts participating in a Pavlovian discrimination. Journal of Experimental Psychology: Animal Behavior Processes, 12, 333–350. Browne, C. B., Powley, E., Whitehouse, D., Lucas, S. M., Cowling, P. I., Rohlfsha­ gen, P., … Colton, S. (2012). A survey of Monte Carlo tree search methods. Computational Intelligence and AI in Games, IEEE Transactions, 4, 1–43. (p. 305)

Capaldi, E. (1957). The effect of different amounts of alternating partial reinforcement on resistance to extinction. The American Journal of Psychology, 70, 451–452. Courville, A. C., Daw, N. D., & Touretzky, D. S. (2006). Bayesian theories of conditioning in a changing world. Trends in Cognitive Sciences, 10, 294–300. Daw, N. D., Courville, A. C., & Touretzky, D. S. (2006). Representation and timing in theo­ ries of the dopamine system. Neural Computation, 18, 1637–1677. Daw, N. D., & Dayan, P. (2014). The algorithmic anatomy of model-based evaluation. Philo­ sophical Transactions of the Royal Society B: Biological Sciences, 369, 20130478. Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P., & Dolan, R. J. (2011). Model-based in­ fluences on humans’ choices and striatal prediction errors. Neuron, 69, 1204–1215. Daw, N. D., Niv, Y., & Dayan, P. (2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience, 8, 1704– 1711. Dayan, P. (1993). Improving generalization for temporal difference learning: The succes­ sor representation. Neural Computation, 5, 613–624. De Groot, A. D. (1978). Thought and choice in chess. The Hague: Mouton Publishers.

Page 17 of 22

Reinforcement Learning and Causal Models de Wit, S., & Dickinson, A. (2009). Associative theories of goal-directed behaviour: a case for animal–human translational models. Psychological Research, 73, 463–476. Dickinson, A. (1985). Actions and habits: the development of behavioural autonomy. Philo­ sophical Transactions of the Royal Society of London. B, Biological Sciences, 308, 67–78. Dolan, R. J., & Dayan, P. (2013). Goals and habits in the brain. Neuron, 80, 312–325. Eichenbaum, H. (2004). Hippocampus: Cognitive processes and neural representations that underlie declarative memory. Neuron, 44, 109–120. Elsner, B., & Hommel, B. (2001). Effect anticipation and action control. Journal of Experi­ mental Psychology: Human Perception and Performance, 27, 229–240. Fanselow, M. S. (1990). Factors governing one-trial contextual conditioning. Animal Learning & Frederick, S., Loewenstein, G., & O’Donoghue, T. (2002). Time discounting and time pref­ erence: A critical review. Journal of Economic Literature, 40, 351–401. Gershman, S. J. (2014). Dopamine ramps are a consequence of reward prediction errors. Neural Computation, 26, 467–471. Gershman, S. J., & Blei, D. M. (2012). A tutorial on Bayesian nonparametric models. Jour­ nal of Mathematical Psychology, 56, 1–12. Gershman, S. J., Blei, D. M., & Niv, Y. (2010). Context, learning, and extinction. Psycholog­ ical Review, 117, 197–209. Gershman, S. J., Horvitz, E. J., & Tenenbaum, J. B. (2015). Computational rationality: A converging paradigm for intelligence in brains, minds, and machines. Science, 349, 273– 278. Gershman, S. J., Jones, C. E., Norman, K. A., Monfils, M.-H., & Niv, Y. (2013). Gradual ex­ tinction prevents the return of fear: Implications for the discovery of state. Frontiers in Behavioral Neuroscience, 7, 164. Gershman, S. J., Markman, A. B., & Otto, A. R. (2014). Retrospective revaluation in se­ quential decision making: A tale of two systems. Journal of Experimental Psychology: Gen­ eral, 143, 182–194. Gershman, S. J., & Niv, Y. (2010). Learning latent structure: Carving nature at its joints. Current Opinion in Neurobiology, 20, 251–256. Gershman, S. J., & Niv, Y. (2012). Exploring a latent cause theory of classical conditioning. Learning & Behavior, 40, 255–268.

Page 18 of 22

Reinforcement Learning and Causal Models Glimcher, P. W. (2011). Understanding dopamine and reinforcement learning: The dopamine reward prediction error hypothesis. Proceedings of the National Academy of Sciences, 108, 15647–15654. Gold, J. I., & Shadlen, M. N. (2002). Banburismus and the brain: Decoding the relation­ ship between sensory stimuli, decisions, and reward. Neuron, 36, 299–308. Hall, G., & Honey, R. C. (1989). Contextual effects in conditioning, latent inhibition, and habituation: Associative and retrieval functions of contextual cues. Journal of Experimen­ tal Psychology: Animal Behavior Processes, 15, 232–241. Hasselmo, M. E. (2012). How we remember: Brain mechanisms of episodic memory. Cam­ bridge, MA: MIT Press. Holding, D. H., & Pfau, H. D. (1985). Thinking ahead in chess. The American Journal of Psychology, 98, 271–282. Honey, R. C., & Good, M. (1993). Selective hippocampal lesions abolish the contextual specificity of latent inhibition and conditioning. Behavioral Neuroscience, 107, 23–33. Huys, Q. J., Eshel, N., O’Nions, E., Sheridan, L., Dayan, P., & Roiser, J. P. (2012). Bonsai trees in your head: How the Pavlovian system sculpts goal-directed choices by pruning decision trees. PLoS Computational Biology, 8, e1002410. Huys, Q. J., Lally, N., Faulkner, P., Eshel, N., Seifritz, E., Gershman, S. J., … Roiser, J. P. (2015). Interplay of approximate planning strategies. Proceedings of the National Acade­ my of Sciences, 112, 3098–3103. Ji, J., & Maren, S. (2005). Electrolytic lesions of the dorsal hippocampus disrupt renewal of conditional fear after extinction. Learning and Memory, 12, 270–276. Kaelbling, L. P., Littman, M. L., & Cassandra, A. R. (1998). Planning and acting in partial­ ly observable stochastic domains. Artificial Intelligence, 101, 99–134. Kehoe, E. J. (1982). Conditioning with serial compound stimuli: Theoretical and empirical issues. Experimental Behavior, 1, 30–65. Keramati, M., Dezfouli, A., & Piray, P. (2011). Speed/accuracy trade-off between the habit­ ual and the goal-directed processes. PLoS Computational Biology, 7, e1002055. Kiernan, M., & Westbrook, R. (1993). Effects of exposure to a to-be-shocked environment upon the rat’s freezing response: Evidence for facilitation, latent inhibition, and perceptu­ al learning. The Quarterly Journal of Experimental Psychology, 46, 271–288. Lansink, C. S., Goltstein, P. M., Lankelma, J. V., McNaughton, B. L., & Pennartz, C. M. (2009). Hippocampus leads ventral striatum in replay of place-reward information. PLoS Biology, 7, e1000173.

Page 19 of 22

Reinforcement Learning and Causal Models Lloyd, K., & Leslie, D. S. (2013). Context-dependent decision-making: a simple bayesian model. Journal of The Royal Society Interface, 10, 20130069. Lubow, R. E. (1973). Latent inhibition. Psychological Bulletin, 79, 398–407. Ludvig, E. A., Sutton, R. S., & Kehoe, E. J. (2012). Evaluating the TD model of clas­ sical conditioning. Learning & Behavior, 40, 305–319. (p. 306)

Mackintosh, N. (1975). A theory of attention: Variations in the associability of stimuli with reinforcement. Psychological Review, 82, 276–98. Niv, Y. (2009). Reinforcement learning in the brain. Journal of Mathematical Psychology, 53, 139–154. Niv, Y., & Schoenbaum, G. (2008). Dialogues on prediction errors. Trends in Cognitive Sciences, 12, 265–272. Nomoto, K., Schultz, W., Watanabe, T., & Sakagami, M. (2010). Temporally extended dopamine responses to perceptually demanding reward-predictive stimuli. The Journal of Neuroscience, 30, 10692–10702. O’Keefe, J., & Nadel, L. (1978). The Hippocampus as a Cognitive Map. Oxford: Clarendon Press. Otto, A. R., Gershman, S. J., Markman, A. B., & Daw, N. D. (2013). The curse of planning dissecting multiple reinforcement-learning systems by taxing the central executive. Psy­ chological Science, 24, 751–761. Otto, A. R., Raio, C. M., Chiang, A., Phelps, E. A., & Daw, N. D. (2013). Working-memory capacity protects model-based learning from stress. Proceedings of the National Academy of Sciences, 110, 20941–20946. Pavlov, I. (1927). Conditioned reflexes. Oxford: Oxford University Press. Pearce, J. M. (1980). A model for pavlovian learning: Variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychological Review, 87, 532–552. Rao, R. P. (2010). Decision making under uncertainty: A neural model based on partially observable Markov decision processes. Frontiers in Computational Neuroscience, 4, 146. Redish, A. D., Jensen, S., Johnson, A., & Kurth-Nelson, Z. (2007). Reconciling reinforce­ ment learning models with behavioral extinction and renewal: Implications for addiction, relapse, and problem gambling. Psychological Review, 114, 784–805. Reid, A. K., & Staddon, J. (1998). A dynamic route finder for the cognitive map. Psycholog­ ical Review, 105, 585–601. Rescorla, R. A. (2004). Spontaneous recovery. Learning & Memory, 11, 501–509.

Page 20 of 22

Reinforcement Learning and Causal Models Rescorla, R. A., & Heth, C. D. (1975). Reinstatement of fear to an extinguished condi­ tioned stimulus. Journal of Experimental Psychology: Animal Behavior Processes, 1, 88– 96. Rescorla, R. A., & Wagner, A. R. (1972). A theory of of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. Black & W. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 64–99). New York: Ap­ pleton-Century-Crofts. Reynolds, J. N., & Wickens, J. R. (2002). Dopamine-dependent plasticity of corticostriatal synapses. Neural Networks, 15, 507–521. Roitman, J. D., & Shadlen, M. N. (2002). Response of neurons in the lateral intraparietal area during a combined visual discrimination reaction time task. The Journal of Neuro­ science, 22, 9475–9489. Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and re­ ward. Science, 275, 1593–1599. Schultz, W., & Dickinson, A. (2000). Neuronal coding of prediction errors. Annual Review of Neuroscience, 23, 473–500. Simon, D. A., & Daw, N. D. (2011). Environmental statistics and the trade-off between model-based and TD learning in humans. In Advances in neural information processing systems (pp. 127–135). Soto, F. A., Gershman, S. J., & Niv, Y. (2014). Explaining compound generalization in asso­ ciative and causal learning through rational principles of dimensional generalization. Psy­ chological Review, 121, 526–558. Stachenfeld, K. L., Botvinick, M., & Gershman, S. J. (2014). Design principles of the hip­ pocampal cognitive map. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, & K. Weinberger (Eds.), Advances in neural information processing systems 27 (pp. 2528– 2536). Curran Associates. Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9–44. Sutton, R. S. (1990). Integrated architecture for learning, planning, and reacting based on approxi- mating dynamic programming. In Proceedings of the seventh international conference (1990) on machine learning (pp. 216–224). Sutton, R. S., & Barto, A. G. (1990). Time-derivative models of Pavlovian reinforcement. In M. Gabriel & J. Moore (Eds.), Learning and computational neuroscience: Foundations of adaptive networks (pp. 497–537). Cambridge, MA: MIT Press. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press. Page 21 of 22

Reinforcement Learning and Causal Models Swartzentruber, D. (1995). Modulatory mechanisms in Pavlovian conditioning. Animal Learning & Behavior, 23, 123–143. Tenenbaum, J. B., Kemp, C., Griffiths, T. L., & Goodman, N. D. (2011). How to grow a mind: Statistics, structure, and abstraction. Science, 331, 1279–1285. Thorndike, E. L. (1911). Animal intelligence: Experimental studies. New York: Macmillan. Tolman, E. C. (1948). Cognitive maps in rats and men. Psychological Review, 55, 189–208. Wagner, A., Siegel, S., Thomas, E., & Ellison, G. (1964). Reinforcement history and the ex­ tinction of conditioned salivary response. Journal of Comparative and Physiological Psy­ chology, 58, 354–358. Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine Learning, 8, 279–292. Yap, C. S., & Richardson, R. (2005). Latent inhibition in the developing rat: An examina­ tion of context-specific effects. Developmental Psychobiology, 47, 55–65. Yap, C. S., & Richardson, R. (2007). Extinction in the developing rat: An examination of renewal effects. Developmental Psychobiology, 49, 565–575.

Notes: (1.) In practice, storing values in a look-up table for MDPs with many states is inefficient. For this reason, most algorithms use some form of function approximation (Sutton & Bar­ to, 1998). (2.) Note that this analysis assumes that the association between B and the absence of re­ ward is not encoded (see Gershman, Blei, & Niv, 2010, for more discussion of this point). (3.) According to a related account, the transition from model-based to model-free control can be understood in terms of experience. (4.) More precisely, ramps will occur when the value function is a convex function of the state representation (Gershman, 2014).

Samuel J. Gershman

Department of Psychology and Center for Brain Science Harvard University Cam­ bridge, Massachusetts, USA

Page 22 of 22

Causation and the Probability of Causal Conditionals

Causation and the Probability of Causal Conditionals   David E. Over The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.3

Abstract and Keywords Indicative and counterfactual conditionals are central to reasoning in general and causal reasoning in particular. Normative theorists and psychologists have held a range of views on how natural language indicative and counterfactual conditionals, and probability judg­ ments about them, are related to causation. There is the question of whether “causal” conditionals, referring to possible causes and effects, can be used to explain causation, or whether causation can be used to explain the conditionals. There are questions about how causation, conditionals, Bayesian inferences, conditional probability, and imaging are re­ lated to each other. Psychological results are relevant to these questions, including find­ ings on how people make conditional inferences and judgments about possibilities, condi­ tionals, and conditional probability. Deeper understanding of the relation between causa­ tion and conditionals will come in further research on people’s reasoning from counter­ factuals as premises, and to counterfactuals as conclusions. Keywords: causation, causal conditionals, indicative conditionals, counterfactuals, conditional probability, Bayesian inference, imaging

There is a close relation in natural language between conditionals and statements about causation. Philosophers and logicians, and more recently psychologists, have tried to es­ tablish the exact nature of this relationship. It is clear that people often use certain types of conditionals, in ordinary and scientific communication, to refer to possible causes and effects. For example, doctors might use an indicative conditional to make a prediction about a patient: (1) If he continues to gain weight, he will get hypertension. When the doctors are less confident that the patient will continue to gain weight, they might use a subjunctive conditional or counterfactual: (2) If he were to continue to gain weight, he would get hypertension.

Page 1 of 33

Causation and the Probability of Causal Conditionals The doctors might also assert this subjunctive conditional or counterfactual: (3) If he were not to gain weight, he would not get hypertension. Conditionals like (1)–(3) refer to possible events in their antecedents that could be the cause of the events referred to in the consequents: they are of the form if cause then ef­ fect. We can label these causal conditionals, but by doing so, we must be careful not to beg the question of exactly how they are related to causation. It may be that conditionals like (1), or (2), together with (3), should be used to explain causation, or perhaps it is the other way around, and causation should be used in an account of these conditionals. Studying the psychology of causal conditionals must at least reveal a great deal about people’s understanding of causation. Many psychologists use the term “counterfactual” for a subjunctive conditional the an­ tecedent and consequent of which are definitely known, or presupposed, to be false. As­ sume that the patient (p. 308) continues to overeat, gains weight, and gets hypertension. The doctors might then use the past tense subjunctive conditional about the patient: (4) If he had not gained weight, he would not have gotten hypertension. By using (4), the doctors strongly suggest that the patient has gained weight and has got­ ten hypertension. It is (4), and not (2) and (3), and still less (1), that many psychologists would classify as a “counterfactual” (Thompson & Byrne, 2002). This narrow use of “coun­ terfactual” is suited to research on certain problems in judgment and decision-making, developmental psychology, and social psychology. Counterfactuals in this narrow sense can be used to express emotions, especially of regret, but also other emotions (Kahneman & Miller, 1986; Teigen, 2005; Zeelenberg & van Dijk, 2005). The patient’s doctors might use (4) when expressing their regret that they did not do more to prevent the patient’s weight gain. There is also interest in developmental psychology in how children learn these “counterfactuals,” with antecedents and consequents known to be false, and their associated emotions (Beck, Riggs, & Burn, 2011; Perner & Rafetseder, 2011). Most linguists, logicians, and philosophers do not use “counterfactual” in this narrow way (Woodward, 2011). They would classify (2), (3), and (4) together as “counterfactuals” in a wide sense, and some of them would also classify indicative conditionals like (1) with these counterfactuals, logically and semantically, if not pragmatically (Bennett, 2003; Kaufmann, 2013; Kratzer, 2012). We will use “subjunctive conditional” and “counterfactu­ al” interchangeably for conditionals like (2)–(4) in this chapter. This use is better suited to an investigation of the relation between conditionals and causation. This relation can hardly depend on whether the antecedent and consequent of the conditional are known, or presupposed, to be false by someone who asserts the conditional. A statement about weight gain in the patient causing hypertension can be made in a context in which it is not definitely known that the patient will, or will not, continue to gain weight and get hy­ pertension. Even so, we will sometimes refer to “narrow” counterfactuals, like (4), which have a secondary role in the study of conditionals and causation. When event p has come

Page 2 of 33

Causation and the Probability of Causal Conditionals before event q, the question can arise whether p caused q, and then it is relevant to ask whether, if p had not occurred, q would not have occurred. Philosophers have long examined the special relation between causation and counterfac­ tuals (Menzies, 2014). More recently, cognitive scientists and psychologists have joined in (Mandel, 2005; Sloman, 2013; Spellman, Kincannon, & Stose, 2005). Some philosophers have also long argued for a counterfactual analysis of causation, which states that condi­ tionals like (2) and (3) are more fundamental than causation and can be used to analyze its meaning (Lewis, 1973a). In these theories, what it means to state that gaining weight will cause hypertension, or probably cause hypertension, in the patient is that (2) and (3) are both true, or both probably true. When the patient has gained weight and has gotten hypertension, what it means to state that the weight gain caused the hypertension, or probably caused the hypertension, is that (4) is true, or probably true (Menzies, 2014). Bayesian accounts of thinking and reasoning are having a profound effect on the psychol­ ogy of reasoning, and on cognitive psychology in general (Chater & Oaksford, 2008; Elqayam & Over, 2013; Over & Cruz, in press). Bayesians hold that almost all thinking and reasoning, including about causation and conditionals, takes place in contexts of at least some uncertainty, where people have to make subjective probability judgments. The doctors in our example would not be absolutely certain of (1)–(4), but would have some lower degree of confidence in these statements and might even use “probably” to qualify them explicitly: (5) If he continues to gain weight, he will probably get hypertension. (6) If he were not to continue to gain weight, he would probably not get hyperten­ sion. There is the probability of causal conditionals of the form if cause then effect, illustrated by (5) and (6), and also of diagnostic conditionals of the form if effect then cause (Oaks­ ford & Chater, 2010). Diabetes can cause a high blood glucose level, and doctors can as­ sert the diagnostic conditional that, if the patient has hyperglycemia, then he probably has diabetes. Other types of conditional can be identified, and the probabilities of some of these have been studied in the psychology of reasoning (Bonnefon & Sloman, 2013; Dou­ ven & Verbrugge, 2010; Ohm & Thompson, 2006; Politzer, Over, & Baratgin, 2010), but we will mostly focus in this chapter on causal conditionals, with some attention to diag­ nostic conditionals. To understand the relation between natural language conditionals and causation in people’s (p. 309) judgments, it is necessary to have an account of the conditionals and their probabilities. We will begin with some basic points about conditional reasoning, and will then introduce and examine theories of the conditional that have most influenced psy­ chological research.

Page 3 of 33

Causation and the Probability of Causal Conditionals

Modus Ponens Conditionals are central, not only to causal reasoning, but to all reasoning above the sim­ plest level, and studying the use of causal conditionals in inferences has to be part of de­ termining the exact relation between causation and conditionals. There are a great many psychological studies of forms of inference for the indicative conditional, but very few of inference forms for counterfactuals (Evans & Over, 2004). In fact, it should be a cause of some embarrassment and regret in the psychology of reasoning that there are so few of these studies compared to the substantial literature in other parts of psychology on coun­ terfactuals and the emotions. The archetypal inference form in conditional reasoning, whether causal or other, is modus ponens (MP): inferring the conclusion q from the two premises if p then q (the major premise) and p (the minor premise). Thompson and Byrne (2002) have results for MP with narrow sense counterfactuals as major premises that reveal an important point about them. The endorsement rate of MP as a valid inference for indicative conditionals is close to 100% when participants in an experiment are asked to assume its two premises, if p then q and p, and to say whether its conclusion q necessarily follows (Evans & Over, 2004). One possible position on narrow sense counterfactuals, like (4), is that they logi­ cally imply that their antecedents and consequents are actually false, that in the actual state of affairs, the patient did gain weight and get hypertension (Bennett, 2003). If this position were correct, MP for a counterfactual like (4) would present participants with contradictory premises. The minor premise, that the patient did not in fact gain weight, would contradict the major premise (4) itself, and the resulting uncertainty about these premises could make the endorsement rate fall. Thompson and Byrne, however, found a high endorsement rate for MP with narrow sense counterfactuals like (4) as the major premise. Somewhat more formally, participants were happy to infer q from the major premise if p had been the case then q would have been the case and the minor premise p is actually the case. This result supports the conclusion that if p had been the case then q would have been the case does not logically imply not-p, but that not-p is only a pragmatic suggestion or presupposition by the speaker who uses the counterfactual. Thompson and Byrne used a dialogue technique to counteract any pragmatic problems with the assertion of a counterfactual. Applying this to our example, we could imagine that one doctor might assert (4) after losing touch with the patient and becoming unaware of his latest lifestyle. But another doctor could add the minor premise, “But after a struggle with his diet, the patient did not gain weight.” The first doctor would then doubtless infer by MP from (4) that, in that case, the patient did not get hyperten­ sion. Another interesting finding in Thompson and Byrne (2002) is on the relative endorsement rates of MP and another valid inference, modus tollens (MT): inferring the conclusion notp from the major premise if p then q and the minor premise not-q, or equivalently infer­ ring p (not-not-p) from the major premise if not-p then not-q and the minor premise q (notnot-q). With an indicative conditional as the major premise, the endorsement rate for MP Page 4 of 33

Causation and the Probability of Causal Conditionals is significantly higher than the endorsement rate for MT (Evans & Over, 2004). But for narrow sense counterfactuals, Thompson and Byrne found that the endorsement rate of MT was higher and more like that for MP. This result can also be explained by the prag­ matic suggestions or presuppositions of these counterfactuals. When (4) is used, the pragmatic background is that the patient did gain weight and got hypertension. People can use their beliefs about this suggested or presupposed background to respond that the patient gained weight when they are given the premise that he got hypertension. They do not, subjectively, have to employ the MT inference form at all; they have only to use their pragmatic background knowledge or beliefs. In traditional psychological experiments on MP, participants were asked to assume that the premises if p then q and p were true and then to state whether the conclusion q validly, or necessarily, followed under these assumptions. These are traditional deductive reasoning instructions: the participants are asked to deduce whether a conclusion neces­ sarily follows from assumptions they are asked to make (Evans, Handley, Neilens, & Over, 2010; Evans & Over, 2013). But even with instructions to assume the premises, the en­ dorsement rate of MP can fall from its very high level when extra premises are intro­ duced, as Byrne (1989) found. Consider (1) again, and suppose that the additional condi­ tional premise “If the patient gains weight in muscle, he will (p. 310) not get hyperten­ sion” is added to (1), along with the minor premise that the patient will gain weight. The additional conditional premise here introduces uncertainty about the major premise for MP, and the endorsement rate could fall (Politzer, 2005; Stevenson & Over, 1995). In a Bayesian analysis, a limitation of experiments like those in Byrne (1989) is that they ask the participants to do something unnatural and, in effect, contradictory—that is, to as­ sume that some arbitrarily given premises are true, but also effectively to doubt the con­ ditional that is to be used as the major premise of MP (and of other inference forms stud­ ied). For the Bayesian, studying reasoning under arbitrary assumptions with a binary conclu­ sion, about validity or invalidity, is not directly relevant to most reasoning in ordinary or even scientific contexts. Ordinary people generally make inferences from beliefs that they hold with some reasonable degree of confidence, and scientists from hypotheses that they judge have some reasonable “prior” probability of holding. Neither ordinary people nor scientists are much concerned with the type of arbitrary assumptions that they might be asked to make in a traditional reasoning experiment. The doctors in our example can use MP to become fairly confident that the patient will get hypertension after they have asserted (1) with confidence and have learned that he has continued to gain weight. But they cannot make rational decisions or predictions about their patients if they reason from assumptions instead of beliefs that they hold with some well-justified confidence. After all, they could, if they wished, simply assume that, if their patient continues to gain weight, then he will not get hypertension. Only clearly, making inferences from this mere assumption about their patient would not help them

Page 5 of 33

Causation and the Probability of Causal Conditionals make rational decisions about his treatment or well-justified predictions about whether he will get hypertension. Bayesians do not question the validity of MP, but even in a valid inference, the conclusion does not have to be true if even one of the premises from which it is inferred is false. The conclusion of a valid inference can be false when one of its premises is false, and the con­ clusion can also be uncertain when one of the premises is uncertain. It might turn out that (1) is actually false. Perhaps the patient continues to gain some weight but does this in a “good” way. He might adopt a healthy diet and get much more exercise, replacing fat with muscle and gaining some weight overall. He becomes highly physically fit and not at all in danger of hypertension. A possibility that can make a conditional like (1) false is called a disabling condition. When it is uncertain whether a disabling condition exists or will occur, the conditional can be uncertain to some degree. The patient’s doctors might not be able to rule out the possibility that he will gain weight with muscle instead of fat, and they might not know whether other disabling conditions hold, like his having genetic protection against hypertension. They will not then have full confidence in (1), but will have some tendency to assert (5). Notice as well that (3) has possible disabling condi­ tions; for instance, the patient might have a genetic predisposition to hypertension, what­ ever his weight, and perhaps the qualified (6) should be asserted. But supposing the doc­ tors learn that the patient has continued to gain weight (without learning how he did this), they will only conclude with fairly high, and not full, confidence that he will get hy­ pertension. In these circumstances, the doctors are absolutely right, by Bayesian stan­ dards, to have some doubt about the conclusion of MP, for all the validity of the inference. The Bayesian position implies that the uncertainty of causal conditionals will affect confi­ dence in the conclusions of causal reasoning. There is experimental support for this impli­ cation. The results so far only concern causal indicative conditionals, and not counterfac­ tuals, but nevertheless support the Bayesian conclusion that most reasoning, including causal reasoning, takes place in a context of some uncertainty about the premises. Cum­ mins (1995) showed that participants in an experiment had less confidence in the conclu­ sions of MP where the major premise had relatively many disablers compared to conclu­ sions of MP where the major premise had few disablers (see as well Cummins, Lubart, Alksnis, & Rist, 1991, and both articles on other conditional inferences). Using examples from Cummins, we can compare these two instances of MP: If John studied hard, then he did well on the test. John studied hard. John did well on the test. If Mary jumped into the swimming pool, then she got wet. Mary jumped into the swimming pool. Mary got well. Page 6 of 33

Causation and the Probability of Causal Conditionals The conditional about John has more disabling conditions than the conditional about Mary. John might work hard, but not efficiently or intelligently, (p. 311) or he might be ill on the day of the test and unable to perform well. But there are relatively few ways Mary can jump into the pool and not get wet. The Bayesian implication is that people will have less confidence in the MP conclusion about John than in the MP conclusion about Mary (Oaksford & Chater, 2007, Chapter 19 in this volume; Stevenson & Over, 1995). This loss of confidence is confirmed in experiments (see also Cummins, 1995; De Neys, 2010; Fern­ bach & Erb, 2013; and Politzer & Bonnefon, 2006). Cummins and others in her line of research have not used traditional deductive reasoning instructions, but rather pragmatic or belief-based reasoning instructions (Evans et al., 2010; Evans & Over, 2013). These are best suited to the Bayesian study of belief-based reasoning (Cruz, Baratgin, Oaksford, & Over, 2015; Evans, Thompson, & Over, 2015; Singmann, Klauer, & Over, 2014). In these instructions, participants in an experiment are not asked to make arbitrary assumptions, but to consider conditionals which they might have beliefs about, or can generate plausible disabling conditions for, as in the preceding examples about John and Mary. They are not asked whether the conclusion validly or nec­ essarily follows from the premises turned into assumptions, but rather for a judgment of how confident they are in the conclusion given some premises that are uncertain to some degree. Confidence in the conclusions of MP inferences can be decreased just by present­ ing the major and minor premises as parts of a conversation, with no mention of making assumptions, and by casting doubt on the minor premise, as well as the major premise, by describing it as asserted by someone who lacks the relevant knowledge or expertise to make a reliable judgment (Stevenson & Over, 1995, 2001). For example, when the patient claims that he has not gained weight, rather than a doctor after examining him, we could have less confidence in the conclusion of MP with (3) as the major premise. It is some­ times said that the valid inference MP is suppressed when participants lose confidence in a conclusion because of uncertainty in the premises. But this use can be misleading when it is confidence in the conclusion that is suppressed (see Over & Cruz, in press, on the de­ finition of “suppression”; and Cariani & Rips, in press, Politzer, 2005, and Rehder, 2014, for a range of views of suppression). We know perfectly well that MP is a valid inference, but we can rightly have low confidence in its conclusion when we know of many disabling conditions for its major premise, or that this premise or the minor premise has been as­ serted by an unreliable person, like a patient with a known tendency to be overoptimistic about his health. Studying probability judgments about causal conditionals is essential for giving an ac­ count of causal reasoning and the relation between causation and conditionals. But to make this study, it is necessary to have a theory of causal conditionals and how people make judgments about their probabilities. For conditionals like (1), (2), and (3) to be more fundamental than causation, there has to be an account of them and their probabilities that does not make reference to causation. A couple of possibilities will be examined in the next two sections.

Page 7 of 33

Causation and the Probability of Causal Conditionals We must first exclude from consideration here what might be called explicitly causal con­ ditionals, in which the consequent explicitly states that the antecedent event will cause the consequent event. Instead of asserting (1), the doctors could have used an explicitly causal conditional by stating that, if the patient continues to gain weight, it will cause him to get hypertension. Some ordinary words contain causation in their meanings. The state­ ment “If the child eats the daffodil bulb, it will poison her” basically means that if she eats the bulb, it will cause her to be ill. It is necessary to exclude these conditionals to avoid begging the question of how causation and conditionals are related and trivial explana­ tions of the relationship. This is not to say that there should be no psychological investi­ gation of explicitly causal conditionals. It would be of interest to have studies of them, al­ though there do not appear to be any in the psychological literature.

The Material Conditional The material conditional (or material implication) is the conditional of elementary proposi­ tional logic (Arló-Costa, 2014; Edgington, 2014). This conditional is logically equivalent to the disjunction, not-p or q, and has the same truth table (see Table 18.1). It is true when p & q holds in the first (p. 312) cell of the table, false when p & not-q holds in the second cell, true when not-p & q holds in the third cell, and true when not-p & not-q holds in the fourth cell. It is said to be truth functional because its truth, or falsity, is always the same in these four cells. Table 18.1 The Truth Table for the Material Conditional, not-p or q q p

1

0

1

1

0

0

1

1

1 = true and 0 = false. According to the mental model theory of Johnson-Laird and Byrne (1991, 2002), people’s fully explicit mental models of a natural language conditional correspond to the cells of Table 18.1 in which the material conditional not-p or q holds: the p & q, not-p & q, and not-p & not-q cells. Their (1991) example was “If Arthur is in Edinburgh then Carol is in Glasgow.” They pointed out that this conditional is true supposing Arthur is in Edinburgh and Carol is in Glasgow (p & q), and false when Arthur is in Edinburgh and Carol is not in Glasgow (p & not-q). But when they asked whether it is true or false supposing Arthur is not in Edinburgh (the not-p & q and not-p & not-q cases), they replied, “It can hardly be false, and so, since the propositional calculus allows only truth or falsity, it must be true” (1991, p. 7, and note also pp. 73–74). The material conditional not-p or q validly fol­ lows from not-p, and not-p or q also validly follows from q. Thus Johnson-Laird and Byrne Page 8 of 33

Causation and the Probability of Causal Conditionals (1991, 2002) argue it is “valid” to infer a natural language conditional if p then q from the single premise not-p, and from the single premise q. Arthur’s being in Edinburgh might be the cause of Carol’s being in Glasgow, but the defi­ nition of the material conditional makes reference only to the four logical possibilities of the truth table for not-p or q, and not at all to causation. The material conditional is more fundamental, in this sense, than causation, but it is highly questionable whether it can be used in a theory of causation. The problems are well illustrated in mental model theory. The probability of natural language conditional, P(if p then q), should increase, according to mental model theory, as P(not-p) increases, and as P(q) increases, because if p then q is supposed to follow validly from not-p, and also from q. Indeed, P(if p then q) = P(not-p or q) “should” hold (Byrne & Johnson-Laird, 2009), since if p then q has exactly the same mental models as not-p or q. The identity P(if p then q) = P(not-p or q) follows in mental model theory not only for indicative conditionals, but also for subjunctive and “narrow” counterfactuals. These conditionals are equally supposed to have the same mental models as not-p or q (Johnson-Laird & Byrne, 1991, 2002), and that implies that their probabili­ ties must be identical to P(not-p or q) as well (Goldvarg & Johnson-Laird, 2001, apply this account to explicit statements of causation). The following conditionals might be used in our decision-making: (7) If we take up cigarette smoking, our health will improve. (8) If we were to take up cigarette smoking, our health would improve. (9) If we were not to take up cigarette smoking, our health would not improve. Suppose we have been absolutely convinced of the dangers of cigarette smoking, and have firmly decided never to take up smoking. That makes it highly probable that we will not change our minds and take up cigarette smoking. Then the “correct” probability judg­ ment for us to make, by mental model theory (Byrne & Johnson-Laird, 2009), is that (7) and (8) are highly probable. These conditionals are all of the general form if p then q, and with P(not-p) very high, P(not-p or q) will be very high, and mental model theory implies that P(if p then q) = P(not-p or q). On top of that, suppose we are also confident that our health will not improve. The best we can hope for is that we will stay in a steady state of average health for our age. Then another “correct” probability judgment, by mental mod­ el theory, is that (9) is highly probable, for (9) is equivalent to not-p or not-q in this theory, and of course P(not-p or not-q) is high when P(not-q) is high. It is clear that, if mental model theory is the correct account of natural language condi­ tionals, then these conditionals cannot be used in a counterfactual analysis of causation. This analysis implies that cigarette smoking will probably cause our health to improve if (8) and (9) are both probable, and in our example, mental model theory implies that both (8) and (9) are highly probable. This example does not create an intuitive problem for the counterfactual analysis of causation. Conditional (8) at least is intuitively highly improba­ ble, but it is highly probable by mental model theory, and so is conditional (9). If this theo­ ry of conditionals were correct, and were added to the counterfactual analysis of causa­ Page 9 of 33

Causation and the Probability of Causal Conditionals tion, then it would absurdly follow that there is a good reason to take up cigarette smok­ ing: it would cause our health to improve. We have used probability to illustrate one of the “paradoxes” of trying to claim that the material conditional gives the meaning of natural language (p. 313) conditionals (Evans & Over, 2004; Edgington, 1995, 2014). The two inferences, of if p then q from not-p alone and from q alone, are called “paradoxes” because of their absurd consequences when if p then q is a natural language indicative or subjunctive conditional. Not only is it norma­ tively unacceptable to hold that P(if p then q) = P(not-p or q) for causal conditionals, but this claim is strongly disconfirmed in experiments, as we will explain later in the section “Experiments on the Probabilities of Realistic Causal Conditionals” (for some of these ex­ periments, see Feeney & Handley, 2011; Over, Hadjichristidis, Evans, Handley, & Sloman, 2007; Haigh, Stewart, & Connell, 2013; and Singmann et al., 2014; and for evidence against mental model theory, see Ali, Chater, & Oaksford, 2011; and Sloman, Barbey, & Hotaling, 2009). Johnson-Laird, Khemlani, and Goodwin (2015) present a revision of men­ tal model theory in which the paradoxical inferences are no longer “valid” (see also John­ son-Laird & Khemlani, Chapter 10 in this volume). This revision is in its early stages, and we will not critique it here (but see Baratgin, Douven, Evans, Oaksford, Over, & Politzer, 2015; Over & Cruz, in press).

Conditionals and Close Possibilities The main problem with representing a causal conditional with a material conditional is that the material conditional is true whenever its antecedent is false or its consequent is true, and that certainly does not hold for causal conditionals. It is highly unlikely that, if we take up cigarette smoking, our health will improve, even though we are highly unlike­ ly to take up cigarette smoking. But if we disagree that a causal conditional, or any other natural language conditional, “must” be true when its antecedent is false, how are we to tell when it is true? In Table 18.2, how are we to replace the question marks in the cells of the false antecedent cases, the not-p & q and not-p & not-q cells? Table 18.2 The Table for if p then q with Questions Marks to Be Completed by a NonTruth Functional Theory q p

1

0

1

1

0

0

?

?

1 = true, 0 = false.

Page 10 of 33

Causation and the Probability of Causal Conditionals Ramsey (1929/1990) suggested the most influential answer to this question, which has become known as the Ramsey test (Edgington, 1995; Evans & Over, 2004; Oaksford & Chater, 2007). Ramsey’s original suggestion was terse and in a footnote. Stalnaker (1968) interpreted it as a procedure in which people evaluate a conditional, if p then q, by hypo­ thetically adding p to their stock of beliefs, making minimal changes to preserve consis­ tency, and then assessing whether or not q holds. Also being more precise about the test, Oaksford and Chater (2011) and Pearl (2013) have developed versions of it in cognitive science. Using Stalnaker’s form of the Ramsey test, the patient’s doctors would process (1) and (2) by first hypothetically supposing that the patient gained weight. Then under this supposi­ tion, they would assess the likelihood of his developing hypertension. Their scientific knowledge of possible disabling conditions could affect the result of the test. If they knew of no likely disabling conditions, they could infer that (1) and (2) were definitely true. In this case, “true” would replace the question marks in Table 18.2 for these conditionals. In a Ramsey test of (7) and (8), we would be aware of practically nothing but disabling con­ ditions, and of much evidence linking cigarette smoking to serious health problems, lead­ ing us to conclude that these conditionals are definitely false, or at least highly improba­ ble. Now “false” would replace the questions marks in Table 18.2 for these conditionals. A Ramsey test of (7) and (8) certainly does not give us a good reason to take up smoking. A conditional that is processed by the Ramsey test is not truth functional; in particular, it is not like the material conditional and always true whenever its antecedent is false, in the not-p & q and not-p & not-q cells. Stalnaker (1968) asked what is being represented in a Ramsey test when people make a hypothetical supposition with minimal changes to their beliefs. His answer was that the result of the minimal changes is a representation of the closest possible world in which the supposition holds. He defined a natural language conditional if p then q as true if and only if q is true in the closest possible world (to the actual world) in which p is true. Lewis argued that, in some cases, there might not be a unique closest possible world, and he de­ fined the conditional as (non-vacuously) true if and only if q is true in all the closest possi­ ble worlds (to the actual world) in which p is true (Lewis, 1973a). These analyses yield ad­ vanced logical systems in which conditionals are (p. 314) not truth functional (Arló-Costa, 2014; Edgington, 2014; Lewis, 1973b). Consider Table 18.2 when if p then q is (7). The first cell of the table represents a greatly simplified possible world in which we take up cigarette smoking and our health improves p & q, making (7) true. People who know of the dangers of smoking would judge this cell to be a highly “distant” and improbable possibility. The second cell is a world in which we take up cigarette smoking, and our health does not improve, p & not-q. In this possibility, (7) is of course false. The third and fourth cells are the possibilities that are mostly likely to represent the actual world, in which we do not take up cigarette smoking and our health either does or does not improve: the not-p & q and not-p & not-q worlds. The clos­ est world to these two not-p worlds in which p is true is that of the second cell, the p ¬-q world, with the p & q world more distant, making (7) false in not-p & q and not-p & Page 11 of 33

Causation and the Probability of Causal Conditionals not-q. Table 18.3 has the overall result of these evaluations for (7), showing what is intu­ itively correct: (7) is false in the possible worlds most likely to be the actual world. Stalnaker (1968, 1975) saw the difference between indicative conditionals and counter­ factuals as pragmatic. For him, speakers would use (1) or (2), and (7) or (8), depending on how likely, or unlikely, they believed the antecedent to be, and the only difference be­ tween (3) and (4) would be the passing of time and the strength of the belief that the an­ tecedent was false. Lewis (1973b) called his book Counterfactuals, but he accepted (pp. 3–4) that his topic was wider than his title suggests, although he was rather vague about how wide it was. He followed Adams (1970) in distinguishing between indicatives and counterfactuals in the past tense, like the following famous examples: “If Oswald did not kill Kennedy then someone else did.” “If Oswald had not killed Kennedy then someone else would have.” Table 18.3 The Table for Conditional (7) about Cigarette Smoking q p

l

0

1

l

0

0

0

0

1 = true and 0 = false. However, Lewis would agree with Stalnaker on (1)–(4) and (7)–(9): like Stalnaker, he would apply his theory to all these conditionals. It might be charged that the possible worlds of Stalnaker and Lewis are too philosophical to be relevant to psychology, but psychologists have found that people have firm intu­ itions about the closeness of possibilities, which affect their reasoning and their emo­ tions. Kahneman and Tversky (1982) proposed a simulation heuristic that is similar to, if not identical with, the Ramsey test when used for evaluating a counterfactual conditional if p then q: simulating the result of p holding and assessing how far that supports q (Evans & Over, 2004; Gerstenberg, Goodman, Lagnado, & Tenenbaum, 2014). Kahneman and Miller (1986), referring to Lewis (1973b), theorized about what is “mutable” when people think about close possibilities, what people change to represent these possibilities. In one of Kahneman and Miller’s experimental scenarios, participants were asked to com­ pare Mr. Adams, who had a car accident on his regular route home, with Mr. White, who had an accident on a route he only took for a change of scenery. The participants judged that Mr. White would be more upset than Mr. Adams. Mr. White would think of taking his usual route home as a “close” possibility, and Mr. Adams would consider another, non-reg­ ular route for him a “distant” possibility. Much psychological research followed on Page 12 of 33

Causation and the Probability of Causal Conditionals people’s thoughts about the relative closeness of possibilities and how that affects their counterfactual and causal judgments and their emotions (Hoerl, McCormack, & Beck, 2011; Mandel, Hilton, & Catellani, 2005). Lewis (1973a) argued for the counterfactual analysis of causation and claimed that he could give this in terms of his theory of counterfactuals, and since ordinary people do make judgments about the closeness of possibilities, his analysis might be relevant to de­ scribing their causal judgments. According to Lewis, to state that the patient’s hyperten­ sion is causally dependent on his gaining weight is to hold (2) together with (3). Suppos­ ing the patient continues to gain weight and gets hypertension, the actual world is itself the closest world in which this antecedent is true for Stalnaker and Lewis, and (1) and (2) will be true. In this case, Lewis’s analysis of causation implies that the patient’s hyperten­ sion is causally dependent on his gaining weight if and only if (3) is also true. Lewis tried to identify the closest possible world, or worlds, by appealing to the overall qualitative similarity between worlds and not to causal relations as (p. 315) basic. For Lewis, conditional (1) is false of a young man who is slowly recovering from anorexia: a possible world in which he continues to gain weight and he does not get hypertension is more similar to the actual world than another possible world in which he continues to gain weight and he does get hypertension. On the other hand, (1) is true of an overweight older man: a possible world in which he continues to gain weight and he gets hyperten­ sion is more similar to the actual world than another possible world in which he continues to gain weight and he does not get hypertension. Lewis’s analysis of counterfactuals has positive features, but his attempt to explain coun­ terfactual “closeness” in terms of similarity judgments, without reference to causation, has been criticized by Fine (1975) and others using examples broadly like the following. Suppose that Mr. White’s car accident in the actual world, while taking the unusual route home, had many serious consequences. Other people died in the accident, and he had to spend months in the hospital recovering from his injuries. He was full of regret, became depressed, lost his job, was divorced by his wife, and committed suicide. It would appear to be highly probable that (10) If Mr. White had taken his usual route home, he would have arrived safely. But how could that be, by Lewis’s similarity account of the closeness of possibilities? The possible world in which Mr. White takes his normal route home and has a similar acci­ dent, with similar consequences, appears to be more similar to the actual world than a world in which he takes his normal route home and arrives there safely, with no negative results. This similarity judgment would imply that (10) is false or at least highly improba­ ble. Lewis could not reply that a world in which Mr. White takes his normal route and has a similar accident is not very similar to the actual world because it violates a causal rela­ tion between his taking his normal, well-practiced route and having a safe journey. Lewis could not argue in a circle. He was supposed to be explaining causation in terms of coun­ terfactuals, and counterfactuals in terms of qualitative similarity between worlds. He could not then turn around and try to explain similarity in terms of causation (see Lewis, Page 13 of 33

Causation and the Probability of Causal Conditionals 1979, for his attempt to overcome the problems of his similarity account of closeness, and Bennett, 2003, and Pearl, 2000, 2013, for critical comment). Pearl (2000) rejects Lewis’s similarity account, with its problems, and presents a highly influential alternative. Pearl does not accept the counterfactual analysis of causation, but goes the other way and aims to build a theory of counterfactuals, and of similarity be­ tween possible worlds, on causation. It is causation, for Pearl, that is fundamental (Pearl, 2000, pp. xii–xiv). He holds that there are causal mechanisms in the world, which can be modeled in Bayesian networks, or Bayes nets (Rottman, Chapter 6 in this volume; Rottman & Hastie, 2014; Sloman, 2005). His models can be used to evaluate counterfac­ tuals. In Pearl’s version of the Ramsey test, a counterfactual is evaluated by a minimal change, termed an intervention, to the representation of an underlying causal mecha­ nism. The antecedent of the counterfactual is represented as holding, and then a judg­ ment can be made about whether, or to what extent, the consequent follows (Pearl, 2013). Pearl does not have Lewis’s problems in the assessment of (10). Underlying (10) for Pearl is a causal mechanism that makes (10) true, or at least highly probable. We might say in­ formally that Mr. White has the ability, and more broadly the disposition, to drive careful­ ly on his usual route home. Although dispositional accounts of causation appear different from theories like Pearl’s, they can be combined in a hybrid model (Mayrhofer & Wald­ mann, 2015). Pearl’s causal mechanism, for careful driving under normal conditions, might also be partially identified with the physical basis of a disposition for careful dri­ ving, but in any case, the mechanism justifies our judgment about (10) in Pearl’s account. The possible world in which Mr. White takes his normal route home and has a similar ac­ cident, with similar consequences, is not more similar for Pearl to the actual world than a possible world in which Mr. White takes his normal route home and arrives safely. The causal mechanism makes the latter possibility more similar to, and so closer to, the actual world. Pearl uses causal mechanisms and interventions to define the similarity and close­ ness of possibilities (2013). Consider in more detail how people might make a judgment about (10). They could repre­ sent an underlying mechanism in which Mr. White first makes a choice between going home by his normal route home or an alternative for a change of scenery. His mental and physical abilities in the former possibility are represented as making a safe journey home much more highly probable than in the latter possibility. In the example, Mr. White has not taken his normal route and has had an accident, perhaps due to being distracted by the scenery. People could (p. 316) hypothetically “undo” this actual state of affairs in Pearl’s Ramsey test for (10). They could suppose, in a hypothetical intervention in the represented mechanism, that he takes his normal route, with the more highly probable consequence of less distraction and a safe trip home. The result of this procedure would be high confidence in (10). As Pearl (2013) notes, there is mixed psychological evidence for this account of people’s judgments about counterfactuals (Sloman & Lagnado, 2005; Rips, 2010; Rips & Edwards, 2013), and more needs to be done to test it.

Page 14 of 33

Causation and the Probability of Causal Conditionals Lewis and Pearl differ on a number of basic points, but not all. Natural language condi­ tionals in Pearl’s theory are non-truth functional, as in Lewis’s system, and Pearl proves that his formal system is identical (with a technical qualification) to Lewis’s (Pearl, 2000, pp. 240–242). Lewis and Pearl have different theoretical accounts of the “closeness” of possibilities, and they disagree on the probabilities of many counterfactuals, but they agree about the probabilities of others for related reasons, as explained in the next sec­ tion.

Imaging As Ramsey (1929/1990) originally stated what came later to be called the Ramsey test, it was a procedure for judging the conditional subjective probability of q given p. His exam­ ple was of two people arguing about if p then q, and he held that they are “fixing their de­ grees of belief” in q given p, P(q|p), when they hypothetically suppose p and, on that ba­ sis, make a judgment about q. Of course, this suggested procedure has to be backed up by accounts of the ways in which P(q|p) can be determined (some possibilities are dis­ cussed in Fernbach & Erb, 2013, and Evans & Over, 2004). Stalnaker (1970) tried to argue that the probability of the conditional in his theory, as fixed by his version of the Ramsey test, is the conditional probability, P(if p then q) = P(q| p). This identity, between the probability of the natural language conditional and the con­ ditional probability, has been called the Equation. It has far-reaching consequences, not only for theories of the natural language conditional, but for Bayesian accounts of reason­ ing in general (Edgington, 1995; Evans et al., 2015; Oaksford & Chater, 2007). However, Lewis (1976) proved that the probability of the conditional in theories like his and Stalnaker’s cannot, in general, be the conditional probability. In these theories, the Equa­ tion, P(if p then q) = P(q|p), will fail to hold in general (see Douven & Dietz, 2011, for a re­ cent analysis of Lewis’s proof). Lewis’s result can best be illustrated using another conditional about the overweight pa­ tient: (11) If his blood pressure were normal, then his cholesterol would be normal. Let (11) be if p then q, and consider using the Ramsey test to assess the conditional prob­ ability of q given p, P(q|p). In supposing p in Ramsey’s original version of the test, we ig­ nore not-p possibilities as irrelevant. In this example, we could be fairly confident in q under the supposition of p. There is a positive correlation between blood pressure and cholesterol levels, with common causes of both in diet, exercise, and other lifestyle fac­ tors (setting aside, for simplicity, high cholesterol as a causal factor in high blood pres­ sure). However, in theories like Stalnaker’s or Lewis’s, the not-p possibilities are not irrelevant for judging the probability of if p then q. In these theories, if p then q will be true, or al­ ternatively false, in not-p possibilities, and its probability, P(if p then q), will be the sum of Page 15 of 33

Causation and the Probability of Causal Conditionals the probabilities of the possibilities in which it is true. The possible worlds can be divided into worlds in which p & q is true, worlds in which p & not-q is true, worlds in which not-p & q is true, and worlds in which not-p & not-q is true. To simplify for illustration, and to have a plausible example for the severely limited processing abilities of human beings, suppose these four high-level possibilities are not represented in more detail. A Stalnaker or Lewis conditional, if p then q, is true in the p & q possibility, and false in the p & not-q possibility. Supposing if p then q is true in the two not-p possibilities, P(if p then q) will equal P(p & q) + P(not-p). Supposing if p then q is false in the two not-p possibilities, P(if p then q) will equal P(p & q). When if p then q is true in just one of the not-p possibilities, say, not-p & q, P(if p then q) will be P(p & q) + P(not-p & q). In all cases, P(if p then q) will not usually equal P(q|p). We can say more about (11), symbolized as if p then q, in a more concrete and informal il­ lustration. Consider a patient whose medical tests indicate that he probably has high blood pressure and high cholesterol: P(not-p & not-q) is high. But although his doctors are “close” to prescribing blood pressure–lowering medication, they know that he cannot be given cholesterol-lowering medication because he (p. 317) would suffer badly from its particular side effects. A possibility in which the patient is given blood pressure–lowering, but not cholesterol-lowering, medication is relatively more similar to the actual state of affairs than the possibility in which the doctors knowingly give the patient medication that would be harmful to him. That makes (11), as a Stalnaker or Lewis conditional, false in the highly probable not-p & not-q possibility in this example. The result is that the probability of (11), P(if p then q), could be significantly lower than the conditional proba­ bility, P(q|p), inferred from the general correlation between low blood pressure and low cholesterol. Lewis (1976) introduces a technical term, imaging, to clarify the probability of condition­ als in theories like his or Stalnaker’s. To illustrate imaging, continue with our example us­ ing (11), in which P(not-p & not-q) is high, and this not-p & not-q possibility is “closer” to the p & not-q possibility (the patient gets blood pressure–lowering medication but not cholesterol-lowering medication) than to the p & q possibility (the patient gets both med­ ications). Lewis says that we are imaging on p when we suppose that p holds and do the following in this example. We transfer the high probability of not-p & not-q, P(not-p & notq), to the probability of p & not-q as the “closer” possibility, revising P(p & not-q) upward, making it higher than P(p & q). With P(p & not-q) now relatively high, and P(p & q) relatively low, P(q) will become relatively low. The probability of a Stalnaker or Lewis con­ ditional, P(if p then q), is the revised probability of q after imaging on p. As already ex­ plained, P(q) will become relatively low after imaging in our example, making P(if p then q) low. In contrast, to assess the conditional probability of q given p, P(q|p), in our exam­ ple, we use Ramsey’s original version of his test, supposing p, ignoring the not-p possibilities, and using the correlation between p and q to infer a high P(q|p) (see Baratgin & Politzer, 2010, and Zhao & Osherson, 2014, for applications of imaging in the psycholo­ gy of reasoning).

Page 16 of 33

Causation and the Probability of Causal Conditionals Pearl (2000, 2013) has a different but related account of imaging to go with his theory of counterfactuals. Consider a simple representation of a causal mechanism, in which an un­ derlying common lifestyle cause affects the blood pressure level and the cholesterol level. Different lifestyles make certain blood pressure levels and certain cholesterol levels prob­ able to various degrees. To evaluate (11) using Pearl’s version of the Ramsey test, we imagine an intervention in the representation of the mechanism that makes the blood pressure level normal. Doctors could intervene with blood pressure–lowering medication for the patient with high blood pressure and high cholesterol. This treatment would, with a high probability depending on the effectiveness of the medication, lower his blood pres­ sure, but would not affect his underlying lifestyle or his cholesterol level, making (11) im­ probable. The conditional probability, P(q|p), which Pearl calls “conventional” Bayesian conditional­ ization, again comes from supposing p in Ramsey’s original version of his test, and P(q|p) remains fairly high for Pearl in our example. Owing to the correlation between blood pressure and cholesterol levels, observing a measure of one can justify an inference to the other by “conventional” Bayesian conditionalization. But observation is not the same as intervention for Pearl, and it is intervention that is used in Pearl’s account of counter­ factuals. In this example, the Equation, P(if p then q) = P(q|p), fails for Pearl, with P(if p then q) low and P(q|p) high. Pearl defines a do operator for referring to interventions: P(q|do(p)) is the probability of q after an intervention to make p hold in Pearl’s version of the Ramsey test. In Pearl’s theory, P(if p then q) = P(q|do(p)), but P(q|do (p)) is an imaging operation and not always the same as P(q|p). In Pearl’s terminology, P(q|p) is the revised probability of q after “conventional” Bayesian conditionalization on p, but P(q|do p) is the revised proba­ bility of q after imaging on p (i.e., after an intervention to make p hold; Pearl, 2000, 2013). Pearl’s imaging is not generally the same as Lewis’s. Suppose p is a cause of q, and not merely correlated with q. For an example, let (10) be if p then q and recall that, due to Mr. White’s well-practiced abilities, p is a cause of q in this example. Thanks to this strong causal relation between p and q, the “conventional” conditional probability of q given p, P(q|p), and an intervention to make p hold, P(q|do(p)), are both high and P(q|do(p)) = P(q| p) in Pearl’s account, with of course P(if p then q) equally high. Here P(if p then q) = P(q| do(p)) = P(q|p), and the Equation is satisfied. More generally, P(q|do(p)) = P(q|p), and the Equation is upheld for Pearl, when p causes q. The Equation fails much more generally for Lewis. We have seen earlier that Lewis’s defi­ nition of similarity between possible worlds implies that (10) (p. 318) is definitely false or at least highly improbable: P(if p then q) must be low for Lewis. Yet P(q|p) is high for Lewis as well. Mr. White’s driving abilities being what they are, he has a safe journey home in almost all the possible worlds that can be conceived in which he takes his usual route home. More generally, the closest possible world in which p holds to a not-p world always gets the probability of the not-p world in Lewis’s account of imaging on p, but the probabilities of not-p worlds have no effect on P(q|p), as we also noted earlier. Page 17 of 33

Causation and the Probability of Causal Conditionals Setting normative disputes to one side, we can ask which theory, Lewis’s or Pearl’s, gives a better description of people’s probability judgments about conditionals. If Lewis is right, the Equation will generally fail to hold for P(if p then q). If Pearl is right, the Equa­ tion will hold when p is a cause of q, and not when p and q are only correlated. But these are not the only possibilities. There is another account of conditionals that implies that the Equation will always hold.

The Probability Conditional Lewis’s proof, of the failure of the Equation P(if p then q) = P(q|p) for theories of the con­ ditional like his and Stalnaker’s, assumes that conditionals are true or false at every pos­ sibility. However, there is a view going back to de Finetti (1936/1935, 1937/1964) that conditionals cannot always be classified as “true” or “false” (not anyway in an objective sense), but sometimes at least can only be assigned subjective conditional probabilities. Consider this indicative causal conditional: (12) If his blood pressure is normal, then his cholesterol is normal. In de Finetti’s analysis, (12) is “true” when the patient’s blood pressure and cholesterol are normal, is “false” when his blood pressure is normal and his cholesterol is not, and is “void” when his blood pressure is not normal. In all cases, the probability of (12) is the conditional probability that the patient’s cholesterol is normal given that his blood pres­ sure is normal. In general, an indicative conditional if p then q is true when p & q is true, false when p & not-q is true, and “void” when not-p is true (see Table 18.4, the de Finetti table). There is a long tradition of experimental research in the psychology of reasoning, finding that people do produce three-valued truth tables like de Finetti’s (see Evans & Over, 2004, for a review of early research on “defective” truth tables, and Baratgin, Over, & Politzer, 2013, and Politzer et al., 2010, for recent experiments.) Table 18.4 The de Finetti Table for if p then q q p

1

0

1

l

0

0

V

V

1 = true, 0 = false, and V = void. The indicative conditional in de Finetti’s theory has a parallel relation to a conditional bet on if p then q, which is won when p & q is true, lost when p & not-q is true, and “void,” or called off, when not-p is true. The probability of the conditional, P(if p then q), and the probability of winning the conditional bet, is always the conditional probability, P(q|p). This is the probability of the p & q outcome assuming that there has been a non-void as­ Page 18 of 33

Causation and the Probability of Causal Conditionals sertion or bet: that p holds (Baratgin et al., 2013, and Politzer et al., 2010, have confirma­ tion of this parallel relationship). After reliable tests indicate that the patient’s blood pressure and cholesterol are normal, (12) is established as true, and we win our bet on it. After reliable tests show that the patient’s blood pressure is normal but his cholesterol is not, (12) is established as false, and we lose a bet on it. After his blood pressure is found to be abnormal, however, the in­ dicative conditional (12) is “void,” and a bet we made on it is equally “void” and neither won nor lost. Finding that his blood pressure is abnormal, we would switch to the coun­ terfactual (11) to express our thought, and as a counterfactual, (11) is not made objective­ ly true, or false, by an actual fact. But we can make a subjective probability judgment about (11), and that should, by de Finetti’s account, equal the conditional probability that the patient’s cholesterol is normal given that his blood pressure is normal. Jeffrey (1991) pointed out that de Finetti’s theory can be represented by a table in which the “void” outcome is replaced by the subjective conditional probability itself. What might be called a Jeffrey table, Table 18.5, can be constructed where are all the entries could be interpreted as probabilities (Baratgin et al., 2013; Over & Cruz, in press). When we know p & q for sure, P(p & q) = 1, and P(if p then q) = P(q|p) =1. When we know p & not-q for sure, P(p & not-q) = 1, and P(if p then q) = P(q|p) = 0. When we know not-p for sure, (p. 319) P(not-p) = 1, but the Ramsey test can be used, by supposing that p does hold, to make a judgment that P(q|p) = x, and then P(if p then q) = x by the Equation. Note that, if p then q can be certain by the Ramsey test, P(if p then q) = P(q|p) = 1, even when not-p is known for sure, and then the p & q and not-p cells of the Jeffrey table will contain 1. For example, “If Anne of Cleves had had her head cut off, she would have died immediately.” This conditional has probability 1, even though we know that Anne of Cleves did not have her head cut off. In this type of example, the probability of 1 arguably corresponds to an “objective” truth for those, like Pearl, who take causation to be a fundamental relation in the world. There is the strongest possible causal connection between decapitation and immediate death. Table 18.5 The Jeffrey Table for if p then q q p

l

0

l

l

0

0

P(q|p)

P(q|p)

1 = true, 0 = false, and P(q|p) = the subjective conditional probability of q given p. De Finetti used the technical term conditional event for a conditional that satisfied his ta­ ble and what we would now call the Equation. This conditional can also be called the probability conditional (Adams, 1998). It and the associated Equation are central to re­ Page 19 of 33

Causation and the Probability of Causal Conditionals cent Bayesian and probabilistic accounts of conditional reasoning in the psychology of reasoning (Evans & Over, 2013; Gilio & Over, 2012; Oaksford & Chater, 2007; Over, Evans, & Elqayam, 2010; Pfeifer & Kleiter, 2010). Given the Equation, P(if p then q) can be replaced in reasoning under uncertainty by P(q|p) and Bayesian principles applied (see also Oaksford & Chater, 2013, on dynamic reasoning and belief updating). The Equation, a normative principle, can be tested experimentally as the conditional probability hypothesis: the prediction that people’s probability judgments about condi­ tionals will be that P(if p then q) = P(q|p). Many experiments on this hypothesis have fo­ cused on artificial conditionals about given frequency distributions, for example, “If the card is yellow then it has a circle printed on it,” referring to a randomly selected card from a given pack of yellow or red cards with circles or diamonds on them (Evans, Hand­ ley, & Over, 2003; Evans, Handley, & Neilens, & Over, 2007; Oberauer & Wilhelm, 2003; Politzer et al., 2010; Wijnbergen-Huitink, Elqayam, & Over, 2014). These experiments find that most people’s responses are consistent with the conditional probability hypothesis. Byrne and Johnson-Laird (2010) and Girotto and Johnson-Laird (2010) try to counter this evidence, but their arguments are vitiated by modal fallacies, as shown by Over, Douven, and Verbrugge (2013) and Politzer et al. (2010). There is a minority conjunctive response P(if p then q) = P(p & q) for this kind of artificial conditional, but it declines and is re­ placed by P(if p then q) = P(q|p) as more and more probability judgements are made (Fu­ gard, Pfeifer, Mayerhofer, & Kleiter, 2011). There is a study of artificial causal conditionals, for example, “If the lever 2 is down, the rabbit’s cage is open,” with arbitrary frequency information showing a number of times in which the lever is, or is not, down, and the rabbit’s cage is, or is not, open (Barrouillet & Gauffroy, 2015). In this study, participants sometimes respond to the conditional as if it were the biconditional event (Fugard et al., 2011), which is effectively the conjunction of two conditional events, both of which satisfy the Equation (Over & Cruz, in press). But more relevant to our topic in this chapter are experiments on the conditional probability hypothesis and realistic causal and diagnostic conditionals. These test whether theories like those of Lewis and Pearl are descriptive of people’s probability judgments about these conditionals. Another more straightforward hypothesis about causal conditionals can also be tested ex­ perimentally. The delta-p rule, P(q|p)–P(q|not-p), can be used to measure the extent to which p raises the probability of q (Allan, 1980). Of course, p can raise the probability of q, P(q|p)–P(q|not-p) is positive, when p is not a cause of q, but there is only a positive cor­ relation between p and q. Nevertheless, a causal conditional if p then q might simply mean that p raises the probability of q, P(q|p) > P(q|not-p). This semantic position leads to the descriptive delta-p rule hypothesis: that if p then q is probable for people to the extent that p raises the probability of q. The hypothesis implies that P(q|not-p) will have a nega­ tive effect on people’s judgments about P(if p then q), while P(q|p) has a positive effect (Over et al., 2007).

Page 20 of 33

Causation and the Probability of Causal Conditionals

Experiments on the Probabilities of Realistic Causal Conditionals Over et al. (2007) investigated people’s probability judgments about indicative condition­ als, in (p. 320) two experiments, and about counterfactuals, in one experiment. All the con­ ditionals referred to possible causes and effects and so were causal conditionals. An indicative example used in Experiments 1 and 2 was (13) If global warming continues, then London will be flooded. The participants were asked to make explicit probability judgments about such condition­ als. To measure their implicit conditional probability judgments, they were also asked to complete a probability truth table task, making probability judgments (summing to 100%) about the four conjunctive possibilities in the four cells of the truth table. For (13), these conjunctions were Global warming continues and London is flooded. Global warming continues and London is not flooded. Global warming does not continue and London is flooded. Global warming does not continue and London is not flooded. The example of a counterfactual used to introduce Experiment 3 was: (14) If the cost of gasoline had increased, then traffic congestion would have im­ proved. The participants made probability judgments about counterfactuals in the experiment, but the probabilistic truth table task was modified. We proposed that people evaluate a counterfactual by recalling a point in time at which they did not know that the antecedent would turn out to be false, and then using the Ramsey test to infer the conditional proba­ bility at that point (see also Evans & Over, 2004). With (13) as our example, we could ob­ tain the implicit conditional probabilities by asking the participants to report their proba­ bility judgments (again summing to 100%) for the four cells of the truth table at a time five years in the past: The cost of gasoline increases and traffic congestion improves. The cost of gasoline increases and traffic congestion does not improve. The cost of gasoline does not increase and traffic congestion improves. The cost of gasoline does not increase and traffic congestion does not improve.

Page 21 of 33

Causation and the Probability of Causal Conditionals More abstractly, for the three experiments, the participants were asked to make explicit probability judgments about causal conditionals, P(if p then q), and about the linked con­ junctions, P(p & q), P(p & not-q), P(not-p & q), and P(not-p & not-q). From the conjunctive judgments, we derived two implicit conditional probability judgments: P(q|p) = P(p & q)/ ((P(p & q) + P(p & not-q)) and P(q|not-p) = P(not-p & q)/((P(not-p & q) + P(not-p & not-q)). The conditional probability hypothesis is supported if P(if p then q) = P(q|p). Deriving P(q| not-p) as well allowed us to use the delta-p rule, P(q|p)–P(q|not-p), to determine the ex­ tent to which p raised the probability of q for the participants and to test the delta-p rule hypothesis. In all three experiments, the conditional probability, P(q|p), was the best predictor of the probability of the conditional, P(if p then q), for both causal indicatives and counterfactu­ als. The participants’ responses were inconsistent with holding that the probability of these conditionals is that of the material conditional, P(if p then q) = P(not-p or q), and the conjunctive response, P(if p then q) = P(p & q), was not at all present for these realis­ tic examples. There was a small negative effect of P(q|not-p) on P(if p then q), suggesting some support for the delta-p rule hypothesis. Singmann et al. (2014), in a further experi­ ment on indicative causal conditionals, used a linear mixed model analysis and found that only P(q|p) was a significant predictor of P(if p then q). These findings support the condi­ tional probability hypothesis (but see Skovgaard-Olsen, Singmann, & Klauer, 2016, and Over & Cruz, in press, for comment). They do not support a Stalnaker or Lewis account of how people understand causal conditionals (see additionally Feeney & Handley, 2011, and Haigh et al., 2013). Pearl’s theory is also supported, insofar as it implies that P(if p then q) = P(q|do(p)) = P(q|p) for the type of conditional used in these experiments, of the form if cause then effect. Pearl’s theory can be investigated further by studying diagnostic conditionals of the form if effect then cause. An example would be (15) If the house is warm, then the heating system has been repaired. Symbolizing (15) as if p then q, the conditional probability hypothesis still implies, of course, that P(if p then q) = P(q|p). However, applying Pearl’s imaging account to (15) im­ plies that P(q|do(p)) = P(q), the revised probability of q after imaging on p is the un­ changed P(q). In the diagnostic conditional (15), the consequent q is a possible cause of the antecedent p, and an intervention to make p hold will not (p. 321) affect q, by Pearl’s account. For example, we could intervene to make the house warm by burning coal in its fireplaces, and that would not change the probability that the heating system had been repaired.

Page 22 of 33

Causation and the Probability of Causal Conditionals In an experiment on diagnostic conditionals like (15), Evans, Handley, Hadjichristidis, Thompson, Over, and Bennett (2007) found that people did judge P(if p then q) = P(q|p), supporting the conditional probability hypothesis and not the application of Pearl’s theo­ ry. It might be objected, fairly, that Pearl’s account is not meant to apply to indicative con­ ditionals like (15), but rather counterfactual conditionals like (16) If the house were warm, then the heating system would have been repaired. Evans et al. (2007) did not have an experiment on counterfactuals, but only on diagnostic indicative conditionals. It might be the case that, in an experiment on (16), people would judge that P(if p then q) = P(q|do(p)) = P(q). On the other hand, it might be that (15) and (16) are not very different from each other, and that people will choose to assert (15) or, alternatively, (16), depending on how confident they are in the antecedent p. When they believe p to be relatively unlikely, they will use (16) rather than (15), and there will be no difference between the overall probabilities of these conditionals. Participants in an ex­ periment would no doubt find both of the following conditionals improbable: (17) If the house is being warmed by coal fires, then the heating system has been re­ paired. (18) If the house were being warmed by coal fires, then the heating system would have been repaired. Of course, people’s responses to (17) and (18) might be different from their responses to (15) and (16), which might both be thought probable. To settle the important questions here, there will have to be experiments comparing people’s probability judgments about diagnostic indicative conditionals and the associated counterfactuals (see also Fernbach, Darlow, & Sloman, 2011, and Meder, Mayrhofer, & Waldmann, 2014, for relevant findings on diagnostic causal reasoning). In technical terms, (16) and (18) are backward or backtracking counterfactuals (Bennett, 2003; Lewis, 1979). In causal cases, these counterfactuals relate an effect in the an­ tecedent to its possible cause in the consequent. Pearl’s theory implies that (16) and (18) are false or at least improbable. There are, however, experiments showing that partici­ pants sometimes endorse counterfactual backtracking in violation of Pearl’s theory. Con­ sider a machine in which the operation of component A always causes component B to op­ erate. When participants in an experiment are asked, “If B were not operating, would A be operating?” they tend to reply “No,” implying the truth of the backtracking, “If B had been operating, then A would not have been operating” (Rips, 2010; Rips & Edwards, 2013; see also Sloman & Lagnado, 2005). Some of these results suggest that a modifica­ tion of Pearl’s approach might be more descriptive of people’s responses (see Lucas & Kemp, 2012, for one possibility). But these experiments need to be supplemented by stud­ ies investigating and comparing more fully the probabilities that participants assign to di­ agnostic indicative conditionals and the associated counterfactuals. Are these probabili­ ties generally consistent with “conventional” Bayesian conditionalization and so with the

Page 23 of 33

Causation and the Probability of Causal Conditionals conditional probability hypothesis? Or are they more accurately described as the result of imaging according to Pearl’s theory or a modification of it? The theories of conditionals considered so far imply the validity of an inference we will simply term centering (rather than the usual mouthful “conjunctive sufficiency”): infer­ ring the conditional if p then q (whether indicative or counterfactual) from the conjunc­ tion p & q, or from the two premises p and q (see Lewis, 1973b, on precisely what he calls centering). The material conditional is equivalent to not-p or q and obviously follows valid­ ly from p & q. Supposing the Equation holds, P(if p then q) = P(q|p), centering is also probabilistically valid. The probability of if p then q could not be coherently less than the probability of p & q: P(if p then q) ≥ P(p)P(q|p) = P(p & q). In the systems of Stalnaker and Lewis, when p & q is true, the closest world in which p is true is the actual world, and with q true as well, if p then q will follow. Even more broadly, the validity of centering fol­ lows from holding that if p then q is true when q is true after a minimal change to make p true. In Pearl’s system, when p & q is true, a minimal “intervention” has already taken place (by someone or other or nature itself) to make p true, and q has followed, and so if p then q will follow (Pearl, 2000, p. 241). But though many theorists have endorsed center­ ing as an inference, and some of these have proposed a counterfactual analysis of causa­ tion, it is open to question whether there can be a very close relation between causation and counterfactuals if centering is valid. For a close relation between the two, it is ar­ guable that if p then q should validly follow, not from the contingent truth of p & q alone, but rather from the stronger premise that p & (p. 322) q is necessarily true, for some fairly strong notion of necessity (Kratzer, 2012, has a theory of this general type). Another pos­ sibility is to argue that there must be an inferential connection between p and q, beyond the truth of p & q, for if p then q to hold (Douven, 2015; Krzyzanowska, & Wenmackers, & Douven, 2013). It is certainly open to doubt whether ordinary people would endorse centering for coun­ terfactuals in experiments. Consider the following conjunction that most of us believe to be true and then the conditional after it: (19) Kennedy was assassinated and Oswald shot him. (20) If Kennedy had been assassinated, then Oswald would have shot him. Counterfactual (20) follows validly from the conjunction (19) by centering. There are good pragmatic reasons for objecting to the use of this inference in communication. Speakers who asserted (20) when they knew (19) could certainly mislead their hearers. For exam­ ple, a teacher who used (20) in a lecture to young pupils who knew little about the history of the 1960s could be said to fail to facilitate their understanding. However, people might still conform to centering in their subjective degrees of belief, where there are no “speak­ ers” or “hearers,” and pragmatic factors might not operate. The conclusion of a valid inference with one premise cannot be less probable, normative­ ly, than its premise, and centering implies that P(p & q) ≤ P(if p then q). There is evi­ dence that people do conform in their probability judgments to centering for indicative conditionals (Cruz, Baratgin, Oaksford, & Over, 2015; Cruz, Over, Oaksford, & Baratgin, Page 24 of 33

Causation and the Probability of Causal Conditionals 2016; Politzer & Baratgin, 2015), but there are as yet no studies of whether they do so for counterfactuals. Such studies would be a strong test of the wide range of logical, philo­ sophical, and psychological theories of counterfactuals that imply centering as valid infer­ ence form.

Conclusions There is some relation of considerable interest between conditionals of a certain type and causation. These causal conditionals refer to possible causes and effects in their an­ tecedents and consequents and are of an indicative or counterfactual form. It might be that the conditionals are fundamental and should be used in a counterfactual analysis of causation, or the converse might hold, with causation more fundamental and necessary for a theory of the conditionals. If the Equation, P(if p then q) = P(q|p), holds generally for counterfactual as well as in­ dicative conditionals, then a full counterfactual analysis of causation is precluded. The probabilities of the two counterfactuals used in the analysis, P(if p then q) and P(if not-p then not-q), will equal P(q|p) and P(not-q|not-p), respectively. When P(q|p) and P(not-q|notp) are both high, P(q|p) is high and P(q|not-p) is low, but such a difference in the delta-p rule, P(q|p)–P(q|not-p), is not enough for the probability of p causes q to be high. The statement that p causes q is stronger than the statement that P(q|p) > P(q|not-p), which only implies a positive correlation between p and q. The Equation has been highly con­ firmed as the descriptive conditional probability hypothesis for a wide range of indicative conditionals, but only limited experimental research has so far been done on whether it holds generally for counterfactuals. The Equation does not generally hold in Pearl’s theory, but his theory, or a modification of it, might still better describe people’s probability judgments about counterfactuals than the conditional probability hypothesis. If that is so, a counterfactual analysis of causation cannot, again, be given of people’s understanding of causation. This analysis implies that counterfactuals can be used to explain causation, as in Lewis’s theory. But in Pearl’s gen­ eral type of theory, causation is more fundamental than counterfactuals and is used to ex­ plain them. There have only been very limited studies of inference forms for counterfactuals as premises or conclusions. There are no studies of how possible disabling conditions for counterfactuals affect the confidence that people have in the conclusions of MP and other inferences for counterfactuals. Research on the inference form of centering, inferring if p then q from p & q, has only just begun for indicative conditionals, and has not yet been extended to counterfactuals. More progress can only be made on understanding the rela­ tion between causation and counterfactuals if these limitations are overcome in the psy­ chology of reasoning.

Page 25 of 33

Causation and the Probability of Causal Conditionals

Acknowledgments I am very grateful to Nicole Cruz, Guy Politzer, Bob Rehder, Lance Rips, Benjamin Rottman, and Michael Waldmann for comments on a draft of this chapter.

References Adams, E. (1970). Subjunctive and indicative conditionals. Foundations of Language, 6, 89–94. (p. 323)

Adams, E. (1998). A primer of probability logic. Stanford, CA: CLSI Publications.

Ali, N., Chater, N., & Oaksford, M. (2011). The mental representation of causal condition­ al reasoning: Mental models or causal models. Cognition, 119, 403–418. Allan, L. G. (1980). A note on the measurement of contingency between two binary vari­ ables in judgment tasks. Bulletin of the Psychometric Society, 15, 147–149. Arló-Costa, H. (2014). The logic of conditionals. In Edward N. Zalta (Ed.), The Stanford encyclopedia of philosophy. http://plato.stanford.edu/archives/sum2014/entries/log­ ic-conditionals/. Baratgin, J., Douven, I., Evans, J. St. B. T., Oaksford, M., Over, D. E., & Politzer, G. (2015). The new paradigm and mental models. Trends in Cognitive Sciences, 19(10), 547–548. Baratgin, J., & Politzer, G. (2010). Updating: A psychological basic situation of probability revision. Thinking & Reasoning, 16, 245–287. Baratgin, J., Over, D. E., & Politzer, G. (2013). Uncertainty and de Finetti tables. Thinking & Reasoning, 19, 308–328. Barrouillet, P., & Gauffroy, C. (2015). Probability in reasoning: A developmental test on conditionals. Cognition, 137, 22–39. Beck, S. R., Riggs, K. J., & Burns, P. (2011). Multiple developments in counterfactual thinking. In Christoph Hoerl, Tessa McCormack, & Sarah R. Beck (Eds.), Understanding counterfactuals, understanding causation (110–122). Oxford: Oxford University Press. Bennett, J. (2003). A philosophical guide to conditionals. Oxford: Oxford University Press. Bonnefon, J. F., & Sloman, S. A. (2013). The causal structure of utility conditionals. Cogni­ tive Science, 37, 193–209. Byrne, R. M. J. (1989). Suppressing valid inferences with conditionals. Cognition, 31, 61– 83. Byrne, R. M. J., & Johnson-Laird, P. N. (2009). “If” and the problems of conditional reason­ ing. Trends in Cognitive Science, 13, 282–287.

Page 26 of 33

Causation and the Probability of Causal Conditionals Byrne, R. M. J., & Johnson-Laird, P. N. (2010). Conditionals and possibilities. In M. Oaks­ ford & N. Chater (Eds.), Cognition and conditionals: Probability and logic in human thought (pp. 55–68). Oxford: Oxford University Press. Cariani, F., & Rips, L. J. (in press). Conditionals, context, and the suppression effect. Cog­ nitive Science. Chater, N., and Oaksford, M. (Eds.) (2008). The probabilistic mind: Prospects for Bayesian cognitive science. Oxford: Oxford University Press. Cruz, N., Baratgin, J., Oaksford, M., & Over, D. E. (2015). Bayesian reasoning with ifs and ands and ors. Frontiers in Psychology, 6, 192. Cruz, N., Over, D., Oaksford, M., & Baratgin, J. (2016). Centering and the meaning of con­ ditionals. In A. Papafragou, D. Grodner, D. Mirman, & J. C. Trueswell (Eds.), Proceedings of the 38th Annual Conference of the Cognitive Science Society (pp. 1104–1109). Austin, US: Cognitive Science Society. Cummins, D. D. (1995). Naive theories and causal deduction. Memory & Cognition, 23, 646–658. Cummins, D. D., Lubart, T., Alksnis, O., & Rist, R. (1991). Conditional reasoning and cau­ sation. Memory & Cognition, 19, 274–282. de Finetti, B. (1995). The logic of probability (translation of 1936 original). Translated in R. B. Angell, The logic of probability. Philosophical Studies, 77, 181–190. de Finetti, B. (1964). Foresight: Its logical laws, its subjective sources (translation of 1937 original). In H. E. Kyburg & H. E. Smokier (Eds.), Studies in subjective probability (pp. 55–118). New York: Wiley. De Neys, W. (2010). Counterexample retrieval and inhibition during conditional reason­ ing: Direct evidence from memory probing. In M. Oaksford & N. Chater (Eds.), Cognition and conditionals: Probability and logic in human thinking (pp. 197–206). Oxford: Oxford University Press. Douven, I. (2015). The epistemology of indicative conditionals. Cambridge: Cambridge University Press. Douven, I., & Dietz, R. (2011). A puzzle about Stalnaker’s hypothesis. Topoi, 30, 31–37. Douven, I., & Verbrugge, S. (2010). The Adams family. Cognition, 117, 302–318. Edgington, D. (1995). On conditionals. Mind, 104, 235–329. Edgington, D. (2014). Indicative conditionals. In Edward N. Zalta (Ed.), The Stanford en­ cyclopedia of philosophy. http://plato.stanford.edu/archives/win2014/entries/condi­ tionals/. Page 27 of 33

Causation and the Probability of Causal Conditionals Elqayam, S., & Over, D. E. (2013). New paradigm psychology of reasoning: An introduc­ tion to the special issue edited by Elqayam, Bonnefon, & Over. Thinking & Reasoning, 19, 249–265. Evans, J. St. B. T., Handley, S., Hadjichristidis, C., Thompson, V., Over, D. E., & Bennett, S. (2007). On the basis of belief in causal and diagnostic conditionals. The Quarterly Journal of Experimental Psychology, 60, 635–643. Evans, J. St. B. T., Handley, S., Neilens, H., & Over, D. E. (2007). Thinking about condition­ als: A study of individual differences. Memory & Cognition, 35, 1772–1784. Evans, J. St. B. T., Handley, S. J., & Over, D. E. (2003). Conditional and conditional proba­ bility. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 321– 335. Evans, J. St. B. T., & Over, D. E. (2004). If. Oxford: Oxford University Press. Evans, J. St. B. T., & Over, D. E. (2013). Reasoning to and from belief: Deduction and in­ duction are still distinct. Thinking & Reasoning, 19, 268–283. Evans, J. St. B. T., Thompson, V., & Over, D. E. (2015). Uncertain deduction and condition­ al reasoning. Frontiers in Psychology, 6, 398. Feeney, A., & Handley, S. (2011). Suppositions, conditionals, and causal claims. In Christoph Hoerl, Tessa McCormack, & Sarah R. Beck (Eds.), Understanding counterfactu­ als, understanding causation (pp. 242–262). Oxford: Oxford Universit y Press. Fernbach, P. M., Darlow, A., & Sloman, S. A. (2011). Asymmetries in predictive and diag­ nostic reasoning. Journal of Experimental Psychology: General, 140, 168–185. Fernbach, P. M., & Erb, C. D. (2013). A quantitative model of causal reasoning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 39, 1327–1343. Fine, K. (1975). Review of Lewis’s Counterfactuals. Mind, 84, 451–458. Fugard, J. B., Pfeifer, N., Mayerhofer, B., & Kleiter, G. D. (2011). How people interpret conditionals: Shifts toward conditional event. Journal of Experimental Psychology: Learn­ ing Memory and Cognition, 37, 635–648. Gerstenberg, T., Goodman, N. D., Lagnado, D. A., & Tenenbaum, J. B. (2014). From coun­ terfactual simulation to causal judgment. In P. Bello, M. Guarini, M. McShane, & B. Scas­ sellati (Eds.), Proceedings of the 36th annual conference of the Cognitive Science Society. Austin, TX: Cognitive Science Society. (p. 324)

Goldvarg, E., & Johnson-Laird, P. N. (2001). Naïve causality: A mental model theo­

ry of causal meaning and reasoning. Cognitive Science, 25, 565–610. Gilio, A., & Over, D. E. (2012). The psychology of inferring conditionals from disjunctions: A probabilistic study. Journal of Mathematical Psychology, 56, 118–131. Page 28 of 33

Causation and the Probability of Causal Conditionals Girotto, V., & Johnson-Laird, P. N. (2010). Conditionals and probability. In M. Oaksford & N. Chater (Eds.), Cognition and conditionals: Probability and logic in human thought (pp. 103–115). Oxford: Oxford University Press. Haigh, M., Stewart, A. J., & Connell, L. (2013). Reasoning as we read: Establishing the probability of causal conditionals. Memory & Cognition, 41, 152–158. Hoerl, C., McCormack, T., & Beck, S. (Eds.) (2011). Understanding causation, understand­ ing counterfactuals. Oxford: Oxford University Press. Jeffrey, R. C. (1991). Matter of fact conditionals. Aristotelian Society Supplementary Vol­ ume, 65, 161–183. Johnson-Laird, P. N., & Byrne, R. M. J. (1991). Deduction. Hove, Sussex; Hillsdale, NJ: Lawrence Erlbaum Associates. Johnson-Laird, P. N., & Byrne, R. M. J. (2002). Conditionals: A theory of meaning, prag­ matics and inference. Psychological Review, 109, 646–678. Johnson-Laird, P. N., Khemlani, S., & Goodwin, G. (2015). Logic, probability, and human reasoning. Trends in Cognitive Science, 19, 201–214. Kahneman, D., & Miller, D. (1986). Norm theory: Comparing reality to its alternatives. Psychological Review, 93, 136–156. Kahneman, D., & Tversky, A. (1982). The simulation heuristic. In D. Kahneman & A. Tver­ sky (Eds.), Judgment under uncertainty: Heuristics and biases (pp. 201–208). New York: Cambridge University Press. Kaufmann, S. (2013). Causal premise semantics. Cognitive Science, 37, 1136–1170. Kratzer, A. (2012). Modality and conditionals. Oxford: Oxford University Press. Krzyzanowska, K., Wenmackers, S., & Douven, I. (2013). Inferential conditionals and evi­ dentiality. Journal of Logic, Language and Information, 22(3), 315–334. Lewis, D. (1973a). Causation. Journal of Philosophy, 70, 556–567. Lewis, D. (1973b). Counterfactuals. Cambridge, MA: Harvard University Press. Lewis, D. (1976). Probabilities of conditionals and conditional probabilities. Philosophical Review, 85, 297–315. Lewis, D. (1979). Counterfactual dependence and time’s arrow. Noûs, 13, 455–476. Lucas, C. G., & Kemp, C. (2012). A unified theory of counterfactual reasoning. In N. Miyake, D. Peebles, & R. P. Cooper (Eds.), Proceedings of the 34th annual conference of the Cognitive Science Society (pp. 707–712). Austin, TX: Cognitive Science Society.

Page 29 of 33

Causation and the Probability of Causal Conditionals Mandel, D. R. (2005). Counterfactual and causal explanation: From early theoretical views to new frontiers. In D. Mandel, D. J. Hilton, & P. Cartellani (Eds.), The psychology of counterfactual thinking (pp. 11–27). London: Routledge. Mandel, D. R., Hilton, D. J., & Catellani, P. (Eds.) (2005). The psychology of counterfactual thinking. London: Routledge. Mayrhofer, R., & Waldmann, M. R. (2015). Agents and causes: Dispositional intuitions as a guide to causal structure. Cognitive Science, 39, 65–95. Meder, B., Mayrhofer, R., & Waldmann, M. R. (2014). Structure induction in diagnostic causal reasoning. Psychological Review, 121, 277–301. Menzies, P. (2014). Counterfactual theories of causation. In Edward N. Zalta (Ed.), The Stanford encyclopedia of philosophy. http://plato.stanford.edu/archives/spr2014/en­ tries/causation-counterfactual/. Oaksford, M, & Chater, N. (2007). Bayesian rationality: The probabilistic approach to hu­ man reasoning. Oxford: Oxford University Press. Oaksford, M., & Chater, N. (Eds.) (2010). Cognition and conditionals: Probability and log­ ic in human thinking. Oxford: Oxford University Press. Oaksford, M., & Chater, N. (2011). Dual systems and dual processes but a single function. In K. I. Manktelow, D. E. Over, & S. Elqayam (Eds.), The science of reason: A Festschrift for Jonathan St. B. T. Evans (pp. 339–351). Hove, Sussex: Psychology Press. Oaksford M., & Chater, N. (2013). Dynamic inference and everyday conditional reasoning in the new paradigm. Thinking & Reasoning, 19, 346–379. Oberauer, K., & Wilhelm, O. (2003). The meaning(s) of conditionals: Conditional probabili­ ties, mental models and personal utilities. Journal of Experimental Psychology: Learning Memory and Cognition, 29, 680–693. Ohm, E., & Thompson, V. A. (2006). Conditional probability and pragmatic conditionals: Dissociating truth and effectiveness. Thinking & Reasoning, 12, 257–280. Over, D. E., & Cruz, N. (in press). Probabilistic accounts of conditional reasoning. In L. J. Ball & V. A. Thompson (Eds.), International handbook of thinking and reasoning. Hove, Sussex: Psychology Press. Over, D. E., Douven, I., & Verbrugge, S. (2013). Scope ambiguities and conditionals. Thinking & Reasoning, 19, 284–307. Over, D. E., Evans, J. St. B. T., & Elqayam, S. (2010). Conditionals and non-constructive reasoning. In M. Oaksford & N. Chater (Eds.), Cognition and conditionals: Probability and logic in human thinking (pp. 135–151). Oxford: Oxford University Press.

Page 30 of 33

Causation and the Probability of Causal Conditionals Over, D. E., Hadjichristidis, C., Evans, J. St. B. T., Handley, S. J., & Sloman, S. A. (2007). The probability of causal conditionals. Cognitive Psychology, 54, 62–97. Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge: Cambridge Uni­ versity Press. Pearl J. (2013). Structural counterfactuals: A brief introduction. Cognitive Science, 37, 977–985. Perner, J., & Rafetseder, E. (2011). Counterfactuals and other forms of conditional reason­ ing: Children lost in the nearest possible world. In C. Hoerl, T. McCormack, & S. R. Beck (Eds.), Understanding counterfactuals, understanding causation (pp. 90–109). Oxford: Ox­ ford University Press. Pfeifer, N., & Kleiter, G. D. (2010). The conditional in mental probability logic. In M. Oaks­ ford & N. Chater (Eds.), Cognition and conditionals: Probability and logic in human think­ ing (pp. 153–173). Oxford: Oxford University Press. Politzer, G. (2005). Uncertainty and the suppression of inferences. Thinking & Reasoning, 11, 5–33. Politzer, G., & Baratgin, J. (2015). Deductive schemas with uncertain premises using qual­ itative probability expressions. Thinking & Reasoning, 22, 78–98. Politzer, G., & Bonnefon, J. F. (2006). Two varieties of conditionals and two kinds of de­ featers help reveal two fundamental types of reasoning. Mind & Language, 21, 484–503. Politzer, G., Over, D. E., & Baratgin, J. (2010). Betting on conditionals. Thinking & Reason­ ing, 16, 172–197. Ramsey, F. P. (1929/1990). General propositions and causality. In D. H. Mellor (Ed.), Philosophical papers (pp. 145–163). Cambridge: Cambridge University Press. (p. 325)

Rehder, B. (2014). Independence and dependence in human causal reasoning. Cognitive Psychology, 72, 54–107. Rips, L. J. (2010). Two causal theories of counterfactual conditionals. Cognitive Science, 34, 175–221. Rips L. J., & Edwards, B. (2013). Inference and explanation in counterfactual reasoning. Cognitive Science, 37, 1107–35. Rottman, B., & Hastie, R. (2014). Reasoning about causal relationships: Inferences on causal networks. Psychological Bulletin, 140, 109–139. Singmann, H., Klauer, K. C., & Over, D. E. (2014). New normative standards of conditional reasoning and the dual-source model. Frontiers in Psychology, 5, 316.

Page 31 of 33

Causation and the Probability of Causal Conditionals Skovgaard-Olsen, N., Singmann, H., & Klauer, K. C. (2016). The relevance effect and con­ ditionals. Cognition, 150, 26–36. Sloman, S. A. (2005). Causal models: How we think about the world and its alternatives. New York: Oxford University Press. Sloman, S. A. (2013). Counterfactuals and causal models: Introduction to the special is­ sue. Cognitive Science, 37, 969–976. Sloman, S. A., Barbey, A. K., & Hotaling, J. M. (2009). A causal model theory of the mean­ ing of cause, enable, and prevent. Cognitive Science, 33, 21–50. Sloman, S. A., & Lagnado, D. (2005). Do we “do?” Cognitive Science, 29, 5–39. Spellman, B. A., Kincannon, A. P., & Stose, S. J. (2005). The relation between counterfac­ tual and causal reasoning. In D. Mandel, D. J. Hilton, & P. Cartellani (Eds.), The psycholo­ gy of counterfactual thinking (pp. 28–43). London: Routledge. Stalnaker, R. (1968). A theory of conditionals. In N. Rescher (Ed.), Studies in logical theo­ ry (pp. 98–112). Oxford: Blackwell. Stalnaker, R. (1970). Probability and conditionals. Philosophy of Science, 37, 64–80. Stalnaker, R. (1975). Indicative conditionals. Philosophia, 5, 269–286. Stevenson, R. J., & Over, D. E. (1995). Deduction from uncertain premises. Quarterly Jour­ nal of Experimental Psychology, 48A, 613–643. Stevenson, R. J., & Over, D. E. (2001). Reasoning from uncertain premises: Effects of ex­ pertise and conversational context. Thinking & Reasoning, 7, 367–390. Teigen, K. H. (2005). When a small difference makes a big difference. In D. Mandel, D. J. Hilton, & P. Cartellani (Eds.), The psychology of counterfactual thinking (pp. 129–146). London: Routledge. Thompson, V. A., & Byrne, R. M. J. (2002). Reasoning counterfactually: Making inferences about things that didn’t happen. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28, 1154–1170. Wijnbergen-Huitink, J. van, Elqayam, S., and Over, D. E. (2014). The probability of iterat­ ed conditionals. Cognitive Science, 1–16. doi:10.1111/cogs.12169. Woodward, J. (2011). Psychological studies of causal and counterfactual reasoning. In C. Hoerl, T. McCormack, & S. R. Beck (Eds.), Understanding counterfactuals, understanding causation (pp. 16–53). Oxford: Oxford University Press. Zeelenberg, M., & van Dijk, E. (2005). On the comparative nature of regret. In D. Mandel, D. J. Hilton, & P. Cartellani (Eds.), The psychology of counterfactual thinking (pp. 147– 161). London: Routledge. Page 32 of 33

Causation and the Probability of Causal Conditionals Zhao, J., & Osherson, D. (2014). Category-based updating. Thinking & Reasoning, 20, 1– 15. (p. 326)

David E. Over

Psychology Department Durham University Durham, England, UK

Page 33 of 33

Causal Models and Conditional Reasoning

Causal Models and Conditional Reasoning   Mike Oaksford and Nick Chater The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.5

Abstract and Keywords There are deep intuitions that the meaning of conditional statements relate to probabilis­ tic law-like dependencies. In this chapter it is argued that these intuitions can be cap­ tured by representing conditionals in causal Bayes nets (CBNs) and that this conjecture is theoretically productive. This proposal is borne out in a variety of results. First, causal considerations can provide a unified account of abstract and causal conditional reason­ ing. Second, a recent model (Fernbach & Erb, 2013) can be extended to the explicit causal conditional reasoning paradigm (Byrne, 1989), making some novel predictions on the way. Third, when embedded in the broader cognitive system involved in reasoning, causal model theory can provide a novel explanation for apparent violations of the Markov condition in causal conditional reasoning (Ali et al, 2011). Alternative explana­ tions are also considered (see, Rehder, 2014a) with respect to this evidence. While fur­ ther work is required, the chapter concludes that the conjecture that conditional reason­ ing is underpinned by representations and processes similar to CBNs is indeed a produc­ tive line of research. Keywords: conditional, causal Bayes nets, conditional reasoning, abstract, cognition

Over the last twenty-five years, the most researched paradigm in the psychology of rea­ soning has been the causal conditional reasoning paradigm, using conditional statements like if the key is turned then the car starts (Byrne, 1989; Cummins, 1995; Cummins, Lubarts, Alksnis, & Rist 1991). During this period the dominant approach to causal learn­ ing and judgment has been based on causal Bayes nets (e.g., Waldmann & Martignon, 1998). In causal model theory (Sloman, 2005; Sloman, Barbey, & Hotaling, 2009; Sloman & Lagnado, 2005), the Bayes net formalism is treated as an account of the mental repre­ sentations and processes underlying causal reasoning. While there have been proposals that causal conditional reasoning and causal model theory could be related (Chater & Oaksford, 2006; Stenning & Van Lambalgen, 2005)1 these proposals have been tested ex­ perimentally only very recently (Ali, Chater, & Oaksford, 2011; Ali, Schlottmann, Shaw, Chater, & Oaksford, 2010; Cummins, 2014; Fernbach & Erb, 2013). In this chapter, we ar­

Page 1 of 33

Causal Models and Conditional Reasoning gue that taking causal model theory seriously as an account of the mental representa­ tions and processes underlying conditional reasoning may be theoretically fruitful. We first introduce causal Bayes nets and the conditional inference paradigm. We then briefly argue that it should come as no surprise that causal considerations are relevant to people’s understanding of conditionals. This is because the intuitions underlying formal logical theories of the conditional have a causal origin. We then look at some experimen­ tal results on conditional inference. We show that causal considerations can allow a uni­ fied account of abstract and causal conditional reasoning; that Fernbach and Erb’s (2013) recent model can be extended to the explicit causal conditional reasoning paradigm (Byrne, 1989), making some novel (p. 328) predictions on the way; and that, when embed­ ded in the broader cognitive system involved in reasoning, causal model theory can pro­ vide a novel explanation for apparent violations of the Markov condition (see next sec­ tion) in causal conditional reasoning.

Figure 19.1 (a) Causal Bayes net used in Fernbach and Erb (2013); (b) Causal Bayes net with a noisy-OR (ᴠ) integration rule and two generative causes. (c) Causal Bayes net with a noisy-AND-NOT (ᴧ¬) inte­ gration rule with one generative cause and one pre­ ventive cause.

Causal Bayes Nets Causal model theory suggests that the mental representations and processes underlying causal reasoning are analogous to causal Bayes nets (Sloman, 2005). Causal Bayes nets (CBNs) treat the causal dependencies that people believe to be operative in the world as basic (Pearl, 1988; 2000, see also, Rehder, Chapters 20 and 21 in this volume; Rottman, this volume). These are represented as edges in a directed acyclic graph (see Figure 19.1). The nodes represent Bayesian random variables. They also represent the relevant causes and effects, with the arrows running from cause to effect (i.e., the arrows repre­ sent causal direction). Nodes that are not connected represent variables which are condi­ tionally independent of each other. The parents of a node are those that connect to it fur­ ther back down the causal chain. These networks have probability distributions defined over them that partly rely on the dependency structure. Integration rules determine how the multiple parents of a node combine, for example, the noisy-OR rule (see Figure 19.1). Suppose, in Figure 19.1, p and r represent rain and the sprinklers being on, respectively. These are independent causes of the pavements being wet (i.e., q). Assume there are no other causes of the pavements being wet. On this as­ sumption, the probability of the pavements being wet is then Page 2 of 33

Causal Models and Conditional Reasoning , where ind(p) = 1 if it is raining and 0 if it is not and Wi is the probability of q given cause i. If this were a deterministic system (i.e., Wr = Wp = 1), then this formula is equivalent to logical inclusive or (i.e., it gives probability 1 unless both causes are absent, when it gives probability 0, i.e., if it is not raining and the sprin­ klers are not on, then the pavements are not wet). This view commits one to more than probability theory (Pearl, 1988, 2000). A recent re­ view by Rottman and Hastie (2014, p. 111) summarizes the additional assumptions made in Bayes nets, which are mainly about making inference tractable. The most important in this respect is the causal Markov property that causes “screen off” their effects, so that inferences about any effect variable depend only on its direct causes and not on any of the other effects or indirect causes. For example, the sun being out causes shadows and high temperatures. If it is known to be sunny, then these effects are independent (i.e., ma­ nipulating one, e.g., walking into an air conditioned room, will not affect the other, there will still be shadows outside). Moreover, if it is known to be sunny, these effects are inde­ pendent of any of the causes of it being sunny. While there are detractors of the Bayes net approach (e.g., Cartwright, 1999, 2001), Rottman and Hastie (2014, p. 111) argue “that the approach helps us to understand real causal systems and how ordinary people think about causality.” They may also help us to understand how people reason with condition­ als.

Causal Conditional Reasoning The psychology of deductive reasoning is the study of the inferences people draw from the logical terms of natural language. In English, these are terms like and, or, not, and the conditional, if p then q. In causal conditional reasoning (Byrne, 1989; Cummins, 1995; Cummins, Lubarts, Alksnis, & Rist 1991; Over, Chapter 18 in this volume), the an­ tecedent, p, is a cause, and the consequent, q, an effect (e.g., if the key is turned, the car starts). Four inference forms are typically investigated in the conditional reasoning para­ digm. These are the (p. 329) valid inference rules of modus ponens (MP: if p then q, p, therefore q) and modus tollens (MT: if p then q, not q, therefore not p) and the fallacies of denying the antecedent (DA: if p then q, not p, therefore not q) and affirming the conse­ quent (AC: if p then q, q, therefore p). The most important finding in the causal condition­ al reasoning paradigm is the discovery of suppression effects (see also Over, Chapter 18 in this volume). We describe these in more detail later, but their nature is straightfor­ ward. If people treat conditional inference formally by the application of logical rules of inference, they should always endorse MP. However, consider MP on the causal condition­ al noted earlier (i.e., if the key is turned, the car starts; the key is turned; therefore, the car starts). If participants are also told that cars need to have fuel in order to start, their endorsements of MP are drastically reduced. The fact that valid inferences can be sup­ pressed in this way has been taken to argue against an approach to reasoning based on mental logic (Byrne, 1989).

Page 3 of 33

Causal Models and Conditional Reasoning Following earlier proposals (Chater & Oaksford, 2006), the range of inferences investigat­ ed in causal conditional reasoning has recently been extended to discounting and aug­ mentation inferences (Ali, Chater, & Oaksford, 2011; Ali, Schlottmann, Shaw, Chater, & Oaksford, 2010). For example, suppose you know that turning the ignition key and jump starting causes cars to start. If you are then told that the car started, then turning the ig­ nition key and jump starting are possible causes. However, if you are then told that the car was jump started, this discounts the key being turned as a possible cause of the car starting. Related effects occur when this information is presented in conditional form that cannot be explained by logic-based theories of the conditional. We discuss these results and extend them in more detail later. Despite earlier intimations (Stenning & van Lambal­ gen, 2005) and discussions (Oaksford & Chater, 2010a, 2010b, 2013), a causal Bayes net approach has only recently been used to model suppression effects in causal conditional reasoning (Fernbach & Erb, 2013; see also Cummins, 2014). We present Fernbach and Erb’s (2013) model later in the chapter. Before discussing how causal model theory can explain results on conditional reasoning, we first provide a brief account of why we believe causation is important to understand­ ing conditionals.

Causation and the Meaning of Conditionals Logical accounts of the conditional have been driven by the attempt to make sense of causal or law-like relations (for more extended summaries, see Oaksford & Chater, 2010a, 2010b, 2010c). The logical positivists analyzed law-like relations using the material condi­ tional of standard logic (Suppe, 1977). A material conditional, if p then q, is true if and on­ ly if the consequent (q) is true or the antecedent (p) is false. So it is only false when the antecedent is true and the consequent is false. In the philosophy of science, it was soon realized that the material conditional was inadequate as an explanation of law-like rela­ tions (Chisolm, 1946; Goodman, 1947, 1955). What distinguishes a law-like relation from a non-law-like or accidental generalization is that the former supports counterfactuals. So if the switch is on the light is on supports the truth of the counterfactual conditional if the switch had not been on, the light would not have been on. Hume’s second definition of causation was framed using a similar counterfactual conditional formulation. However, the material conditional could not be used to analyze counterfactuals because on a mater­ ial conditional interpretation, counterfactual conditionals are always true because their antecedents (p) are always false. This is why they are counterfactuals. But counterfactual conditionals can obviously be false. For example, suppose all the coins in my pocket hap­ pen to be silver. Is the counterfactual, if this coin had been in my pocket it would have been silver true? Intuition says it is false because there is no causal or law-like relation between being in my pocket and the metallic constitution of a coin. That the coins in my pocket all happen to be silver is an accidental generalization. The possible worlds semantics for the counterfactual (Lewis, 1973; Stalnaker, 1968) is recognized as the beginning of our understanding of the logic of the conditional in natur­ Page 4 of 33

Causal Models and Conditional Reasoning al language, as opposed to mathematics (Nute, 1984). To evaluate the truth of a counter­ factual conditional requires one to envisage the most similar possible world to the actual world in which the antecedent is true, to see if the consequent is also true in this world. “Similar” is taken to mean one in which the causal laws remain the same. An almost iden­ tical procedure also applies to “open” indicative conditionals (Stalnaker, 1984). These conditionals are open in the sense that it is not known whether the antecedent is true or false (e.g., if the switch is on the light is on). Again one envisages the closest possible world in which the antecedent is true and evaluates whether the consequent is true. In his conceptualist interpretation of possible worlds, Stalnaker (1984) conceives of open in­ dicative conditionals as describing our methodological (p. 330) policies for revising our be­ liefs in the light of new information. Moreover, Stalnaker argues that if an open indicative supports a counterfactual, then the person asserting the counterfactual must believe that the open indicative is making a factual claim about a real causal relation in the world. In terms of the psychology of thinking, it does not matter if an open indicative conditional that supports a counterfactual actually denotes a real causal relation (i.e., such ontologi­ cal questions need not directly concern us). That someone must believe this, however, is instructive about the nature of the cognitive system. Our methodological policies for be­ lief change or habits of inference are fundamental, and an assertable conditional in natur­ al language will imply that an inferential dependency exists between antecedent and con­ sequent.2 While there are apparent uses of the conditional that seem to contradict this claim,3 it is arguable that these are not real conditionals (Krzyżanowska, Wenmackers, & Douven, 2013). The intuition behind the possible worlds semantics for the conditional is the Ramsey test (Ramsey, 1931), which also provides the core intuition behind recent probabilistic ap­ proaches to conditional reasoning. Probabilistically construed, the Ramsey test provides a means of determining the subjective probability of a conditional which is identified with the conditional probability, that is, Pr(if p then q) = Pr(q|p). This identity has been called “the Equation” (Edgington, 1995). According to the Ramsey test, one supposes that p = True, adds it to one’s stock of beliefs, make revisions to accommodate this new belief, and then reads off the probability of q. The Equation is at the heart of the probabilistic new paradigm in reasoning (Evans, Handley, & Over, 2003; Elqayam & Over, 2013; Fugard, Pfeifer, Mayerhofer, & Kleiter, 2011; Oaksford & Chater, 2007, 2009; Oaksford, Chater, & Larkin, 2000; Oberauer & Wilhelm, 2003; Over, Hadjichristidis, Evans, Sloman, & Hand­ ley, 2007), which appeals to probability logic of some form (e.g., Adams, 1998). For exam­ ple, we have construed conditional inference as Bayesian conditionalization. So when we learn that p = True, the new probability of p, Pr1(p) = 1, and the new probability of q, Pr1(q) = Pr0(q|p) (i.e., the old conditional probability of q given p). While riding roughshod over many important conceptual problems, we would argue that a causal model approach based on causal Bayes nets unifies these core logical intuitions about conditionals. That is, causal or law-like dependencies are what conditionals are

Page 5 of 33

Causal Models and Conditional Reasoning used to describe and that these dependencies are treated probabilistically (Oaksford & Chater, 2010a).

Experimental Results We have argued that people’s default is to think about open indicative conditionals like causal relations. Does this point of view provide us with any explanatory purchase, does it provide a new way to understand the patterns of reasoning observed in conditional infer­ ence? In this main section of the chapter, we review some of the empirical results on con­ ditional inference. We argue that taking causality and causal models seriously as an ac­ count of the representations and processes involved in conditional reasoning does indeed provide new theoretical insights. We start with the canonical results using abstract mate­ rials.

The Abstract Results In this section, we first show how the logical laws of MP and MT and the fallacies AC and DA can be probabilized using Bayesian conditionalization. We argue that people approach even abstract versions of the conditional inference task causally (i.e., they think about conditionals as expressing causal dependencies). This intuition allows a new interpreta­ tion of MT and the fallacies as learning a counterexample. We show how this interpreta­ tion allows us to explain the abstract results, which provides clues about people’s default thinking about causal relations (Oaksford & Chater, 2013). We conclude by arguing that the best way to integrate our causal intuitions and the probabilistic analysis they suggest is to propose that people represent conditionals in causal Bayes nets.

Probabilized Conditional Reasoning Bayesian conditionalization has been used to provide an account of conditional inference as dynamic belief revision in which we transit from an old, Pr0(), to a new, Pr1(), probabili­ ty distribution (Oaksford & Chater, 2003a, 2003b, 2003c, 2007, 2013; Oaksford, Chater, & Larkin, 2000). On this account, probabilized MP is an inference by Bayesian conditional­ ization. So take the conditional premise if the key is turned, then the car starts (Pr(if p then q) = Pr0(q|p) = a, by the Equation), and the categorical premise, the key is turned, p (Pr1(p) = 1), then the new probability of the conclusion, that the car starts, Pr1(q), is a, that is, the new probability (p. 331) that the car starts is the same as the old conditional probability that the car starts given the key is turned, Pr1(q) = Pr0(q|p). Bayesian condi­ tionalization relies on the invariance or rigidity condition (Jeffrey, 1992; Sober, 2002). The conditional probabilities remain the same between the old and new distributions, that is, Pr1(q|p) = Pr0(q|p). This new paradigm approach represents a radical departure from previous accounts of human reasoning. On previous logical (e.g., Rips, 1994) and quasi-logical accounts (John­ son-Laird & Byrne, 2002), participants are asked to assume the premises are true and to Page 6 of 33

Causal Models and Conditional Reasoning indicate whether a conclusion validly follows from the premises. On the new approach, in­ ference is belief revision (i.e., given your degrees of belief in the premises, what should be your new degree of belief be in the conclusion?). We are concerned not with whether the inference is valid, but whether the degree of belief assigned to the conclusion falls with coherent bounds given someone’s degrees of belief in the premises. There is good evidence that people’s degrees of belief in the conclusion respects probabilistic coher­ ence for a variety of inferences involving conditionals (Cruz, Baratgin, Oaksford, & Over, 2015). The departure we have suggested is that in many contexts the premises of an ar­ gument can lead to non-monotonic changes in the probability distribution, such that in­ variance is violated (Oaksford & Chater, 2013). This is dynamic belief revision, and we have argued that causal examples show that invariance is often violated for conditional inferences. Moreover, explaining what happens in these situations provides a much better account of some of the canonical abstract data on conditional inference (Oaksford & Chater, 2013). For MP, learning that the key was turned has no effect on the probability that the car starts given the key was turned. But what if you learn that the car did not start? By logi­ cal MT, one is supposed to conclude that the key was not turned. However, it seems that asserting that the car did not start only makes sense in a context where the car was ex­ pected to start (i.e., the key has been turned; Adams, 1998; Oaksford & Chater, 2007; So­ bel, 2004). That is, the categorical premise of the MT inference appears to provide a counterexample to the conditional statement (i.e., an instance of the key being turned but the car not starting). We have explored the possibility that people treat the categorical premise of MT as a falsifying, p, ¬q, observation from which they learn a new conditional probability (Oaksford & Chater, 2007, 2013) in violation of the invariance assumption. To see how this could happen, we need to be clear on what further information people need to know about in order to draw the other inferences, MT, AC, and DA. The only infor­ mation participants have is that for each inference, the conditional and the categorical premises have been asserted (and perhaps they have been told to assume these premises are true). This suggests that Pr0(p|q) is high and, for example, Pr1(¬q) = 1, that is, ¬q is true (for MT). However, to draw inferences other than MP they must appeal to prior knowledge. So, to draw AC, for example, requires deriving the probability that the key was turned given the car starts, that is, Pr0(p|q). This probability is the inverse of Pr0(q|p) and so must be calculated using Bayes’s theorem, which requires information from world knowledge about the priors, Pr0(p) and Pr0(q) (Oaksford & Chater, 2007; Oaksford, Chater, & Larkin, 2000). With these three parameters we can derive the conditional prob­ abilities associated with all four inferences, that is, Pr0(q|p) (MP), Pr0(¬q|¬p) (DA), Pr0(p| q) (AC), and Pr0(¬p|¬q) (MT). Conditional inference is then Bayesian conditionalization on the appropriate conditional probability (Oaksford & Chater, 2007; Oaksford, Chater, & Larkin, 2000). We used an equivalent, more informative parameterization, that is, Pr0(p), Pr0(q|p), Pr0(q| ¬p), to model the abstract data (Oaksford & Chater, 2013). This is more informative be­ cause interpreting the conditional causally means we want to derive information about Page 7 of 33

Causal Models and Conditional Reasoning causal sufficiency and necessity. The higher Pr0(q|p), the more sufficient turning the key is for starting the car. The lower Pr0(q|¬p), the more necessary turning the key is for start­ ing the car.

Fitting the Abstract Data The abstract data pose an apparent problem for a simple model of conditional inference like Bayesian conditionalization. The endorsement rate of MP is very high, which sets too high a value (≈ .97) for Pr0(q|p) to provide good fits for the remaining inferences. Howev­ er, as we saw, it appears that on a causal interpretation the categorical premise for the remaining inferences suggest a counterexample. For MT, this counterexample is an in­ stance of the car not starting even though the key has been turned—that is, a counterex­ ample that questions the sufficiency of turning the key for starting the car. For AC and DA, this counterexample is an instance of (p. 332) the car starting even though the key has not been turned—that is, a counterexample that questions the necessity of turning the key for starting the car. Our model assumes that for these other inferences, people learn a new probability of Pr1(q|p) (such that Pr1(q|p) ≠ Pr0(q|p)) by learning on this observa­ tion of a single counterexample. For simplicity, we assumed that the new probability Pr1(q |p) is the same whether they learn a p, ¬q counterexample or a ¬p, q counterexample.4 This means that for MT, AC, and DA, Pr1(q|p) was always lower than the probability of drawing MP, Pr0(q|p), from which it is derived. Pr1(q|p) was calculated by simple Bayesian updating of the empirical value of Pr0(q|p) based on the MP endorsement rate, assuming a single observation of a counterexample. This value was then used to model the data with Pr0(q|¬p) and Pr0(p) as free parameters. This model was fitted (Oaksford & Chater, 2013) to a large meta-analysis of the abstract conditional inference task (Schroyens & Schaeken, 2003). The model provided as good a fit to the data as in Oaksford and Chater (2007). It also provided an interesting clustering of the parameter values, which we suggest may be indicative of people’s default under­ standing of a causal dependency. The cluster analysis over the fitted parameter values yielded two clusters. In cluster 1 (N = 29) the mean values of Pr0(p), Pr0(q|p), and Pr0(q|¬p) were .48 (SD = .091), .98 (SD = . 030), and .04 (SD = .065), respectively. In cluster 2 (N = 26) the mean values were .40 (SD = .084), .96 (SD = .038), and .46 (SD = .165), respectively. Pr0(q|p) did not differ sig­ nificantly between clusters. The value of Pr1(q|p), used to model DA, AC, and MT, was .72. For cluster 1, Pr0(q|¬p) ≈ 1 – Pr0(q|p). That is, these parameter values are consistent with the interpretation that for the studies in this cluster, people initially regard turning the key to be as necessary as it is sufficient for starting the car, that is, Pr0(¬q|¬p) ≈ Pr0(q|p). In the other cluster, Pr0(p) was significantly less than 0.5 and Pr0(q|¬p) did not differ significantly from .5, that is, learning that the key was not turned is initially regard­ ed as uninformative leaving one maximally uncertain about whether the car started.

Page 8 of 33

Causal Models and Conditional Reasoning

Causal Interpretation and CBNs We think that it is no coincidence that a causal interpretation makes sense of what is go­ ing on in these results. In the first cluster, people initially assume very high necessity and sufficiency, that is, there are no other causes that can start a car other than turning the key and there are very few occasions when turning the key fails to have this effect. The latter suggests there are few ways in which the cause can be disabled or that disablers (e.g., a flat battery or no fuel) occur infrequently (see Geiger & Oberauer, 2007). Howev­ er, when people are asked to consider the MT, AC, and DA inferences, the revised values of Pr0(q|p) fall and those of Pr0(q|¬p) rise. That is, these inferences suggest counterexam­ ples. However, the experiments used abstract alphanumeric stimuli. Consequently, people can’t be generating explicit alternative causes (jump starting) or disabling conditions (no fuel). Rather, the fact that they are possible and that their possibility is triggered for the non-MP inferences for almost any conditional statement must be represented implicitly (see also Rehder, 2014)—that is, the existence of alternative causes or disablers is repre­ sented purely in terms of lower values of Pr(¬q|¬p) and Pr(q|p), respectively. As we have seen, this analysis used causal intuitions to motivate a new probabilistic mod­ el of the conditional inference paradigm. We earlier proposed that CBNs may unify proba­ bilistic theories of the conditional (Adams, 1998) with the view that open indicative condi­ tionals describe inferential dependencies. The obvious next move is to translate our prob­ abilistic analysis into the CBN formalism. The CBN in Figure 19.1 corresponds to a single causal dependency between p and q. In this network, the variables of interest are parame­ terized as causal powers (Cheng, 1997) and other alternative causes are explicitly repre­ sented. So Wp corresponds to the causal power of p to cause q in the absence of alterna­ tive causes, a (Pr(q|p,¬a), and Wa corresponds to the causal power of a to cause q in the absence of p (Pr(q|¬p,a)); p and a are assumed to independently affect q, and their com­ bined effect is modeled using the noisy-OR integration rule. High Wp and low Wa correspond to high causal sufficiency and high causal necessity, respectively. Recasting our account in terms of CBNs has the advantage that there are well-defined algorithms for learning both causal structure and the strength of causal powers from data (Griffiths & Tenenbaum, 2005). This fact suggests a unified way of implementing the learning from counterexamples approach to violations of invariance for MT and the fallacies. However, the area where CBNs has the greatest potential to illuminate the psychology of condition­ al reasoning is the suppression effect, to which we now turn. (p. 333)

Suppression Effects

We initially adopted a similar implicit interpretation of the effects of manipulating alter­ native causes and disablers in the causal conditional reasoning paradigm (Oaksford & Chater 2003c, 2007). There are two versions of the causal conditional reasoning para­ digm, which we have briefly introduced. However, the original experiment did not use causal materials (Byrne, 1989; see also, Bryne, Espino, & Santamaria, 1999). Instead, ma­ terials about individual intentions or dispositions were used (e.g., if Lisa met her friend then she went to a play). For reasons we have already discussed (see also Oaksford & Page 9 of 33

Causal Models and Conditional Reasoning Chater, 2010a, 2010b), we don’t think the difference is consequential. People still infer that some inferential dependency, her intention or perhaps a promise, causes Lisa to go to the play if she meets her friend. Consequently, we introduce both paradigms using causal material. In the Byrne (1989) paradigm, participants are provided with pairs of conditional sen­ tences, (1) and (1') or (1) and (1''): If the key is turned, the car starts (1) If jump started, the car starts (1') If there is fuel, the car starts (1'') (1') describes an alternative cause and (1'') describes a disabler (i.e., if the antecedent is false, the car will not start). As our discussion in the previous section indicated, pairs like (1) and (1') should lead to lower levels of AC and DA inferences, and pairs like (1) and (1'') should lead to lower levels of MP and MT. We have referred to this paradigm as the explicit suppression paradigm (Oaksford & Chater, 2003c, 2007). Cummins, Lubarts, Alksnis, and Rist (1991; see also Cummins, 1995) introduced what we have called the implicit suppression paradigm (Oaksford & Chater, 2003c, 2007). Rather than present participants with explicit alternative causes (1') or disablers (1''), condition­ als like (1) are pretested for the number of alternative causes or disablers with which they are associated. In the pretests, people are asked to generate as many alternative ways of getting to the effect that they can, or to generate as many reasons as they can for why the effect does not occur even though the cause has occurred. Post hoc, the condi­ tional statements are then divided into four groups: few alternatives and few disablers (FF), few alternatives and many disablers (FM), many alternatives and few disablers (MF), and many alternatives and many disablers (MM). In this literature, the general con­ clusion is that these paradigms produce similar results (e.g., Dieussaert, De Neys, & Schaeken, 2005).5 Figure 19.2 shows the results of Byrne (1989) and Cummins et al. (1991). We have aligned comparable conditions in the panels. So, for example, Panel B shows the alternative cause condition (1 and 1') for Byrne (1989) and the many alternative causes and few disablers condition (MF) for Cummins et al. (1991). Comparing conditions shows that the magni­ tude of the suppression effects was far greater for Byrne’s (1989) explicit paradigm. One possible reason for this discrepancy is the reason that 95% confidence intervals are only shown for Cummins et al. (1991). Rather than use a rating scale (Cummins et al., 1991), in Byrne (1989) participants were asked for a binary “endorse or not endorse” response. So the Byrne (1989) data in Figure 19.2 are frequencies rather than mean ratings. Could the ratings for Cummins et al. (1991) yield similar categorical binary responses to Byrne (1989) (e.g., if we assumed a 0.5 cutoff such that if Pr(Conclusion) < 0.5 respond “no” and if Pr(Conclusion) > 0.5 respond “yes”)? The error bars in Panel B for DA span Pr(Conclusion) = .5. Consequently, using this cutoff should yield roughly equal numbers Page 10 of 33

Causal Models and Conditional Reasoning of “yes” and “no” responses. Yet in Byrne’s results, 96% of participants responded “no.” Moreover, using the same criterion for MP in Panel C, all participants should respond “yes,” yet in Byrne et al. (1989) 62% responded “no.” In sum, there is no immediately ob­ vious way to escape the conclusion that the magnitude of the suppression effects was far greater for Byrne’s (1989) explicit paradigm.6 We initially modeled both sets of data implicitly (Oaksford & Chater, 2003c, 2007, 2008) by adjusting the parameters, Pr(q|p) and Pr(q|¬p). First, we assumed that the presence of an alternative cause (Byrne, 1989) or of a conditional associated with many alternative causes (Cummins et al., 1991) raised the probability that, for example, the car could start without turning the key (Pr(q|¬p)). That is, turning the key is considered less necessary to start the car. Second, we assumed that the presence of a disabler (Byrne, 1989) or of a conditional associated with many disablers (Cummins et al., 1991) lowered the probabili­ ty that, for example, the car would start when the key was turned (Pr(q|p)), that is, turn­ ing the key is considered less sufficient to start the car. However, there is a flaw in this account. If the mental processes involved in both para­ digms is (p. 334) exactly the same, then regardless of whether a disabler or alternative cause is retrieved from memory (Cummins et al., 1991) or is explicitly presented (Byrne, 1989) it should have the same effect on these parameters. Moreover, the more disablers or alternative causes retrieved or presented, the greater the effects should be (i.e., de­ creases in Pr(q|p) and increases in Pr(q|¬p)). However, in Byrne (1989) the magnitude of the effect of a single presented disabler or alternative cause was much greater than when many can be retrieved from memory in the Cummins et al. (1991) paradigm. That is, if ex­ actly the same mental processes are involved (e.g., implicit representation), Figure 19.2 shows that people’s behavior is the wrong way around: they are showing greater effects for a single counterexample than for many.

Page 11 of 33

Causal Models and Conditional Reasoning

Figure 19.2 Results from Byrne (1989) and Cummins et al. (1991) showing the probability of drawing each inference aligning comparable conditions. Panel A shows the standard control (1) for Byrne (1989) and the FF condition for Cummins et al. (1991); Panel B shows the alternative cause condition (1 and 1') for Byrne (1989) and the MF condition for Cummins et al. (1991); Panel C shows the disabler condition (1 and 1'') for Byrne (1989) and the FM condition for Cummins et al. (1991); Panel D shows the MM condi­ tion for Cummins et al. (1991). FF = few alternatives, few disablers; MF = many al­ ternatives, few disablers; FM = few alternatives, many disablers; MM = many alternatives, many dis­ ablers. Error Bars = 95% CIs.

This anomaly suggests that people do not represent and reason about the two paradigms in the same way, as we and others have largely assumed. A possible difference between the tasks is that without the explicit presence of exceptions (i.e., disablers or alternative causes), people do not spontaneously retrieve them from memory (Byrne et al., 1999). However, the dominant paradigm in reasoning research over the last almost 25 years has been the Cummins et al.’s (1991) paradigm, and the many replications of their results have been interpreted as demonstrating that people do indeed spontaneously retrieve ex­ ceptions from memory. The most obvious explanation of the difference between paradigms is that in the Byrne (1989) paradigm the explicit presence of these exceptions leads participants to explicitly represent the specific alternative causes or disablers mentioned in the materials. Ideally, such explicit representations would allow people to explore a greater range of possibili­ ties by interrogating the model. For example, they could explore whether the car could start even though the key had not been turned. They should be able to determine that this is possible as long as the car was jump started. Similarly, they could explore whether the car could not start even though the key had been turned. They should be able to deter­ mine that this is possible as long as the car was out of fuel. Simulating these possibilities in an explicit model of the situation described in the premises may have a far greater ef­ fect on people’s judgments than implicitly encoding the possible presence of alternative causes and disablers in the parameters, Pr(q|p) and Pr(q|¬p). One can see why, by consid­ ering that these parameters would also have to encode information about the prior Page 12 of 33

Causal Models and Conditional Reasoning probability of a disabler or an alternative cause; that is, Pr(q|p) is high because, for example, running out of fuel is a rare event and Pr(q|¬p) is low because, for example, jump starting cars is similarly a rare event. However, explicit representation allows con­ sideration of what happens when the probabilities of these events goes to 1 (i.e., to con­ sider what would happen should they actually occur). We now look at current applications of CBNs to causal conditional reasoning. (p. 335)

Causal Bayes Nets and Conditional Reasoning We show how causal Bayes nets have been applied to causal conditional reasoning, first, by qualitatively extending the paradigm to discounting and augmentation inferences (Ali, et al., 2011) and, second, in quantitatively modeling the predictive MP inference and the diagnostic AC inference (Fernbach & Erb, 2013). We then extend the approach to the Byrne paradigm and introduce some novel predictions.

Discounting Some of the unique predictions of a CBN approach to conditional inference have recently been confirmed (Ali et al., 2011). As in Byrne (1989), participants were presented with two conditional statements, either a pair of causal conditionals—if c then e—or a pair of diagnostic conditionals—if e then c. We describe two conditions of these experiments that are logically identical but which make contrasting predictions under a causal interpreta­ tion (see Waldmann & Holyoak, 1992, for the first intimation that there may be profound differences between the causal or predictive cases and diagnostic cases; see also in this volume, Rehder, Chapters 20 and 21, and Rottman, Chapter 6). (2) If it rains, the streets are wet (2') If the sprinklers are on, the streets are wet (3) If it is warm outside, it is sunny (3') If there are shadows, it is sunny For the causal conditionals (2) and (2'), participants are told that the streets are wet and are asked for an estimate of how likely it is to have rained. They are then told that the sprinklers are on and again asked for an estimate of how likely is it to have rained. The likelihood judgment that it rained should fall between the first and the second judgment because finding out that the sprinklers are on explains away why the streets are wet dis­ counting rain as a possible cause. For the diagnostic conditionals, a directly analogous se­ quence of information should not lead to a fall in the likelihood judgment. This is because once it is known that it is sunny, learning that there are shadows does not discount the possibility of it being warm because they have a common cause, it being sunny. These pat­ terns of likelihood judgments have been observed with adults (Ali et al., 2011) and with children (Ali et al., 2010). These experiments extended the range of inferences investigated with conditionals. Moreover, the results cannot be explained by logic-based theories as they rely on the con­ ditionals being interpreted causally. This research on discounting inferences in causal Page 13 of 33

Causal Models and Conditional Reasoning conditional reasoning made qualitative predictions for the patterns of inference. Howev­ er, recently the Cummins paradigm has been modeled quantitatively.

Modeling the Cummins Paradigm Using CBNs to quantitatively model the Cummins paradigm has been characterized (Fernbach & Erb, 2013) as a combination of the mental models approach (Johnson-Laird, 1983; see Johnson-Laird & Khemlani, Chapter 10 in this volume) and the conditional prob­ ability approach (Oaksford & Chater, 2003c, 2007). In mental models theory, reasoning in­ volves a search for counterexamples. Situations that could lead to the car not starting even though the key has been turned are counterexamples to sufficiency (i.e., p, ¬q cases). Situations that could lead to the car starting even though the key has not been turned are counterexamples to necessity (i.e., ¬p, q cases). The conditional probability model does not specify where the probabilities come from, but accessing counterexam­ ples from memory is an obvious possibility and the one endorsed by CBNs. This connec­ tion between mental models and the conditional probability model has been pointed out before (Oaksford & Chater, 2003b): “However, there are many areas of agreement. … The intuition behind the conditional probability model, the Ramsey (1931) thought experi­ ment, makes this explicit. … For repeatable events [the Ramsey test] will involve search­ ing memory for counterexamples” (p. 155). Fernbach and Erb (2013) developed a CBN model for just the MP and AC inferences. In the conditional probability model, this involves computing the probability that the car starts given the key has been turned (Pr(q|p)) and the probability that the key has been turned given that the car started (Pr(p|q)). In contrast, (p. 336) Fernbach and Erb (2013) model MP as causal power (Cheng, 1997; see also Cheng & Lu, Chapter 5 in this volume); that is, the probability that the car starts given that the key has been turned in the ab­ sence of all alternative causes (a) (Pr(q|p, ¬a)). The probabilities Pr(q|p, ¬a) and Pr(p|q) are derived by accessing alternative causes and disablers from memory. Alternative caus­ es and the actual cause (p) are combined in a common effect structure (p → q ← a), which assumes that the cause and its alternatives make independent contributions to the effect. The effects of different combinations of the cause and its alternatives on the effect vari­ able are determined by the noisy-OR function (Pearl, 1988). All alternative causes are al­ ways assumed to be present in the causal background, Pr(a) = 1, and their power is as­ sumed to encode information about their collective prior probability. We use Wa to denote Pr(q|a) and Wp to denote causal power, Pr(q|p, ¬a). From basic probability theory:

(4)

Using the noisy-OR function, which we explained earlier, taking into account other possi­ ble causes of q, Pr(q|p) is

Page 14 of 33

Causal Models and Conditional Reasoning (5)

where ind(p) is 1 when the cause is present and 0 when the cause is absent. Equations (4) and (5) are sufficient to specify all the possible joint probabilities of the cause (p) and the effect (q) or their absences:

(6)

They therefore allow the derivation of an expression for the conditional probability for the AC inference (i.e., Pr(p|q)):

(7)

In one experiment, Fernbach and Erb (2013) collected estimates of the three parameters Pr(p), Wa, and Wp for the causal conditionals used in Cummins (1995) to fit (7) to data on MP and AC inference ratings without using free parameters. In another experiment, rather than treat Wp as primitive, they exploited the noisy-OR functions approach to calculating causal power (Pearl, 1988). The idea is that causal pow­ er can be derived from retrieved disablers (di), and the derivation depends on their prior probability, Pr(di), and their disabling power, . The disabling probability, Ai, is then, . The aggregated disabling power, A', is then:

(8)

Causal power then equals

(9)

Equations (8) and (9) show exactly how causal power depends on information about re­ trieved disablers. This allowed Fernbach and Erb (2013) to collect estimates of the para­ meters Pr(di) and , for various causal conditionals to fit (9) to data on MP inference rat­ ings without using free parameters. Here participants were explicitly asked to generate possible disablers and to rate their prior probability and disabling power.

Page 15 of 33

Causal Models and Conditional Reasoning Correlations across the causal conditionals used in Cummins (1995) between estimates of MP and AC and their predicted values using (7) and (9) were very good (Fernbach & Erb, 2013). Consequently, CBNs may provide a good model of causal conditional reasoning us­ ing the Cummins implicit paradigm. In this setup, no information about particular alterna­ tive causes or disablers is explicitly represented. Alternative causes are treated as part of the ever present causal background, and this causal background is combined with the ac­ tual cause, p, via the noisy-OR function. Another limitation is that Fernbach and Erb (2013) did not look at the denial inferences, DA and MT. We modeled the Cummins et al. (1991) data including the denial inferences. In the Cum­ mins paradigm, the conditionals used in each condition vary to achieve the manipulation of the independent variable (i.e., many or few alternative causes or disablers). Conse­ quently, we allowed the parameters to vary between conditions (i.e., we used separate Pr(p), Wa, and Wp parameters in (p. 337) each condition). The fits, shown in Figure 19.2, were acceptable, R2 = .79. However, to achieve these fits rather than use causal power to model MP, we have used Pr(q|p). This was because in modeling these data, the plausible parameter values always yielded a value of Pr(¬p|¬q), the probability of the MT infer­ ence, greater than causal power (Pr(q|p, ¬a)). But Figure 19.2 shows that the conclusion of MP is always believed more strongly than the conclusion of MT. These results suggest that more detailed model fitting is required. Fernbach and Erb (2013), who modeled MP using causal power, found good parameter free fits between the data and the model, which were better than plausible alternative theories. But to model the denial inferences seems to require using Pr(q|p) to capture the ordering in MP and MT endorsements. For causal conditional inference, this modeling result argues against the claim that people ig­ nore alternative causes in predictive inference (i.e., MP or DA), but not in diagnostic in­ ference (i.e., AC or MT) (Fernbach, Darlow, & Sloman, 2010, 2011). These anomalies need to be resolved. One possibility explored by Cummins (2014) is that the questions posed in Fernbach and Erb (2013; see also, Fernbach, Darlow, & Sloman, 2010, 2011) were easily misconstrued as about causal power. That is, the standard question posed for the predictive MP infer­ ence invites the interpretation, does the cause alone produce the effect. When partici­ pants are asked a question more appropriate to the noisy-OR interpretation (e.g., how likely is the cause and/or other things likely to produce the effect), people’s MP judg­ ments more closely approximate Pr(q|p). This finding may explain why, when we modeled Cummins et al.’s (1991) data, including the denial inferences, we needed to use Pr(q|p) to capture the ordering in MP and MT. Cummins (2014) also found that even when asked an appropriate question, people’s MP judgments were still lower than they should be accord­ ing to the noisy-OR function. She also showed (Experiment 4) that, as in the Byrne (1989) paradigm, the explicit presentation of a single disabler had a very pronounced effect on predictive MP inferences. This result suggests that the noisy-OR rule may not sufficiently take into account the effect of disablers on these inferences. In a final experiment, she asked participants to generate disablers after making predictive MP inferences and to judge their prior probability and disabling power. The results were not fully consistent with equations (8) and (9), but instead seemed to suggest primacy and recency effects for Page 16 of 33

Causal Models and Conditional Reasoning early and late retrieved disablers. This finding suggests that a more psychologically nu­ anced account of counterexample retrieval from long-term memory is required. However, we would argue that while important, these data require no fundamental revi­ sions to the model to take account of the actual retrieval process. Ultimately, these processes are captured in the model as modifcations of the prior probability and disabling power for particular disablers. The way they are then combined and the way they affect inference remains the same. What may require greater changes are the effects Cummins observed when disablers are explicitly presented, as in the Byrne paradigm to which we now turn.

Modeling the Byrne (1989) Paradigm To model the Byrne paradigm requires extending the model to include either an explicit alternative cause or an explicit disabler. For the alternative cause, explicit mention means that one alternative cause (r) is promoted from the causal background to be explicitly represented in the CBN. Because turning the key (p) and jump starting (r) are indepen­ dent causes of starting the car, (4), becomes

(10)

Incorporating a further alternative cause in (5) yields

(11)

All eight joint probabilities over these three variables can be calculated using (10) and (11), as in (6). This means that all the relevant conditional probabilities for the four infer­ ences, MP, DA, AC, and MT, can also be calculated. For disabling conditions, explicit mention means that one disabler is promoted from the calculation of causal power for the cause p, to be explicitly represented as an inhibitory or preventive cause. To model this setup we use the noisy-AND-NOT function (Novick & Cheng, 2004; Rottman & Hastie, 2014). This function assumes that there is one genera­ tive cause, p, and one preventive cause, r, and that these have independent effects on the effect variable, q (so (10) applies)7. So the effect, q, occurs if the generative cause, p, is present and (p. 338) the preventive cause, r, is absent. For noisy-AND-NOT, (5) becomes

(12)

Page 17 of 33

Causal Models and Conditional Reasoning Using (10) and (12), expressions can again be derived for all eight joint probabilities from which the conditional probabilities for the four inferences, MP, DA, AC, and MT, can be calculated. Fitting these data using the conditional probability model (Oaksford & Chater, 2003c, 2007) yielded very good fits, but the parameters values were not as interpretable as could be desired. For example, using the parameters Pr(q|p), Pr(p), and Pr(q) showed very large variations, especially for the priors. These variations might be more explicable with a dif­ ferent parameterization (e.g., using Wa, Wp, and Pr(p)). However, explicitly representing the alternative cause or disabler presented in the Byrne paradigm suggests that people might be assessing these inferences in a different way. This is similar to the question of whether MP is evaluated using the conditional probability, Pr(q|p), or causal power, Pr(q| p, ¬a). As we suggested earlier, the most obvious explanation for the magnitude of the ef­ fects in the Byrne paradigm is that people explicitly represent the presented alternative cause or disabler, r, and assume that it is present when drawing conditional inferences. Consequently, when r is an alternative cause, people may respond to DA and AC assuming that the explicit alternative cause also occurs (i.e., Pr(¬q|¬p, r) and Pr(p|q, r)), respec­ tively. Similarly, when r is a disabler, people may respond to MP and MT assuming that the explicit disabler also occurs, i.e., Pr(q|p, r) and Pr(¬p|¬q, r), respectively. We used these probabilities to model the Byrne data. In modeling the Byrne paradigm, there should also be a constant causal background for all three cases (i.e., A, B, and C in Figure 19.2), because the same causal conditional, if p then q, occurs in all three. The only change from A is the addition of the alternative cause (B) or the disabler (C). Consequently, Wa should not vary across these cases. This con­ trasts with the Cummins paradigm in which different conditionals are used to achieve the manipulation of the independent variables. In fitting the Byrne data, we modeled case A in the same way as the Cummins model, using the parameters, Wa, Wp, Pr(p). For each of the other cases, B and C, we added Pr(r) and Wr. We let Wr take the same value when it was an alternative cause (B) and when it was a disabler (C). Figure 19.2 shows acceptable model fits with an R2 of .81 (Pr(p) = .46, Pr(r) = .41, Wa = .55, Wp = .91, and Wr = .73. Clearly, more work is required on these model fits. In particular, more data are required to carry out fits to individual participants’ results and also to provide parameter-free fits to the Byrne paradigm, as in Fernbach and Erb (2013). Modeling the Byrne paradigm us­ ing explicit representations of the presented alternative causes or disablers also provides an account of Cummins (2014) Experiment 4. In this experiment, comparable effects on the MP inference were observed to those found in Byrne (1989; see Figure 19.2: Byrne, 1989).

Further Predictions In the previous section, we extended the scope of causal model theory to explain causal conditional reasoning in the Byrne paradigm. This involved exploiting the noisy-AND-NOT integration rule. There are a range of possible integration rules (Waldmann, 2007), some Page 18 of 33

Causal Models and Conditional Reasoning of which may make further interesting predictions about the way in which people inter­ pret causal conditionals using causal models. For example, Rehder (2015) has recently ex­ tended the framework to conjunctive causation, where two causes are only jointly suffi­ cient to produce an effect. So the effect, q, occurs only if both generative causes, p and r, are present. In this model, the separate causes are modeled as only possessing a com­ bined causal power, Wpr. Equation (5) then becomes:

(13)

Using (10) and (13), expressions can again be derived for all eight joint probabilities. We consider the implications of (13) for the discounting paradigm used by Ali et al. (2011). Using Rehder’s (2015) own material, possible pairs of conditionals are If butane-laden fuel is used (p), then the engine will overheat (q). (14) If the fuel filter gasket is loose (r), then the engine will overheat (q). For (13) to apply, it would also have to be emphasized that both p and r are required for q to occur. Given this interpretation, rather than discounting, augmentation should occur.8 So if it is known that the engine overheated, q, the probability that (p. 339) butane-laden fuel was used, Pr(p), will increase. And once it is known that the fuel filter gasket is loose, r, this probability will increase even further. This is a novel prediction of the causal model approach. A further prediction derives from the observation that if no other alternative causes are available (i.e., Wa = 0), then augmentation is not predicted. This is because in the ab­ sence of alternative causes, once it is known that the engine overheated, both conjunctive causes must have occurred. Consequently, being told that the fuel filter gasket is loose, r, could not further raise the probability that butane-laden fuel is used. Examples of this sit­ uation, in which causes are individually necessary for an effect, are given by pairs of con­ ditionals like (15). If the plant is watered (p), then it will grow (q). (15) If the plant receives light (r), then it will grow (q). The plant will only grow if it receives water and light (and some other things). There is no alternative cause that will make plants grow in the absence of water and light. Again us­ ing materials like (15) makes the novel prediction of no augmentation, in contrast to the setup in (14). In closing this section, we note that taking seriously the idea that people think about con­ ditionals as expressing causal or similar relations and that CBNs provide a satisfactory account of these relations can account reasonably well for some of the canonical data on conditional reasoning, including the abstract results, the suppression effects in the Cum­ Page 19 of 33

Causal Models and Conditional Reasoning mins and the Byrne paradigms, and, moreover, makes some interesting novel predictions, which we are now testing.

Extending the Framework as a Psychological Model In modeling these data, we have treated CBNs purely as a mathematical model. We have fitted the equations governing the behavior of particular CBNs to experimental results. Doing so leaves open the question of the level of explanation that we regard CBNs as pro­ viding. They could be regarded as providing a computational level explanation. At this level they outline the functions we think the mind brain is computing without any assump­ tions that CBNs provide the underlying mental representations and processes involved in conditional reasoning. However, the assumption underlying causal model theory is that CBNs provide an account of the actual mental representations and processes underpin­ ning conditional reasoning. In this section we therefore explore the consequences of tak­ ing this assumption seriously in order to look at the kind of cognitive system it implies.

CBNs as Mental Simulations The CBNs we have used to model the data are shown in Figure 19.1. These mental repre­ sentations allow us to simulate the causal system described in the conditional sentences used as premises in the arguments we have modeled. The term “simulation” is usually re­ stricted to running a causal model forward in time. Here we also use this term to de­ scribe running a model backward in diagnostic inference to infer possible causes of an ef­ fect. It is an advantage of the level of abstraction from real causal systems implicit in CB­ Ns that both predictive and diagnostic inferences can be made. It is important to bear in mind that in the mental simulation people do not have the equations that govern the model’s behavior available to them. This is rather like the difference between setting up a model in HUGIN and investigating the model after it has been compiled. In setting up the model, one must create the appropriate nodes corresponding to Bayesian random vari­ ables, introduce the appropriate causal links, select the integration rule, and specify the parameters. Once compiled, this information is no longer available. But what you can do now is explore how the model behaves as you simulate the presence or absence of the variables that make up the model by setting their probabilities to 0 or 1. In the cognitive system, the model is set up depending on the causal interpretation of the conditionals in the premises, and the parameters are set from world knowledge in long-term memory. A simulation involves interrogating some of the variables in the model in order to investi­ gate the behavior of the remaining variables. By interrogation, we mean considering hy­ pothetical observations (i.e., interrogating a model is not to intervene on it in Pearl’s sense; Pearl, 1988). There are many possible interrogations. So if we consider the model in Figure 19.1 (i.e., the model underpinning our account of abstract reasoning and the Cummins paradigm), there are four possible interrogations corresponding to the four con­ ditional inferences. That is, p or q can be set to 0 or 1, and the probability of the other Page 20 of 33

Causal Models and Conditional Reasoning variable can be read off the simulation. Once we move to the models in Figures 19.2 B and 19.2 C, the number of possible interrogations considerably expands. So we can have unary interrogations involving one variable, of which there will be six by analogy with the model in Figure 19.1. But there will also be binary interrogations of variables being (p. 340) set to either 0 or 1. There are three possible binary combinations of each vari­ able. Consequently, there are 12 possible binary interrogations that can be explored to examine the effect on the remaining variable. Each interrogation corresponds to an infer­ ence by Bayesian conditionalization. People are not consciously aware of the process of setting up the appropriate CBN, nor do we think the specific CBNs in Figure 19.1 are available to consciousness. People are aware only of the results of making a particular interrogation or sequence of interroga­ tions.

Interrogation and Working Memory Carrying out inferences may often require a sequence of interrogations of an underlying causal model. So, for example, this is required in responding to the sequence of informa­ tion provided in some discounting inferences. So causal conditionals like (2) and (2') will lead to the construction of the underlying causal model in Figure 19.1 (i.e., the noisy-OR representation). When participants are told that the streets are wet, this corresponds to a unary interrogation of this structure in which the probability of the effect, Pr(q), is set to 1. This will lead to probabilities associated with the p and the r nodes to increase above base rate. Participants then provide their estimate of Pr(p) (i.e., how likely was it to have rained). Participants are then told that the sprinklers are on, which corresponds to a bina­ ry interrogation of the structure in which the probability of one of the causes, r, is set to 1 (Pr(r) = 1) and Pr(q) = 1. They then have to provide another estimate of how likely it is to have rained, Pr(p). In Ali et al.’s (2011) Experiment 1, participants were given the infor­ mation sequentially and were asked for judgments of the probability that it rained (Pr(p)) one after the other. This experiment produced, bar one result, the standard pattern of dis­ counting inferences. However, in Ali et al.’s (2011) Experiment 2, participants were given the same sequence of information, but at the end of the sequence they were only asked for a single judgment of whether the probability that it rained had risen or fallen or stayed the same given the second piece of information that the sprinklers are on. This version of the task imposes a memory load not present in the first version of the task. Moreover, while the results were broadly in line with prediction, there were some differences from Experiment 1. For ex­ ample, there should be no discounting when people do not know whether the streets are wet. According to the CBN approach in the absence of knowledge about whether the ef­ fect occurred, the causes are independent for the common effect structure. However, as we discuss in more detail later (under “Errors”), in this condition people tended to violate independence in Experiment 2, suggesting that the sprinklers being on precludes it from raining, which they did not in Experiment 1. We suggest that this is because of the need in Experiment 2 to store the results of the first interrogation of the model in some further Page 21 of 33

Causal Models and Conditional Reasoning representational medium in working memory (WM) and compare it to the result of the second interrogation. How do people record the results of their interrogations in WM? We suggest that people use a representation like a mental model (see Figure 19.3). A node is recorded as, for ex­ ample, p if the probability that it is true, Pr(p = True), is greater than .5 and ¬p if this probability is less than .5 (i.e., it is more likely in the model that p is false). This represen­ tation shows the variables interrogated in bold. Consequently, the variables in bold also show the nodes on which the probability of the target node p is conditioned. This proba­ bility is added as an annotation.

Figure 19.3 Sequence of annotated mental model like representations created in WM as a result of building the causal model (Figure 19.1 b) corre­ sponding to the conditionals if it rains (p), then the pavements are wet (q) and if the sprinklers are on (r), then the pavements are wet (q) (A), from which the probability of p can be read off as the base rate. See the text for how the mental models are created. The model is then interrogated for the probability that it is raining on the assumption that the sprinklers are on (b). Interrogations on variables are shown as shaded nodes; the probabilities of variables conse­ quent on an interrogation on other variables are shown using nodes with a double border, with the weight of the border proportional to the probability. The priors are set to the normal situation (i.e., it is not normally raining and the sprinklers are not nor­ mally on). Consequently, the priors are low and the streets are probably not wet. The target node is al­ ways p.

In Figure 19.3, the mental model reflects the normal situation where the probabil­ ities of rain or the sprinklers being on, and consequently the pavements being wet, are all at their presumably quite low base rates. Figure 19.3 shows the state of the causal model once it has been interrogated on the assumption that the sprinklers are on (i.e., r = True). Here Pr(p) does not change. The mental model in Figure 19.3 retains the model from the previous interrogations. Unless such cumulative records of what happens at each step in (p. 341)

Page 22 of 33

Causal Models and Conditional Reasoning the interrogation process are retained in working memory, the results will not be avail­ able to make comparisons. The situation in Ali et al.’s (2011) Experiment 2 is probably more like the real world. When reading, for example, we build underlying causal models of described situations. Depending on the author’s intentions, we will receive snippets of information about what happened. Too fully understand what went on will require storing away the potential con­ sequences of each snippet of information to see if they tell a coherent story. The need for a cumulative record of our interrogations of a causal model may also have consequences for the errors that people make.

Errors The need to consider alternative scenarios has long been identified as a source of error in reasoning (e.g., Evans, 2007; Johnson-Laird, 1983). This is the reason for the long-stand­ ing claim in mental models theory that people perform better on reasoning tasks that on­ ly require them to construct a single model. It also underpins Evans’s (2007) singularity principle: the principle that people only consider one possibility at a time. Error may also arise in our current account when people consider more than one possibility. For exam­ ple, in the Byrne paradigm, we argued that people will tend to evaluate the four infer­ ences assuming r is present whether r is interpreted as an alternative cause or a disabler. This in itself is an error given that only p has been asserted categorically. A further possi­ ble source of error might arise from considering two simulations. So, for MP, people may evaluate both Pr(q|p, r) and Pr(q|p, ¬r). But to combine these estimates to get a proper estimate of Pr(q|p) requires accurate estimates of the priors. This would require looking at the model without interrogating it to read off these estimates (as in Figure 19.3). With­ out this further information, errors may result from not combining Pr(q|p, r) and Pr(q|p, ¬r) in accordance with probability theory by marginalizing out r (Pr(q|p) = Pr(q|p, r)Pr(r) + Pr(q|p, ¬r)Pr(¬r)). This kind of potential error arises from taking r into account, even when only p is categorically asserted as a premise. If participants just interrogated the model using p, then this potential error would not arise. Another potential source of error may arise from the notation used to store the possibili­ ties investigated by interrogating a causal model. As we mentioned earlier, violations of independence were observed in Ali et al.’s Experiment 2, but not in their Experiment 1. That is, in Experiment 2, participants seemed to regard rain and the sprinklers being on as negatively correlated. How is this difference to be explained as a function of the simple change in response for­ mat? As we suggested earlier, the response format in Experiment 2 imposes a memory load not present using the format in Experiment 1. It is well documented that relatively minor increases in memory load can disrupt reasoning (e.g., Knauff, 2013). In Experiment 1, there is no need to retain the annotations recording the probability that it rained (Pr(p)), as these are requested immediately after each interrogation.9 In Experiment 2, extra operations are required to record the nature of each interrogation and its result so Page 23 of 33

Causal Models and Conditional Reasoning that these can be compared. In Experiment 2, the memory load leads people to focus on their mental model like cumulative records. Mental models are assumed to be construct­ ed as the interpretations of logical terms. A mental model depicts the possibilities al­ lowed by a logical term, which correspond to the true truth table cases. For example, for inclusive disjunction, p or q, the following mental model is constructed: p q, ¬p q, p ¬q. In the current context, the process is reversed, a set of mental models is created as a conse­ quence of interrogating a causal model, and people propose the best verbal description of these possibilities. Such a description would be required for someone to communicate the results of their deliberations to others. This process may provide an explanation of Ali et al.’s (2011) data. We showed the series of interrogations required for the case where it is not known whether the pavements are wet in Figure 19.3. The final piece of information people are given is that the sprinklers are on. But they could just as well have been told that it is raining (i.e., this was an arbi­ trary choice). Consequently, a final model, p ¬r q, could be added. Together with this model, the set of models in Figure 19.3 provides the mental model for the linguistic state­ ment if p or r then q but with “or” ambiguous between the inclusive or the exclusive (XOR) reading. That is, a fourth possibility could (p. 342) be p r q (inclusive) or p r ¬q (exclusive). The XOR interpretation would lead participants to assume that if one cause is present, the other must be absent. Interpreted this way, without regard to the causal in­ terpretation, participants would seemingly violate independence (i.e., indicate that since it is known the sprinklers are on, it cannot be raining, even if they do not know whether the pavements are wet). It could be argued that this interpretation follows from the nature of the integration rule in the causal model. However, by hypothesis, this information is not consciously available and has to be recovered by running simulations. This is rather like coming across a HUGIN model compiled by someone else and trying to discover the integration rule by tri­ al and error.

Alternative Explanations The account in the last section may provide a better explanation than alternatives for the difference between Ali et al.’s (2011) Experiments 1 and 2. The observation of indepen­ dence violations in the condition where it is not known whether the pavements are wet seems to constitute a violation of the Markov condition (Rehder, 2014). Such violations have been observed before (Mayrhofer & Waldmann, 2015; Rehder & Burnett, 2005; Walsh & Sloman, 2008). Potential explanations include the idea that underlying the pat­ tern of relations between observable variables is a hidden causal mechanism connecting all variables (Rehder & Burnett, 2005). This hidden mechanism can be represented as a further node in the causal model connected generatively or preventively to all the other observable variables. These connections could explain the relations between observable variables that otherwise should be independent. Indeed there are a variety of related at­ tempts to explain away apparent violations of the Markov condition (see Rehder, 2014). However, a common factor in these explanations is that these violations are explained by Page 24 of 33

Causal Models and Conditional Reasoning additional causally relevant factors from world knowledge being incorporated in the causal model. This is exactly the explanatory strategy used in the psychology of reasoning to explain the implicit suppression effects in Cummins et al.’s (1991) experimental para­ digm and a variety of other results, whether modeled in causal models or not. However, if these additional causal factors are embedded in the causal model, then both Experiments 1 and 2 of Ali et al. (2011) should have been equally affected. So, the second interrogation of the model in Experiment 1, even when it is not known that the streets are wet, should lead to a lower value of the probability that it is raining given you now know that the sprinklers are on. But the violation of independence for this condition was only significant in Experiment 2. Another possibility is that observable variables that are represented as independent in the model are actually related in the real world, albeit not causally. It has been shown that a significant group of participants respond associatively in causal reasoning tasks, ig­ noring most causal structure (Rehder, 2014). For example, although the sprinklers could be on while it is raining, typically they are only turned on when it is not raining, leading to an inverse correlation. However, for instrumental causes, such inverse correlations may not arise. For example, one would usually only try jump starting a car once the key had been turned and the car had not started (i.e., key turning and jump starting generally co-occur in the same time frame). A question that also arises is at what point in the rea­ soning process does this world knowledge have its effects? We have assumed so far that world knowledge affects the construction of the causal model. If non-causal correlational knowledge is included in people’s actual mental representations at this stage, then the same objection as to the causal mechanism proposal applies. Both Experiments should have been equally affected. Another possibility is that associative knowledge acts earlier in the reasoning process, that is, these are responses triggered by the fast System 1 (Evans, 2007; Kahneman, 2011; Stanovich, 2011). A problem for this proposal is that in causal reasoning, associa­ tive responders do not respond more quickly than causal responders and careful delibera­ tive thinking, required to justify answers, still often leads to what are considered associa­ tive responses (Rehder, 2014). The current proposal is consistent with this finding insofar as forming a verbal description of the interrogations of the causal model occurs late in the reasoning process and requires mental effort against the background of the increased memory load in Ali et al.’s (2011) Experiment 2. That is, this is a systematic error pro­ duced by System 2, the slow, apparently analytic system. Further speculation is probably premature until these correlations have been experimentally tested and Ali et al.’s (2011) results replicated. In summary, considering the further cognitive processes required to use causal models as simulations of the situations described by the premises in a causal conditional inference problem seems to provide a novel hypothesis about the source of errors using the discounting paradigm in causal conditional reasoning.

Page 25 of 33

(p. 343)

observed

Causal Models and Conditional Reasoning

Conclusions We have argued that some of the deep intuitions about the meaning of conditional state­ ments, that people believe that they relate to probabilistic law-like dependencies, are cap­ tured by causal Bayes nets. Consequently, treating this formalism, as in causal models, as an account of the mental representations underlying conditional reasoning may be a pro­ ductive conjecture. This conjecture was borne out in a variety of results. First, we showed that causal considerations can provide a unified account of abstract and causal condition­ al reasoning. Second, we showed that Fernbach and Erb’s (2013) recent model can be ex­ tended to the explicit causal conditional reasoning paradigm (Byrne, 1989), making some novel predictions on the way. Third, we showed that, when embedded in the broader cog­ nitive system involved in reasoning, causal model theory can provide a novel explanation for apparent violations of the Markov condition in causal conditional reasoning (Ali et al., 2011). Moreover, we related this account to alternative explanations in the literature on causal reasoning (see Rehder, 2014). The latter extension to consider the role that causal models play in the broader cognitive system has several advantages. First, it suggests an integrative role for causal models and mental models. Second, it seems more consistent with Rehder’s (2014) recent finding that apparently associative responses in causal discounting take time and cognitive ef­ fort. Third, it suggests that some errors in human reasoning may arise because of our need to summarize the results of our mental processes in language in order to communi­ cate them to others. Consequently, rather like decision-making (see, e.g., Monteiro, Vas­ concelos, & Kacelnik, 2013), animals may show normative performance in causal dis­ counting, while humans occasionally do not. Fourth, it suggests a novel source of error in human reasoning emanating from System 2 rather than System 1 (for more discussion, see Oaksford & Hall, 2016). There have been arguments put forward, however, that performance with conditionals, if cause (c) then effect (e), does not map well onto performance with causal statements, c causes e (Rips, 2011; Sloman & Lagnado, 2005). We have discussed the problems identi­ fied by Sloman and Lagnado (2005) before (Ali et al., 2011). It would be beyond the scope of this chapter to attempt a detailed response to Rips’s (2011) objections, but we feel that the differences might be more related to linguistic and pragmatic differences about when it would be appropriate to use each formulation, and we do not view these objections as a fundamental threat to the underlying conjecture that conditionals relate to probabilistic law-like dependencies. An extension of this account will also be required to account for deontic or more general­ ly utility conditionals (Bonnefon, 2009), for example, the conditional underlying the slip­ pery slope argument if we legalize cannabis, then we will have to legalize heroin (Corner, Hahn, & Oaksford, 2011). The argument suggests that the antecedent action should not be taken because of the dire consequences of the consequent action to which it will inex­ orably lead. The conditional clearly suggests there is some dependency between these two actions. What is required to further understand this inference is the incorporation of Page 26 of 33

Causal Models and Conditional Reasoning utilities, which might be achieved by generalizing Bayes nets to influence diagrams incor­ porating value nodes (Howard & Matheson, 1981; Schacter, 1986). In conclusion, while further work is clearly required, as we and others have argued (Chater & Oaksford, 2006; Fernbach & Erb, 2013; Oaksford & Chater, 2010a,c, 2013; Rottman & Hastie, 2014; Sloman, Barbey, & Hotaling, 2009), the conjecture that condi­ tional reasoning is underpinned by representations and processes similar to causal mod­ els would seem to be a productive line of research.

References Adams, E. W. (1998). A primer of probability logic. Stanford, CA: CLSI Publications. Ali, N., Chater, N., & Oaksford, M. (2011). The mental representation of causal condition­ al inference: Causal models or mental models. Cognition, 119, 403–418. Ali, N., Schlottmann, A., Shaw, C., Chater, N., & Oaksford, M., (2010). Conditionals and causal discounting in children. In M. Oaksford & N. Chater (Eds.), Cognition and condi­ tionals: Probability and logic in human thinking (pp. 117–134). Oxford: Oxford University Press. Bonnefon, J. F. (2009). A theory of utility conditionals: Paralogical reasoning from decision theoretic leakage. Psychological Review, 116, 888–907. Byrne, R. M. (1989). Suppressing valid inferences with conditionals. Cognition, 31, 61–83. doi:10.1016/0010-0277(89)90018-8 Byrne, R. M. J., Espino, O., & Santamaria, C. (1999). Counterexamples and the suppres­ sion of inferences. Journal of Memory & Language, 40, 347–373. Cartwright, N. (1999). The dappled world: A study of the boundaries of science, Cam­ bridge: Cambridge University Press. Cartwright, N. (2001). What is wrong with Bayes nets? The Monist, 84, 242–264. Chater, N., & Oaksford, M. (2006). Mental mechanisms. In K. Fiedler, & P. Juslin (Eds.), Information sampling and adaptive cognition (pp. 210–236). Cambridge: Cambridge Uni­ versity Press. Cheng, P. W. (1997). From covariation to causation: A causal power theory. Psychological Review, 104, 367–405. Chisolm, R. (1946). The contrary-to-fact conditional. Mind, 55, 289–307. Corner, A., Hahn. U., & Oaksford, M. (2011). The psychological mechanism of the slippery slope argument. Journal of Memory and Language, 64, 133–152. Cruz, N., Baratgin, J., Oaksford, M., & Over, D. E. (2015). Bayesian reasoning with ifs and ands and ors. Frontiers in Psychology, 6, 192. doi:10.3389/fpsyg.2015.00192 Page 27 of 33

Causal Models and Conditional Reasoning Cummins, D. D. (1995). Naïve theories and causal deduction. Memory & Cognition, 23, 646–658. Cummins, D. D. (2014). The impact of disablers on predictive inference. Journal of Experi­ mental Psychology: Learning, Memory, and Cognition, 40, 1638–1655. doi:10.1037/ xlm0000024 Cummins, D. D., Lubart, T., Alksnis, O., & Rist, R. (1991). Conditional reasoning and cau­ sation. Memory & Cognition, 19, 274–282. Dagum, P., & Luby, M. (1993). Approximating probabilistic inference in Bayesian belief networks is NP-hard. Artificial Intelligence, 60, 141–153. Dieussaert, K., De Neys, W., & Schaeken, W. (2005). Suppression and belief revision, two sides of the same coin? Psychologica Belgica, 45, 29–46. doi:10.5334/pb-45-1-29 Edgington, D. (1995). On conditionals. Mind, 104, 235–329. Elqayam, S., & Over, D. E. (2013). New paradigm psychology of reasoning: An introduc­ tion to the special issue edited by Elqayam, Bonnefon, and Over. Thinking & Reasoning, 19, 249–265. doi:10.1080/13546783.2013.841591 Evans, J. St. B. T. (2007). Hypothetical thinking. Hove, East Sussex: Psychology Press. Evans, J. St. B. T., Handley, S.H., Over, D.E. (2003). Conditionals and conditional probabili­ ty. Journal of Experimental Psychology: Learning, Memory, & Cognition, 29, 321–355. Fernbach, P. M., Darlow, A., & Sloman, S. A. (2010). Neglect of alternative causes in pre­ dictive but not diagnostic reasoning. Psychological Science, 21, 329–336. doi: 10.1177/0956797610361430. Fernbach, P. M., Darlow, A., & Sloman, S. A. (2011). Asymmetries in predictive and diag­ nostic reasoning. Journal of Experimental Psychology: General, 140, 168–185. doi: 10.1037/a0022100. Fernbach, P. M., & Erb, C. D. (2013). A quantitative causal model theory of conditional reasoning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 39, 1327–1343. doi:10.1037/a0031851 Fugard, J. B., Pfeifer, N., Mayerhofer, B., & Kleiter, G. (2011). How people interpret condi­ tionals: Shifts toward the conditional event. Journal of Experimental Psychology: Learn­ ing, Memory, & Cognition, 37, 635–648. Geiger, S. M., & Oberauer, K. (2007). Reasoning with conditionals: Does every counterex­ ample count? It’s frequency that counts. Memory & Cognition, 35, 2060–2074. Goodman, N. (1947). The problem of counterfactual conditionals. Journal of Philosophy, 44, 113–128. Page 28 of 33

Causal Models and Conditional Reasoning Goodman, N. (1955). Fact, fiction, and forecast. Cambridge, MA: Harvard University Press. Griffiths, T. L., & Tenenbaum, J. B. (2005). Structure and strength in causal induction. Cognitive Psychology, 51, 354–384. Howard, R. A., & Matheson, J. E. (1981). Influence diagrams. In R. A. Howard and J. E. Matheson (Eds.), Readings on the principles and applications of decision analysis (Vol. II, pp. 719–762). Menlo Park, CA: Strategic Decisions Group. Jeffrey, R. (1992). Probability and the art of judgment. Cambridge: Cambridge University Press. Johnson-Laird, P. N. (1983). Mental models. Cambridge: Cambridge University Press. Johnson-Laird, P. N., & Byrne, R. J. (2002). Conditionals: A theory of meaning, pragmatics, and inference. Psychological Review, 109, 646–678. doi:10.1037/0033-295X.109.4.646 Kahneman, D. (2011). Thinking, fast and slow. London: Penguin Books. Knauff, M. (2013). Space to reason: A spatial theory of human thought. Cambridge, MA: MIT Press. Krzyżanowska, K. K., Wenmackers, S. S., & Douven, I. I. (2013). Inferential conditionals and evidentiality. Journal of Logic, Language and Information, 22, 315–334. doi:10.1007/ s10849-013-9178-4 (p. 345)

Lewis, D. (1973). Counterfactuals. Cambridge, MA: Harvard University Press

Mayrhofer, R., & Waldmann, M. R. (2015). Agents and causes: Dispositional intuitions as a guide to causal structure. Cognitive Science, 39, 65–95. Monteiro, T., Vasconcelos, M., Kacelnik, A. (2013). Starlings uphold principles of econom­ ic rationality for delay and probability of reward. Proceedings of the Royal Society B, 280, 20122386 doi:10.1098/rspb.2012.2386 Novick, L. R., & Cheng, P. W. (2004). Assessing interactive causal influence. Psychological Review, 111, 455–485. doi:10.1037/0033-295X.111.2.455 Nute, D. (1984). Conditional logic. In D. Gabbay, and F. Guenthner (Eds.), Handbook of philosophical logic (Vol. 11, pp. 387–439). Dordrecht: D. Reidel. Oaksford, M. & Chater, N. (1994). A rational analysis of the selection task as optimal data selection. Psychological Review, 101, 608–631. Oaksford, M., & Chater, N. (2003a). Conditional probability and the cognitive science of conditional reasoning. Mind & Language, 18, 359–379.

Page 29 of 33

Causal Models and Conditional Reasoning Oaksford, M., & Chater, N. (2003b). Computational levels and conditional reasoning: Re­ ply to Schroyens and Schaeken (2003). Journal of Experimental Psychology: Learning, Memory & Cognition, 29, 150–156. Oaksford, M. & Chater, N. (2003c). Probabilities and pragmatics in conditional inference: Suppression and order effects. In D. Hardman & L. Macchi (Eds.) Thinking: Psychological perspectives on reasoning, judgement and decision making. Chichester, UK: John Wiley & Sons. Oaksford, M., & Chater, N. (2007). Bayesian rationality. Oxford: Oxford University Press. Oaksford, M., & Chater, N. (2008). Probability logic and the Modus Ponens–Modus Tollens asymmetry in conditional inference. In N. Chater, & M. Oaksford (Eds.), The probabilistic mind: Prospects for Bayesian cognitive science (pp. 97–120). Oxford: Oxford University Press. Oaksford, M., & Chater, N. (2009). Precis of “Bayesian rationality: The probabilistic ap­ proach to human reasoning.” Behavioral and Brain Sciences, 32, 69–121. Oaksford, M., & Chater, N. (2010a). Causation and conditionals in the cognitive science of human reasoning. Open Psychology Journal, 3, 105–118. Oaksford, M., & Chater, N. (2010b). Conditionals and constraint satisfaction: Reconciling mental models and the probabilistic approach? In M. Oaksford & N. Chater (Eds.), Cogni­ tion and conditionals: Probability and logic in human thinking (pp. 309–334). Oxford: Ox­ ford University Press. Oaksford, M., & Chater, N. (2010c). Cognition and conditionals: An introduction. In M. Oaksford & N. Chater (Eds.), Cognition and conditionals: Probability and logic in human thinking (pp. 3–36). Oxford: Oxford University Press. Oaksford, M., & Chater, N. (2013). Dynamic inference and everyday conditional reasoning in the new paradigm. Thinking and Reasoning, 19, 346–379. doi: 10.1080/13546783.2013.808163 Oaksford, M., Chater, N., & Larkin, J. (2000). Probabilities and polarity biases in condi­ tional inference. Journal of Experimental Psychology: Learning, Memory & Cognition, 26, 883–899. Oaksford, M., & Hall, S. (2016). On the source of human irrationality. Trends in Cognitive Sciences, 20, 336–344. doi:10.1016/j.tics.2016.03.002 Oberauer, K., & Wilhelm, O. (2003). The meaning(s) of conditionals: Conditional probabili­ ties, mental models, and personal utilities. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 680–693. doi:10.1037/0278-7393.29.4.680

Page 30 of 33

Causal Models and Conditional Reasoning Over, D. E., Hadjichristidis, C., Evans, J. St. B. T., Handley, S. J., & Sloman, S. A. (2007). The probability of causal conditionals. Cognitive Psychology, 54, 62–97. doi:10.1016/ j.cogpsych.2006.05.002 Pearl, J. (1988). Probabilistic reasoning in intelligent systems. San Mateo: Morgan Kauf­ mann. Pearl, J. (2000). Causality: Models, reasoning and inference. Cambridge: Cambridge Uni­ versity Press. Ramsey, F. P. (1931). The foundations of mathematics and other logical essays. London: Routledge and Kegan Paul. Rehder, B. (2014). Independence and dependence in human causal reasoning. Cognitive Psychology, 72, 54–107. doi:10.1016/j.cogpsych.2014.02.002 Rehder, B. (2015). The role of functional form in causal-based categorisation. Journal of Experimental Psychology: Learning, Memory and Cognition, 41, 670–692. doi:10.1037/ xlm0000048 Rehder, B., & Burnett, R. C. (2005). Feature inference and the causal structure of object categories. Cognitive Psychology, 50, 264–314. doi:10.1016/j.cogpsych.2004.09.002 Rips, L. J. (1994). The psychology of proof: Deductive reasoning in human thinking. Cam­ bridge, MA: MIT Press. Rips, L. J. (2011). Lines of thought. Oxford: Oxford University Press. Rottman, B., & Hastie, R. (2014). Reasoning about causal relationships: Inferences on causal networks. Psychological Bulletin, 140, 109–139. doi:10.1037/a0031903 Shachter, R. D. (1986). Evaluating influence diagrams. Operations Research, 34, 871–882. doi:10.1287/opre.34.6.871 Sloman, S. A. (2005). Causal Models: How people think about the world and its alterna­ tives. New York: Oxford University Press. Sloman, S., Barbey, A., & Hotaling, J. (2009). A causal model theory of the meaning of cause, enable, and prevent. Cognitive Science, 33, 21–50. Sloman, S., & Lagnado, D. (2005). Do we “do?” Cognitive Science, 29, 5–39. Sobel, J. H. (2004). Probable modus ponens and modus tollens and updating on uncertain evidence. Unpublished manuscript, Department of Philosophy, University of Toronto, Scarborough. (www.scar.toronto.ca/~sobel/ConfDisconf.pdf). Sober, E. (2002). Intelligent design and probability reasoning. International Journal for Philosophy of Religion, 52, 65–80.

Page 31 of 33

Causal Models and Conditional Reasoning Stalnaker, R.C. (1968). A theory of conditionals. In N. Rescher (Eds.), Studies in logical theory (pp. 98–112). American Philosophical Quarterly Monograph Series, 2. Oxford: Blackwell. Stalnaker, R. C. (1984). Inquiry. Cambridge, MA: MIT Press. Stanovich, K. E. (2011). Rationality and the reflective mind. Oxford: Oxford University Press. Stenning, K., & van Lambalgen, M. (2005). Semantic interpretation as computation in nonmonotonic logic: The real meaning of the suppression task. Cognitive Science, 29, 919–960. doi:10.1207/s15516709cog0000_36 Suppe, F. (1977). The structure of scientific theories. Chicago: University of Illinois Press. Waldmann, M. R. (2007). Combining versus analyzing multiple causes: How domain as­ sumptions and task context affect integration rules. Cognitive Science, 31, 233–256. (p. 346)

Waldmann, M. R., & Holyoak, K. J. (1992). Predictive and diagnostic learning with­

in causal models: Asymmetries in cue competition. Journal of Experimental Psychology: General, 121, 222–236. doi:10.1037/0096-3445.121.2.222 Waldmann, M. R., & Martignon, L. (1998). A Bayesian network model of causal learning. In M. A. Gernsbacher & S. J. Derry (Eds.), Proceedings of the 20th annual conference of the Cognitive Science Society (pp. 1102–1107). Mahwah, NJ: Lawrence Erlbaum Asso­ ciates. Walsh, C. R., & Sloman, S. A. (2008). Updating beliefs with causal models: Violations of screening off. In G. H. Bower, M. A. Gluck, J. R. Anderson, & S. M. Kosslyn (Eds.), Memory and mind: A festschrift for Gordon Bower (pp. 345–358). Hove, East Sussex: Psychology Press.

Notes: (1.) This is in spite of our initial skepticism that Bayes nets could be psychologically rele­ vant to conditional reasoning due to computational tractability issues (Dagum & Luby, 1993; Oaksford & Chater, 1994). (2.) As we said, whether the inference is backed up by a real dependency in the world or the belief in a dependency is a mere projection of our habits of inference onto the world, as Hume argued, need not concern us from a psychological perspective. However, to the extent that our habits of inference lead to successful behavior in the real world, our confi­ dence in a realist interpretation of these dependencies will no doubt increase. (3.) For example, non-interference conditionals: “If it snows in July, the government will fall,” biscuit conditionals: “If you are hungry, there are biscuits on the table” (although

Page 32 of 33

Causal Models and Conditional Reasoning here there is a dependency between the antecedent and the “relevance” of the conse­ quent), and Dutchman conditionals: “If John passes the exam, I’m a Dutchman.” (4.) This assumption actually turns out to explain an interesting feature of earlier model fits (Oaksford & Chater, 2007) in which very good fits were obtained on the assumption that Pr0(p) ≈ .5. It turns out that if the revised value of Pr0(q|p) is the same assuming ei­ ther a p, ¬q counterexample or a ¬p, q counterexample, then Pr0(p) = .5. (5.) This is the only study we could find directly comparing these paradigms, although the comparison was only made for the MP inference. (6.) This difference between paradigms also marks the difference between the old as­ sumption-based framework (Byrne), i.e., assume the premises are true and assess whether the inference is valid, vs. the new belief-based paradigm (Cummins) (see, Over, Chapter 18 in this volume). (7.) Novick and Cheng (2004) also discuss at length how such a model can be extended to cases where causes, generative or preventive, have interactional causal effects. (8.) Augmentation also occurs for a common cause structure, i.e., (3) and (3’), when it is not known whether the cause occurred. (9.) But, as we suggested earlier, the normal situation is probably more like Experiment 2, where a cumulative record of previous interrogations is retained in order to interpret up­ coming information, like constructing a discourse model in sentence comprehension.

Mike Oaksford

Birkbeck College University of London London, England, UK Nick Chater

Behavioural Sciences Group, Warwick Business School Warwick University Coventry, England, UK

Page 33 of 33

Concepts as Causal Models: Categorization

Concepts as Causal Models: Categorization   Bob Rehder The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.39

Abstract and Keywords This chapter evaluates the case for treating concepts as causal models, the view that peo­ ple conceive of a categories as consisting of not only features but also the causal relations that link those features. In particular, it reviews the role of causal models in categoriza­ tion, the process of inferring an object’s category membership by observing its features. Reviewed studies include those testing categories that are either real world or artificial (made up of the experimenters) and subjects that are either adults or children. The chap­ ter concludes that causal models provide accounts of causal-based categorization judg­ ments that are superior to alternative accounts. Keywords: causal model, categorization, real world, testing, category

As long as there have been theorists interested in thinking, their thoughts have turned to the nature of concepts. Concepts—mental representations of categories of objects in the world—make up much of the contents of our thoughts. If you’re not thinking one now, then wait a second and one will come along. And just as long as there has been empirical psychology, experiments have been conducted to uncover how concepts are learned, rep­ resented, and reasoned with. The history of this venerable line of research is well re­ hearsed (Komatsu, 1992; Murphy, 2002; Smith & Medin, 1981). Many early experiments tested novel perceptual categories—categories consisting of collections of dot patterns, geometric shapes, or schematic faces. This practice, which arose in part because such materials provide experimental control—investigators could teach brand new categories that were isolated from anything the subject already knew—led to a panoply of sophisti­ cated learning models based on rules, prototypes, or exemplars. But by the 1980s a num­ ber of theorists observed that this isolation meant that such materials lacked a key prop­ erty of many real-world categories, namely, that features are related to other features, and concepts are related to other concepts, in intricate ways (e.g., Carey, 1985; Keil, 1989, 1995; Murphy, 1993; Murphy & Medin, 1985; Wellman & Gelman, 1992). This in­ sight was accompanied by demonstrations of how those relations have profound effects on virtually every judgment involving categories.

Page 1 of 51

Concepts as Causal Models: Categorization The “theory-based” view thus joined the other views of the mental representation of con­ cepts. Yet, for many years it remained the club’s poor man because it lacked something the others had: namely, a plausible formal computational model from which precise pre­ dictions could be derived and tested. This has now changed with the advent of proposals regarding how intricate networks of interrelated conceptual knowledge can be represent­ ed and processed computationally (e.g., Kemp & Tenenbaum, 2009; Rehder, 2003a; Re­ hder & Murphy, 2003; Rogers & McClelland, 2004, 2011). One such proposal—and the fo­ cus of this chapter—is based on adapting the well-known formalism of Bayesian networks or causal graphical models (Glymour, 2001; Pearl, 1988, 2000; Rottman, Chapter 6 in this volume; Sloman, 2005; Spirtes, Glymour, & Scheines, 2000 (p. 348) ), to serve as a theory of concepts, in particular, to model the causal relations that permeate our conceptual knowledge. For example, on this view, people conceive of a category as consisting of not only a set of features (or a set of exemplars, each with a set of features), but also the causal relations that link those features. Note that there is no claim that this framework captures all conceptual knowledge, as not all knowledge is causal (and not all knowledge can be represented as a graphical model). Yet, findings such as those of Ahn, Marsh, Luh­ mann and Lee (2002), who reported that 52% of inter-feature relations in real-world cate­ gories are thought by people to be causal, suggest that causal models can go a long way to capturing the empirical regularities that arise from conceptual relations. This chapter and a companion (“Concepts as Causal Models: Induction,” Chapter 21 in this volume) evaluate the case for treating concepts as causal models (also see Danks, 2014). Two central topics in research on concepts are reviewed. This chapter reviews cat­ egorization, the process of inferring an object’s category membership given its features. The companion chapter reviews category-based induction, the process of inferring fea­ tures given an object’s category membership. Category-based induction, in turn, consists of drawing inferences about both objects and categories; in the former case, one infers the features of a particular object, whereas in the latter, one generalizes a feature to a category (and thus most or many of its members). How causal knowledge influences how categories are formed in the first place—causal-based category discovery—is also exam­ ined. Existing causal model accounts of these phenomena have names; in my own work I refer to a model of the effects of causal knowledge on categorization as the generative model and one of the effects on induction as category-based generalization. But because they presume the same basic representations and inferential processes, these chapters present these phenomena as aspects of the same underlying representation. These chapters are related to a number of others in this volume. Predicting features will be equated to drawing conditional probabilities on the basis of causal information, and so this topic connects to chapters that discuss causal reasoning more generally (e.g., Meder & Mayrhofer, Chapter 23; Oaksford & Chater, Chapter 19). Analogies are an important type of inductive inference considered in Holyoak and Lee (Chapter 24). Learning the causal structure of categories will partially depend on inducing causal relations on the basis of observed data, a topic considered here by numerous authors (e.g., Cheng & Lu,

Page 2 of 51

Concepts as Causal Models: Categorization Chapter 5; Griffiths, Chapter 7; Perales, Catena, Cándido, & Maldonado, Chapter 3; Rottman, Chapter 6). Before launching into empirical findings regarding categories as causal models, I start by considering the relationship between causal models and the categories they aim to repre­ sent in order to paint a complete picture of how concepts might be causal models.

Concepts as Causal Models As contributions to this collection show, “causal model” typically refers to a representa­ tion that includes variables and the causal relationships between them. For example, a category’s causally related features might be represented by the causal model shown in Figure 20.1 a. It is worth reviewing some of the key properties of such graphs. First, they may be incomplete because variables are allowed to have external causes and effects not shown in the graph (so long as the external causes aren’t correlated; more about this lat­ er). This is an advantageous property because it is safe to say that there are few com­ pletely self-contained causal systems. Second, in their most basic form causal models do not represent temporal information (e.g., regarding the time course of how causes yield their effects) or causal cycles (variables that mutually influence one (p. 349) another). Third, one may, but need not, represent many of the details of the causal relations them­ selves, such as whether they are generative or inhibitory, their strength, or, when a vari­ able has multiple incoming links, how those causal influences combine. Without such in­ formation, the graphs alone make non-trivial predictions regarding patterns of condition­ al independence among variables, patterns that are sensitive to the direction of causality. But when a theorist is also willing to flesh out the graph with additional assumptions (e.g., that links are generative, that they integrate according to a “noisy-OR” function, that they have a certain strength), quantitative predictions are easily derived. The conclu­ sion of the companion chapter on induction will briefly touch on dynamic versions of these networks that represent cycles and temporal information as well as how one might repre­ sent uncertainty about the components of a model (e.g., the strength of a causal relation).

Page 3 of 51

Concepts as Causal Models: Categorization

Figure 20.1 (a) A schematic representation of a category’s features. (b) Those features delineated in­ to those inside the category’s causal model (intrinsic features) and those not (inferential features).

Whereas the preceding points apply to any application of causal models, an additional is­ sue that arises when applying them to concepts concerns which variables should be treat­ ed as “features” of the concept. Beavers, for example, have some uncontroversial fea­ tures: they are nocturnal, brown, and semi-aquatic. Yet, if we start drawing a causal net­ work of their features, we would likely include something that beavers cause, namely dams. Focusing on dams, we might remember that beaver dams often cause flooding. And that flooding often causes (in populated areas) property damage, which causes an in­ crease in flood insurance, and so forth. But while on the list of features of beavers most would include “brown” and many might include “build dams,” none would include in­ creased flood insurance. We could follow beavers’ features backward in time as well, first to their DNA and then their long evolutionary history that led to the features’ presence. Analogous examples arise with artifacts (e.g., computers have distal downstream effects such as the health problems that arise when they are deposited into African dumps and distal upstream causes such as the designers’ desire to express creativity or make mon­ ey). Clearly then, causal models of concepts must be delineated, because if they’re not, then the persistent forward and backward tracing of causal relations will result in their swal­ lowing everything, so that we’re left with the one causal model of everything (Oppen­ heimer, Tenenbaum, & Krynski, 2013; Rehder, 2007, 2010). If causal models are to serve as useful stand-ins for concepts, there must at least be a one-to-one relation between the two, which entails that not everything that is causally related to a concept’s features is a feature of the concept. Figure 20.1 b, for example, delineates the features of the example concept into those that are, in a sense, “part” of the concept and those that are not. De­ Page 4 of 51

Concepts as Causal Models: Categorization lineation does more than just enforce a one-to-one relationship between models and con­ cepts. Although a causal model can be used to compute the degree of evidence that a fea­ ture provides for category membership using the standard inferential processes that ac­ company causal models, which process is used depends on the type of feature. The claim will be that features that are part of the causal model (i.e., those that are inside the box in Figure 20.1 b) provide direct evidence for category membership. They will be referred to as intrinsic features for this reason. The rest provide indirect evidence because they are used to infer the intrinsic ones; these will be referred to as inferential features. The con­ sequence of this distinction is that different phenomena are implied for different features. For example, it will be shown that intrinsic but not inferential features exhibit something called coherence effects, and that inferential but not intrinsic features exhibit explaining away and screening off (both are defined later). Thus, where the boundary in Figure 20.1 b is drawn has important behavioral consequences.

Causal-Based Categorization This chapter now reviews the ways that causal knowledge has been shown to influence judgments of category membership. Other reviews of causal-based categorization exist (Ahn & Kim, 2001; Oppenheimer et al., 2013; Rehder, 2010); this section builds and ex­ pands on this work. As usual, a categorization judgment is modeled as some function from potential sources of evidence (an object’s observed features) onto category labels. But, as mentioned, the features in Figure 20.1 b provide evidence of category member­ ship in different ways. The following section presents studies supporting the view that such evidence is computed from intrinsic features by estimating the likelihood that they were generated by the category’s causal model. Then the subsequent section presents those suggesting that evidence is derived from inferential features via some of the stan­ dard causal reasoning procedures that accompany causal models.

Part 1: Categorization with Intrinsic Features For historical reasons, the presentation of how causal relations among intrinsic features affect (p. 350) classification is divided into two subtopics. This structure is dictated by the type of questions traditionally asked in empirical studies. One subtopic concerns the im­ portance of features considered individually. Ever since Rosch and Mervis (1975), a cen­ tral question in research on concepts has been how and why features vary in the amount of evidence they provide for category membership (also see Hampton, 1979; Reed, 1972). Rosch and Mervis emphasized the statistical information inherent in the environment re­ garding how features and categories covary. One factor that increases feature f’s impor­ tance or weight is its category validity, the probability of f in members of category k, here­ after referred to as . Another is its cue validity, the extent that the feature helps de­ termine which category (of a set of possible categories) an object belongs to (Hampton, 1995; Sloman, Love, & Ahn, 1998). Numerous others factors have since been identified, including features’ salience (Lamberts, 1995, 1998; Sloman et al., 1998), the extent they support further inferences (Ross, 1997, 1999, 2000), and whether they are functional, that Page 5 of 51

Concepts as Causal Models: Categorization is, whether they serve some goal or purpose associated with the category (e.g., a tiger has claws in order to catch prey; Chaigneau, Barsalou, & Sloman 2004; Kemler-Nelson et al., 1995; Lin & Murphy, 1997; Lombrozo & Rehder, 2012; Murphy & Wisniewski, 1989; Puebla & Chaigneau, 2014; Wisniewski, 1995; cf. Ahn, 1998). The question here is how features’ importance varies as a function of their position in the category’s network of causally related features. Another subtopic is how certain combinations of features make for better category mem­ bers. There is precedent for considering these two different types of effects. For example, Medin and Schaffer (1978) distinguished independent cue models, in which each feature provides an independent source of evidence for category membership (prototype models are an example), from interactive cue models, in which a feature’s influence also depends on what other features are present (exemplar models are an example). Of course, where­ as these models are traditionally concerned with how feature weights and interactions are learned from category members observed in the past, here the question is how they are affected by inter-feature causal relations. This chapter will show that the predictions derived from causal models make them very much a type of interactive cue model. It is important to emphasize that the effects on feature weights and feature interactions of causal knowledge are not mutually exclusive. Instead, feature weights and interactions can be analyzed as two orthogonal effects (corresponding to “main effects” and “interac­ tions” in analysis of variance, respectively). Theoretically, then, the presence of causal re­ lations can affect just features, just interactions, both, or neither. In fact, most of the studies reviewed here will reveal that they affect both. Because they have been shown to be often larger (Rehder, 2010), I start with feature in­ teractions. The key finding is the existence of coherence effects—the fact that objects are considered more likely category members (all else being equal) to the extent they are consistent with (i.e., cohere with) the category’s inter-feature causal relations. The sec­ tion on features weights that follows will review evidence regarding the importance of in­ dividual features as a function of their causal role (e.g., whether they are causes or ef­ fects).

Interacting Features: The Coherence Effect This section begins with the coherence effects that have been observed in artificial cate­ gories (i.e., based on materials made up by the experimenters) and then turns to those observed with natural categories. It closes with some special considerations that arise in the case of artifacts, ones that elaborate coherence effects in a way that further expli­ cates the distinction between intrinsic and inferential features.

Artificial Categories The coherence effect is defined as an interaction in which the importance of a feature to judgments of category membership depends on the state of other features. It is easy to see that inter-feature causal relations might induce such interactions because they direct classifiers’ attention to entire constellations of causally related features. In an initial Page 6 of 51

Concepts as Causal Models: Categorization demonstration, Rehder and Hastie (2001, Experiment 2) instructed undergraduates on one of a number of artificial categories with four binary feature dimensions. One feature on each dimension was described to be typical of the category and the typical features were described as forming a common cause network in which one feature was the cause of the other three. For example, some subjects were informed of a type of star named Myastars and how one feature of Myastars (ionized helium) caused three others (hot tem­ perature, high density, a large number of planets). Myastars were one member of a set of six experimental categories that included (p. 351) biological kinds (species of ants and shrimp), non-living natural kinds (types of stars and molecules), and artifacts (types of cars and computers). A description of the causal mechanism associated with each causal link was also provided (see Table 20.1 for Myastars). A schematic of the causal network formed by the features is depicted in Figure 20.2, in which the cause feature is referred to as X and the effects as Y1, Y2, and Y3. Subjects were then presented with the 16 test items that can be formed on four binary dimensions and were asked to rate, on a 100point scale, how likely each was a category member. Table 20.1 One of the Experimental Categories from Rehder and Hastie (2001) Features

Causal Relationships

Ionized heli­ um [X] Very hot [Y1] High density [Y2] Large num­ ber of plan­ ets [Y3]

Ionized helium causes the star to be very hot. Ionized helium partici­ pates in nuclear reactions that release more energy than the nuclear reactions of normal hydrogen-based stars, and the star is hotter as a result. [X → Y1] Ionized helium causes the star to have high density. Ionized helium is stripped of electrons, and helium nuclei without surrounding elec­ trons can be packed together more tightly. [X → Y2]

Ionized helium causes the star to have a large number of planets. Be­ cause helium is a heavier element than hydrogen, a star based on he­ lium produces a greater quantity of the heavier elements necessary for planet formation (e.g., carbon, iron) than one based on hydrogen. [X → Y3] Note: The assignment of features and causal relationships to the causal roles shown in Figure 20.2 are shown in brackets. The left chart in Figure 20.2 presents those ratings as a function of the number of effects (i.e., Ys) present in the test item and whether the common cause feature (X) was present or not. For test items in which X was present, ratings varied in the expected way: namely, they were judged a more likely category member to the extent that they had more fea­ tures typical of that category (i.e., more Ys). But this pattern changed when X was absent, such that adding Ys led to virtually no increase in ratings; indeed, going from zero to one Page 7 of 51

Concepts as Causal Models: Categorization Y feature present resulted in a decrease. The nature of this interaction is such that some test items were rated lower than others, even though they had more typical features (e.g., the item with all three Ys but not X, which possesses three typical features, was rated lower than those with one Y plus X, which possess two typical features). The intuition be­ hind this result is straightforward: Adding an effect feature when its cause is absent de­ creases the items’ coherence, resulting in a decrease (or, at best, a very small (p. 352) in­ crease) in category membership ratings. The interaction in Figure 20.2 illustrates the co­ herence effect. The analogous chart for a control condition that was identical except for the absence of the causal relations contained two parallel lines (i.e., no interaction), con­ firming that the coherence effect was due to the instructed causal links.

Figure 20.2 Category membership ratings from Re­ hder and Hastie (2001, Experiment 2). Error bars are standard errors of the mean.

Causal models provide an account of this result by assuming that categorizers’ judgments are a monotonic function of the joint distribution associated with the category’s causal model. I now present how a causal model’s joint distribution is derived (also see Rottman, Chapter 6 in this volume) and then applied to categorization judgments. Recall that the variables of a causal model may have exogenous influences that are not included in the model; however, these influences are constrained to be uncorrelated (ruling out, e.g., all hidden common causes whose values are not constant). This property, referred to as causal sufficiency (Spirtes et al., 2000), not only licenses the well-known causal Markov condition (which specifies the conditions under which variables are conditionally indepen­ dent), but also allows a model’s joint distribution to be factorized as follows,

(1)

where

are the features of category k and

denotes the par­

ents of fj in k’s causal model. For example, for the causal model in Figure 20.2, = . As mentioned, to allow for quantitative predictions, one must make further assumptions regarding the functions that link causes to their effects. Assuming that variables are bina­ ry (present, 1, or absent, 0), that the causal links are generative (a cause makes its effects Page 8 of 51

Concepts as Causal Models: Categorization more likely) and independent (each causal link operates autonomously), and that multiple causal influences integrate according to a noisy-OR function (Cheng, 1997), then,

(2)

where mij is the strength of the causal link between feature j and parent i, bj is the effect of background causes (causal influences exogenous to the model) on feature j, and ind (fi) is an indicator function that yields 1 if feature fi is present, 0 otherwise. Causal sufficien­ cy also entails that root nodes (features with no parents) are independent of one another and the probability of each is represented with its own parameter, cj.1 For example, for the common cause network in Figure 20.2, the generating function im­ plied by Equation 2 is

(3)

which, when X is present, evaluates to that Yi is brought about by X (

, reflecting the probability

) or an alternative cause not shown in the model (

When X is absent, the probability of Yi is just

).

. Combining Equations 1 and 3 means, for

instance, that the probability that X and Y3 are present and Y1 and Y2 are absent is

Or, the probability that Y1 and Y3 are present and X and Y2 are absent is

(4)

Table 20.2 presents the equations that specify a full joint distribution for a common cause network, albeit one with two rather than three effects. It also specifies the marginal prob­ ability of each feature, which are discussed later. The causal model represented by Equations 1–3 was quantitatively fit to Rehder and Hastie’s empirical data.2 The fits, shown in the right chart in Figure 20.2, demonstrate that the category’s causal model accounts for the coherence effect. It does so because the category’s joint distribution carries information about which configuration of features are most probable. In particular, configurations that honor coherence (ones in which causes and effects are both present or both absent) are probable (all else being (p. 353) equal) and those that don’t (cause present and effect absent, or vice versa) are improbable.

Page 9 of 51

Concepts as Causal Models: Categorization Table 20.2 Equations for Joint and Marginal Probabilities for a Common Cause Network with Two Effects Common Cause Network

cX = .750

X

Y1

Y2

1

1

1

.403

1

1

0

.147

1

0

1

.147

0

1

1

.010

0

0

1

.040

0

1

0

.040

1

0

0

.053

0

0

0

.160

pk(X = 1) Page 10 of 51

pk(XY1Y2) = pk(Yl| X)pk(Y2|X)pk(X)

cX

.750

Concepts as Causal Models: Categorization pk(Y1 = 1)

.600

pk(Y2 = 1)

.600

Page 11 of 51

Concepts as Causal Models: Categorization Subsequent work has shown that the coherence effect responds to experimental manipu­ lations in the predicted manner. First, one can ask whether classifiers are sensitive to the direction of causality or whether coherence is simply a function of (undirected) semantic relationships between features. The latter possibility has been tested by comparing a common cause network with a common effect network, which is identical except that the direction of causality has been reversed (Figure 20.3 a). Table 20.3 presents the equa­ tions that specify the joint distribution for a common effect network (with two rather than three causes). Rehder and Hastie (2001) also compared these two topologies, but here I report a comparison from Rehder (2003b) that tested the same artificial categories but, in order to conduct a more sensitive test of coherence, omitted the information about which features were typical. Figure 20.3 a, which shows classification ratings when the X fea­ ture was present as a function of the number of Ys and network topology (CC = common cause; CE = common effect), reveals that the pattern of ratings differed for the two net­ works, providing prima facie evidence that classifiers were sensitive to the direction of causality. To gain insight into why the ratings differed in this way, it is useful to re-plot the logarithm of the ratings (right chart of Figure 20.3 a). Whereas the ratings for the com­ mon cause network increased linearly (in log units), those for the common effect network exhibit a non-linearity such that ratings incur a large increase when the first cause is in­ troduced. The intuition behind this non-linearity is simple: when the common effect fea­ ture X is present in an object, that object will of course be a better category member if a cause feature (e.g., Y1) is also present, but the importance of Y1 will be reduced if another cause (e.g., Y2) is already present to “explain” the presence of X. In other words, common effect networks exhibit higher-order interactions such that the importance of a cause in­ teracts not just with its (p. 354) effect but also with other causes when the effect is present (see Rehder, 2010, for additional discussion). Rehder (2003b) demonstrated that the CGM defined by Equations 1 and 2 applied to a common effect network readily ac­ counts for such higher order effects, which are related to the phenomena of explaining away (a.k.a., discounting) for the case of multiple sufficient causation during causal attri­ bution (Ali, Chater, & Oaksford, 2011; Jones & Harris, 1967; Kelley, 1973; McClure, 1998; Morris & Larrick, 1995). More will be said about explaining away later.

Page 12 of 51

Concepts as Causal Models: Categorization

Figure 20.3 Category membership ratings from three studies: (a) Rehder (2003a); (b) Rehder (2014, Experiment 1); (c) Rehder and Kim (2010, Experi­ ment 1). CC = common cause network and CE = common effect network. Error bars are standard er­ rors of the mean.

Second, one might ask whether classifiers are sensitive to not only causal direction but al­ so the form of the generating function that relates causes to their effects. The preceding studies assumed that subjects treated each generative causal relation as independent, an interpretation bolstered (p. 355) by the descriptions of the causal mechanisms (e.g., Table 20.1) that implied independence (and codified by the generating function in Equation 2). But causes might also be interactive; for example, two conjunctive causes might both need to be present in order for an effect to arise (Lucas & Griffiths, 2010; Novick & Cheng, 2004). These situations are depicted by the two common effect networks in Fig­ ure 20.3 b. Whereas the causes in the left network correspond to two independent causal mechanisms (depicted as diamonds), those on the right conjunctively activate a single mechanism. Rehder (2014, Experiment 1) tested whether classifiers exhibit the unique patterns of coherence implied by the two networks in Figure 20.3 b by instructing them on categories whose features formed either independent or conjunctive common effect networks. Whereas the results were analogous to those found in Figure 20.3 a when caus­ es were independent (left chart of Figure 20.3 b), a different pattern emerged when they were conjunctive (right chart). When the effect X was present, a large increase in ratings only occurred when two Ys were present, because now two causes are necessary to ex­ plain the presence of X (i.e., achieve coherence). Rehder (2014) modeled these results by replacing the generating function for independent causes implied by Equation 2,

Page 13 of 51

Concepts as Causal Models: Categorization Table 20.3 Equations for Joint and Marginal Probabilities for an Independent Cause Network with Two Causes Independent Cause Network

Y1

Y2

X

1

1

1

.513

1

1

0

.050

1

0

1

.138

0

1

1

.138

0

0

1

.013

0

1

0

.050

1

0

0

.050

0

0

0

.050

pk(Y1 = 1) Page 14 of 51

pk(Y1Y2X) = pk(X| YlY2)pk(Y1)pk(Y2)

.750

Concepts as Causal Models: Categorization pk(Y2 = 1)

.750

pk(X = 1)

.800

Page 15 of 51

Concepts as Causal Models: Categorization

(5)

with a conjunctive generating function, namely,

(6)

where

is the strength of the conjunctive causal relation between Y1, Y2, and X.

Equation 6 codifies the intuition that the Ys only influence the probability of X when both are present; it evaluates to

when

and bx

(p. 356)

otherwise. Table 20.4 presents the equations that specify the joint probabilities for a con­ junctive cause network based on this alternative generating function.3

Page 16 of 51

Concepts as Causal Models: Categorization Table 20.4 Equations for Joint and Marginal Probabilities for a Conjunctive Cause Network Conjunctive Cause Network bX = .200 Y1

Y2

X

1

1

1

.413

1

1

0

.150

1

0

1

.038

0

1

1

.038

0

0

1

.013

0

1

0

.150

1

0

0

.150

0

0

0

.050

pk(Y1 = 1) Page 17 of 51

pk(Y1Y2X) = pk(X| YlY2)pk(Y1)pk(Y2)

.750

Concepts as Causal Models: Categorization pk(Y2 = 1)

.750

pk(X = 1)

.500

Page 18 of 51

Concepts as Causal Models: Categorization Finally, one might ask whether classifiers are sensitive to a causal network’s parameteri­ zation. For example, whereas the previous studies provided no information about the strength of the causal relations (e.g., the mij parameters in Equation 2), Rehder and Kim (2010, Experiment 1) instructed subjects on categories whose inter-feature causal links were described as either weak (the cause produces the effect 75% of the time) or strong (100%). The categories consisted of three features that formed a causal chain, as shown in Figure 20.3 c (the equations that specify the joint for this network are presented in Ta­ ble 20.5). The charts of Figure 20.3 c present the ratings as a function of both the pres­ ence of the intermediate feature Y and the number of other features (X and Z) present, along with the strength of the causal links (75% or 100%). The weak condition yielded an interaction analogous to those already seen: adding more features (X and/or Z) led to a large increase in category membership ratings when Y was present but a much smaller (and sometimes negative) one when Y was absent. The new finding is that this interaction was larger still for stronger (i.e., 100%) causal relations (right chart of Figure 20.3 c). Re­ hder and Kim showed that this result is readily predicted by the causal models in Figure 20.2 because stronger causal links imply stronger expected inter-feature correlations and so greater degrees of incoherence when those correlations are violated. (Rehder, 2014, also manipulated the strength of causal relations in independent and conjunctive common effect networks and found that the coherence effects in Figure 20.3 b varied in the pre­ dicted manner.) In another experiment, Rehder and Kim manipulated the strength of the alternative causes of Y and Z (the b parameters in Equation 2) and found, as predicted, that coherence effects were larger when inter-feature correlations should be stronger, that is, when alternative causes are weak. Two other results obtained in this paradigm are worthy of note. First, the preced­ ing studies instructed subjects on novel categories but did not present “data,” that is, ac­ tual examples of category members (as is the norm in learning studies in which cate­ gories are induced from observations). One might ask whether coherence effects hold when data is present. They do. In other experiments, Rehder and Hastie (2001) also pre­ sented subjects with a series of category exemplars that manifested the feature typicality information and that either did (Experiment 3) or did not (Experiment 1) manifest interfeature correlations consistent with the causal links. Subjects’ subsequent category mem­ bership judgments exhibited coherence in both experiments. Second, whereas these stud­ ies elicited category membership judgments with respect to a single category, one might ask whether coherence effects hold when classifiers decide which of two categories an (p. 357)

item belongs to (see note 1). They do. Martin and Rehder (2013) instructed subjects on two artificial categories (Myastars and Terrastars), each of which had one causal link be­ tween two of its typical features. Subjects’ forced-choice classification decisions were sensitive to the coherence exhibited by to-be-classified test items. For example, a test item with the same number of typical features from each category was nonetheless cate­ gorized as a Myastar (Terrastar) if its features exhibited coherence (incoherence) with re­ spect to Myastars’ causal structure.

Page 19 of 51

Concepts as Causal Models: Categorization Table 20.5 Equations for Joint and Marginal Probabilities for a Chain Network Chain Network

cX = .750 bY = bZ = .200

X

Y

Z

1

1

1

.403

1

1

0

.147

1

0

1

.040

0

1

1

.037

0

0

1

.040

0

1

0

.013

1

0

0

.160

0

0

0

.160

pk(X = 1)

Page 20 of 51

pk(XYZ) = pk(Z| Y)pk(Y|X)pk(X)

cX

.750

Concepts as Causal Models: Categorization pk(Y = 1)

.600

pk(Z = 1)

.520

Page 21 of 51

Concepts as Causal Models: Categorization Whereas the preceding studies involved university undergraduates, Hayes and Rehder (2012) asked whether moderately young children are sensitive to coherence. Both adults and 5- to 6-year-olds were taught a novel category with four features, two of which were causally related. For example, the features of a novel animal named Rogos were the fol­ lowing: have big lungs, can stay underwater for a long time, have long ears, and sleep during the day. In addition, big lungs were described as the cause of staying underwater for a long time. Subjects were then presented with test trials in which two animals were presented and asked which was more likely to be a Rogo. Logistic regression analyses re­ vealed that both subject groups exhibited coherence. For example, an animal whose fea­ tures exhibited coherence would be chosen as the Rogo over one that did not, even when both animals exhibited the same number of typical Rogo features (see Barrett, Abdi, Mur­ phy, & Gallagher, 1993, for related results). Although (p. 358) the effect was substantially weaker for children than adults, this study established that sophisticated (i.e., adult level) reasoning is not necessary for coherence to influence categorization.

Real-World Categories The studies just reviewed tested categories and causal relations that were created for purposes of the experiment. The advantage of testing artificial materials, of course, is that they control for the numerous other factors that influence category membership. On the other hand, one might ask whether the coherence effects reported in the preceding arose only because of the salience of causal relations that are instructed during an exper­ imental session. Thus, demonstrating that those effects arise with natural materials pro­ vides evidence that coherence contributes to everyday classification. Malt and Smith (1984) provide one such demonstration. Their initial question was whether classifiers are sensitive to the inter-feature correlations that implicitly exist in people’s representations of real-world categories. To identify such correlations, these re­ searchers asked one group of adult subjects to list features of a number of real-world cat­ egories (bird, clothing, flower, etc.) and then another to indicate which features were pos­ sessed by which category members. Within-category feature correlations were computed on the basis of this information. In their Experiment 1, yet another group was presented with test items that were equated on their overall feature-wise typicality but either pre­ served or broke inter-feature correlations, and was asked to rate the typicality of each. Importantly, no effect of coherence obtained in this experiment. However, a coherence ef­ fect emerged in their Experiment 2 when the only correlations tested were those between features that subjects explicitly rated as related. This result cannot be attributed to expe­ rienced inter-feature correlations because those correlations were no stronger in the sec­ ond experiment than the first; rather, it was subjects’ explicit beliefs that the features were related that made the difference. Ahn et al. (2002) replicated these results and fur­ ther found that the majority of the inter-feature relations (52%) were causal in nature. Hampton, Storms, Simmons, and Heussen (2009, Experiment 1) presented evidence for another form of coherence in real-world categories (also see Hampton & Simmons, 2003). In their Experiment 1, Hampton et al. tested pairs of biological categories (plants and ani­ mals) that were contrasted on three abstract dimensions: appearance, appearance of off­ Page 22 of 51

Concepts as Causal Models: Categorization spring, and “innards” (a biochemical property specific to one species). Subjects then cate­ gorized the eight items that can be formed on those three dimensions. For example, for the crab/lobster pair, a test item with two features of crabs and one of lobsters was “a creature with legs and claws that looks and acts just like a crab. The scientists found that the structure of its eyes was identical to that typically found only in lobsters. They found that the creature had offspring that looked and acted just like crabs.” Hampton et al. found that the features exhibited an interactive contribution to categorization decisions. Figure 20.4 a, which presents the z scores of categorization probabilities associated with test items that either did or didn’t have the target category’s typical appearance and ap­ pearance of offspring (collapsing over the third feature, innards), reveals the same sort of interaction seen in Figures 20.2 and 20.3, in which the weight of one feature was greater to the extent that another was already present. In fact, all three two-way feature interac­ tions reached significance. The authors interpreted these results as indicating that bio­ logical kinds are viewed as possessing “a strong set of causal principles within each or­ ganism that lead to the homogeneity of the class as a whole” (p. 1159). Note that the co­ herence exhibited in this study may have been based on abstract knowledge in which classifiers assume that features are causally related but lack any deep understanding of the nature of those relationships.

Complications: The Case of Artifacts In a second experiment, Hampton et al. (2009) conducted an analogous test of artifact categories. The abstract dimensions on which pairs of artifacts were contrasted were ap­ pearance, current function, and historical function. Historical function—the function orig­ inally intended by the artifact’s designer—has been thought to play an important role in artifact classification by many theorists (Bloom, 1996, 1998; Gelman & Bloom, 2000; Kele­ men & Carey, 2007; Matan & Carey, 2001).4 Although the contribution to categorization decisions made by these artifact features was more independent as compared to those for biological kinds, a modest but significant two-way interaction between appearance and current function (Figure 20.4 b) established the contribution of coherence to artifact cate­ gorization. The other two pairwise interactions were non-significant. Other studies have found that coherence between an artifact’s appearance and its func­ tion influences categorization judgments. For example, (p. 359) Wisniewski (1995) defined a novel artifact category (labeled “plapels”) in terms of its function, namely “captures ani­ mals.” He found that objects were considered better category examples when they pos­ sessed combinations of features that enabled an instance of that function (e.g., “contains peanuts” and “has a large metal container,” enables “caught a squirrel”) as compared to those that did not (“contains acorns” and “has a small metal container” fails to enable “caught an elephant”) (also see Murphy & Wisniewski, 1989). Similarly, Rehder and Ross (2001) showed that artifacts were considered better examples of a category of pollutioncleaning devices (“morkels”) when their features cohered (e.g., “has a metal pole with a sharpened end” enables “works to gather discarded paper”), and worse ones when their features were incoherent (“has a magnet” fails to enable “removes mosquitoes”).

Page 23 of 51

Concepts as Causal Models: Categorization

Figure 20.4 Z scores of categorization probabilities as a function of the presence or absence of features in Hampton et al. (2009). (a) Biological categories (Experiment 1). (b) Artifacts (Experiment 2). In each panel, two of the three pairwise feature interactions are presented: in panel (a), between appearance and appearance of offspring; in panel (b), between cur­ rent function and appearance. Z scores have been collapsed over the third variable: “innards” in panel (a), historical function in panel (b).

Nevertheless, the question remains why Hampton et al. found a coherence effect between appearance and function but not one involving historical function. Of course, a causal model predicts coherence only between features thought to be causally connected, and so this result could be explained by assuming that historical function is causally unrelated to the other two features. Yet it is clear that the function intended by an artifact’s designer has substantial causal responsibility for the physical features it ends up with. Indeed, this view has been codified by Chaigneau et al.’s (2004) HIPE theory of artifacts in which his­ torical function (H) results in its physical structure (P) that in turn enables a functional outcome (O), a causal model depicted Figure 20.5 a. If this is the causal model that peo­ ple have of most artifacts, why is categorization affected by the coherence between P and O but not H and P? This result encourages us to step back and ask why coherence effects emerge in the first place. The literature’s popular example of an inter-feature causal relation is bird’s fly be­ cause they have wings. But though this proposition would be affirmed by most, it is clear that more than wings are needed for flight. A bird must also be properly nourished, not be diseased or injured, not have a body size too large relative to its wings, and so forth. Said differently, wings cause flying but only when various enabling conditions are met (see Murphy, 2002, for related discussion). This situation in represented schematically in Figure 20.6 b in which the causal relation of Figure 20.6 a has been re-represent­ ed as a conjunctive relation in which the mechanism variable MWF represents the en­ abling condition for . MWF is depicted with a dashed line because it is usually un­ observed. In Figure 20.6 b, coherence is now represented by the (inferred) presence of MWF. For example, MWF is likely present if W and F are both present and unlikely if W is present and F is absent.5 Thus an animal that has wings (W) and (p. 360) flies (F) allows one to infer the likely presence of the underlying processes of birds that produce flight Page 24 of 51

Concepts as Causal Models: Categorization (MWF), making one doubly sure that it is a bird. Wings and the absence of flight suggest the likely absence of MWF. Such an animal may not be just a malnourished or injured bird, it may not be a bird at all.

Figure 20.5 Causal models of artifacts with explicit mechanism nodes (the Ms).

Figure 20.6 (a) A causal model of birds in depicting the causal relationship between wings (W) and flying (F). (b) Interpretations of the bird causal model in which the presence of an unobserved mechanism node (M_WF) is made explicit.

This deconstruction of coherence effects highlights that coherence between two features will provide evidence for category membership only if the causal mechanism that relates them is construed to be part of the category’s causal model. In this light, I propose that people’s causal model of most artifacts corresponds to the one shown in Figure 20.5 b. Just as for in Figure 20.6, the causal relationships for artifacts ( and ) have been re-represented with mechanism nodes that represent those relationships’ en­ abling conditions (MHP and MPO, respectively). Importantly, though, MPO is part of the ac­ tual causal model but MHP is not. This account provides an intuitive account of the lack of coherence between historical function and physical structure: the factors that determine whether a designer can successfully render an artifact’s physical structure (skill, money, the proper tools) have no influence on what the artifact is. Note that another alternative model in Figure 20.5 c in which the H node is also excluded is inconsistent with Hampton Page 25 of 51

Concepts as Causal Models: Categorization et al.’s (2009) findings that historical function had a main effect on categorization deci­ sions (I will return to this finding in the section “Classification as Prospective Forward Reasoning”). A final study of coherence in artifact categorization is that of Puebla and Chaigneau (2014), who instructed subjects on novel artifacts (e.g., a device described as a fish catch­ er consisting of a net of vegetable fibers that could be used to catch fish) and then con­ structed test items by systematically varying the values on a number of stimulus dimen­ sions, including H, P, and O. Subjects then rated the identity of the object (e.g., “Would you say this object is a fish catcher?”). Unlike Hampton et al. (2009) (and Wisniewski, 1995, and Rehder & Ross, 2001 (p. 361) ), they failed to observe any interactions between features, including between P and O. One important difference between studies may have involved the category labels used. In Puebla and Chaigneau, categories did not have la­ bels in the traditional sense, such as “plapels” (in Wisniewski) and “morkels” (in Rehder & Ross). Rather, the question that asked for the object’s identity (e.g., “Would you say this object is a fish catcher?”) simply reiterated the outcome (catching fish). This practice may have led many subjects to treat the outcome as a near-defining feature of the catego­ ry and so convert the problem into a logical reasoning exercise (e.g., “it caught a fish, thus it’s a fish catcher”; cf. Malt & Sloman, 2007), that is, to use Figure 20.5 d as the arti­ facts’ causal model. Results reviewed later (in the section “Classification as Prospective Forward Reasoning”) will provide support for this conjecture. In contrast, it is likely that the use of traditional category labels in Rehder and Ross, Hampton et al., and Wisniewski worked against this focus on a single feature and, perhaps unsurprisingly, coherence ef­ fects emerged when all the objects’ features were taken into account.

Feature Weights Recall that the second way that relations between intrinsic features affect categorization is to alter the importance of features individually. Different methods have been used to assess feature importance in causal-based categorization experiments. First, Rehder and Hastie (2001) performed linear regression on classification ratings in which there was one predictor for each feature coding whether that feature was present or absent in a test item. Because they presented subjects with all the test items that can be formed on all bi­ nary dimensions, this method yields feature weights that are orthogonal to feature inter­ actions (that were assessed by two-way and higher-order interaction terms in the regres­ sion equation). A second method involves asking subjects to explicitly rate the importance of features. For example, Rehder and Kim (2010) had subjects rate the prevalence of each feature within a category where prevalence was presumed to reflect the importance of that feature to category membership. Because features are presented in isolation, this method also yields a measure that is independent of feature interactions. A third method involves presenting test items that display all of a category’s typical features except for one (which is explicitly stated to be absent). The classification ratings for such items are interpreted as providing a relative measure of feature importance. We shall see how this missing feature method is not guaranteed to yield a measure of feature importance that is Page 26 of 51

Concepts as Causal Models: Categorization orthogonal to coherence. This section begins with a review of the substantial number of studies done testing adult subjects and then turns to those testing children.

Adult Studies of Feature Weights Sloman et al. (1998) and Ahn, Kim, Lassaline, and Dennis (2000a) were among the first studies to assess the effect of causal knowledge on classification. Their focus was on fea­ ture weights. For example, Ahn et al. tested novel categories with features related in a causal chain ( ). For instance, undergraduates were instructed on a type of bird called roobans that typically eat fruit (X), have sticky feet (Y), and climb trees (Z) and that (“Eating fruit tends to cause roobans to have sticky feet because sugar in fruits is secreted through pores under their feet.”) and and (“Sticky feet tend to al­ low roobans to build nests on trees because they can climb up the trees easily with sticky feet.”). Feature importance was assessed by the missing feature method in which partici­ pants rated on a 0–100 scale the category membership of test items missing one feature (one missing only X, one missing only Y, one missing only Z). The results, shown in the left chart of Figure 20.7, revealed that the item missing X was rated lower than one missing Y, which in turn was lower than the one missing Z, suggesting that X is more important than Y, which is more important than Z. (Because lower ratings imply greater feature impor­ tance, ratings in Figure 20.7 have been subtracted from 100.) The phenomenon in which cause features are more important than their effects is referred to as the causal status ef­ fect. In addition to this study and others with artificial categories (Kim et al. 2009; Luhmann et al. 2006; Sloman et al. 1998, Study 3), causal status effects have been observed with realworld categories. In these studies, researchers first assessed the theories that subjects hold via a theory drawing task in which they are presented with a category’s features and are asked to draw (and estimate the strength of) the inter-feature relations. With this method, Sloman et al. (1998) found a causal status effect for common-day objects (e.g., apples and guitars); Kim and Ahn (2002a, 2002b) and Ahn, Levin, and Marsh (2005) did so for psychiatric disorders such as depression and schizophrenia (and see Ahn, Kim, & Lebowitz, Chapter 30 in this volume). Each of these studies allowed subjects an unlimited amount of time to respond. One might ask whether a causal status effect obtains when an object must be categorized (p. 362)

quickly. It does. Using materials similar to those in Ahn et al. (2000a), Luhmann, Ahn, and Palmeri (2006) imposed a response deadline and found a causal status effect even when subjects responded within only 500 milliseconds. This finding is consistent with others that have evaluated the time course of knowledge effects in categorization (Lin & Mur­ phy, 1997; Palmeri & Blalock, 2000). The importance of category features can be derived from a causal model by assuming that they are monotonically related to their within-category marginal probability (a.k.a., their category validity) derived from the joint distribution implied by the category’s CGM. For independent, generative causes, those marginal probabilities are given by

Page 27 of 51

Concepts as Causal Models: Categorization

(7)

That is, a feature’s category importance is determined by the probability that it will be generated by the category’s causal relations. Equation 7 reveals that an effect will be less important than its cause(s) (a causal status effect will obtain) when the strengths of its in­ coming causal links and its background causes are sufficiently weak. Table 20.5 presents the expressions for the marginal probabilities of X, Y, and Z in Figure 20.7’s chain net­ work implied by Equation 7. The bars on the left of the right-hand chart of Figure 20.7 shows the decreasing marginal probabilities of X, Y, and Z, predicted by a chain model for the parameters cx = .750, mXY = mYZ = .667, and bY = bZ = .200. Sloman et al. (1998) themselves offered an alternative account of the relative importance of causes and effects. Their dependency model is based on the intuition that features are more important to category membership (i.e., are more conceptually central) to the ex­ tent that they have more dependents, that is, features that depend on them (directly or in­ directly). According to the original dependency model, feature i’s weight or centrality, ci, can be computed from the iterative equation

(8)

where ci,t is feature i’s weight at iteration t and dij is the strength of the causal link be­ tween i and its dependent j. For example, for the causal chain when cZ,1 is ini­ tialized to 1 and each causal link has a strength of 2, after two iterations the centralities for X, Y, and Z are 4, 2, and 1, representing that feature X is more important to category membership than Y, which is more important than Z (right-hand chart in Figure 20.7).6 Although the formulation of the dependency model in Equation 8 makes it technically in­ applicable to some causal models, Kim, Luhmann, Pierce, and Ryan (2009) introduced variants that are applicable to any network topology. Which account is correct? Studies that have explicitly compared them have consistently favored the causal model approach. Analogous with the (p. 363) presentation of coherence effects, I discuss experiments that manipulated network topology, functional form, and then parameterization (i.e., causal strengths).

Page 28 of 51

Concepts as Causal Models: Categorization

Figure 20.7 Category membership ratings (subtract­ ed from 100) from Ahn et al. (2000) and the predic­ tions of a chain causal model and the dependency model. CGM = causal graphical model.

First, the dependency model predicts that a feature will increase in importance to the ex­ tent it has additional effects. Rehder and Kim (2006, Experiment 3) explicitly manipulated the number of effects by comparing the 1-1-3 topology in Figure 20.8 a with a 1-1-1 net­ work (i.e., a causal chain). The critical comparison concerns the relative importance of feature Y, which has a different number of effects in the two networks but is identical oth­ erwise. Feature weights derived from a regression analysis are presented in Figure 20.8 a and show that feature Y was not relatively more important in the 1-1-3 network than the 1-1-1 network; indeed, the two networks yielded the same overall pattern of weights. Whereas this result contradicts the predictions of the dependency model, it is expected from the causal model framework. For the same example parameters used earlier (cX = . 750, ms = .667, and bs = .200), pk(Y) is the same in both networks (.60). In another experiment, Rehder and Kim tested the effect of increasing Y’s causes by com­ paring the 3-1-1 and 1-1-1 networks in Figure 20.8 b. Y will be more probable in the 3-1-1 network exactly because it has more causes that generate it. For instance, for the exam­ ple parameters, pk(Y) = .90 in the 3-1-1 network as compared to .60 in the 1-1-1 network. Regression weights shown in Figure 20.8 b confirmed that Y had a greater relative weight in the 3-1-1 condition; indeed, in this network Y was weighed more heavily that its causes (the Xs).7 Rehder and Hastie (2001) and Rehder (2003a) also found that an effect feature in a common effect network is weighed more heavily than its causes (and see Table 20.3 for a quantitative example). That is, the causal status effect can be overturned when an effect is generated by multiple causes. In contrast, the dependency model incorrectly pre­ dicts that Y should be weighed equally in the 3-1-1 and 1-1-1 networks. Note that others have shown that an event will be judged as more likely when multiple potential causes of that event are known (Fischoff, Slovic, & Lichtenstein, 1978; Tversky & Koehler 1994). Second, Rehder (2014, Experiment 1) tested how the importance of a common effect varies depending on whether its causes generate it independently or conjunctively. For the independent cause network in Figure 20.8 c, the marginal probability of X derived from Equation 7 is

(see Table 20.3); the

corresponding expression for the conjunctive cause network is (Table 20.4). For the example parameters (i.e.,

(p. 364)

cs = .

750, ms = .667, and bs = .200), pk(X) for the conjunctive cause network (.50) is lower Page 29 of 51

Concepts as Causal Models: Categorization than that for the independent network (.80). This result obtains because both causes must be present to generate X in the conjunctive network, whereas either cause may generate X in the independent network. The predictions were borne out by subjects’ explicit fea­ ture likelihood ratings from Rehder’s Experiment 1 (Figure 20.8 c), which shows that the effect was rated as more likely than its causes for the independent network but less likely for the conjunctive network. Because the two networks in Figure 20.8 c have the same de­ pendency structure (X depends on both Y1 and Y2), the dependency model incorrectly pre­ dicts no difference in the importance of X as a function whether Y1 and Y2 are indepen­ dent or conjunctive causes.

Figure 20.8 (a) Feature regression weights from Re­ hder and Kim (2006, Experiment 3). (b) Feature re­ gression weights from Rehder and Kim (2006, Exper­ iment 2). (c) Feature likelihood ratings from Rehder (2014, Experiment 1). (d) Feature likelihood ratings from Rehder and Kim (2010, Experiment 1). Error bars are standard errors of the mean.

Third, Rehder and Kim (2010) tested how the importance of the causes and effects in a causal chain varies with causal strength. According to Equation 7, the causal status effect should decrease with strength: the marginal probabilities of .75, .60, and .52 for X, Y, and Z that obtain when parameter m = .667 (Table 20.5) become .75, .74, and .73 when m = . 90 instead. The causal status effect can even become negative (i.e., effects are more im­ portant than their causes) when parameters m and b are sufficiently high. These predic­ tions were borne out by subjects’ explicit feature likelihood ratings from Rehder and Kim’s Experiment 1, shown in Figure 20.8 d: whereas a causal status effect obtained for causal links of strength 75%, it was absent entirely for links of strength of 100%. Again, regression analyses of subjects’ classification ratings revealed the same pattern. These results were replicated with strengths of 90% and 60% and when causal strength was ma­ nipulated as a within-subjects variable. The dependency model incorrectly predicts that the causal status effect should increase with causal link strength: The centralities of 4, 2, and 1 that obtain for X, Y, and Z when the link strength parameter d is 2 (Figure 20.7) be­ come 9, 3, and 1 when it is 3. The finding that a causal status effect arises only in scenarios involving relatively weak causal links is consistent with the studies described earlier. In the Ahn et al. (2000a) study discussed earlier that found a large causal status effect (Figure 20.7), the descrip­ Page 30 of 51

Concepts as Causal Models: Categorization tion of the causal link descriptions stated that the cause only “tends to” produce the ef­ fect (e.g., “Eating fruit tends to cause roobans to have sticky feet.”). (Kim et al. 2009 also used the “tends to” phrase.) In fact, Rehder and Kim reported replications of Ahn et al. in which the large causal effect disappeared when “tends to” was replaced with “always.” In Kim and Ahn (2002a) as well, the links for the psychological disorders they tested were generally weak, rarely exceeding 1.5 on a 1–3 scale (see their Figures 19.7–19.11). Two additional studies support the causal model prediction that effects increase in impor­ tance as a function of the strength of the causal relations that generate them. First, Re­ hder and Kim (2010, Experiment 2) manipulated the strength of the alternative causes in a causal chain and found that effects became more prevalent (i.e., the causal status effect became weaker) as alternative causes became stronger. Second, Rehder (2014, Experi­ ment 2) manipulated the strength of causal links in independent and conjunctive cause networks and found that the independent and conjunctive effect were both rated as more prevalent for strong causal links as compared to weak ones.

The Importance of Functional Features As mentioned, it has also been shown that features are more important to categorization when they are functional (e.g., a tiger has claws in order to catch prey) (e.g., also see Booth & Waxman, 2002; DiYanni & Kelemen, 2005; Keil, 1989; Kemler-Nelson et al., 1995; Lin & Murphy, 1997; Murphy & Wisniewski, 1989; Wisniewski, 1995; cf. Barton & Komat­ su, 1989; Malt & Johnson, 1992; see Oakes & Madole, 2008, for a review). Some theorists argue that the importance of such features derives not from function per se but rather from their position in the network of causal mechanisms that enables that function (Ahn, 1998; Lombrozo, 2009; Lombrozo & Rehder, 2012). For example, using novel biological categories and features, Lombrozo and Rehder both confirmed that the exact same fea­ ture was considered more diagnostic of category membership when it was described as functional as compared to non-functional and investigated why function was so privi­ leged. A series of manipulations established that functional features that were ontogenet­ ic (i.e., arose from recent changes in the environment) rather than phylogenetic (arose from natural selection), temporary (unlikely to be present in future category members), and were equated on prevalence among current category members were not substantially more important to categorization than non-functional features. These findings were inter­ preted as indicating that functional features are privileged because they are typically seen as being involved in causal interactions extending both backward and forward in time that ensure the feature’s stability, that is, its prevalence among past, current, (p. 365) and future category members. Note that whereas these studies tested biological categories, the role of function in the classification of artifacts has also received much at­ tention. Although functional features are privileged for artifacts as well, evidence pre­ sented in the section “Categorization as Causal Reasoning” will reveal that supposedly discrepant research findings can be reconciled by appreciating the unique types of causal networks in which artifact features are embedded.

Page 31 of 51

Concepts as Causal Models: Categorization Developmental Studies of Feature Weights A number of studies have shown that children will classify objects on the basis of causal properties. For example, Gopnik and Sobel (2000) presented 3- and 4-year-old children with four blocks that varied in shape and color. Each block was placed on a machine (a “blicket detector”); two of the blocks caused the machine to “activate” (play a song) and two did not. The experimenter then held up one of the blocks that activated the machine, referred to it as a “blicket,” and then asked the child to identify the “other” blicket. In a neutral/causal condition in which there was no other basis to group the blocks other than on the causal property, the other block that activated the machine was chosen 74% of the time. To establish that this effect indeed arose due to a causal property, Gopnik and Sobel also tested a neutral/baseline condition in which the blocks were held above the machine and the machine went off only when the experimenter’s hand was placed on it (providing an alternative explanation for the activation other than the block); in that condition, the other block that activated the machine was chosen only 45% of the time. An experiment that assessed 2-year-olds yielded a similar albeit weaker pattern of results. Even young children readily extend category labels on the basis of a shared causal property (also see Booth, 2008; Nazzi & Gopnik, 2000). Gopnik and Sobel also assessed the relative importance of causal and perceptual proper­ ties by testing conflict conditions in which the category label could instead be extended on the basis of a common perceptual property. For example, a machine would be activat­ ed by one of two red blocks and one of two blue blocks; when then told that the red block that activated the machine was a blicket, a child might choose as the other blicket either the other red block (i.e., extend on the basis of color) or the other block that activated the machine (extend on the basis of a causal feature). In a conflict/baseline condition in which the blocks didn’t touch the machine (just as in the preceding neutral/baseline condition), they chose the block with the shared perceptual feature 85% of the time, consistent with research indicating that children often extend category labels on the basis of perceptual similarity (Laudau, Smith, & Jones, 1988). But this proportion dropped to 47% in a con­ flict/causal condition in which the blocks touched the machine (they chose the causal match 40% of the time). That is, children do not extend category labels on the basis of causal properties only in the absence of shared perceptual properties. Instead, the causal and perceptual features were about equally important for extending the label. Walker, Lombrozo, Legare, and Gopnik (2014, Experiment 2) investigated what factors might enhance the salience of causal properties in children’s classifications using a de­ sign similar to Gopnik and Sobel’s conflict condition. They presented 3-, 4-, and 5-yearolds with four trials each consisting of three blocks. When placed on top of the machine, one block (the target) caused the toy to play music. Another block (the perceptual match) had the same shape and color as the target but did not cause activation. A third (the causal match) led the toy to play music but had a different shape and color. An experi­ menter then provided a category label (“blicket”) for the target block and asked children to identify the “other” blicket. Children in a control condition chose (on an average of more than three of the four trials) the perceptual rather than the causal match. Yet, the Page 32 of 51

Concepts as Causal Models: Categorization extensions of children in an explain condition who were first asked, for each block, why it did (or didn’t) make the toy play were far more likely (on about two of the four trials) to extend the label to the causal match. Even in 3-year-olds, the causal thinking apparently potentiated by the explanation prompt was sufficient to raise the importance of a causal feature to match that of a highly salient perceptual cue. A number of other experiments in this paradigm have been conducted, including ones us­ ing a blicket detector. In some cases (Gopnik, Glymour, Sobel, Schulz, & Kushnir, 2004; Sobel, Tenenbaum, & Gopnik, 2004), the findings are more appropriately construed as in­ stances of children using causal reasoning in the service of classification and so are re­ viewed in the section “Categorization as Causal Reasoning.” Whereas Gopnik and Sobel (2000) and Walker et al. (2014) focused on the relative impor­ tance of causal and perceptual features to children’s classifications, Ahn, Gelman, Ams­ terdam, Hohenstein, and Kalish (2000b) (p. 366) focused on the relative importance of cause and effect features, that is, whether a causal status effect obtains in children. In Ahn et al.’s experiment, 7- to 9-year-olds were taught a novel category with three fea­ tures in which one feature was the cause of the other two. For example, children were told about fictitious animals called taliboos that have promicin in their nerves (X), thick bones (Y1), and large eyes (Y2), and, moreover, that promicin caused thick bones (X ®Y1) and large eyes (X ®Y2). Ahn et al. found that an animal missing only the cause—that is, XYZ=(0,1,1)—was judged to be a less likely category member than one missing only one of the effects (1,0,1 or 1,1,0). Meunier and Cordier (2009) found a similar result with 5year-olds (albeit only when the cause was an internal rather than a surface feature). Although Ahn et al. interpreted their findings as indicating a causal status effect, we have seen (note 6) how the missing feature method is not guaranteed to yield estimates of fea­ ture importance that are independent of their interactions. Indeed, their results can be interpreted as reflecting coherence instead: the item missing only the cause feature may have been rated a poor category member because it violated two expected correlations (one with each of the effects), whereas an item missing only one effect violated only one (with the cause). To ask whether children exhibit a causal status effect, I return to the study of Hayes and Rehder (2012) that taught both adults and 5- and 6-year-olds a novel category with four features and one inter-feature causal link. Importantly, the logistic re­ gression analyses of subjects’ classification choices allowed an assessment of feature weights independent of their interactions. Not only did both groups exhibit coherence (as reviewed earlier), neither group exhibited a causal status effect. The presence of a coher­ ence effect but not a causal status effect lends credence to the possibility that the find­ ings interpreted as a causal status effect by Ahn et al. and Meunier and Cordier reflected coherence instead.8 In a second experiment, Hayes and Rehder used the qualifier “some­ times” in the causal link description to convey that it operated probabilistically. Whereas adults now exhibited a causal status effect along with coherence (consistent with Rehder and Kim, 2010), children continued to only exhibit coherence. Although these findings might be interpreted as indicating that the children had trouble understanding probabilis­ tic language (“sometimes”), other studies showing that children understand such lan­ Page 33 of 51

Concepts as Causal Models: Categorization guage suggest that they are more likely to reflect children’s general bias that causal links operate deterministically (Shultz & Somerville, 2006). Thus, the causal status effect may develop later because it depends on probabilistic causal reasoning, whereas coherence effects do not.

Part 2: Categorization as Causal Reasoning with Inferential Features This section now presents evidence that categorization can sometimes be an act of causal reasoning in which observed features are used to infer the presence of other features that establish category membership. By so doing, it will further explicate the distinction made in this chapter between intrinsic and inferential features. This distinction first received support in considering conditions under which inter-feature coherence effects can be ex­ pected (see the section “Interacting Features: The Coherence Effect”). The studies in the following section will show that certain causal-based categorization phenomena can be understood by assuming that inferential features are external to a category’s causal mode and are used by classifiers to infer the presence of unobserved intrinsic ones. The studies are organized into the two types of causal inferences implied by Figure 20.1 b. Diagnostic inferences involve reasoning from effects to the intrinsic features that cause them. Prospective inferences involve reasoning from causes to the intrinsic features they affect. As mentioned, two signatures of causal reasoning, explaining away and screening off, will be taken as evidence for diagnostic and prospective reasoning, respectively. These causal reasoning phenomena will be revisited when instances of causal-based fea­ ture prediction are reviewed in the companion chapter, “Concepts as Causal Models: Induction” (Chapter 21 in this volume). That chapter takes up the question of whether such inferences are drawn correctly, that is, in full accord with the predictions of causal models. Here, they are used to establish the presence of causal reasoning in the service of categorization.

Classification as Diagnostic Reasoning Oppenheimer et al. (2013) provide an example of one form of diagnostic reasoning—ex­ plaining away—during categorization with real-world materials. Subjects were presented with several objects, each with three features characteristic of a category, and rated its membership in that category. For example, an animal would have four legs, a tail, and be very smelly, and subjects rated (p. 367) the likelihood it was a skunk. In addition, the ani­ mal had a fourth feature that varied in its causal relationship to the characteristic ones. In what they referred to as the discounting (a.k.a., explaining away) condition, that fea­ ture could serve as an alternative explanation for the smelliness—the animal had been wading through a sewer. In fact, categorization ratings in the discounting condition were lower than those in a baseline condition in which the additional feature was causally un­ related to the typical ones. The authors argued that the causal model operative in the dis­ counting condition was the one shown in Figure 20.9. Node represents the underlying causal processes associated with skunks that yield its characteristic features; the fact that it is unobserved is depicted with a dashed line. Importantly, the alternative explana­ tion of smelliness—waded through a sewer—is not an intrinsic feature of skunks. Never­ Page 34 of 51

Concepts as Causal Models: Categorization theless, the fact that it provides an alternative explanation of smelliness “explains away” the evidence that smelliness normally provides for membership in the skunk category.9 Indeed, smelliness is now consistent with the many different kinds of animals (see Table 20.3 for a quantitative example of explaining away). Oppenheimer et al. also tested an augmentation condition in which the additional feature was preventative, that is, tended to make the target feature less likely (e.g., the animal with the typical skunk features had also been “sprayed with a pleasant smelling chemical agent”). Categorization ratings were higher with the preventive feature relative to the baseline condition. Note that these results obtained not just for biological kinds (skunks, bears) but social kinds (firefighter, cheerleader) and artifacts (golf cart, refrigerator) as well (see Rehder and Kim, 2009 for additional examples of explaining away).

Figure 20.9 The causal model assumed to be opera­ tive during Oppenheimer et al.’s (2013) categorization experiment.

Diagnostic reasoning for the purpose of categorization is exhibited by young children. For example, using the “blicket detector” paradigm described earlier, Gopnik, Sobel, and col­ leagues have shown that children can reason causally from observed evidence to an unob­ served feature that determines category membership (Gopnik & Sobel, 2000; Gopnik et al., 2004; Sobel & Buchanan, 2009; Sobel et al., 2004; Sobel & Kirkham, 2006; Sobel, Yoachim, Gopnik, Meltoff, & Blumenthal, 2007). In these studies, children are told that a blicket detector activates (i.e., plays music) whenever blickets are placed on it. In a back­ ward blocking condition tested in Sobel et al. (2004, Experiment 1), two blocks (A and B) were twice placed on the machine, causing it to activate, followed by a third trial in which A alone caused activation. On a subsequent test, 4-year-olds judged that B was unlikely to be a blicket. This result obtained despite the fact that the machine had been observed to activate whenever B was on top. In contrast, B was judged to be blicket (with probability 1) in an indirect screening off condition that was identical except that the machine didn’t activate on the final A trial. Apparently, children in the backward blocking condition en­ gaged in a form of explaining away in which the trial in which A alone activated the ma­ chine was sufficient to discount evidence that B was a blicket. Sobel and Kirkham (2006) Page 35 of 51

Concepts as Causal Models: Categorization reached similar conclusions for 24-month-olds (and 8-month-olds using anticipatory eye movements as a dependent measure). Kushnir and Gopnik (2005) have shown that chil­ dren can also infer category membership via another type of causal reasoning, namely, on the basis of interventions (in which the subject rather than the experimenter places blocks on the machine). Kim and Keil (2003) found evidence of diagnostic reasoning in yet another paradigm. They tested both artificial and real-world diseases with symptoms that were causally re­ lated, as shown in Figure 20.10. Subjects were presented with two hypothetical patients, each with two symptoms that were leaf nodes in Figure 20.10, and were asked which was more likely to have the disease. For one patient, the two symptoms were from the same causal chain (e.g., AX and AY). In the other, they were from different chains (e.g., AX and BX). Subjects judged that the latter patient was more likely to have the disease. This re­ sult, which Kim and Keil dubbed the causal diversity effect, is sensible because two symp­ toms from the same chain provide redundant information that is more redundant (e.g., AX already implies A3 and so relatively little (p. 368) additional information is provided by AY).10 Kim, Yopchick, and de Kwaadsteniet (2008) showed that the diversity effect ex­ tends to an information search paradigm in which subjects choose which sources of infor­ mation they find most useful for making an accurate diagnosis.

Figure 20.10 Schematic representation of the causal structure of the diseases (D) tested by Kim and Keil (2003).

Relationship to Essentialism The notion of diagnostic reasoning during categorization is intimately related to essential­ ism, the view that many kinds are defined by underlying properties or characteristics (an essence) that is shared by all category members and by members of no other categories (Gelman, 2003; Keil, 1989; Medin & Ortony, 1989; Rehder, 2007; Rips, 1989, 2001) and that are presumed to generate, or cause, perceptual features. Thus, the causal models of many categories are likely to include underlying causes that people assume produce a category’s observable features, which in turn are used to infer the underlying causes.

Page 36 of 51

Concepts as Causal Models: Categorization Research with natural categories supports the view that people reason diagnostically to essential properties. In a replication of Rips’s (1989) well-known transformation experi­ ments, Hampton, Estes, and Simmons (2007) found that whether a transformed animal (e.g., a bird that comes to look like an insect due to exposure to hazardous chemicals) was judged to have changed category membership often depended on what participants inferred about underlying causal processes and structures. As in Rips’s study, a (small) majority of subjects in Hampton et al. judged the transformed animal to still be a bird, whereas a (large) minority judged that it was now an insect. But although the judgments of the latter group (dubbed the phenomenalists by Hampton et al.) would seem to be based on the animals’ appearance, the justifications they provided for their choices indi­ cated that many used the animals’ new properties to infer deeper changes. For example, subjects assumed that a giraffe that lost its long neck also exhibited new behaviors that were driven by internal changes (e.g., to its nervous system) that in turn signaled a change in category membership (to a camel). Conversely, subjects who judged that the transformed animal’s category was unchanged (the essentialists) often appealed to the fact that it produced offspring from its original category, from which they inferred the ab­ sence of important internal changes (e.g., to the animal’s DNA). That is, both groups used observable features to infer the state of internal causal structures and decided category membership on that basis. The transformation paradigm has also been used to argue for essentialist classification in children, of course (Gelman, 2003; Gelman & Wellman, 1991; Johnson & Solomon, 1997; Keil, 1989), although such studies usually establish the importance of animals’ internal, non-perceptual features (their “insides”; Gelman & Wellman, 1991; Newman & Keil, 2008; Newman, Herrmann, Wynn, & Keil, 2008; Sobel et al., 2007) rather than causal reasoning per se. Perhaps the clearest linkage between essentialism and causal thinking in children comes from Opfer and Bulloch (2007). Children (6–7 years old) were presented with two insects in juvenile form, one labeled a “dax” and the other a “feb,” which were each ac­ companied by a pair of adults to which the juveniles were born. They were asked the cat­ egory label of a third target juvenile, also accompanied by two adults. The levels of simi­ larity between both the juveniles and the adult pairs were manipulated. Opfer and Bul­ loch found that the children generalized the category of the juvenile whose parents resembled the parents of the target. When the adults were instead described as potential predators of the juveniles, children projected category labels on the basis of the similarity of the juveniles themselves. Opfer and Bulloch interpreted these findings as indicating that 6- to 7-year-old children understand that parents are the causal source of the fea­ tures of their offspring.

Classification as Prospective (Forward) Reasoning The notion of explicit causal reasoning in the service of classification also allows for for­ ward, or prospective, reasoning. For example, a physician may suspect the presence of HIV given the presence of the forms of sarcoma, lymphoma, and pneumonia that HIV is known to produce (diagnostic reasoning). But the case for HIV is made stronger still by

Page 37 of 51

Concepts as Causal Models: Categorization the presence of one or more of its known causes, such as blood transfusions, sharing of intravenous needles, or unsafe sex. For compelling evidence of prospective causal reasoning, I return to Puebla and Chaigneau’s (2014) study of artifact categorization first reviewed in the section “Interact­ ing Features: The Coherence Effect.” Recall that test items in their Experiment 1 varied on the function historically intended by the artifact’s designer (H), physical structure (P), and functional outcome (O). The full HIPE theory of artifacts (Chaigneau et al., 2004) stipulates the two additional variables shown in Figure 20.11 a (an agent with the rele­ vant goal or intention [G] who then acts [A] toward the artifact in a manner appropriate to yield the outcome) and in fact test items varied on all five dimensions (H, P, O, G, and A). Regression analyses of (the log of) the categorization ratings, shown in Figure 20.11 a, revealed that outcome (O) had the largest influence on ratings, with H a distant second and P an even more distant third (G and A had no effect on ratings). This result is theoret­ ically interesting in its own right because it contrasts with other claims that artifact cate­ gory membership is largely determined by P (Malt, 1994; Malt & Johnson, 1992) or H; in­ deed, according to some, H plays the role of an essential (Bloom, 1996, 1998; Gelman & (p. 369)

Bloom, 2000; Keil, 1995; Kelemen & Carey, 2007; Matan & Carey, 2001; Rips, 1989). Nev­ ertheless, recall that rather than using traditional labels, Puebla and Chaigneau’s catego­ ry membership question restated the outcome (e.g., asked if something is a “fish catch­ er”) and in this light it may be unsurprising that O’s regression weight dominated. In­ deed, the significant weight on H under these circumstances stands as impressive evi­ dence for the role of a designer’s intentions in artifact categorization. It also bolsters the claim, made earlier in this chapter, that Figure 20.5 b, in which H is an intrinsic feature, is a more viable view of people’s default causal model for artifacts as compared to one in which it is not (Figure 20.5 c).

Page 38 of 51

Concepts as Causal Models: Categorization

Figure 20.11 Feature regression weights from Puebla and Chaigneau (2014). (a) Experiment 1. (b) Experiment 2. Error bars are standard errors of the mean.

(p. 370)

Importantly, evidence for prospective reasoning comes from Puebla and

Chaigneau’s Experiment 2, which was identical except that information about the func­ tional outcome (O) was omitted from the test items. Regression weights (Figure 20.11 b) revealed a new pattern in which physical structure (P) dominated. Their Experiment 3 re­ vealed a similar pattern when the presence of information about O was manipulated as a within-subjects variable. A causal reasoning perspective provides an intuitive account of these results, assuming that some subjects in these experiments reasoned with the causal model in Figure 20.5 d in which O served as a defining feature of the category. When in­ formation about outcome O is provided, P plays a minor role in determining category membership. But when it is not provided, P plays a major role by virtue of allowing cate­ gorizers to infer the state of O. For these subjects, H provides no evidence of category membership because the presence of P blocks, or “screens off,” the flow of information between H and O.11 Apart from this empirical contribution, the method introduced by Puebla and Chaigneau is innovative in that it allows an independent assessment of the in­ ferential and intrinsic evidence for category membership provided by a feature.

Summary of Causal-Based Categorization This chapter has reviewed the numerous ways that causal knowledge influences how ob­ jects are classified. Three key phenomena were reviewed and were shown to be explica­ ble under a causal model account. First, causal relations induce strong coherence effects, that is, interactive effects among features such that objects with more coherent features are judged better category members. A causal model predicts coherence effects because the degree to which a set of features is coherent (corroborate causal relations, i.e., has Page 39 of 51

Concepts as Causal Models: Categorization few cases in which a cause is present and its effect absent or vice versa) is encoded in the joint probability distribution derived from the model. Second, features considered individ­ ually grow in importance to the extent that their incoming causal links are more numer­ ous, stronger, and combine independently rather than conjunctively. It was shown how these findings reflect the features’ marginal probability in the corresponding causal mod­ el. Features (e.g., those that serve a purpose or are “functional”) may also be important when they are involved in causal interactions that extend over time and that ensure their prevalence in the past, present, and future. Finally, it was shown that features also pro­ vide support for category membership inferentially, that is, through the other features they imply. This evidence took the form of explaining away for diagnostic reasoning and screening off for prospective reasoning, effects that are each predicted by the conditional probabilities implied by a causal model. Causal models were contrasted with an alternative view of how causal relations affect categorization, namely, the dependency model. The dependency model not only incorrect­ ly predicts that features grow in importance to the extent that their outgoing causal links are more numerous and stronger (and predicts no difference for independent versus con­ junctive causes), but also provides no account of coherence effects. Because it conceives of causal knowledge as only changing the weight (centrality) of individual features, the dependency model is a type of independent cue (or prototype) model and so is unable to account for the large feature interactions induced by causal knowledge. It should be not­ ed that whereas this counter-evidence to the dependency model largely comes from tests of artificial categories, much of the model’s original support came from studies of natural categories (e.g., Kim & Ahn, 2002a, 2002b; Sloman et al., 1998). Thus, the door remains open to the possibility that its central insight—that features grow in importance as more features depend on them—may yet prove to have merit. This chapter emphasized the distinction between intrinsic and inferential features in causal-based categorization. The intrinsic/inferential distinction was important to deter­ mining when a feature decreases the likelihood of category membership (via explaining away) or increases it (via coherence). And, I proposed a boundary condition on when co­ herence effects arise; namely, they do so when the underlying causal mechanisms implied by a coherent set of features are themselves viewed as intrinsic to the category. To return to beavers as an example, although their inter-feature relations might include how gnaw­ ing at trees enables the building of dams, which sometimes causes flooding, which some­ times leads to higher insurance rates, the coherence between flooding and insurance says nothing about whether the dimly viewed creatures in the woods are beavers. Future work should strive to clarify how these two sorts of causal influences affect category member­ ship decisions. For example, the intrinsic/inferential distinction may instead be a continu­ um in which a feature can provide category membership evidence both directly and indi­ rectly by virtue (p. 371) of other features it implies (see Rehder, 2007). Further, Oppen­ heimer et al. (2013) emphasize that classifiers can incorporate context-specific variables into their causal representations of situations, which are then used to help decide catego­ ry membership (Murphy & Medin, 1985). For example, the property “waded through a sewer” is not just part of our causal model of skunks, its potential for providing an alter­ Page 40 of 51

Concepts as Causal Models: Categorization native explanation for an animal’s smelliness is unlikely be explicitly represented as part of people’s semantic knowledge. The question of exactly how causal models of categories and such context-specific information become integrated remains an important open question. There remain a number of other outstanding theoretical questions. For example, artifacts remain a puzzle. I have proposed that the causal model in Figure 20.5 b is most consis­ tent with the current data on artifact classification, because it explains the presence of coherence effects between an artifact’s appearance and its function, but not between ap­ pearance and historical function. This claim is in need of greater empirical support. One complication is that in some circumstances classifiers might reason with the model in Fig­ ure 20.5 d, in particular when the label used for an artifact simply repeats its function (e.g., “fish catcher” for an artifact that catches fish in Puebla & Chaigneau, 2014). Sys­ tematic investigation of the inferential versus intrinsic contribution of features like that conducted by Puebla and Chaigneau using artifacts with more conventional labels may be the key to providing a definitive causal model account of artifacts. There exist numerous gaps in basic empirical knowledge. Although many of the studies reported here have asked subjects to make classification judgments with respect to a sin­ gle category, real-world categorization often takes place against a backdrop of the multi­ ple potential categories. Although coherence effects obtain in a two-forced choice catego­ rization task (Martin and Rehder, 2013), analogous results for other phenomena (e.g., the causal status effect) are lacking. And, more needs to be known about the underlying processes that bring rise to causal-based categorization judgments. Although Luhmann et al. (2006) established that classification judgments exhibited a causal status effect even when they were made in 500 milliseconds, nothing is known about the time course of the other effects reviewed in this chapter (e.g., coherence effects in categorization). This chapter and its companion chapter on induction together aim to present the current evidence in favor of the causal model approach to concepts. The conclusion of that sec­ ond chapter will take up some more general issues. One set of issues involves the nature of causal model representation itself. There I will discuss extensions to address the pres­ ence of causal cycles among features (that appear to be ubiquitous in human concepts) and the presence of uncertainty in people’s causal beliefs. Another set concerns a new sort of category-based reasoning tasks that causal models can be extended to. I will sketch an outline of how causal models can form the basis of information search (decid­ ing which sources of information are most valuable for classifying an object correctly) and how they can be integrated with observed data. It will close with some general com­ ments on the causal model approach to concepts.

References Ahn, W. (1998). Why are different features central for natural kinds and artifacts? The role of causal status in determining feature centrality. Cognition, 69, 135–178.

Page 41 of 51

Concepts as Causal Models: Categorization Ahn, W., Gelman, S. A., Amsterdam, A., Hohenstein, J., & Kalish, C. W. (2000b). Causal sta­ tus effect in children’s categorization. Cognition, 76, B35–B43. Ahn, W., & Kim, N. S. (2001). The causal status effect in categorization: An overview. In D. L. Medin (Ed.), The psychology of learning and motivation (Vol. 40, pp. 23–65). San Diego, CA: Academic Press. Ahn, W., Kim, N. S., Lassaline, M. E., & Dennis, M. J. (2000a). Causal status as a determi­ nant of feature centrality. Cognitive Psychology, 41, 361–416. Ahn, W., Levin, S., & Marsh, J. K. (2005). Determinants of feature centrality in clinicians’ concepts of mental disorders, Proceedings of the 25th annual conference of the Cognitive Science Society. Mahwah, NJ: Lawrence Erlbaum Associates. Ahn, W., Marsh, J. K., Luhmann, C. C., & Lee, K. (2002). Effect of theory based correla­ tions on typicality judgments. Memory & Cognition, 30, 107–118. Ali, N., Chater, N., & Oaksford, M. (2011). The mental representation of causal condition­ al reasoning: Mental models or causal models. Cognition, 119, 403–418. Barrett, S. E., Abdi, H., Murphy, G. L., & Gallagher, J. M. (1993). Theory-based correla­ tions and their role in children’s concepts. Child Development, 64, 1595–1616. Barton, M. E., & Komatsu, L. K. (1989). Defining features of natural kinds and artifacts. Journal of Psycholinguistic Research, 18, 433–447. Bloom, P. (1996). Intention, history, and artifact concepts. Cognition, 60, 1–29. Bloom, P. (1998). Theories of artifact categorization. Cognition, 66, 87–93. Booth, A. (2008). The cause of infant categorization. Cognition, 106, 984–993. Booth, A. E., & Waxman, S. R. (2002). Object names and object functions serve as cues to categories for infants. Developmental Psychology, 38, 948–957. Carey, S. (1985). Conceptual change in childhood. Cambridge, MA: MIT Press. Chaigneau, S. E., Barsalou, L. W., & Sloman, S. A. (2004). Assessing the causal structure of function. Journal of Experimental Psychology: General, 133, 601–625. Cheng, P. (1997). From covariation to causation: A causal power theory. Psychological Re­ view, 104, 367–405. Danks, D. (2014). Unifying the mind: Cognitive representations as graphical models: Cam­ bridge, MA: MIT Press. DiYanni, C. & Kelemen, D. (2005). Time to get a new mountain? The role of function in children’s conceptions of natural kinds. Cognition, 97, 327–335.

Page 42 of 51

Concepts as Causal Models: Categorization Fischoff, B., Slovic, P., & Lichtenstein, S. (1978). Fault trees: Sensitivity of estimated fail­ ure probabilities to problem representation. Journal of Experimental Psychology: Human Perception and Performance, 4, 330–344. Gelman, S. A. (2003). The essential child: The origins of essentialism in everyday thought. New York: Oxford University Press. Gelman, S. A., & Bloom, P. (2000). Young children are sensitive to how an object was cre­ ates when deciding what to name it. Cognition, 76, 91–103. Gelman, S. A., & Wellman, H. M. (1991). Insides and essences: Early understandings of the nonobvious. Cognition, 38, 213–244. Glymour, C. (2001). The mind’s arrows: Bayes nets and graphical causal models in psychology. Cambridge, MA: MIT Press. (p. 373)

Gopnik, A., Glymour, C., Sobel, D. M., Schulz, L. E., & Kushnir, T. (2004). A theory of causal learning in children: Causal maps and Bayes nets. Psychological Review, 111, 3– 23. Gopnik, A., & Sobel, D. M. (2000). Detecting blickets: How young children use informa­ tion about novel causal powers in categorization and induction. Child Development, 71, 1205–1222. Hampton, J. A. (1979). Polymorphous concepts in semantic memory. Journal of Verbal Learning and Verbal Behavior, 18, 441–461. Hampton, J. A. (1995). Testing the prototype theory of concepts. Journal of Memory and Language, 34, 686–708. Hampton, J. A., & Simmons, C. L. (2003, November). Feature independence in natural cat­ egories. Poster presented at the 44th Annual Meeting of the Psychonomic Society, Van­ couver. Hampton, J. A., Estes, Z., & Simmons, S. (2007). Metamorphosis: Essence, appearance, and behavior in the categorization of natural kinds. Memory & Cognition, 35, 1785–1800. Hampton, J. A., Storms, G., Simmons, C. L., & Heussen, D. (2009). Feature integration in natural language concepts. Memory & Cognition, 37, 1150–1163. Hayes, B. K., & Rehder, B. (2012). Causal categorization in children and adults. Cognitive Science, 36, 1102–1128. Johnson, S. C., & Solomon, G. E. A. (1997). Why dogs have puppies and cates have kit­ tens: The role of birth in young children’s understanding of biological origins. Child De­ velopment, 68, 404–419. Jones, E. E., & Harris, V. A. (1967). The attribution of attitudes. Journal of Experimental Social Psychology, 3, 1–24. Page 43 of 51

Concepts as Causal Models: Categorization Keil, F. C. (1989). Concepts, kinds, and cognitive development. Cambridge, MA: MIT Press. Keil, F. C. (1995). The growth of causal understandings of natural kinds. In D. Sperber, D. Premack & A. J. Premack (Eds.), Causal cognition: A multidisciplinary approach (pp. 234– 262). Oxford: Clarendon Press. Kelemen, D., & Carey, S. (2007). The essence of artifacts: Developing the design stance. In S. Laurence & E. Margolis (Eds.), Creations of the mind: Theories of artifacts and their representation. Oxford: Oxford University Press. Kelley, H. H. (1973). The process of causal attribution. American Psychologist, 28, 107– 128. Kemler-Nelson, D. G., et al. (1995). Principle-based inferences in young children’s catego­ rization: Revisiting the impact of function on the naming of artifacts. Cognitive Develop­ ment, 10, 347–380. Kemp, C., & Tenenbaum, J. B. (2009). Structured statistical models of inductive reason­ ing. Psychological Review, 116, 20–58. Kim, N. S., & Ahn, W. (2002a). Clinical psychologists’ theory-based representation of men­ tal disorders affect their diagnostic reasoning and memory. Journal of Experimental Psy­ chology: General, 131, 451–476. Kim, N. S., & Ahn, W. (2002b). The influence of naive causal theories on lay concepts of mental illness. American Journal of Psychology, 115, 33–65. Kim, N. S., & Keil, F. C. (2003). From symptoms to causes: Diversity effects in diagnostic reasoning. Memory & Cognition, 31, 155–165. Kim, N. S., Luhmann, C. C., Pierce, M. L., & Ryan, M. M. (2009). Causal cycles in catego­ rization. Memory & Cognition, 37, 744–758. Kim, N. S., Yopchick, J. E., & de Kwaadsteniet, L. (2008). Causal diversity effects in infor­ mation seeking. Psychonomic Bulletin & Review, 15, 81–88. Komatsu, L. K. (1992). Recent views of conceptual structure. Psychological Bulletin, 112, 500–526. Kushnir, T., & Gopnik, A. (2005). Young children infer causal strength from probabilities and interventions. Psychological Science, 16, 678–683. Lamberts, K. (1995). Categorization under time pressure. Journal of Experimental Psy­ chology: General, 124, 161–180. Lamberts, K. (1998). The time course of categorization. Journal of Experimental Psycholo­ gy: Learning, Memory, and Cognition, 24, 695–711. Page 44 of 51

Concepts as Causal Models: Categorization Landau, B., Smith, L. B., & Jones, S. S. (1988). The importance of shape in early lexical learning. Cognitive Development, 3, 299–321. Lin, E. L., & Murphy, G. L. (1997). The effects of background knowledge on object catego­ rization and part detection. Journal of Experimental Psychology: Human Perception and Performance, 23, 1153–1163. Lombrozo, T. (2009). Explanation and categorization: How “why?” informs “what?.” Cog­ nition, 110, 248–253. Lombrozo, T., & Rehder, B. (2012). The role of functional features in biological kind con­ cepts. Cognitive Psychology, 65, 457–485. Lucas, C. G., & Griffiths, T. L. (2010). Learning the form of causal relationships using hier­ archical Bayesian models. Cognitive Science, 34, 113–147. Luhmann, C. C., Ahn, W., & Palmeri, T. J. (2006). Theory-based categorization under speeded conditions. Memory & Cognition, 34, 1102–1111. Malt, B. C. (1994). Water is not H2O. Cognitive Psychology, 27, 41–70. Malt, B. C., & Johnson, E. C. (1992). Do artifacts have cores? Journal of Memory and Lan­ guage, 31, 195–217. Malt, B. C., & Smith, E. E. (1984). Correlated properties in natural categories. Journal of Verbal Learning and Verbal Behavior, 23, 250–269. Martin, J. B., & Rehder, B. (2013). Causal knowledge and information search during cate­ gorization. Paper presented at the 46th Annual Meeting of the Society for Mathematical Psychology. Potsdam, Germany. Matan, A., & Carey, S. (2001). Developmental changes within the core of artifact con­ cepts. Cognition, 78, 1–26. McClure, J. (1998). Discounting causes of behavior: Are two reasons better than one? Journal of Personality and Social Psychology, 74, 7–20. Medin, D. L., & Ortony, A. (1989). Psychological essentialism. In S. Vosniadou & A. Ortony (Eds.), Similarity and analogical reasoning (pp. 179–196). Cambridge,UK: Cambridge Uni­ versity Press. Medin, D. L., & Schaffer, M. M. (1978). Context theory of classification learning. Psycho­ logical Review, 85, 207–238. Meunier, B., & Cordier, F. (2009). The biological categorizations made by 4 and 5-year olds: The role of feature type versus their causal status. Cognitive Development, 24, 34– 48.

Page 45 of 51

Concepts as Causal Models: Categorization Morris, M. W., & Larrick, R. P. (1995). When one cause casts doubt on another: A norma­ tive analysis of discounting in causal attribution. Psychological Review, 102, 331–355. Murphy, G. L. (1993). Theories and concept formation. In I. V. Mechelen, J. Hampton, R. Michalski & P. Theuns (Eds.), Categories and concepts: Theoretical views and inductive data analysis (pp. 173–200). London: Academic Press. (p. 374)

Murphy, G. L. (2002). The big book of concepts: MIT Press.

Murphy, G. L., & Medin, D. L. (1985). The role of theories in conceptual coherence. Psy­ chological Review, 92, 289–316. Murphy, G. L., & Wisniewski, E. J. (1989). Feature correlations in conceptual representa­ tions. In G. Tiberchien (Ed.), Advances in cognitive science: Vol. 2. Theory and applica­ tions (pp. 23–45). Chichester, England: Ellis Horwood. Nazzi, T., & Gopnik, A. (2000). A shift in children’s use of perceptual and causal cues to categorization. Developmental Science, 3, 389–396. Newman, G. E., Herrmann, P., Wynn, K., & Keil, F. C. (2008). Biases towards internal fea­ tures in infants’ reasoning about objects. Cognition, 107, 420–432. Newman, G. E., & Keil, F. C. (2008). Where is the essence? Developmental shifts in children’s beliefs about internal features. Child Development, 79, 1344–1356. Oakes, L. M., & Madole, K. L. (2008). Function revisited: How infants construe functional features in their representation of objects. In R. Kail (Ed.). Advances in child development and behavior (Vol. 36, pp. 135–185). San Diego: Elsevier. Opfer, J. E., & Bulloch, M. J. (2007). Causal relations drive young children’s induction, naming, and categorization. Cognition, 105, 206–217. Oppenheimer, D. M., Tenenbaum, J. B., & Krynski, T. R. (2013). Categorization as causal explanation: Discounting and augmenting in a Bayesian framework. Psychology of Learn­ ing and Motivation, 58, 203–231. doi:http://dx.doi.org/10.1016/ B978-0-12-407237-4.00006-2 Palmeri, T. J., & Blalock, C. (2000). The role of background knowledge in speeded percep­ tual categorization. Cognition, 77, B45–B47. Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible infer­ ence. San Mateo, CA: Morgan Kaufman. Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge, UK: Cambridge University Press. Puebla, G., & Chaigneau, S. E. (2014). Inference and coherence in causal-based artifcat categorization. Cognition, 130, 50–65. Page 46 of 51

Concepts as Causal Models: Categorization Reed, S. K. (1972). Pattern recognition and categorization. Cognitive Psychology, 3, 383– 407. Rehder, B. (2003a). Categorization as causal reasoning. Cognitive Science, 27, 709–748. Rehder, B. (2003b). A causal-model theory of conceptual representation and categoriza­ tion. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 1141– 1159. Rehder, B. (2007). Essentialism as a generative theory of classification. In A. Gopnik & L. Schultz (Eds.), Causal learning: Psychology, philosophy, and computation (pp. 190–207). Oxford: Oxford University Press. Rehder, B. (2010). Causal-based classification: A review. In B. Ross (Ed.), The psychology of learning and motivation (Vol. 52, pp. 39–116). San Diego, CA: Elsevier Academic Press. Rehder, B. (2014). The role of functional form in causal-based categorization. Journal of Experimental Psychology: Learning, Memory, and Cognition, 41, 670–692. Rehder, B. & Hastie, R. (2001). Causal knowledge and categories: The effects of causal beliefs on categorization, induction, and similarity. Journal of Experimental Psychology: General, 130, 323–360. Rehder, B. & Kim, S. (2006). How causal knowledge affects classification: A generative theory of categorization. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32, 659–683. Rehder, B. & Kim, S. (2009). Classification as diagnostic reasoning. Memory & Cognition, 37, 715–729. Rehder, B. & Kim, S. (2010). Causal status and coherence in causal-based categorization. Journal of Experimental Psychology: Learning, Memory, and Cognition, 36, 1171–1206. Rehder, B., & Murphy, G. L. (2003). A Knowledge-Resonance (KRES) model of category learning. Psychonomic Bulletin & Review, 10, 759–784. Rehder, B. & Ross, B. H. (2001). Abstract coherent concepts. Journal of Experimental Psy­ chology: Learning, Memory, and Cognition, 27, 1261–1275. Rips, L. J. (1989). Similarity, typicality, and categorization. In S. Vosniadou & A. Ortony (Eds.), Similarity and analogical reasoning (pp. 21–59). New York: Cambridge University Press. Rips, L. J. (2001). Necessity and natural categories. Psychological Bulletin, 127, 827–852. Rogers, T. T., & McClelland, J. L. (2004). Semantic cognition: A parallel distributed pro­ cessing approach. Cambridge, MA: MIT Press.

Page 47 of 51

Concepts as Causal Models: Categorization Rogers, T. T., & McClelland, J. L. (2011). Semantics without categorization. In E. M. Pothos & A. J. Wills (Eds.), Formal approaches in categorization (pp. 88–119). New York: Cambridge University Press. Ross, B. H. (1997). The use of categories affects classification. Journal of Memory and Language, 37, 240–267. Ross, B. H. (1999). Postclassification category use: The effects of learning to use cate­ gories after learning to classify. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25, 743–757. Ross, B. H. (2000). The effects of category use on learned categories. Memory & Cogni­ tion, 28, 51–63. Rosch, E. H., & Mervis, C. B. (1975). Family resemblance: Studies in the internal struc­ ture of categories. Cognitive Psychology, 7, 573–605. Sloman, S. A. (2005). Causal models: How people think about the world and its alterna­ tives. Oxford: Oxford University Press. Sloman, S. A., Love, B. C., & Ahn, W. (1998). Feature centrality and conceptual coher­ ence. Cognitive Science, 22, 189–228. Smith, E. E., & Medin, D. L. (1981). Categories and concepts. Cambridge, MA: Harvard University Press. Sobel, D. M., & Buchanan, D. W. (2009). Bridging the gap: Causality at a distance in children’s categorization and inferences about internal properties. Cognitive Develop­ ment, 24, 274–283. Sobel, D. M., & Kirkham, N. Z. (2006). Blickets and babies: The development of causal reasoning in toddlers and infants. Developmental Psychology, 42, 1103–1115. Sobel, D. M., Tenenbaum, J. B., & Gopnik, A. (2004). Children’s causal inferences from in­ direct evidence: Backwards blocking and Bayesian reasoning in preschoolers. Cognitive Science, 28, 303–333. Sobel, D. M., Yoachim, C. M., Gopnik, A., Meltzoff, A. N., & Blumenthal, E. J. (2007). The blicket within: Preschoolers’ inferences about insides and causes. Journal of Cognition and Development, 8, 159–182. Spirtes, P., Glymour, C., & Scheines, R. (2000). Causation, prediction, and search. New York: Springer-Verlag. (p. 375)

Tversky, A., & Koehler, D. J. (1994). Support theory: A nonextensional representa­

tion of subjective probability. Psychological Review, 101, 343–357. Walker, C. M., Lombrozo, T., Legare, C. H., & Gopnik, A. (2014). Explaining prompts chil­ drent to privilege inductively rich properties. Cognition, 133, 420–432. Page 48 of 51

Concepts as Causal Models: Categorization Wellman, H. M., & Gelman, S. A. (1992). Cognitive development: Foundational theories of core domains. Annual Review of Psychology, 43, 337–375. Wisniewski, E. J. (1995). Prior knowledge and functionally relevant features in concept learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 449– 468. (p. 376)

Notes: (1.) Equations 1 and 2 reasonably apply in situations in which subjects rate an object’s de­ gree of category membership with respect to a single category, which was the case in Re­ hder and Hastie (2001) and the large majority of studies reviewed in this section. When there are multiple categories to which an object o may belong, Equation 1 can be entered into Bayes’s rule,

where oc is o’s category label, oF are its features, and K is the set of potential categories (2.) In these fits the strengths of the three causal links were constrained to be equal (and estimated by a single parameter m), as were the strength of the alternative causes of the effects (b). The probabilities of the resulting joint distribution were exponentiated by (to account for non-linear usage of the response scale) and then multiplied by 100 (to map the result onto the range of responses). The best fitting parameters were cx = .95, m = . 76, b = .44, and = .17. (3.) Whereas Equation 6 defines a conjunctive causal mechanism as one that is only effec­ tive when both causes are present, Novick and Cheng’s (2005) framework for learning conjunctive causes allows each cause to also have an independent influence on the effect X. Analogously, the animal learning literature includes models in which cues can have both an “elemental” (i.e., independent) and configural (interactive) effect on the outcome (Gluck & Bower, 1988; Pearce, 1987, 1994, 2002; Wagner & Rescorla, 1972). Rehder (2014) presents an alternative version of Equation 6 in which the Ys in Figure 20.2 b have both independent and conjuctive causal influences. (4.) For example, for the church/art gallery pair, a test item with two features of churches and one of art galleries was “A large building with stained glass windows, and a steeple with a cross on the top, which looks just like a church. It was originally built just to be an exhibition hall for displaying large works of art, and had that function in the past. It is presently occasionally used for Christian services, and has no other function.” (5.) Figure 20.6 b is an example of a functional model in which the probabilistic operation of a causal relation is instead represented as a property of the enabling condition for a deterministic causal relation (Kemp, Shafto, & Tenenbaum, 2012; Pearl, 2000). In fact, the parameters of the networks in Figure 20.6 b can be derived from those in Figure 20.6 Page 49 of 51

Concepts as Causal Models: Categorization a so as to yield the same joint distribution (e.g., the joint that obtains when the network in 4A is instantiated with parameters and is identical to the one obtained when the network in Figure 20.6 b is instantiated with same parameters and then integrated over MWF). (6.) Note that whereas Figure 20.7 reveals that both theories predict a decelerating pat­ tern of decreasing feature weights, the empirical results exhibit an accelerating pattern instead. This result is likely due to an overestimation of the importance of Y attributable to the use of the missing feature method. Whereas the item missing only X (XYZ=(0,1,1)) and the one missing only Z (1,1,0) involve one violation of the expected inter-feature cor­ relations, the item missing only Y (1,0,1) involves two. That is, the ratings in Figure 20.7 likely reflect coherence effects in addition to the importance of individual features. See Rehder (2010), pp. 44–49, for additional discussion and examples. (7.) The decreasing feature weights (i.e., a causal status effect) for the 1-1-1 network in Figure 20.8 a but not 20.8 b is due to the fact they were constructed from different physi­ cal features. Categories in these experiments consisted of five feature dimensions and the features of to-be-classified test items were listed, top to bottom, on the computer screen. This technique usually yields a greater weight on the first feature dimension. In Figure 20.8 a, feature X was always instantiated by the first feature, whereas in Figure 20.8 b it was instantiated by either the first, second, or third (depending on counterbalancing con­ dition). Feature position thus contributes more to feature X in Figure 20.8 a than it does in Figure 20.8 b. (8.) Whereas Hayes and Rehder tested 5- and 6-year-olds, Ahn et al. and Meunier and Cordier tested 7- and 9-year-olds. It is possible, of course, that children at the older age might exhibit a causal status effect. (9.) This result only follows if wading through a sewer is represented as an inferential fea­ ture. Representing it as part of the causal model of skunks instead leads to two non-intu­ itive results. First, an animal that has both waded through a sewer and is smelly will be considered more likely to be a skunk (because of the coherence between sewer and smelliness). Second, the extra potential cause of smelliness will now make it more diag­ nostic of skunks (because of the increase in smelliness’s within-category marginal proba­ bility brought about by wading through a sewer). (10.) This result holds for most but not all of the parameterizations of the network in Fig­ ure 20.10. In particular, it doesn’t hold for very weak causal links, in which case redun­ dant information leads to stronger inferences to D. (11.) A later experiment suggested that subjects sometimes reason from not only from P to O but from O to P, suggesting that for some P may have served as the artifact’s defin­ ing feature. (No evidence for reasoning between H and P was found, however.)

Bob Rehder Page 50 of 51

Concepts as Causal Models: Categorization Department of Psychology New York University New York, New York, USA

Page 51 of 51

Concepts as Causal Models: Induction

Concepts as Causal Models: Induction   Bob Rehder The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.21

Abstract and Keywords This chapter evaluates the case for treating concepts as causal models, the view that peo­ ple conceive of a categories as consisting of not only features but also the causal relations that link those features. In particular, it reviews the role of causal models in categorybased induction. Category-based induction consists of drawing inferences about either objects or categories; in the latter case one generalizes a feature to a category (and thus its members). How causal knowledge influences how categories are formed in the first place—causal-based category discovery—is also examined. Whereas the causal model ap­ proach provides a generally compelling account of a large variety of inductive inferences, certain key discrepancies between the theory and empirical findings are highlighted. The chapter concludes with a discussion of the new sorts of representations, tasks, and tests that should be applied to the causal model approach to concepts. Keywords: causal model, induction, category, inference, concept, empirical

Introduction This chapter continues the argument—begun in the companion chapter, “Concepts as Causal Models: Categorization” (Chapter 20 in this volume)—that human concepts are represented as causal models that explicitly represent, for example, the causal relations between features of the categories. Categories with inter-feature causal links are com­ monplace. Birds nest in trees because they can fly, and fly because they have wings. Dis­ eases don’t just have symptoms but cause them (and in turn have causes themselves). Cars have features such as “burn gasoline” and “produce heat and carbon monoxide,” but most of us have at least some rudimentary understanding that it is the burning of the gasoline that produces the heat and carbon monoxide. To treat a concept as a causal model is to treat it as a Bayesian network or causal graphical model (Glymour, 2001; Pearl, 1988, 2000; Sloman, 2005; Spirtes, Glymour, & Scheines, 2000), in which features are represented as nodes and inter-feature causal relations as directed arcs between those nodes. The companion chapter reviews how such causal models provide a frame­ Page 1 of 64

Concepts as Causal Models: Induction work in which to understand a wide variety of phenomenon involving how objects are cat­ egorized, that is, how their features are taken as evidence for which category the object is a member. This chapter presents the causal model account of category-based induction. Inductive inferences—inferences to uncertain conclusions—often involve propositions about categories. The effect of causal knowledge on three forms of induction is discussed. In one, referred to here as feature prediction, a reasoner is given an object’s category la­ bel (and perhaps some of its features) and asked the likelihood that it has another feature (see the section “Casual-Based Feature Prediction”). In another, feature generalization, an object displays a novel (never before seen) feature, and the question is whether that fea­ ture should be included in the representation of the object’s category (see the section “Casual-Based Feature Generalization”). Finally, in category discovery, one judges how objects should be grouped together to (p. 378) form categories in the first place (see the section “Casual-Based Category Discovery”). This chapter demonstrates how causal knowledge has profound influences on all three types of judgments. The topics of the two companion chapters—classification and induction—represent two sides of the same coin. Once an object is classified, one may use that knowledge to infer the presence of features that cannot be observed directly. And, once one concludes that a feature of an object should be treated as a stable property of its category, that feature can be projected onto category members encountered in the future. Together, these infer­ ences highlight the central function of categories, which is to serve as repositories of ex­ perience and knowledge that enable reasoners to reduce uncertainty about the objects and events they encounter (Anderson, 1991; Rosch, 1978; Smith & Medin, 1981).

Causal-Based Feature Prediction Acts of causal-based feature prediction are ubiquitous. Recognizing an animal as a skunk leads one to back away in anticipation of a bad smell; identifying a can opener (and using it appropriately) leads one to expect an open can; diagnosing a patient’s disease leads a doctor to anticipate symptoms that could appear in the future. This section begins with a presentation of the formal mechanism via which the causal model associated with an object’s category is used to infer the object’s unobserved features (also see, in this vol­ ume, Meder & Mayrhofer, Chapter 23; Oaksford & Chater, Chapter 19; Rottman, Chapter 6). I then present the basic empirical evidence that bears on those predictions.1 The final two subsections will ask whether two sorts of causal reasoning errors that have been doc­ umented in the literature also manifest themselves in causal-based feature predictions.

Feature Prediction as Causal-Based Conditional Probabilities The causal model approach stipulates that feature inferences are based on the condition­ al probabilities implied by a category’s causal model. That is, given an object whose cate­ gory membership is known (k) along with some subset of its features (F), the probability that it also displays feature fi is . (Throughout this Page 2 of 64

Concepts as Causal Models: Induction chapter, “1” denotes that a feature is present and “0” denotes that it is absent.) Rottman and Hastie (2014) present a tutorial on how to draw such inferences on the basis of causal knowledge and review the many types of causal inferences that have been studied in the literature, including ones involving categories (also see Ali et al., 2011; Fernbach & Erb, 2013; Khemlani & Oppenheimer, 2010; Sloman & Lagnado, 2015). Here I demon­ strate how conditional probabilities can be computed from the same joint distribution that was deemed responsible for category membership judgments in the companion chapter on categorization. That two judgments types (categorization and feature predictions) can be computed on the basis of the same underlying representation is an important advan­ tage of the causal model approach. Repeating the presentation from the companion chapter on categorization, the wellknown causal Markov condition (which specifies the conditions under which variables are conditionally independent2), allows a model’s joint distribution to be factorized as follows,

(1)

where

are the features of category k and

denotes the par­

ents of fj in k’s causal model. In other words, the Markov condition implies that the proba­ bility of any feature is a function of (only) its immediate parents. Consider the four net­ works shown in Tables 12.1, 12.2, 12.3, and 12.4, networks that were tested in many of the studies reviewed below. For the common cause network shown in Table 21.1, in which a feature referred to schematically as X is the common cause of effect features, Y1 and Y2, . For the independent and conjunctive cause net­ works of Tables 21.2 and 21.3, chain network in Table 21.4

. Finally, for the .

Deriving quantitative predictions for these networks requires that one make further as­ sumptions regarding how effect features are functionally related to their causes. Assum­ ing that causal links are generative (a cause makes its effects more likely) and indepen­ dent (each causal link operates autonomously), and that multiple causal influences inte­ grate according to a noisy-OR function (Cheng, 1997), then,

(2)

where mij is the strength of the causal link between feature j and parent i, bj is the effect of background (p. 379) causes (causal influences exogenous to the model) of feature j, and ind(fi) is an indicator function that yields 1 if feature fi is present and 0 otherwise. The noisy-OR function embodied by Equation 2 represents how a feature can be caused by one Page 3 of 64

Concepts as Causal Models: Induction of its parent or some alternative cause. For example, for the common cause network of Table 21.1 the generating function implied by Equation 2 is

Page 4 of 64

Concepts as Causal Models: Induction Table 21.1 Equations for Joint and Conditional Probabilities for a Common Cause Network with Two Effects Common Cause Network

X

Y1

Y2

1

1

1

.403

1

1

0

.147

1

0

1

.147

0

1

1

.010

0

0

1

.040

0

1

0

.040

1

0

0

.053

0

0

0

.160

Page 5 of 64

pk(XY1Y2) = pk(Yl| X)pk(Y2|X)pk(X)

Concepts as Causal Models: Induction pk(Y1 = 1|Y2 = 0) pk(Y1 = 1|Y2 = 1)

pk(Y1 = 1, Y2 = 0)/pk(Y2 = 0) pk(Y1 = 1, Y2 = 1)/pk(Y2 = 1)

pk(Y1 = 1|X = 1, Y2 = 0) pk(Y1 = 1|X = 1) pk(Y1 = 1|X = 1, Y2 = 1)

pk(Y1 = 1, X = 1, Y2 = 0)/pk(X = 1, Y2 = 0) pk(Y1 = 1, X = 1)/pk(X = 1) pk(Y1 = 1, X = 1, Y2 = 1)/pk(X = 1, Y2 = 1)

pk(X = 1|Y1 = 0, Y2 = 0) pk(X = 1|Y1 = 1, Y2 = 0) pk(X = 1|Y1 = 1, Y2 = 1)

pk(X = 1, Y1 = 0, Y2 = 0)/pk(Y1 = 0, Y2 = 0) pk(X = 1, Y1 = 1, Y2 = 0)/pk(Y1 = 1, Y2 = 0) pk(X = 1, Y1 = 1, Y2 = 1)/pk(Y1 = 1, Y2 = 1)

Page 6 of 64

Concepts as Causal Models: Induction

(3)

which, when X is present, evaluates to that Yi is brought about X (

, reflecting the probability

) or an alternative cause not shown in the model (

When X is absent, the probability of Yi is

).

. Note that root nodes (features with no par­

ents) are assumed to be independent of one another (see note 2) and the probability of each is represented with its own parameter, cj. Thus, pk(X) = cX in Tables 21.1 and 21.4 and in Tables 21.2 and 21.3. Equations 1–3 can be used to derive the equations that specify the full joint distributions for any causal network and those for the four networks in Tables 21.1–21.4 are included in the tables. Two examples illustrate how a conditional probability judgment is then de­ rived from a joint distribution. Both are based on Table 21.1’s common cause network and assume that that network is instantiated with parameters cX = .750, , and First consider ition, bution in Table 21.1 and Y1 yields

Page 7 of 64

. By defin­

Marginalizing the joint distri­ (p. 380)

over X yields . Thus,

; marginalizing over X

Concepts as Causal Models: Induction Table 21.2 Equations for Joint and Conditional Probabilities for an Independent Cause Network with Two Causes Independent Cause Network

Y1

Y2

X

1

1

1

.513

1

1

0

.050

1

0

1

.138

0

1

1

.138

0

0

1

.013

0

1

0

.050

1

0

0

.050

0

0

0

.050

Page 8 of 64

pk(Y1Y2X) = pk(X| YlY2)pk(Y1)pk(Y2)

Concepts as Causal Models: Induction pk(Y1 = 1|Y2 = 0) pk(Y1 = 1|Y2 = 1)

pk(Y1 = 1, Y2 = 0)/pk(Y2 = 0) pk(Y1 = 1, Y2 = 1)/pk(Y2 = 1)

pk(Y1 = 1|X = 1, Y2 = 0) pk(Y1 = 1|X = 1) pk(Y1 = 1|X = 1, Y2 = 1)

pk(Y1 = 1, X = 1, Y2 = 0)/pk(X = 1, Y2 = 0) pk(Y1 = 1, X = 1)/pk(X = 1) pk(Y1 = 1, X = 1, Y2 = 1)/pk(X = 1, Y2 = 1)

pk(X = 1|Y1 = 0, Y2 = 0) pk(X = 1|Y1 = 1, Y2 = 0) pk(X = 1|Y1 = 1, Y2 = 1)

pk(X = 1, Y1 = 0, Y2 = 0)/pk(Y1 = 0, Y2 = 0) pk(X = 1, Y1 = 1, Y2 = 0)/pk(Y1 = 1, Y2 = 0) pk(X = 1, Y1 = 1, Y2 = 1)/pk(Y1 = 1, Y2 = 1)

Page 9 of 64

Concepts as Causal Models: Induction

Similarly,

is given by,

Tables 21.1–21.4 present the conditional probabilities for each of the four networks for a number of theoretically important types of causal inferences discussed below. Having shown how the probability of one category feature can be computed given the state of others on the basis of a causal model, I now turn to the question of whether human rea­ soners in fact predict features in this manner.

Basic Findings A number of studies have investigated causal-based feature prediction using artificial ma­ terials (i.e., materials made up by the experimenters). The presentation of these studies is organized so as to address three questions. First, one can ask whether feature predic­ tions are sensitive to the direction of causality or whether they are simply a function of (undirected) semantic relationships between features. Second, one can ask whether they are sensitive to the form of the generating function that relates causes to their effects. For example, two causes of a single effect might combine their influences disjunctively (i.e., independently, as in Equation 2) or conjunctively. Finally, one might ask whether they are sensitive to a causal network’s (p. 381) parameterization, in particular to the strength of the causal relations.

Page 10 of 64

Concepts as Causal Models: Induction Table 21.3 Equations for Joint and Conditional Probabilities for a Conjunctive Cause Network Conjunctive Cause Network

Y1

Y2

X

1

1

1

.413

1

1

0

.150

1

0

1

.038

0

1

1

.038

0

0

1

.013

0

1

0

.150

1

0

0

.150

0

0

0

.050

Page 11 of 64

pk(Y1Y2X) = pk(X| YlY2)pk(Y1)pk(Y2)

Concepts as Causal Models: Induction pk(Y1 = 1|Y2 = 0) pk(Y1 = 1|Y2 = 1)

pk(Y1 = 1, Y2 = 0)/pk (Y2 = 0) pk(Y1 = 1, Y2 = 1)/pk (Y2 = 1)

pk(Y1 = 1|X = 1, Y2 = 0) pk(Y1 = 1|X = 1) pk(Y1 = 1|X = 1, Y2 = 1)

pk(Y1 = 1, 0)/pk (X = pk(Y1 = 1, = 1) pk(Y1 = 1, 1)/pk (X =

pk(X = 1|Y1 = 0, Y2 = 0) pk(X = 1|Y1 = 1, Y2 = 0) pk(X = 1|Y1 = 1, Y2 = 1)

Page 12 of 64

X = 1, Y2 = 1, Y2 = 0) X = 1)/pk (X X = 1, Y2 = 1, Y2 = 1)

pk(X = 1, Y1 = 0, Y2 = 0)/pk (Y1 = 0, Y2 = 0) pk(X = 1, Y1 = 1, Y2 = 0)/pk (Y1 = 1, Y2 = 0) pk(X = 1, Y1 = 1, Y2 = 1)/pk (Y1 = 1, Y2 = 1)

Concepts as Causal Models: Induction Network Topology and Causal Asymmetry Rehder and Burnett (2005) taught subjects artificial categories whose typical features were organized into a common cause (Experiment 3) or a common effect (Experiment 4) causal network, shown in panels (a) and (b) of Figure 21.1. For example, subjects in the common cause condition were informed of a type of star named Myastars and how one feature of Myastars (ionized helium) caused three others (hot temperature, high density, a large number of planets). Subjects in the common effect condition were instead told that one feature (a large number of planets) was caused by the other three. A description of the causal mechanism associated with each causal link was also provided (see Table 21.5 for the causal mechanisms for Myastars in the common effect condition). Myastars were one member of a set of six experimental categories that included biological kinds (species of ants and shrimp), non-living natural kinds (types of stars and molecules), and artifacts (types of cars and computers). After learning their category, subjects were presented with test items that were known category members with (typical or atypical) values on two of the four feature dimensions and were asked to rate the likelihood that one of the unobserved dimensions had its typi­ cal value. For example, on a particular trial a subject might be presented with a Myastar with a large number of planets and normal density (an atypical density value for Myas­ tars), and would be asked how likely it was to have ionized helium. Recall that the joint distributions for common cause and common effect networks and a number of example conditional probabilities are (p. 382) presented in Tables 21.1 and 21.2 (albeit for networks with three rather than four variables).

Page 13 of 64

Concepts as Causal Models: Induction Table 21.4 Equations for Joint and Conditional Probabilities for a Chain Network

Page 14 of 64

Concepts as Causal Models: Induction Chain Network

cX = .750 mXY = mYZ =.667 bY = bZ = .200

X

Y

Z

pk(XYZ) = pk(Z| Y)pk(Y|X)pk(X)

1

1

1

(1 − (1 − bz)(1 − mYZ)) (1 − (1 − bY)(1 − mXY))cX

.403

1

1

0

(1 − bz)(1 − mYZ)(1 − (1 − bY)(1 − mXY))cX

.147

1

0

1

bz(1 − bY)(1 − mXY)cX

.040

0

1

1

(1 − (1 − bZ)(1 − mYZ))bY(1 − cX)

.037

0

0

1

bz(1 − bY)(1 − cX)

.040

0

1

0

(1 − bz)(1 − mYZ)bY(1 − cX)

.013

1

0

0

(1 − bz)(1 − bY)(1 −

.160

mXY)cX

Page 15 of 64

Concepts as Causal Models: Induction 0

0

0

(1 − bz)(1 − bY)(1 − cX)

pk(Z = 1|X = 0) pk(Z = 1|X = 1)

pk(Z = 1, X = 0)/pk(X = 0) pk(Z = 1, X = 1)/pk(X = 1)

pk(Z = 1|Y =1, X = 0) pk(Z = 1|Y = 1) pk(Z = 1|Y = 1, X = 1)

pk(Z = 1, Y pk(Y = 1, X pk(Z = 1, Y 1) pk(Z = 1, Y pk(Y = 1, X

.160

= 1, X = 0)/ = 0) = 1)/pk(Y = = 1, X = 1)/ = 1)

pk(Y = 1|X = 0, Z = 0)

pk(Y = 1, X = 0, Z = 0)/ pk(X = 0, Z = 0)

.077

pk(Y = 1|X = 1, Z = 0)

pk(Y = 1, X = 1, Z = 0)/ pk(X = 1, Z = 0)

.478

pk(Y = 1|X = 1, Z = 1)

pk(Y = 1, X = 1, Z = 1)/ pk(X = 1, Z = 1)

.909

Page 16 of 64

Concepts as Causal Models: Induction Comparing how such inferences are drawn with common cause and common effect net­ works is of theoretical importance because they are identical except that the direction of causality has been reversed. Thus, a different pattern of feature inferences in the two conditions indicates that reasoners are sensitive to causal direction. Rehder and Burnett’s empirical results are presented in the bottom half of Figure 21.1. The ratings when the tobe-predicted dimension is X (the common cause or common effect) is shown on the left side of each chart, and those when it is a Y are shown on the right. The x-axis varies the number of Y features present in the test item and the Y inferences are divided into whether X is present, absent, or unobserved. One basic finding is that features are con­ sidered more likely to the extent that their causes/effects are present. For example, in the common cause network, ratings for the presence of the common cause X increase as its number of effects increase. These inference types are referred to as diagnostic inferences in Table 21.1 because they involve reasoning backward from effects to causes. In the com­ mon effect network, ratings for the presence of the common effect X increase as its num­ ber of causes increases (forward inferences in Table 21.2). Of greater theoretical importance is the fact that the ratings exhibited a different pattern for the two networks. For example, when predicting a Y when X was present (black filled circles), ratings were lower when another Y was present versus absent in the common ef­ fect condition. That is, subjects exhibited that signature of causal reasoning known as ex­ plaining away (a.k.a., discounting) (Ali et al., 2011; Jones & Harris, 1967; Kelley, 1973; McClure, 1998; Morris & Larrick, 1995). For example, if one knows that bacteria and viruses are two independent causes of fever and then observes (p. 383) that the presence of fever in a patient, then also learning that a bacteria is present in that patient lowers the probability that the fever is due to a virus (Pearl, 2000). Table 21.2 presents a quanti­ tative example of explaining away that shows that when X is present, a Y is less probable when the other Y is present. The results in Figure 21.1 b establish that explaining away also obtains in causal-based feature prediction. In contrast, the corresponding ratings in the common cause condition (Figure 21.1 a) show that the presence of the other Y made X more likely.

Page 17 of 64

Concepts as Causal Models: Induction

Figure 21.1 Conditional feature likelihood ratings. (a) From Rehder and Burnett (2005), Experiment 3. (b) From Rehder and Burnett (2005), Experiment 4. In each chart, ratings for the presence of feature X are shown on the left and those for a Y are shown on the right. Circled plot points are cases where condi­ tional independence is violated. Error bars are stan­ dard errors of the mean.

The results in Figure 21.1 thus provide prima facie evidence that causal-based feature predictions are sensitive to the direction of causality. (p. 384) But while those results pro­ vide important confirming evidence for the distinctive predictions of common cause and common effect causal models, there are other aspects of the results in Figure 21.1 that are not in accord with those predictions. Discussion of these additional results is deferred until later in the chapter (see the section “Markov Violations”). Table 21.5 One of the Experimental Categories from Rehder and Burnett (2005) Features

Causal Relationships

Ionized helium [Y1] Very hot [Y2] High den­ sity [Y3] Large number of planets [X]

Ionized helium causes the star to have a large number of planets. Be­ cause helium is a heavier element than hydrogen, a star based on heli­ um produces a greater quantity of the heavier elements necessary for planet formation (e.g., carbon, iron) than one based on hydrogen. [Y1 → X]

Page 18 of 64

A hot temperature causes the star to have a large number of planets. The heat provides the extra energy required for planets to coalesce from the gas in orbit around the star. [Y2 → X]

Concepts as Causal Models: Induction High density causes the star to have a large number of planets. Helium, which cannot be compressed into a small area, is spun off the star, and serves as the raw material for many planets. [Y3 → X] The assignment of features and causal relationships to the causal roles shown in Fig­ ure 21.1 b are shown in brackets.

Functional Form A second question is whether reasoners are sensitive to the form of the generating func­ tion that relates causes to their effects. Whereas the common effect networks in Table 21.2 and Figure 21.1 b assumed that each link operated independently, causes might also be interactive; for example, causes might be conjunctive in that both need to be present in order for an effect to arise (Lucas & Griffiths, 2010; Novick & Cheng, 2004). Compare the independent and conjunctive cause networks shown in Figure 21.2 a. Whereas the causes in Figure 21.2 b correspond to two independent causal mechanisms (depicted as dia­ monds), those in Figure 21.2 operate via a single mechanism. The different joint distribu­ tions (and thus inferences) implied by these networks arise from their distinct generating functions, namely,

(4)

when Y1 and Y2 are independent and, when Y1 and Y2 are conjunctive,

(5)

where

is the strength of the conjunctive causal relation between Y1, Y2, and X.

Equation 5 codifies the intuition that the Ys only influence the probability of X when both are present; it evaluates to when Y1 = Y2 = 1 and bx otherwise. Tables 21.2 and 21.3 present the joint distributions for these two types of networks.3 Rehder (2014a, Experiment 1) assessed how subjects reason with independent versus conjunctive common effect networks. Again using Myastars as an example, whereas in the independent condition subjects were told that ionized helium and high temperature each individually cause high density, in the conjunctive condition they were told that ion­ ized helium and high temperature “together” cause high density. Subjects were then asked to draw a number of feature inferences. The results are presented in the lower half of Figure 21.2. As in Figure 21.1, ratings when to-be-predicted feature dimension is X are shown on the left of each chart, and those when it is a Y are shown on the right (the x-ax­ is again varies the number of Y features present in the test item). The different pattern of ratings in the two conditions indicates that subjects were sensitive to the difference be­ tween independent versus conjunctive causes. First, when predicting the effect (X) and Page 19 of 64

Concepts as Causal Models: Induction one cause (a Y) is present (open squares in the figure), ratings were much higher for in­ dependent versus conjunctive causes, because the latter require both causes to be present to yield the effect (open squares in the charts). Second, when predicting a Y when X was present (black filled circles), ratings were lower when the other Y was present versus absent for the independent network (i.e., subjects explained away), where­ as the conjunctive network exhibited the opposite pattern (namely, ratings were higher when the other Y was present). This result is sensible because one conjunctive cause is potentially responsible for an effect only when the other conjunct is present. Rehder (2014a) referred to this as the exoneration effect. For example, discovering that a murder suspect didn’t possess the means to carry out the crime (e.g., proximity to the victim) de­ creases his or her likely guilt and makes the guilt of an alternative suspect more likely. Table 21.3 presents a quantitative example of the exoneration effect.

Parameterization (Causal Strength) Finally, Rehder (2014a, Experiment 2) asked whether feature inferences are sensitive to a causal model’s parameterization, namely, whether the strengths of the causal relations were weak or strong (e.g., the mij parameters in Equation 2). In the weak condition (Fig­ ure 21.3 a), the links of an independent cause network were described as producing the effect “occasionally.” In the strong condition (Figure 21.3 b), “often” was used instead. The results reveal that each of the effects observed with weak causal links (Figure 21.3 a) were greater when they were strong (Figure 21.3 b). For example, the rated likelihood of X grew more quickly as the number Ys increased with strong versus weak links. And, ex­ plaining away was stronger for strong causal links as compared to weak ones. Note that the same experiment also manipulated the strength of conjunctive causes and found, as predicted, that the effects shown in Figure 21.2 b were more pronounced for stronger links (e.g., the exoneration effect was larger). (p. 385)

Page 20 of 64

Concepts as Causal Models: Induction

Figure 21.2 Conditional feature likelihood ratings. (a) From Rehder (2014), Experiment 1, independent cause condition. (b) From Rehder (2014), Experiment 1, conjunctive cause condition. In each chart, ratings for the presence of feature X are shown on the left and those for a Y are shown on the right. Circled plot points are cases where conditional independence is violated. Error bars are standard errors of the mean.

Markov Violations The empirical results presented in Figures 21.1–21.3 reveal that causal-based feature pre­ dictions are generally in accord with the conditional probabilities implied by a concept’s causal model. Yet, each of the conditions in Figure 21.1–21.3 also exhibits a theoretically important departure from those predictions. As mentioned, a key property of causal mod­ els is the Markov condition, which stipulates the patterns of conditional independence that arise, given knowledge of the state of other variables in a network—specifically, when the state of a variable’s direct parents is known, the variable is conditionally inde­ pendent of each of its non-descendants (Hausman & Woodward, 1999; Pearl, 1988, 2000; Reichenbach, 1956; Spirtes et al., 2000). This condition has a natural causal interpreta­ tion: apart from its descendants, one has learned as much as possible about a variable once the state of its direct causes is known. Because non-descendants only provide infor­ mation about the variable through the parents, the variable is said to be screened off from those non-descendants by the parents. Tables 21.1–21.4 present quantitative examples of conditional independence for each of the network types. I now summarize the evidence regarding whether subjects in the pre­ vious experiments honor the Markov condition. • For the common cause network in Figure 21.1 a, screening off entails that the Ys are independent when the state of X is known. Subjects judged otherwise: the presence of Page 21 of 64

Concepts as Causal Models: Induction one Y made another more likely both when X was known present (filled circles) or known absent (open circles). For example, according to these subjects, and (cf. Table 21.1). • The causes of a common effect network should be independent when the state of X is unknown.4 Yet, in Figure 21.1 b subjects judged that the likelihood of one Y increased as a function of the number of other Ys present (gray circles). For example, causes should also be independent when X is known to be absent, yet subjects failed to (p. 386) exhibit independence on these inferences types in Figures 21.1 b, 21.2 a, 21.3 a, and 21.3 b (open circles). • In a conjunctive causes network, the probability of an effect should be the same re­ gardless of whether zero or one cause is present. Yet subjects in Figure 21.2 b judged that X was more likely when one Y was present versus none (open squares). For exam­ ple, they judged that (cf. Table 21.3). • Rehder and Burnett (2005) also found independence violations with the chain net­ work in Table 21.4 (albeit one with four features: ). For example, in Ta­ ble 21.4 feature Z should be screened off from X when the state of Y is known. Yet, sub­ jects judged, for example, that .

Figure 21.3 Conditional feature likelihood ratings. (a) From Rehder (2014), Experiment 2, weak condi­ tion. (b) From Rehder (2014), Experiment 2, strong condition. In each chart, ratings for the presence of feature X are shown on the left and those for a Y are shown on the right. Circled plot points are cases where conditional independence is violated. Error bars are standard errors of the mean.

Page 22 of 64

Concepts as Causal Models: Induction Each circled set of the plot points in Figures 21.1–21.3 reflects an independence violation. Rehder and Burnett referred to this result as a typicality effect in which the presence of typical features implies the presence of still more typical features. Indeed, they found a strong typicality effect in each of their experiments’ control condition that was identical except for the absence of any inter-feature causal links. Studies of causal reasoning in domains other than feature prediction reveal that Markov violations are quite common (Burnett, 2004; Lagnado & Sloman, 2004; Luhmann & Ahn, 2007; Mayrhofer et al., 2010; Mayrhofer & Waldmann, 2015; Park & Sloman, 2013; Perales et al., 2004; Rehder, 2014b; Walsh & Sloman, 2004, 2008; see Rottman & Hastie, 2014, for a review). Two classes of explanation of these violations have been offered. One emphasizes the possibility that subjects sometimes reason with knowledge that is differ­ ent from that assumed by the experimenter. A number of the possibilities suggested in the literature are depicted in Figure 21.4. First, Park and Sloman proposed that reason­ ers sometimes assume that a common cause network is elaborated with a shared disabler (node D in Figure 21.4 a). On their account, when D is present, it prevents the operation of the causal mechanisms both between X and Y1 and between X and Y2 (also see Mayrhofer & Waldmann, 20155). Second, a number of these (p. 387) examples of Markov violations can be interpreted as indicating that subjects reasoned with a common cause model, but one in which the two causal links are actually mediated by a common mecha­ nism (node C in Figure 21.4 b). Finally, Rehder and Burnett (2005) suggested that people assume that categories possess underlying properties or mechanisms that produce or generate a category’s observable properties, situations represented in Figures 21.4 c and 21.4 d in which node UM serves as the shared underlying mechanism cause for common cause and common effect networks. The important point is that for these causal models, many of the cases of the lack of independence in Figures 21.1–21.3 are no longer Markov violations. These possibilities highlight the need for researchers to carefully consider the causal knowledge that people actually reason with.

Page 23 of 64

Concepts as Causal Models: Induction

Figure 21.4 Alternative causal structures proposed to account for the types of independence violations.

The second class of explanation attributes independence violations to alternative reason­ ing processes rather than knowledge. For example, Rehder (2014b) taught subjects vari­ ables in the domains of economics, meteorology, and sociology with causal relationships that formed a common cause, common effect (with independent causes), or chain net­ work. I found many of the same independence violations shown in Figures 21.1–21.3 (also see Rehder & Waldmann, in press). That these results obtained despite the fact that the materials were not features of categories (plus the extensive counterbalancing of the ma­ terials) rules out the sort of accounts in Figure 21.4 as complete accounts of Markov vio­ lations. Instead, I concluded that people’s causal reasoning processes include a general tendency to reason “associatively,” that is, to assume that the presence of one variable makes the presence of all other variables in the network more likely. That reasoners were still quite sensitive to causal direction indicated that this associative bias contributes to but does not completely override normative causal reasoning. If this account is correct, the associative bias is likely to have contributed to the feature prediction results shown in Figure 21.1. Nevertheless, note that this explanation does not account for the lack of fea­ ture independence in Rehder and Burnett’s control conditions. It is likely that each of the explanations just presented contribute to apparent indepen­ dence violations in causal-based feature prediction. It is probable that there are circum­ stances in which reasoners augment their causal models with shared disablers (Figure 21.4 a), shared mediators (Figure 21.4 b), or a shared mechanism (Figure 21.4 c), and that an associative bias contaminates otherwise veridical causal inferences (Rehder, 2014b).

Page 24 of 64

Concepts as Causal Models: Induction

Neglect of Alternative Causes Markov violations represent one kind of systematic causal reasoning error. (p. 388) Fern­ bach, Darlow, and Sloman (2010, 2011a, 2011b) established the existence of another. In one study, they tested simple predictive inferences (inferring an effect E from the pres­ ence of a cause C) in real-world situations where there were other potential causes of E. For each scenario, subjects not only judged but also the strength of the relation and

, which indexes the strength of causes of E other than

C. Although predictive inferences ought to increase as function of the strength of and the strength of alternative causes, Fernbach et al. found that subjects reasoned as if they were ignoring the strength of potential alternative causes, resulting in estimates of that were too low. Indeed, when was weak, was sometimes judged lower than the marginal probability of E,

. For instance, sub­

jects told about positive but weak evidence that the Republicans would win the House of Representatives in the 2010 US mid-term election (a newspaper endorsement of a single candidate) were actually less likely to gamble on a Republican win than those given no ev­ idence (a result that Fernbach et al. referred to as the weak evidence effect). Apparently, the focus on a single present cause leads reasoners to ignore alternative causes. Fernbach and Rehder (2012, Experiment 1) tested whether alternative causes are also ne­ glected during causal-based feature prediction. Subjects were taught artificial categories (e.g., Myastars) whose features were described as causally related in one of the ways shown in Figure 21.5 a. In both conditions, the strength of was described as 80%. But the strength of the alternative causes of E was varied: in the weak condition it was described as 25%, whereas in the strong condition it was described as 75%. Subjects then drew forward, , and diagnostic, , inferences. The norma­ tive predictions for this experiment are presented in Figure 21.5 b,6 which reveals that in­ ferences should be sensitive to alternative cause strength. The forward inference should be higher when there are strong alternative causes of E because it is probable that E is present due to those alternatives. The diagnostic inference should be lower when alternatives are strong because it is more likely that the presence of E is due to the alternatives rather than C. The empirical results in Figure 21.5 c show that diagnostic inferences were appropriately sensitive to the strength of alternative causes. But just as in Fernbach et al., predictive inferences were insensitive to the manipulation. This neglect of alternative causes also obtained for weaker causal links (40% vs. 80%) and when the alternative cause was an explicit feature of the catego­ ry (Experiment 2); only when the alternative was an explicit category feature that was ex­ plicitly described as present did alternative strength affect predictive inferences appro­ priately (Experiment 3). Fernbach and Rehder concluded that reasoners look for opportu­ nities to simplify their causal model (and the required calculations) by deleting variables they believe to be irrelevant to the inference at hand. That the neglect of alternative caus­ es did not vary with the potential size of the reasoning error ruled out the hypothesis that reasoners neglect alternative causes only when potential error is low.

Page 25 of 64

Concepts as Causal Models: Induction Summarizing the review of causal-based features prediction, it is clear that such predic­ tions exhibit many important properties of reasoning based on causal models, including sensitivity to causal direction, causal strength, and the manner in which multiple causes integrate. These results corroborate those reported in the companion chapter (“Concepts as Causal Models: Categorization,” Chapter 20 in this volume), which showed how causal direction, strength, and the integration function also affect how people make classifica­ tion decisions. Yet, whereas the causal model framework clearly provides a good first-or­ der approximation of people’s feature inferences, those inferences also exhibit systematic biases, such as Markov violations and neglect of alternative causes. I will return to these findings in the chapter summary which considers future directions for the causal model approach.

Causal-Based Feature Generalization This section reviews acts of induction in which reasoners generalize a feature to a catego­ ry. Whereas studies in the previous section asked whether one of a category’s known features is present in an object, in this paradigm the feature is usually described as novel and subjects are asked to consider how widespread it might be. Nevertheless, acts of fea­ ture generalization have the same ultimate purpose: once generalized to a category, a novel feature can then be projected onto its members. Indeed, the dependent variable in many of the following studies is the proportion of category members that are likely to possess the novel feature. Studies vary depending on the source of the novel feature (also referred to as the base). An object-to-category generalization arises when the source is a specific object. For exam­ ple, one might ask whether (p. 389) the shape and size of the cones of a particular fir tree generalize to all firs. A category-to-category generalization occurs when the source is it­ self a category. For example, one might ask if the cones of fir trees generalize to a super­ ordinate category (pines), a subordinate category (Douglas firs), or a category at the same taxonomic level (cedars). Because they tend to be treated as part of the same litera­ ture (and explained by the same computational models), some of the studies that follow report object-to-object generalizations (where the two objects can be from either the same or different categories). Kemp and Jern (2013) have proposed a useful taxonomy of the inductive inferences supported by conceptual knowledge.

Page 26 of 64

Concepts as Causal Models: Induction

Figure 21.5 (a) Experimental design in Fernbach and Rehder (2012), Experiment 1. (b) Normative predic­ tions. (c) Feature likelihood ratings (on a 0–20 scale). Error bars are standard errors of the mean.

The category-based induction literature is quite large and includes numerous reviews (Feeney & Heit, 2007; Gelman, 2003; Hayes, Heit, & Swenden, 2010; Hayes & Heit, 2013; Heit, 2000, Kemp & Tenenbaum, 2009; Rehder, 2007b; Rips, 2001; Sloman & Lagnado, 2005b; in addition to Kemp & Jern, 2013). Much early research focused on the role of the similarity relations among the categories that played the role the source(s) and target in an inductive argument. To assess the effect of those relations isolated from any involving the to-be-generalized feature itself, these studies traditionally used blank features—fea­ tures about which people have no (or at least little) prior knowledge. In a typical experi­ ment, participants might be told that “sparrows have sesamoid bones” (sesamoid bones are blank) and then asked how they generalize to all birds (Osherson, Smith, Wilkie, & Lopez, 1990; Rips, 1975; Sloman, 1993). Typicality effects refer to the fact that more typi­ cal source examples support stronger generalizations. For example, people will be more confident that all birds have sesamoid bones when the source category is sparrows rather than penguins, because sparrows are more typical birds than penguins. Diversity effects refers to the fact that a more diverse set of base examples leads to stronger generaliza­ tions. All else being equal, people are more likely to conclude that all birds have sesamoid bones given that {sparrows, hawks, chickens} do as compared to {sparrows, robins, blue jays}, because the first set is more diverse than the second. (p. 390) Finally, the basic simi­ larity effect itself is that generalizations between items that are not hierarchically nested will be determined by their similarity. For example, people will be more confident that turkeys will have sesamoid bones when the source category is chickens rather than spar­ rows, because chickens are more similar to turkeys than sparrows. In contrast, this section reviews more recent research that, because of its focus on causal relations, often uses “non-blank” features—features about which the reasoner has (or is Page 27 of 64

Concepts as Causal Models: Induction given) beliefs about the causal mechanisms via which the features arise. The next subsec­ tion reviews studies that have juxtaposed themselves against some of the standard simi­ larity-based findings just described. The second subsection presents extensions to causal models that have been proposed as accounts of a number of kinds of feature generaliza­ tions.

Relationship with Similarity-Based Effects Rehder and Hastie (2004) asked whether the coherence effect that influences judgments of category membership also influences object-to-category generalizations. The coher­ ence effect (reviewed in the companion chapter on categorization) consists of an interac­ tion between features such that the presence of a feature can be construed as positive ev­ idence for category membership when its causes and effects are also present, but as neg­ ative (or at least less positive) evidence when those causes and effects are absent (be­ cause of the violations of causal laws—cases where causes are present and effects are ab­ sent and vice versa—that are introduced). On the basis of the standard typicality effect just reviewed, Rehder and Hastie reasoned that an object that appeared to be a good cat­ egory member because it cohered with a category’s causal relations (and was thus more “typical”) should also support stronger generalizations. Subjects learned a category (e.g., Myastars) with four typical features. They were then presented with a particular category member that displayed a fifth novel feature and were asked what proportion of all catego­ ry members had that feature. (The novel feature was causally unrelated to the existing features and thus was a traditional blank feature.) The number of typical features of the category member was varied. Figure 21.6 a presents the results when the category’s typi­ cal features formed a common cause network. The right panel of Figure 21.6 a presents the generalization ratings as a function of the number of effects (i.e., Ys) present and whether the common cause feature (X) was present or not in the category member with the novel feature. The left panel presents the classification ratings for the same items. The left panel displays the interaction that characterizes the coherence effect: for test items in which X was present, items were judged as more likely category members to the extent that they had more features typical of that category (i.e., more Ys). But when X was absent, adding Ys led to virtually no increase (and sometimes a decrease) in ratings (because adding Ys when X is absent decreases the items’ coherence). The figure shows that coherence that had the same effect on generalization: coherent (i.e., typical) items supported strong generalizations and incoherent (i.e., atypical) items supported weak ones. Over three experiments that each tested three different network topologies, the correlation between classification and generalization ratings was never less than .95. These results indicate that the changes to item typicality brought about by inter-feature causal relations transfer directly to object-to-category generalizations. Relatedly, Patalano and Ross (2007) found that properties were generalized more strongly to categories that were rated as coherent on a variety of measures (e.g., skydivers) as compared to less co­ herent categories (e.g., joggers; Patalano et al., 2006).

Page 28 of 64

Concepts as Causal Models: Induction Rehder (2006) asked instead how the standard similarity-based effects on inductions are influenced when the causal knowledge involves the to-be-generalized feature itself (thus, a non-blank feature). Experiment 1 instructed subjects on artificial categories (e.g., Myas­ tars) with four typical features and then presented generalization trials in which a partic­ ular category member displayed a novel feature. On non-blank trials, the novel feature was accompanied by a causal explanation (e.g., the Myastar was said to experience fre­ quent gravitational fluctuations, which were caused by one of its typical features, high density), whereas on blank trials no explanation was provided. The second experimental factor was the typicality of the category member, which had one, two, three, or four typi­ cal features. (On non-blank trials, the source category member always possessed the typi­ cal feature that was described as the cause of the novel one.). The left panel of Figure 21.6 b reveals that, as expected, generalization ratings for blank features increased with the exemplars’ number of typical features, indicating that this experiment replicated the standard typicality effect. However, the effect of typicality was significantly reduced when a causal explanation was provided. Whereas Rehder’s (2006) Experiment 1 addressed typicality, Experiment 2 focused on di­ versity. Subjects were instructed on categories (p. 391) with five typical features. General­ ization trials then presented two category members with the same novel feature. The two category members exhibited either low diversity (i.e., they shared all five features) or high diversity (they shared only one feature). The two category members always each had three typical features so that their typicality was held constant across low- and high-di­ versity trials. Whether the novel property had a causal explanation was manipulated or­ thogonally. (Again, the category members always possessed the novel feature’s cause fea­ ture on non-blank trials.) The middle panel in Figure 21.6 b shows that whereas blank properties exhibited a modest diversity effect in which more diverse pairs of category members supported stronger generalizations, that effect was absent entirely for nonblank properties.

Page 29 of 64

Concepts as Causal Models: Induction

Figure 21.6 (a) Classification and generalization rat­ ings from Rehder (2003a) and Rehder and Hastie (2004), Experiment 1, respectively. (b) Generaliza­ tion ratings from Rehder (2006), Experiments 1 (left panel), 2 (middle panel), and 3 (right panel). Error bars are standard errors of the mean.

Finally, to assess how causal explanations affect the role of similarity, Experiment 3 asked subjects to generalize a property from one category member to another (thus, an objectto-object generalization). One experimental factor was whether the source and target ex­ emplars shared three of four typical features (high similarity) or only one (low). The source and target were chosen so that they always possessed three and two characteris­ tic features, respectively, so that their typicality was held constant over similarity condi­ tions. The other factor was whether a causal explanation was provided for the novel prop­ erty. The results are presented in the right panel of Figure 21.6 b as a function of the source/target similarity and whether the trial was blank or non-blank (and, if non-blank, whether the cause feature was present or absent in the target). As expected, blank prop­ erties were more strongly projected when the target (p. 392) exemplar was more similar to the source exemplar and non-blank properties were more strongly projected when the cause of that property appeared in the target exemplar versus when it did not. The impor­ tant finding is that the generalization of non-blank properties was much less sensitive to the similarity of the source and target (see Lassaline, 1996; Stephens, Navarro, Dunn, & Lee, 2009; and Wu & Gentner, 1998, for related findings). These results suggest that causal explanations direct attention away from the features not involved in the causal explanation such that overall typicality, diversity, and similarity often become irrelevant. Moreover, the influences of causal relations and similarity some­ times operate in an either-or manner. In Experiment 1, for example, the responses of half the subjects were completely determined by the causal relations and the other half by typicality. On the other hand, that Experiment 3 found a small but significant effect of Page 30 of 64

Concepts as Causal Models: Induction similarity on non-blank trials (and that this pattern was exhibited by the majority of the participants) reveals that causal-based and similarity-based reasoning can sometimes in­ fluence the same judgment. This conclusion will be reinforced by additional studies re­ viewed in the subsection “Models of Causal-Based Feature Generalization.” A large literature has established that many of these causal-based effects obtain with re­ al-world materials. For example, the similarity that often dominates category-to-category generalizations can be overturned when causal explanations are present. The well-known study of Heit and Rubinstein (1994) found that whereas a novel behavioral feature (e.g., travels in a zigzag path) was generalized more strongly to whales when tunas was the source category as compared to bears, the reverse was true for a novel biological feature (e.g., a liver with two chambers that acts as one). This result obtained despite the fact that the categories involved (bears, tunas, and whales) were the same and thus so too were the inter-category similarity relations. Apparently, participants thought that bears and whales share biological properties because such properties are likely to arise from causal mechanisms associated with mammals. Tunas and whales are instead more likely to share behaviors because they are both prey animals living in a common ecology (also see Coley & Vasilyeva, 2010; Sloman, 1994, 1997; Smith, Shafir, & Osherson, 1993; Springer & Keil, 1989).7 In some circumstances, feature generalizations are determined not by the causal rela­ tions among features, but by those between the source and target categories themselves. For example, Medin, Coley, Storms, and Hayes (2003) found that blank properties were generalized more strongly to an animal (e.g., cows) from something it eats (grass) than in the reverse direction. Similarly, in their investigation of how fishermen reason about species of fish, Shafto and Coley (2003, Experiment 2) found that diseases were general­ ized more strongly from a prey species to a predator than from predator to prey. Like the other findings reviewed earlier, this phenomenon—known as the causal asymmetry effect —can override similarity based effects. For example, Medin et al. (2003) found that a more diverse pair of categories (e.g., cats and sparrows, a mammal and a bird) resulted in weaker generalizations of a blank property to lizards as compared to a less diverse pair (cats and rhinos, two mammals), because the fact that cats eat sparrows suggest a mech­ anism by which they but not lizards share the feature. Finally, inter-category relations in­ teract with the type of feature being generalized. Shafto, Kemp, Bonawitz, Coley, and Tenenbaum (2008) found that predators and prey exhibited a causal asymmetry effect for diseases but not genes. Presumably, reasoners recognized that genes are not readily transferred via ingestion and so generalized them on the basis of taxonomic similarity in­ stead. Many of these effects have been shown to vary over subject groups depending on their domain knowledge, which presumably includes knowledge of causal relations. Lopez, Atran, Coley, Medin, and Smith (1997) found that US undergraduates but not the Itza’ Maya (an indigenous population in central Guatemala) exhibited diversity effects (also see Bailenson, Shum, Atran, Medin, & Coley, 2002). Groups of North American tree ex­ perts tested by Proffitt, Coley, and Medin (2000) exhibited diversity inconsistently and Page 31 of 64

Concepts as Causal Models: Induction typicality effects not at all (also see Lynch, Coley, & Medin, 2000). The causal asymmetry effect exhibited by fishermen in Shafto and Coley (2003) was absent in university under­ graduates. What these studies have in common is that the to-be-generalized property was a disease and the experts used their ecological knowledge to generalize on the basis of specific causal mechanisms (e.g., predator–prey relationships, species’ geographic distri­ bution, their susceptibility to disease, etc.). Researchers have asked whether the effects of causal knowledge depend on cognitive load and the amount of time in which subjects must respond. Shafto, Coley, and Baldwin (2007) investigated the generalization of novel features (genes or diseases) (p. 393) across pairs of categories that were either from the same taxonomic superordinate category but a different habitat (e.g., tigers and camels) or from the same habitat but a different su­ perordinate (tigers and parrots). When subjects had 15 seconds to respond, the general­ ization of genes was stronger for taxonomic pairs, whereas the generalization of diseases was stronger for ecological pairs (also see Shafto, Coley, & Vitkin, 2007; cf. Heit & Rubin­ stein, 1994). The latter result is sensible given that a common ecology provides multiple causal pathways for the spread of disease (e.g., proximity, predator–prey relations, etc.). But when given only one second to respond, the effect disappeared for diseases but not genes, a result the authors attributed to the greater availability of taxonomic knowledge (cf. Ross & Murphy, 1999). More recently, however, Bright and Feeney (2014, Experiment 1) found the opposite result when controlling for the strength of association between the two categories: namely, diseases were generalized more strongly for causally related pairs (e.g., fly and frog) for both short- and long-response deadlines, but types of cells were only generalized more strongly for taxonomic pairs (e.g., fly and ant) at long dead­ lines. On the other hand, Bright and Feeney’s Experiment 2 found a causal asymmetry ef­ fect when subjects were under light cognitive load but not when they performed a sec­ ondary task. On balance then, these results suggest that causal generalizations may re­ quire considerable time and cognitive resources to compute (cf. Luhmann et al. 2006).

Developmental Studies The literature on children’s inductive inferences is also quite large and supports the view that causal relations affect children’s generalizations. Considerable work has investigated how children’s systems of categories develop so as to support the projection of features (Carey, 1985; Keil, 1989, 1995). For example, the classic study of Gelman and Markman (1986) found that the generalization of a feature from one animal to another were stronger when the two shared category membership rather than perceptual features (also see Gelman, 1988; Gelman & Coley, 1990; Gelman & O’Reilly, 1988; Lopez, Gelman, Gutheil, & Smith, 1992). Results like these often have a causal interpretation (animals in the same category share underlying causal processes and so features), but they speak most directly to children’s assumptions regarding the distribution of non-obvious (e.g., in­ ternal) properties rather than causal reasoning per se (Gelman & Wellman, 1991; New­ man, Herrmann, Wynn, & Keil, 2008; Newman & Keil, 2008; Sobel et al., 2007; Walker, Lombrozo, Legare, & Gopnik, 2014). Moreover, some have argued that these phenomena have similarity-based accounts (e.g., Sloutsky & Fisher, 2004, 2008, 2012). Nevertheless, Page 32 of 64

Concepts as Causal Models: Induction especially unambiguous evidence for the use of causal relations comes from a study by Hayes and Thompson (2007). Children (5- and 8-year-olds) and adults were given two source objects each with three features and a causal relation between two of them. They were then presented with a third target object with one feature from each source and asked which of two target features (one from each source) it was more likely to possess. Critically, one of the target features was known (on the basis of the information in the two source objects) to be an effect of a feature in the target, whereas the other was not. All three groups chose the causally supported feature (cf. Rehder, 2006, Experiment 3). Whereas in Experiments 1 and 2 the perceptual similarity of the target to the two sources was equated (one feature each), the causally supported feature was also favored in a third experiment even when the source with the applicable causal relation had fewer features than the other source. (This latter result obtained with the children so long as the causal relations in the source objects were explicitly pointed out.) Related evidence comes from the Opfer and Bulloch (2007), who found that the similarity of insects’ par­ ents determined not only category membership but also how a novel feature (a property inside the blood) generalized; that is, causal relations can trump similarity in generaliza­ tions not only in adults but also in children as young as 5 years old. Just like adults, children’s inductions vary depending on their experience with a domain. Coley (2012) used a forced-choice task in which subjects (adults and 6-, 8-, and 10-yearolds) were given a source category with a novel feature (either “stuff inside called andro” vs. “a disease called andro”) and were asked to choose whether it was more likely to gen­ eralize to a taxonomically related category or one that was ecologically related. Consis­ tent with the results of Heit and Rubinstein (1994) and Shafto et al. (2007), adults exhibit­ ed the expected interaction in which the inside “stuff” and the disease were generalized more strongly to the taxonomic and ecological category, respectively. The key results con­ cern the 6-year-olds, who exhibited this interaction when they came from a rural but not a suburban or urban environment, again suggesting that the experience with an ecology (and the (p. 394) causal interactions that operate within) influence inductions (also see Inagaki, 1990; Medin, et al., 2010; Ross et al., 2003; Tarlowski, 2006; Waxman et al., 2007). By age of 10, children from all three environments exhibited the more adult-like property/target category interaction

Models of Causal-Based Feature Generalization A number of models have been proposed to extend the causal model approach to these sorts of generalization phenomena. Note that because causal models represent semantic information (e.g., about a category) that can be used to reason about specific objects (e.g., category members), they can be readily applied to the category-to-object feature predictions reviewed earlier. In contrast, additional assumptions are required for feature generalizations that project a feature from an object onto a category’s causal model (an object-to-category generalization), another object from another category (an object-to-ob­ ject generalization), or from one category to another (a category-to-category generaliza­ tion). The following subsection reviews models designed to account for these cases. Page 33 of 64

Concepts as Causal Models: Induction Object-to-Category Generalizations Rehder (2009) extended the causal model approach to object-to-category generalizations by assuming that they occur in two steps. First, reasoners compute, for every potential category member that they think of, the likelihood that the to-be-generalized novel prop­ erty is present in that object. Second, those probabilities are summed, each weighed by the probability that the object is a category member. That is, the prevalence of a novel property N in category k is

(6)

The probability that object o has N,

, is determined by the rules of causal-

based feature prediction, previously discussed in the section “Causal-Based Feature Pre­ diction.” The predictions implied by Equation 6 were tested in a series of experiments. First, most of the instances of causal-based of generalization presented earlier (e.g., in Figure 21.6 b) were cases in which the novel feature was an effect of an existing category feature; Equa­ tion 6 predicts that generalizations should also be supported when it is a cause. Experi­ ment 1 of Rehder (2009) taught subjects categories via standard supervised classification. Of 16 training items, 8 were members of the target category (e.g., Myastars) and 8 were non-members. Of the four stimulus dimensions, two of four were of high validity (the typi­ cal value appeared in 7/8 of category members and 1/8 of non-members) and two were of low validity (5/8 and 3/8). Subjects were then presented with generalization trials in which a novel feature was described as causally related to one of the category’s typical features, (hereafter referred to as the base feature), as shown in Figure 21.7 a. One factor was whether the base feature was described as a cause or effect of the novel feature (N in Figure 21.7 a). Another was whether the base feature was of high validity (H in Figure 21.7 a) or low validity (L). Subjects were asked to judge the proportion of category mem­ bers that had the novel feature. The results in Figure 21.7 a reveal two important find­ ings. First, generalizations were stronger for high versus low validity base features. This result follows from Equation 6 because for those potential category members (the os) with the base feature (and thus likely to also possess N), is generally higher when the base feature is of high versus low validity. Second, this effect obtained even when the novel feature was a cause. That is, causal knowledge promotes generalizations not only through forward reasoning but through diagnostic reasoning as well. The same pattern of results obtained when the validity of the typical features was conveyed via direct instruc­ tion rather than experience via supervised training. Experiment 2 tested the effect of varying the strength of the causal link with the novel feature as shown in Figure 21.7 b. As in Experiment 1, the novel feature was either a cause or an effect of an existing category feature. But the causal link with the novel fea­ ture was described as either 100% effective (strong condition) or 67% (weak) (and that Page 34 of 64

Concepts as Causal Models: Induction there was no other cause of the effect). The causal model prediction when N is an effect is straightforward: N will be more prevalent when the causal link that produces it is more likely to operate. But when N is a cause, its prevalence should decrease as the causal links gets stronger, because a stronger causal link allows a cause to be less prevalent to account for the prevalence of the effect. Unlike Experiment 1, subjects learned the validi­ ties of each typical category features (67%) through direct instruction, rather than through classification learning. The results in Figure 21.7 b show the predicted interac­ tion: generalization ratings increased with causal strength when N was an effect but de­ creased when it was a cause. (p. 395)

Figure 21.7 Generalization ratings from Rehder (2009). (a) Experiment 1. (b) Experiment 2. (c) Ex­ periment 3. Error bars are standard errors of the mean.

Experiment 3 varied the number of typical features to which the novel property was causally related between one and three, as shown in Figure 21.7 c. The results con­ firmed the intuition that generalizations would be stronger when the novel property was involved in more causal relations. For example, when N is an effect, it will be generated (p. 396)

more often when it has three causes that generate it, rather than just one. Of course, the results in Figure 21.7 again reveal that feature generalizations are promoted by diagnos­ tic as well as prospective reasoning.8 Whereas the results in Figure 21.7 c generally support the model represented by Equa­ tion 6, Rehder (2009) also reported individual difference analyses that showed that in sev­ eral experiments a substantial minority of subjects failed to exhibit any sensitivity to the direction of the causal relations. This finding echoes those reported by Bright and Feeney (2014) reported earlier: given that causal-based generalizations require considerable cog­

Page 35 of 64

Concepts as Causal Models: Induction nitive resources to compute, it may not be surprising that some subjects fall back on sim­ pler, “associative” reasoning strategies (cf. Rehder, 2014b). Others have found evidence of diagnostic reasoning in support of generalization. Had­ jichristidis, Sloman, Stevenson, and Over (2004) investigated category-to-category gener­ alizations by manipulating the number of category features that “depend on” the to-begeneralized property. For example, whereas some subjects were told that many of a seal’s physiological functions depend on a particular hormone, others were told that only a few did. Still others were given no information about the hormone’s dependents. Subjects were then asked to rate the likelihood that the hormone appeared in another category (e.g., dolphins). Generalizations ratings were highest when the hormone had many depen­ dents and lowest when no information about dependents was provided. Whereas Had­ jichristidis et al. interpreted these findings in terms of Sloman et al.’s (2008) dependency model (a hormone with many dependents was more strongly generalized because it was more “central”), Rehder (2009) interpreted them as reflecting diagnostic reasoning: be­ cause dolphins are similar to seals, they can be assumed to possess many of the same physiological functions, which can then be used to infer the presence of the hormone. Consistent with this interpretation, Hadjichristidis et al. found that the effect of the num­ ber of dependents decreased when the target category was less similar to the source (sparrows rather than dolphins). Because sparrows are unlikely to have many of a seal’s physiological functions, they are also unlikely to have the hormone.

Object-to-Object Generalizations While inter-feature causal relations often dominate similarity-based effects in feature gen­ eralization, we have seen how similarity sometimes continues to have an influence even on non-blank features (e.g., Rehder, 2006, Experiment 3, Figure 21.6 b, and Hadjichris­ tidis et al., 2004). Kemp et al. (2012) developed a hybrid model that combines the effects of both causal reasoning and taxonomic similarity when features are generalized across objects from different categories. This model, summarized in Figure 21.8 a, includes two important innovations. First, categories are organized in a hierarchy and the path length between any two categories in the hierarchy represents their similarity. According to the hierarchy in Figure 21.8 a, a mouse is most similar to a rat, less similar to a squirrel, and least similar to a sheep. Similarity then influences feature generalization (e.g., a rat will be more likely than a sheep to possess a novel mouse feature, all else being equal; also see Tenenbaum, Kemp, & Shafto, 2007, and Kemp & Tenenbaum, 2009). Second, inter-feature causal relations are represented in a way that allows the generaliza­ tion of their operation to also be influenced by similarity, namely, as a type of functional (causal) model (Pearl, 2000). For example, suppose mice have features X, Y, and Z, relat­ ed in a causal chain, . Whereas the causal models considered thus far have treated causal links as a probabilistic relation, in Figure 21.8 a and are de­ terministic and TXY and TYZ (T for “transmission”) are unobserved variables that function as enabling conditions. That is, X and TXY are conjunctive causes in that both must be present in order to produce Y (and likewise for Y, TYZ, and Z). In addition, BX, BY, and BZ are deterministic but unobserved (“background”) causes of X, Y, and Z, respectively. The Page 36 of 64

Concepts as Causal Models: Induction advantage of representing causal relations as a functional model is that the T and B vari­ ables, though unobserved, are themselves subject to similarity-based generalization. Re­ sults from Kemp et al.’s Experiment 2 illustrate this principle. Subjects were told that they would learn the results of tests of a number of enzymes in a mouse, rat, squirrel, and sheep. They were also given causal knowledge relating the enzymes (and shown training data that instantiated those relations); here I focus on the condition in which three en­ zymes, X, Y, and Z, were described as forming a causal chain, . They were then told that the mouse had tested positive for X (a situation represented in the left side of Figure 21.8 b) and were asked to predict the likelihood that each (p. 397) animal had each enzyme. The results in the right side of Figure 21.8 b show that subjects were sensi­ tive to both similarity and causal relations. X was generalized most strongly from the mouse to the rat and most weakly to the sheep, reflecting similarity. And, Y was rated more likely in the mouse than Z, reflecting causality. Importantly, both factors combined: Y was rated more likely in the rat than the sheep, suggesting that the (similarity-based) inference from X in the mouse to X in the rat was followed by a (causal-based) inference from X in the rat to Y in the rat.

Figure 21.8 (a) A taxonomic representation of four categories, capturing the fact that, for example, mice are more similar to rats than sheep. (b and c) Gener­ alization ratings from Kemp et al. (2012), Experiment 2. Error bars are standard errors of the mean. White plot symbols in the charts represent information that was given to the subjects. Variables settings (e.g., X = 1) reflect the state of the variable as either present (1) or absent (0).

After the first set of test questions, subjects were given the additional information that the rat, squirrel, and sheep also had X but that the mouse did not have Y and were then asked to again estimate the present of each enzyme. The results in Figure 21.8 c illustrate how the failure of a causal link to operate in one animal can generalize to anoth­ (p. 398)

Page 37 of 64

Concepts as Causal Models: Induction er: because didn’t operate in the mouse (X is present but Y is absent), subjects ap­ parently reasoned that it was also unlikely to operate in the rat, and gave a low likelihood rating to Y as a result. That this generalization was also sensitive to similarity is reflected in the much higher rating for Y in the least similar category (sheep). The Kemp et al. mod­ el reproduces this effect because the absence of the enabling conditions for in the mouse (e.g., variable TXY in Figure 21.8 a) was generalized, along with all the other vari­ ables, according to inter-category similarity. Kemp et al. demonstrated that their model was also able to reproduce the results for a number of topologies besides causal chains. Note that unlike Experiment 1 from Rehder (2006) (but like Experiment 3 in that study), Kemp et al. found that the results in Figure 21.8 were manifested by the majority of sub­ jects, demonstrating that causal- and similarity-based inferences can simultaneously influ­ ence the same judgment (also see Stephens et al., 2009).

Category-to-Category Generalizations Shafto et al. (2008) developed a model of the feature generalizations made on the basis of inter-category causal relations reviewed earlier, namely, predator–prey relations (Medin et al., 2003; Shaft & Coley, 2003). This model consists of a causal network—a “food web”—whose nodes represent categories rather than features and whose edges represent the causal transmission of diseases between species. Nevertheless, it includes the same types of parameters seen previously, namely, ones that represent the strength of the causal relations (what Shafto et al. referred to as the “transmission rate”) and alternative causes (the “background rate”); it also assumes the standard noisy-OR integration func­ tion. From this causal model, a joint distribution can be derived that represents how dis­ eases are likely to be distributed over species; from the joint, any conditional probability (i.e., that one animal has a disease if another does) can be derived. An innovative aspect of this model is that it is abstract—it applies to any property that might be transmitted through ingestion. Shafto et al.’s Experiment 1 asked subjects to generalize transmittable properties (e.g., “bacteria XD”) across three species related by predator–prey relations (e.g., carrots-rab­ bits-fox). Their findings revealed a causal asymmetry effect (e.g., bacteria XD was gener­ alized more strongly from carrots to rabbits than vice versa) and also a causal distance ef­ fect (XD was generalized more strongly from carrots to rabbits than to fox). These find­ ings also obtained when the animal categories were “blank” (were referred to as “Animal X,” “Animal Y,” etc.) and subjects were instructed on the predator–prey relations (Experi­ ment 2) and for more complicated causal network topologies (Experiments 2 and 3). The Shafto et al. model can reproduce the causal asymmetry effect, assuming that all nodes have the same background rate and the causal distance effect assuming a transmission rate < 1 (for simple three-node chain networks). Also as predicted, the pattern of results changed when a property presumably not transmittable via ingestion was tested: the gen­ eralization of “gene XD” was dominated by the categories’ taxonomic similarity, rather than their food web.

Page 38 of 64

Concepts as Causal Models: Induction

Causal-Based Category Discovery This third and final section asks how the causal regularities that obtain in the world af­ fect what categories are formed in the first place. According to the Kemp and Jern (2013) taxonomy alluded to earlier, learning which categories there are is a problem of category discovery. Since the 1970s, a standard view of the origins of humans’ representations of categories is that they reflect the objective similarity-based (i.e., “family resemblance”) clusters that obtain in the environment (Rosch, 1978), and numerous computational mod­ els that simulate this process have been proposed (e.g., Anderson, 1991; Fisher, 1987; Love, Medin, & Gureckis, 2004; Pothos & Chater, 2002; Pothos, Chater, & Hines, 2011; Sanborn, Griffiths, & Navarro, 2010). But research has begun to assess how those repre­ sentations are also determined by causal knowledge. The first subsection below presents studies that have assessed the effect of causal relations among category features, where­ as the second subsection presents those examining causal relations among the categories themselves.

Category Discovery and Inter-Feature Causal Relations Researchers interested in the origins of concepts have used the category construction task in which (p. 399) subjects are asked to sort items with multiple features into whatev­ er categories they find most natural. Note that category construction studies differ from many of those reported in the category learning literature that used supervised learning (i.e., subjects learn by trying to classify items and getting immediate feedback about whether they were right or wrong). Although supervised learning sheds light on basic learning processes, it says little about how people naturally group items together (be­ cause that grouping is determined by the experimenters). By virtue of being unsupervised (people are free to sort items however they choose), the category construction task fills that gap. One main finding from this literature is that, surprisingly, people generally do not freely sort items into the similarity-based clusters that characterize natural categories. Instead, they usually sort on the basis of a single feature dimension (Medin, Wattenmaker, & Hampson, 1987; Regehr & Brooks, 1995; also see Ahn & Medin, 1992). In a review, G. L. Murphy (2002) interpreted this finding to indicating that a tendency to form similaritybased clusters is overridden by an even stronger bias (perhaps a product of Western edu­ cation) to identify a single property that explains a phenomenon. Importantly, though, this uni-dimensional sorting bias can itself be overridden in particular circumstances, namely, when there is semantic knowledge that links the features of the categories (Kaplan & Murphy, 1999; Lassaline & Murphy, 1996; Medin et al. 1987; Spalding & Murphy, 1996). Here I review a study by Ahn (1999) that asked how the categories that people construct varies with the introduction of inter-feature causal relations. Subjects were presented with 10 index cards that each presented an object with features drawn from four trinary dimensions. For example, in one set of materials the dimensions reflected features of houses: floor type (concrete, wood, or straw), wall type (reinforced, Page 39 of 64

Concepts as Causal Models: Induction half-reinforced, or non-reinforced), number of windows (3, 6, or 10), and door size (small, medium, and large). The features were distributed among objects so that they naturally formed two “family resemblance” clusters: half the cards had at least three out of four of one set of features (concrete floor, reinforced wall, 10 windows, large doors), which thus played the role of the prototype of one cluster. The other half had at least three out of four of another prototype (straw floor, non-reinforced wall, 3 windows, small doors). The causal knowledge that accompanied these materials was varied. In a common cause con­ dition, subjects were instructed on causal knowledge that linked the four stimulus dimen­ sions to a fifth variable. For buildings that variable was whether that the house was de­ signed for a public or private purpose. Reasons were given for why (a) a public house would need concrete floors, reinforced walls, many windows, large doors, and (b) a pri­ vate house would need straw floors, non-reinforced walls, few windows, and small doors. (Note that the to-be-sorted index cards did not specify whether a house had a public or private function.) In a common effect condition, each of the dimensions was described as a cause of a fifth variable. In a third condition, the dimensions caused each other so as to form a causal chain, and in a fourth, control condition, no additional knowledge was pro­ vided. Subjects were then asked to sort the 10 cards into two categories (of any size). As expected, most sorts in the control condition reflected the familiar uni-dimensional bias. But those in the common cause and common effect conditions instead reflected the family resemblance clusters. Interestingly, sorts in the chain condition were also uni-dimension­ al and did not differ significantly from those in the control condition. These results establish that inter-feature causal knowledge can affect how people choose to cluster items into categories. Nevertheless, whether it does so depends on the topolo­ gy of the causal relations. Ahn concluded that what distinguishes common cause and common effect structures is that they provide the means to assimilate features to a single explanatory property (in the case of buildings, whether they have a public or private func­ tion). In this regard, her results might be interpreted as reflecting a uni-dimensional strategy, with the caveat that the dimension is itself not directly observable. Rather, its presence has to be inferred from the observable features. The chain structure, in con­ trast, provided no single variable around which to unify the observable features. Nonethe­ less, these results establish that inter-feature causal relations can affect the categories that people spontaneously form.

Category Discovery and Inter-Category Causal Relations A traditional view from philosophy is that categories are delineated not by similarity, but instead by their causal relations with each other (Lewis, 1929). The intimate connection between categories and causal relations becomes apparent when one recognizes that a causal relation is between one type of object or event that serves as the cause and (p. 400) another that serves as the effect. Thus, in situations where causal learners have some latitude regarding how to group objects, how they do so might be affected by the observed pairings of cause-and-effect objects and events.

Page 40 of 64

Concepts as Causal Models: Induction Indeed, Lien and Cheng (2000) presented evidence of how causal learning can affect how stimuli are classified. Their study asked subjects to learn how the addition of various kinds of substances to a type of flowering plant led to the plant blooming or not. In their Experiment 1, the substances varied along two continuous dimensions. One was color, which varied from midnight blue on one end of the spectrum to brown on the other. The other dimension was shape, which was varied from highly regular (exhibited high symme­ try) to irregular (no symmetry). These dimensions lend themselves to alternative classifi­ cation schemes. The colors could be grouped at an intermediate level (blue, green, yel­ low, and red), a more subordinate level (the greens could be divided into forest green, pine green, and light aqua), or a more superordinate level (e.g., the blues and greens could be grouped as “cool” colors and the yellows and reds could be grouped as “warm” colors). The shape dimension had analogous alternative hierarchical groupings. In one condition, subjects observed learning data in which the addition of substances with cool colors to the flowering plants in a particular garden usually led to blooming, whereas ones of warm colors usually did not. In another, irregular shape predicted blooming and regular shape predicted its absence. The results from a subsequent test that presented substances that were outside the range of the training stimuli (e.g., ones with a very warm color [orange] or a very irregular shape) suggested that subjects had induced a causal relation at the highest level of abstraction (e.g., cool colors blooming). The au­ thors interpreted this finding as indicating that causal learners induce categories at that level that affords the greatest predictability.9 This conclusion was bolstered by the addi­ tional finding that the induced causal rule affected how subsequent data from a novel context (i.e., to gardens other than the one in which original training data was presented) were interpreted. These findings amount to an existence proof that inter-object causal regularities provide one source of constraint on how objects are categorized. Yet one might ask whether that constraint is always sufficient to overturn a system of categorization that is already estab­ lished. A study by Waldmann and Hagmayer (2006) suggests that it is not. As in Lien and Cheng, the stimuli varied along two dimensions: the size of a virus and its brightness. Un­ like Lien and Cheng, however, subjects first learned how to classify these items before ob­ serving any causal learning data. In Waldmann and Hagmayer’s Experiment 3, for exam­ ple, subjects in one condition learned to classify the viruses into categories (labeled allovedic and hemovedic) that were defined by size. This condition is illustrated in Figure 21.9 a in which the larger (allovedic) viruses presented during training are shown in the bottom half of the grid and the smaller (hemovedic) viruses are shown in the top half. In a second condition, brightness was the defining dimension: in Figure 21.9 b, the darker, allovedic viruses are on the right and the lighter, hemovedic viruses are on the left. This category learning phase was followed by a causal learning phase in which subjects were told that certain viruses caused a disease-related symptom called splenomegaly (a swelling of the spleen), in the manner shown in Figure 21.9 c: only viruses below and to the left of the diagonal line were accompanied by splenomegaly. Importantly, Figure 21.9 c indicates that the allovedic–hemovedic distinction was an imperfect predictor of splenomegaly, regardless of whether those categories were defined by size or brightness. Page 41 of 64

Concepts as Causal Models: Induction In a subsequent test phase, subjects were presented with viruses not seen in Phase 2 and were asked whether they led to splenomegaly. To determine whether the allovedic and he­ movedic categories would affect the generalization of splenomegaly, Waldmann and Hag­ mayer focused on the four test items shown in Figure 21.9 d. Consider the test items in the upper left quadrant. For those subjects who learned that allovedic and hemovedic viruses were distinguished by size (Figure 21.9 a), assimilating the causal learning data (Figure 21.9 c) to the categories would lead one to conclude that allovedic viruses usually cause splenomegaly and thus that the upper left test items (which are hemovedic for these subjects) would not result in splenomegaly. Conversely, for those who learned that allovedic and hemovedic viruses were distinguished by brightness (Figure 21.9 b), assimi­ lating the causal learning data would lead one to conclude that hemovedic viruses usually cause splenomegaly and thus that the upper left test items would result in splenomegaly. This is what was found: the likelihood that those viruses would yield splenomegaly was lower in the size condition than in the brightness condition. An analogous result obtained for the test viruses in the lower right quadrant of Figure 21.9 c. Importantly, these (p. 401) results obtained despite the fact that the causal data implied that the most predic­ tive categories were those defined by the diagonal line in Figure 21.9 c. Clearly, although causal data provide one constraint on categories (Lien & Cheng, 2000), that constraint does not obligate learners to abandon a pre-existing system of categories.

Figure 21.9 Design of Experiment 3 of Waldmann and Hagmeyer (2006). (a) A category distinction be­ tween the allovedic and hemovedic viruses based on size. (b) A category distinction based on brightness. (c) Causal learning data indicating which viruses led to splenomegaly. (d) Critical test items assessing how splenomegaly was generalized.

Nevertheless, another set of conditions in Waldmann and Hagmayer’s Experiment 3 re­ veals that causal learners’ commitment to pre-existing categories is not absolute. In the Page 42 of 64

Concepts as Causal Models: Induction causal learning phase of those conditions, subjects were informed that the viruses in Fig­ ure 21.9 c had been shown to designers who judged whether people would like their (p. 402) abstract pattern. Rather than being told whether the viruses led to splenomegaly, these subjects were then told which viruses were liked by the designers. When generaliz­ ing the likability judgments to the test items in Figure 21.9 d, the effect of whether the initial categories were defined by size or brightness disappeared. Waldmann and Hag­ mayer concluded that this result obtained because, although a type of virus, by virtue of being a natural kind, can easily be conceived as responsible for a biological feature like splenomegaly, it is unlikely to be viewed as lawfully related to whether people like its ab­ stract shape—that is, although existing categories can affect how causal data are inter­ preted, whether they do so depends on the plausibility of those categories as a potential cause of the effect. Marsh and Ahn (2009) demonstrated yet another way in which causal learning affects how stimuli are classified. Subjects viewed a sequence of causal learning trials that rep­ resented how one-dimension stimuli (e.g., bacteria that varied in height) covaried with a compound (e.g., a type of protein). In their Experiment 1, a strong positive correlation ob­ tained such that tall bacteria predicted the presence of the protein and short ones pre­ dicted its absence. However, a subset of ambiguous trials presented bacteria of medium height. In the ambiguous cause, effect present (AE) condition, those bacteria were always accompanied by the protein; in the ambiguous cause, effect absent condition ( ), they never were. In a subsequent test, subjects estimated how many instances of each of the non-ambiguous trial types they had seen. Subjects in the AE condition estimated that they had seen more tall-bacteria/protein-present trials than those in an appropriate control condition. Because the actual frequencies were in fact equal in the two conditions, Marsh and Ahn concluded that the medium-bacteria/protein-present trials were interpreted as cases in which the bacteria were tall. Conversely, subjects in the condition overesti­ mated the frequency of short-bacteria/protein-absent trials, a result suggesting that the medium-bacteria/protein-absent trials were interpreted as cases in which the bacteria were short. In other words, causal learning data led subjects to alter how they grouped the medium bacteria (with the tall bacteria in the AE condition and with the short ones in the condition). In a subsequent experiment, Marsh and Ahn showed how these alter­ native interpretations of ambiguous stimuli led to differences in learners’ estimates of the causal strength between a cause and effect.

Summary of Category-Based Induction This chapter has reviewed studies that have examined the influence of causal knowledge on category-based induction. In feature prediction, inter-feature causal relations produce signature causal reasoning phenomena such as explaining away. In feature generaliza­ tion, causal knowledge has large effects on how novel features are generalized between categories, between objects, and from an object to a category. Finally, causal knowledge—

Page 43 of 64

Concepts as Causal Models: Induction both between features and the categories themselves—influences how people naturally cluster objects into categories. These findings contrast with early research in category-based induction that emphasized the influence of similarity relations. It is important to note, however, that experiments of that era often made use of materials that afforded no opportunity for subjects to make use of the sorts of inter-feature and object causal relations that are emphasized here. This chapter has shown that when such relations are available, attention is directed away from causally irrelevant properties such that the role of overall similarity is reduced and some­ times eliminated entirely. On the assumption that reasoners usually have at least some (perhaps vague) ideas regarding the mechanisms via which features arise, it is likely that use of causal knowledge in category-based induction is the norm rather than the excep­ tion. Despite the predictive successes of causal models, some systematic differences between the causal model-based predictions and human judgments were found. There were two prominent failures. One was that feature predictions often violate the patterns of condi­ tional independence stipulated by the Markov condition. Another was that the influence of alternative causes is often ignored in forward (cause to effect) inferences. One lesson to be taken from these demonstrations is that it is important to consider the possibility that subjects are reasoning on the basis of knowledge that is different from that assumed by the experimenters—an apparent reasoning error may disappear once an accurate rep­ resentation of the reasoners’ causal knowledge is obtained. Yet, it appears that some er­ rors—those reflecting an “associative” bias and a tendency to neglect alternative cause— cannot be attributed to knowledge. If good news can be taken from these failures, it is that they are not (p. 403) specific to categories, but rather reflect general principles of hu­ man causal reasoning. So, at this juncture there is every reason to believe that a com­ plete theory of human causal reasoning will provide a comprehensive account of causalbased inductive judgments. The category discovery studies reviewed in the previous section demonstrate that the causal interactions between types of entities influence how those entities are organized into types in the first place. Importantly, the reciprocal relationship between categories and causal learning suggest the need for models that integrate multiple learning inputs, and indeed such models have begun to appear. For example, Kemp, Goodman, and Tenen­ baum (2007, 2010) propose a hierarchical Bayesian model that simultaneously learns both how objects should be organized into kinds and how the kinds are causally related (and demonstrate how it accounts for data such as Lien and Cheng’s, and Waldmann and Hagmayer’s) (also see Griffiths & Tenenbaum 2009; Tenenbaum & Niyogi, 2003; Wald­ mann, 1996). The need for additional constraints on how categories are formed is key be­ cause of well-known issues associated with purely similarity-based clustering approaches, namely, the fact that the features that people perceive in objects (and which contribute to inter-object similarity) are themselves context dependent and vary depending on how items are classified (Austerweil & Griffiths, 2011, 2013; Goodman, 1970; Medin, Gold­ stone, & Gentner, 1993; Murphy & Medin, 1985). A full account of concept discovery will Page 44 of 64

Concepts as Causal Models: Induction require specification of how the learning of features, categories, and causal relations mu­ tually influence one another.

Summary of Concepts as Causal Models This chapter and its companion (“Concepts as Causal Models: Categorization,” Chapter 20 in this volume) have reviewed how treating concepts as causal models provide ac­ counts for two key phenomena, classification and category-based induction. The first sub­ section of this summary compares the causal model approach to alternative frameworks. The following sections then present some new problems to which causal models need to be applied and the ways in which the basic representational machinery of causal models needs to be extended. Some preliminary evidence that bears on those proposals is pre­ sented.

Alternative Models Causal models can be contrasted against the connectionist model of semantic knowledge proposed by Rogers and McClelland (2004, 2011). As a feed-forward network, this model accepts concepts (and contexts10) as inputs and yields concepts and features as outputs. As a result, it can verify propositions such as “robins have wings” and “robins are birds.” One problem faced by this model is that because features are not part of the input, this model is inapplicable to many of the problems addressed in these chapters, including cat­ egorization (a mapping from features to concepts) or feature prediction (a mapping from category labels and features to other features). Nevertheless, Rogers and McClelland suggest that a more complete implementation based on recurrent networks could address these limitations (cf. Kurtz, 2007; Rehder & Murphy, 2003). Of more fundamental concern is the model’s lack of any explicit representation of the inter-feature and inter-category causal relations that have been the focus of these chapters. Of course, networks extended with the appropriate connections may be able to learn the complex patterns of inter-fea­ ture and inter-category correlations implied by causal structure.11 However, this ignores both the many studies reviewed earlier that simply instructed subjects on causal relations without providing data, and those in which judgments varied with explicit causal beliefs when correlations were held constant (e.g., Malt & Smith, 1984). While learning from ex­ perience is obviously important, my own suspicion is that the large majority of causal in­ ferences that people draw daily are based on beliefs they have been told (by friends, fami­ ly, teachers, and the web) rather than the correlations they have observed. Other con­ cerns include whether such networks are suitable for capturing more complex represen­ tations, such as higher-order features and relations (Kemp & Jern, 2013). Of course, to be fair, advocates of structured representations (e.g., causal models) must stipulate how those representations come into being in the first place (Kemp & Tenenbaum, 2008, 2009; Kemp, Tenebaum, Niyogi, & Griffiths, 2010). Another alternative is Sloman et al.’s (1998) dependency model that models effects of in­ ter-feature causal relations (where a cause–effect relation is a type of “dependency”). Evi­ Page 45 of 64

Concepts as Causal Models: Induction dence in favor of this model was reviewed earlier in this chapter, where it was shown that a feature (e.g., a hormone) was generalized from one category to another (e.g., (p. 404) from seals to dolphins) more strongly when the feature was a cause of many other fea­ tures of seals (albeit this effect was moderated by the degree of inter-category similarity) (Hadjichristidis et al., 2004). Yet, I showed there that such results can be interpreted as an act of diagnostic reasoning in which the central feature’s effects are used to infer its presence in a target category. More decisive evidence against the dependency model was presented in the companion chapter on categorization, where it was shown that it ac­ counts for very few of the phenomena reported in the causal-based categorization litera­ ture (e.g., the coherence effect briefly presented here). In contrast, the causal model approach benefits from explicitly representing certain key properties of causal relations—properties that reasoners are especially sensitive to. In both categorization and induction, they are sensitive to causal direction, as indicated by the different kinds of inferences they draw for common cause and common effect net­ works (which are identical except for the direction of causality). They are sensitive to whether multiple causal influences combine—these chapters have presented multiple ex­ amples of how inferences vary as a function of whether causes combine independently or conjunctively. Finally, the magnitudes of the phenomena presented here were influenced by how the causal models were parameterized, namely, they increased as a function of the strength of the causal links. Results like these will remain a challenge for any account that fails to model these key aspects of causal knowledge.

New Tasks An important desiderata for any theory of conceptual representation is that it account for performance on multiple tasks. This is so because explanations of behavior that appeal to complex cognitive representations face the problem of identifiability, that is, the possibili­ ty that the behavioral phenomenon being explained arises instead from mental processes specific to the task (Anderson, 1990). Because a defining property of any mental repre­ sentation is that it be accessible from multiple mental processes, evidence for the psycho­ logical reality of a representation can accrue through converging operations (Danks, 2014; Jones & Love, 2011). In fact, the chapter on categorization demonstrated how two distinct types of judgments—category membership and the prevalence of features within a category—can be accounted for by a causal model (by mapping them onto its joint and marginal probabilities). And, the current chapter showed how feature predictions maps onto a causal model’s conditional probabilities. Of course, causal models have also been implicated in a variety of tasks not directly related to categories, such as causal reason­ ing more generally (Holyoak et al., 2010; Kemp & Tenenbaum, 2009; Kemp et al., 2012; Lee & Holyoak, 2008; Meder, Mayrhofer, & Waldmann, 2014; Oppenheimer, 2004), causal learning (Cheng, 1997; Gopnik et al., 2004; Griffiths & Tenenbaum, 2005; 2009; Lu et al., 2008; Sobel, Tenenbaum, & Gopnik, 2004; Waldmann & Holyoak, 1992), interventions and counterfactuals (Rips, 2010; Rips & Edwards, 2013; Sloman & Lagnado, 2005a; Wald­ mann & Hagmayer, 2005), and decision-making (Hagmayer & Meder, 2013; Hagmayer & Page 46 of 64

Concepts as Causal Models: Induction Sloman, 2009). Together, these demonstrations present converging evidence for the psy­ chological reality of causal models. One important goal for future research is to show how a causal model can account for the multiple judgment types generated by the same participant. Some new work has started in this direction. For example, Rehder (2014a) asked each subject to make judgments of category membership, conditional feature inferences, and feature likelihoods. By simulta­ neously fitting all the data from each subject, I showed that the same causal model (with the same parameters) could account for the three judgment types at once. There are other types of category-based judgments for which causal model accounts are still developing or absent entirely. For example, because information about some of an object’s features is often missing, classifiers sometimes have the option of acquiring it (e.g., doctors can choose which medical tests to order). Studies have shown that classi­ fiers generally choose sources of evidence that are diagnostic, that is, that increase cate­ gorization accuracy (e.g., Skov & Sherman, 1986; Slowiaczek et al., 1992) a conclusion corroborated by eye- and mouse-tracking studies (e.g., Blair, Watson, Walshe, & Maj, 2009; Kim & Rehder, 2011; Matsuka & Corter, 2008; Rehder & Hoffman, 2005a, 2005b). More recent work (Nelson, 2005, 2010) has assessed which of a number of quantitative measures (e.g., expected information gain, or EIG; Shannon, 1948) best describes their search performance. Preliminary studies suggest that those (p. 405) choices are strongly influenced by causal knowledge. For example, Martin and Rehder (2013) taught two con­ trasting categories, each with inter-feature causal relations, and then asked subjects to choose which dimension they would like to examine in order to classify an item. In fact, causal structure had a large impact on search choices; in particular, subjects were strong­ ly biased to sample a dimension if a value on a causally related dimension was already known (e.g., if a cause feature was present, select the corresponding effect dimension) (also see Morais, Olsson, & Schooler, 2011). Another important question concerns how people’s existing causal models of categories are integrated with observed category members. Some studies have shown that existing semantic relations between features alter how categories are learned (e.g., Murphy & Al­ lopenna, 1994; Rehder & Ross, 2001; Wattenmaker et al., 1986); others have given sub­ jects both a causal model and learning data and then tested their subsequent inferences (e.g., Meder et al., 2008, 2009; von Sydow et al., 2009, 2010; Waldmann & Hagmayer, 2005; see Rottman & Hastie, 2014, for a review). Yet few have specifically examined causal models and category members. One exception is Waldmann, Holyoak, and Fra­ tianne (1995), who found that the pattern inter-feature correlations that inhered in ob­ served category members interacted with whether the to-be-learned category’s features naturally formed a common cause and common effect causal network. This result pro­ vides yet more evidence that reasoners are sensitive to causal direction; Waldmann et al. found that the exact pattern of those correlations affected how the causal relations them­ selves were interpreted (cf. Rehder & Hastie, 2001). More research is needed to deter­

Page 47 of 64

Concepts as Causal Models: Induction mine how and to what extent the correlational structure of observed data is integrated in­ to a category’s causal model. Finally, too little research has considered how causal models relate to the fact that con­ cepts are related in a taxonomic hierarchy. On one hand, the study of Lien and Cheng (2000) reviewed earlier implies that causal relations can influence the hierarchical level at which items are naturally clustered. Yet, people have learned multiple clusters that give them the option of classifying the same object at different levels. No work has con­ sidered how the causal models of hierarchically nested categories, such as poodles, dogs, and animals, relate to one another, or what causal models have to say about one of the most important phenomenon in the field, namely, the existence of a basic level of catego­ rization (that an object will be labeled a dog before a poodle or an animal; Rosch et al., 1976). Further, the multiple categories to which an object belongs need not be nested, as in the case of cross-classification (a single individual can be a female, a mother, a runner, a board member, and a Libertarian). Little is known about whether and how multiple causal models are combined to, say, predict a feature of a single cross-classified individ­ ual.

New Representations Although the fact that the same representation can be applied to a diverse set of tasks ex­ hibits the power of causal models, it is also clear that this representation requires a num­ ber of extensions. One pressing problem is the presence of causal cycles in people’s rep­ resentations of concepts. For example, Kim and Ahn (2002) found that almost two-thirds of subjects’ causal models of mental disorders such as depression included cycles. Like­ wise, Sloman et al. (1998) found numerous cycles in subjects’ theories of everyday biolog­ ical kinds and artifacts. Cycles are a problem because causal models are stipulated to be acyclic. Two solutions for this problem have been offered. First, Kim, Luhmann, Pierce, and Ryan (2009) proposed that classifier’s treat causal models with cycles like the one in Figure 21.10 a as being “unfolded” in time so as to yield the new model in Figure 21.10 b, in which cycle has been replaced by (the second subscript represents time). Moreover, they proposed that features’ weights should be computed from the unfolded representation by applying the dependency model. Rehder and Martin (2011) proposed the use of dynamic Bayesian networks (DBNs) as a represen­ tation of cycles. DBNs represent how systems change over time (Dean & Kanazawa, 1989; Doshi-Velez, Wingate, Tenenbaum, & Roy, 2011; Friedman, Murphy, & Russell, 1998; Ghahramani, 1998; P. K. Murphy, 2002), and Bernard and Hartemink (2005) have advocated their usefulness in modeling causal cycles. On this account, the improper graphical model in Figure 21.10 a is replaced with the proper (dynamic) one in Figure 21.10 c. Finally, Rehder (in press) proposed that cycles should be represented as chain graphs, generalization of CGMs that permit the presence of undirected edges that can be interpreted as representing feedback relationships (Lauritzen & Richardson, 2002). While empirical tests of causal cycles are few, (p. 406) current evidence favors chain graphs (Re­ hder, in press). Page 48 of 64

Concepts as Causal Models: Induction

Figure 21.10 (a) A causal model with a cycles. (b) An unfolded dependency model represention. (c) A dy­ namic causal graphical model (DBN) with an unlimit­ ed number of unfolding. (d) A chain graph.

Another outstanding problem is the representation of uncertainty. For example, you may believe that all Roobans have sticky feet and that most of Myastars have high density and gravitational fluctuations and that the former causes the latter, but your confidence in these beliefs may vary depending on their source (e.g., samples that are large or small, individuals that are reliable or unreliable, etc.). The construct of uncertainty has become an important one in the field of causal learning. Griffiths and Tenenbaum (2005) reinterpreted the standard causal learning paradigm as one in which learners rate the relative merits of two structural hypotheses (one causal model in which the target causal link exists, and another where it does not). Lu, Yuille, Liljeholm, Cheng, and Holyoak (2008) proposed a parametric variant of this idea in which the prior distribution over a single causal model’s parameters is a two-dimensional density function on the strengths of the to-be-learned causal link and the alternative causes (also see Carroll, Cheng, & Lu, 2013; Meder et al., 2014). However, these proposals are oriented toward causal learning situations in which an assumption of mutual exclusivity is natural (the effect is produced by the target causal link or something else). McDonnell, Tsividis, and Rehder (2013) proposed a model for representing the uncertainty applicable to cases where the causal model is acquired via instruction, learning, or both. For example, Figure 21.11 represents the uncertainty associated with the causal model as a ß distribution on each of the model parameters. Because we assumed that beliefs about the parameters of a causal model come from multiple sources (e.g., instruction, first-hand observations, generic prior beliefs, etc.), the ß distributions are a result of integrating beliefs from each of those sources.

Concluding Remarks More than 25 years have now passed since the observation of how concepts of categories are embedded in the rich knowledge structures that make up our conceptual systems. What has changed is that insights regarding the effects of such knowledge have now been cashed out as explicit computational models. If anything is taken from these chapters, it Page 49 of 64

Concepts as Causal Models: Induction should be that the effects of causal relations are often large, broad (are manifested in vir­ tually all category-related judgments), and often eliminate the more traditional effects of similarity. As a result, key phenomena such as coherence effects (in categorization), ex­ plaining away (in feature prediction), and asymmetry effects (in feature generalization) have become (p. 407) benchmarks that all future models are obligated to account for. Causal models will not be the last word on concepts, of course, and phenomena captured by previous models (e.g., the abstraction of central tendencies by prototype models and the influence of individual cases by exemplar models) will continue to be important. But any model that fails to incorporate the causality that saturates our conceptual systems will remain a radically incomplete account of human categories.

Figure 21.11 A representation of uncertainty in a causal model.

References Ahn, W. (1999). Effect of causal structure on category construction. Memory & Cognition, 27, 1008–1023. Ahn, W., & Medin, D. L. (1992). A two-stage model of category construction. Cognitive Science, 16, 81–121. Ali, N., Chater, N., & Oaksford, M. (2011). The mental representation of causal condition­ al reasoning: Mental models or causal models. Cognition, 119, 403–418. Anderson, J. R. (1990). The adaptive character of thought. Hillsdale, NJ: Lawrence Erl­ baum Associates. Anderson, J. R. (1991). The adaptive nature of human categorization. Psychological Re­ view, 98, 409–429. Page 50 of 64

Concepts as Causal Models: Induction Austerweil, J. L., & Griffiths, T. L. (2011). A rational model of the effects of distributional information on feature learning. Cognitive Psychology, 63, 173–209. Bailenson, J. N., Shum, M. S., Atran, S., Medin, D. L., & Coley, J. D. (2002). A bird’s eye view: Biological categorization and reasoning within and across cultures. Cognition, 84, 1–53. Bernard, A., & Hartemink, A. (2005). Informative structure priors: Joint learning of dy­ namic regulatory network from multiple types of data. In R. Altman, A. K. Dunker, L. Hunter, T. Jung & T. Klein (Eds.), Proceedings of the Pacific Symposium on Biocomputing (pp. 459–470). Hackensack, NJ: World Scientific. Blair, M., Watson, M. R., Walshe, R. C., & Maj, F. (2009). Extremely selective attention: Eyetracking studies of dynamic attention allocation to stimulus features in categorization. Journal of Experimental Psychology: Learning, Memory and Cognition, 35, 1196–1206. Bright, A. K. & Feeney, A. (2014). The engine of thought is a hybrid: Roles of associative and structured knowledge in reasoning. Journal of Experimental Psychology: General, 143, 2082–2102. Burnett, R. C. (2004). Inference from complex causal models (Doctoral dissertation). Re­ trieved from ProQuest Dissertations and Theses. (UMI No. 3156566) Carey, S. (1985). Conceptual change in childhood. Cambridge, MA: MIT Press. Carroll, C. D., Cheng, P. W., & Lu, H. (2013). Inferential dependencies in causal inference: A comparison of belief distribution and associative approaches. Journal of Experimental Psychology: General, 142, 845–863. Cheng, P. (1997). From covariation to causation: A causal power theory. Psychological Re­ view, 104, 367–405. Coley, J. D. (2012). Where the wild things are: Informal experience and ecological reason­ ing, Child Development, 83, 992–1006. Coley, J. D., & Vasilyeva, N. Y. (2010). Generating inductive inferences: Premise relations and property effects. In B. H. Ross (Ed.), The psychology of learning and motivation (Vol. 53, pp. 183–226). Burlington, VT: Academic Press. Danks, D. (2014). Unifying the mind: Cognitive representations as graphical models: Cam­ bridge, MA: MIT Press. Doshi-Velez, F., Wingate, D., Tenenbaum, J. B., & Roy, N. (2011). Infinite dynamic Bayesian networks. Paper presented at the Proceedings of the 28th International Conference on Machine Learning (ICML), pp. 913–920. Dean, T., & Kanazawa, K. (1989). A model for reasoning about persistence and causation. Computational Intelligence, 5, 142–150. Page 51 of 64

Concepts as Causal Models: Induction Feeney, A. & Heit, E. (2007). Inductive reasoning: Experimental, developmental, and com­ putational approaches. New York: Cambridge University Press. Fernbach, P. M., Darlow, A., and Sloman, S. A. (2010). Neglect of alternative causes in predictive but not diagnostic reasoning. Psychological Science, 21, 329–336. Fernbach, P. M., Darlow, A., and Sloman, S. A. (2011a). Asymmetries in predictive and diagnostic reasoning. Journal of Experimental Psychology: General, 140, 168–185. (p. 409)

Fernbach, P. M., Darlow, A., and Sloman, S. A. (2011b). When good evidence goes bad: The weak evidence effect in judgment and decision-making. Cognition, 119, 459–467. Fernbach, P. & Rehder, B. (2012). Toward an effort reduction framework for causal infer­ ence. Argument and Computation, 4, 1–25 Fernbach, P. M., & Erb, C. D. 2013. A quantitative causal model theory of conditional rea­ soning. Journal of Experimental Psychology: Learning, Memory, & Cognition, 39, 1327– 1343. Fisher, D. H. (1987). Knowledge acquisition via incremental conceptual clustering. Ma­ chine Learning, 2, 139–172. Friedman, N., Murphy, K. P., & Russell, S. J. (1998). Learning the structure of dynamic probabilistic networks. In Proceedings of the fourteenth conference on uncertainty in arti­ ficial intelligence (UAI) (pp. 139–147). San Mateo, CA: Morgan Kaufmann. Gelman, S. A. (1988). The development of induction within natural kinds and artifact cate­ gories. Cognitive Psychology, 20, 65–95. Gelman, S. A. (2003). The essential child: The origins of essentialism in everyday thought. New York: Oxford University Press. Gelman, S. A., & Coley, J. D. (1990). The importance of knowing that a dodo is a bird: Cat­ egories and inferences in 2-year old children. Developmental Psychology, 26, 796–804. Gelman, S. A., & Markman, E. M. (1986). Categories and induction in young children. Cognition, 23, 183–208. Gelman, S. A., & O’Reilly, A. W. (1988). Children’s inductive inferences with superordinate categories: The role of language and category structure. Child Development, 59, 876–887. Gelman, S. A., & Wellman, H. M. (1991). Insides and essences: Early understandings of the nonobvious. Cognition, 38, 213–244. Ghahramani, Z. (1998). Learning dynamic Bayesian networks. In C. Giles & M. Gori (Eds.), Adaptive processing of sequences and data structures (Vol. 1387, pp. 168–197), Berlin; Heidelberg: Springer.

Page 52 of 64

Concepts as Causal Models: Induction Glymour, C. (2001). The mind’s arrows: Bayes nets and graphical causal models in psy­ chology. Cambridge, MA: MIT Press. Goodman, N. (1970). Seven strictures on similarity. In L. Foster & J. W. Swanson (Eds.), Experience and theory (pp. 19–29). Amherst: University of Massachusetts Press. Gopnik, A., Glymour, C., Sobel, D. M., Schulz, L. E., & Kushnir, T. (2004). A theory of causal learning in children: Causal maps and Bayes nets. Psychological Review, 111, 3– 23. Griffiths, T. L., & Tenenbaum, J. B. (2005). Structure and strength in causal induction. Cognitive Psychology, 51, 334–384. Griffiths, T. L., & Tenenbaum, J. B. (2009). Theory-based causal induction. Psychological Review, 116, 661–716. doi:10.1037/a0017201 Hadjichristidis, C., Sloman, S. A., Stevenson, R., & Over, D. (2004). Feature centrality and property induction. Cognitive Science, 28, 45–74. Hagmayer, Y., & Meder, B. (2013). Repeated causal decision making. Journal of Experi­ mental Psychology: Learning, Memory, and Cognition, 39, 33–50. Hagmayer, Y., & Sloman, S. A. (2009). Decision makers conceive of themselves as inter­ veners. Journal of Experimental Psychology: General, 128, 22–38. Hausman, D. M., & Woodward, J. (1999). Independence, invariance, and the causal Markov condition. British Journal for the Philosophy of Science, 50, 521–583. Hayes, B. K., & Thompson, S. P. (2007). Causal relations and feature similarity in children’s inductive reasoning. Journal of Experimental Psychology: General, 136, 470– 484. Hayes, B. K., Heit, E., & Swendsen, H. (2010). Inductive reasoning. Wiley Interdiscipli­ nary Reviews: Cognitive Science, 1, 278–292. Hayes, B. K., & Heit, E. (2013). Induction. In D. Reisberg (Ed.), Oxford handbook of cogni­ tive psychology (pp. 618–634). New York: Oxford University Press. Heit, E. (2000). Properties of inductive reasoning. Psychonomic Bulletin & Review, 7, 569–592. Heit, E., & Rubinstein, J. (1994). Similarity and property effects in inductive reasoning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 411–422. Inagaki, K. (1990). The effects of raising animals on children’s biological knowledge. British Journal of Developmental Psychology, 8, 119–129. Jones, E. E., & Harris, V. A. (1967). The attribution of attitudes. Journal of Experimental Social Psychology, 3, 1–24. Page 53 of 64

Concepts as Causal Models: Induction Jones, M., & Love, B. C. (2011). Bayesian fundamentalism or enlightenment? On the ex­ planatory status and theoretical contributions of Bayesian models of cognition. Behavioral and Brain Sciences, 34, 169–231. Kaplan, A. S., & Murphy, G. L. (1999). The acquisition of category structure in unsuper­ vised learning. Memory & Cognition, 27, 699–712. Keil, F. C. (1989). Concepts, kinds, and cognitive development. Cambridge, MA: MIT Press. Keil, F. C. (1995). The growth of causal understandings of natural kinds. In D. Sperber, D. Premack & A. J. Premack (Eds.), Causal cognition: A multidisciplinary approach (pp. 234– 262). Oxford: Clarendon Press. Kelley, H. H. (1973). The process of causal attribution. American Psychologist, 28, 107– 128. Kemp, C., Goodman, N. D., & Tenenbaum, J. B. (2007). Learning causal schemata. In D. S. McNamara & G. Trafton (Eds.), Proceedings of the 29th annual conference of the Cogni­ tive Science Society (pp. 389–394). Mahwah, NJ: Lawrence Erlbaum Associates. Kemp, C., Goodman, N. D., & Tenenbaum, J. B. (2010). Learning to learn causal models. Cognitive Science, 34, 1185–1243. Kemp, C., Shafto, P., & Tenenbaum, J. B. (2012). An integrated account of generalization across objects and features. Cognitive Psychology, 64, 35–73. Kemp, C., & Tenenbaum, J. B. (2008). The discovery of structural form. Proceedings of the National Academy of Sciences of the United States of America, 105, 10687–10692. Kemp, C., & Tenenbaum, J. B. (2009). Structured statistical models of inductive reason­ ing. Psychological Review, 116, 20–58. Kemp, C. & Jern, A. (2013). A taxonomy of inductive problems. Psychonomics Bulletin & Review, 66, 85–125 Khemlani, S. S., & Oppenheimer, D. M. (2010). When one model casts doubt on another: A levels-of-analysis approach to causal discounting. Psychological Bulletin, 137, 1–16. Kim, N. S., & Ahn, W. (2002). Clinical psychologists’ theory-based representations of men­ tal disorders predict their diagnostic reasoning and memory. Journal of Experimental Psy­ chology: General, 131, 451–476. Kim, N. S., Luhmann, C. C., Pierce, M. L., & Ryan, M. M. (2009). Causal cycles in catego­ rization. Memory & Cognition, 37, 744–758. Kim, S. & Rehder, B. (2011). How prior knowledge affects selective attention dur­ ing category learning: An eyetracking study. Memory & Cognition, 39, 649–665. (p. 410)

Page 54 of 64

Concepts as Causal Models: Induction Komatsu, L. K. (1992). Recent views of conceptual structure. Psychological Bulletin, 112, 500–526. Kurtz, K. J. (2007). The divergent autoencoder (DIVA) model of category learning. Psycho­ nomic Bulletin & Review, 14, 560–576. Lagnado, D. A., & Sloman, S. A. (2004). The advantage of timely intervention. Journal of Experimental Psychology: Learning, Memory, & Cognition, 30, 856–876. Lassaline, M. E. (1996). Structural alignment in induction and similarity. Journal of Exper­ imental Psychology: Learning, Memory, and Cognition, 22, 754–770. Lassaline, M. E., & Murphy, G. L. (1996). Induction and category coherence. Psychonomic Bulletin & Review, 3, 95–99. Lauritzen, S. L., & Richardson, T. S. (2002). Chain graph models and their causal interpre­ tations. Journal of the Royal Statistical Society, 64, 321–361. Lee, H. S., & Holyoak, K. J. (2008). The role of causal models in analogical inference. Jour­ nal of Experimental Psychology: Learning, Memory and Cognition, 34, 1111–1122. Lewis, C. I. (1929). Mind and the world order. New York: Charles Scribner’s Sons. Lien, Y., & Cheng, P. W. (2000). Distinguishing genuine from spurious causes: A coherence hypothesis. Cognitive Psychology, 40, 87–137. Lopez, A., Atran, S., Coley, J. D., Medin, D. L., & Smith, E. E. (1997). The tree of life: Uni­ versal and cultural features of folkbiological taxonomies and inductions. Cognitive Psy­ chology, 32, 251–295. Lopez, A., Gelman, S. A., Gutheil, G., & Smith, E. E. (1992). The development of categorybased induction. Child Development, 63, 1070–1090. Love, B. C., Medin, D. L., & Gureckis, T. M. (2004). SUSTAIN: A network model of catego­ ry learning. Psychological Review, 111, 309–332. Lu, H., Yuille, A. L., Liljeholm, M., Cheng, P. W., & Holyoak, K. J. (2008). Bayesian generic priors for causal learning. Psychological Review, 115, 955–984. Luhmann, C. C., Ahn, W., & Palmeri, T. J. (2006). Theory-based categorization under speeded conditions. Memory & Cognition, 34, 1102–1111. Luhmann, C. C., & Ahn, W. (2007). BUCKLE: A model of unobserved cause learning. Psy­ chological Review, 114, 657–677. Lynch, E. B., Coley, J. D., & Medin, D. L. (2000). Tall is typical: Central tendency, ideal di­ mensions, and graded category structure among tree experts and novices. Memory & Cognition, 28, 41–50.

Page 55 of 64

Concepts as Causal Models: Induction Malt, B. C., & Smith, E. E. (1984). Correlated properties in natural categories. Journal of Verbal Learning and Verbal Behavior, 23, 250–269. Marsh, J. K., & Ahn, W.-k. (2009). Spontaneous assimilation of continuous values and tem­ poral information in causal induction. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35(2), . doi:10.1037/a0014929 19271850 Martin, J. B., & Rehder, B. (2013). Causal knowledge and information search during cate­ gorization. Paper presented at the 46th annual meeting of the Society for Mathematical Psychology. Potsdam, Germany. Matsuka, T., & Corter, J. E. (2008). Observed attention allocation processes in category learning. The Quarterly Journal of Experimental Psychology, 61, 1067–1097. Mayrhofer, R., Hagmayer, Y., & Waldmann, M. R. (2010). Agents and causes: A bayesian error attribution model of causal reasoning (pp. 925–930). Mayrhofer, R., & Waldmann, M. R. (2015). Agents and causes: Dispositional intuitions as a guide to causal structure. Cognitive Science, 39, 65–95. McClure, J. (1998). Discounting causes of behavior: Are two reasons better than one? Journal of Personality and Social Psychology, 74, 7–20. McDonnell, J. V., Tsividis, P., & Rehder, B. (2013). Reasoning with inconsistent causal be­ liefs. In M. Knauff. M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th annual conference of the Cognitive Science Society (pp. 1002–1007). Austin, TX: Cogni­ tive Science Society. Meder, B., Hagmayer, Y., & Waldmann, M. R. (2008). Inferring interventional predictions from observational learning data. Psychonomic Bulletin & Review, 15, 75–80. Meder, B., Hagmayer, Y., & Waldmann, M. R. (2009). The role of learning data in causal reasoning about observations and interventions. Memory & Cognition, 37, 249–264. Meder, B., Mayrhofer, R., & Waldmann, M. R. (2014). Structure induction in diagnostic causal reasoning. Psychological Review, 121, 277–301. Medin, D. L., Coley, J. D., Storms, G., & Hayes, B. K. (2003). A relevance theory of induc­ tion. Psychonomic Bulletin & Review, 10, 517–532. Medin, D. L., Goldstone, R. L., & Gentner, D. (1993). Respects for similarity. Psychological Review, 100(2), 254–278. Medin, D. L., Wattenmaker, W. D., & Hampson, S. E. (1987). Family resemblance, concep­ tual cohesiveness, and category construction. Cognitive Psychology, 19, 242–279. Medin, D. L., Waxman, S., Woodring, J., & Washinawatok, K. (2010). Human-centeredness is not a universal feature of young children’s reasoning: Culture and experience matter when reasoning about biological entities. Cognitive Development, 25, 197–207. Page 56 of 64

Concepts as Causal Models: Induction Meunier, B., & Cordier, F. (2009). The biological categorizations made by 4 and 5-year olds: The role of feature type versus their causal status. Cognitive Development, 24, 34– 48. Morais, A. S., Olsson, H., & Schooler, L. J. (2011). Does the structure of causal models predict information search? In B. Kokinov, A. Karmiloff-Smith, & N. J. Nersessian (Eds.), European perspectives on cognitive science (pp. 1–6). Sofia: New Bulgarian University Press. Morris, M. W., & Larrick, R. P. (1995). When one cause casts doubt on another: A norma­ tive analysis of discounting in causal attribution. Psychological Review, 102, 331–355. Murphy, G. L., & Allopenna, P. D. (1994). The locus of knowledge effects in concept learn­ ing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 904–919. Murphy, G. L., & Medin, D. L. (1985). The role of theories in conceptual coherence. Psy­ chological Review, 92, 289–316. Murphy, G. L. (2002). The big book of concepts. Cambridge, MA: MIT Press. Murphy, P. K. (2002). Dynamic Bayesian networks: Representation, inference and learning. Berkeley: University of California Press. Nelson, J. D. (2005). Finding useful questions: On Bayesian diagnosticity, probability, im­ pact, and information gain. Psychological Review, 112, 979–999. Nelson, J. D., McKenzie, C. R. M., Cottrell, G. W., & Sejnowski, T. J. (2010). Experience matters: Information acquisition optimizes probability gain. Psychological Science, 21, 960–969. Newman, G. E., Herrmann, P., Wynn, K., & Keil, F. C. (2008). Biases towards inter­ nal features in infants’ reasoning about objects. Cognition, 107, 420–432. (p. 411)

Newman, G. E., & Keil, F. C. (2008). Where is the essence? Developmental shifts in children’s beliefs about internal features. Child Development, 79, 1344–1356. Novick, L. R., & Cheng, P. W. (2004). Assessing interactive causal influence. Psychological Review, 111(2), 455–485. Opfer, J. E., & Bulloch, M. J. (2007). Causal relations drive young children’s induction, naming, and categorization. Cognition, 105, 206–217. Osherson, D. M., Smith, E. E., Wilkie, O., Lopez, A., & Shafir, E. (1990). Category-based induction. Psychological Review, 97, 185–200. Park, J. & Sloman, S. A. (2013). Mechanistic beliefs determine adherence to the Markov property in causal reasoning. Cognitive Psychology, 67, 186–216.

Page 57 of 64

Concepts as Causal Models: Induction Patalano, A., Chin-Parker, S., & Ross, B. H. (2006). The importance of being coherent: Category coherence, cross-classification, and reasoning. Journal of Memory & Language, 54, 407–424. Patalano, A. L., & Ross, B. H. (2007). The role of category coherence in experience-based prediction. Psychonomic Bulletin & Review, 14, 629–634. Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible infer­ ence. San Mateo, CA: Morgan Kaufman. Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge: Cambridge Uni­ versity Press. Perales, J., Catena, A., & Maldonado, A. (2004). Inferring non-observed correlations from causal scenarios: The role of causal knowledge. Learning and Motivation, 35, 115–135. Pothos, E. M., & Chater, N. (2002). A simplicity principle in unsupervised human catego­ rization. Cognitive Science, 26, 303–343. Pothos, E. M., Chater, N., & Hines, P. (2011). The simplicity model of unsupervised cate­ gorization. In E. M. Pothos & A. J. Wills (Eds.), Formal approaches in categorization (pp. 199–219). New York: Cambridge University Press. Proffitt, J. B., Coley, J. D., & Medin, D. L. (2000). Expertise and category-based induction. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 811–828. Regehr, G., & Brooks, L. R. (1995). Category organization in free classification: The orga­ nizing effect of an array of stimuli. Journal of Experimental Psychology: Learning, Memo­ ry, and Cognition, 21(2), 347–363. Rehder, B. (2006). When similarity and causality compete in category-based property in­ duction. Memory & Cognition, 34, 3–16. Rehder, B. (2007b). Property generalization as causal reasoning. In A. Feeney & E. Heit (Eds.), Inductive reasoning: Experimental, developmental, and computational approaches (pp. 81–113). New York: Cambridge University Press. Rehder, B. (2009). Causal-based property generalization. Cognitive Science, 33, 301–343. Rehder, B. (2014a). The role of functional form in causal-based categorization. Journal of Experimental Psychology: Learning, Memory, and Cognition, 41, 670–692. Rehder, B. (2014b). Independence and dependence in human causal reasoning. Cognitive Psychology, 72, 54–107. Rehder, B. (in press). Reasoning with causal cycles. Cognitive Science. Rehder, B., & Burnett, R. C. (2005). Feature inference and the causal structure of object categories. Cognitive Psychology, 50, 264–314. Page 58 of 64

Concepts as Causal Models: Induction Rehder, B., & Hastie, R. (2001). Causal knowledge and categories: The effects of causal beliefs on categorization, induction, and similarity. Journal of Experimental Psychology: General, 130, 323–360. Rehder, B., & Hastie, R. (2004). Category coherence and category-based property induc­ tion. Cognition, 91, 113–153. Rehder, B., & Hoffman, A. B. (2005a). Eyetracking and selective attention in category learning. Cognitive Psychology, 51, 1–41. Rehder, B., & Hoffman, A. B. (2005b). Thirty-something categorization results explained: Selective attention, eyetracking, and models of category learning. Journal of Experimen­ tal Psychology: Learning, Memory, and Cognition, 31, 811–829. Rehder, B., & Martin, J. (2011). A generative model of causal cycles. In L. Carlson. C. Hoelscher, & T. F. Shipley (Eds.), Proceedings of the 33rd annual conference of the Cogni­ tive Science Society (pp. 2944–2949). Austin, TX: Cognitive Science Society. Rehder, B., & Murphy, G. L. (2003). A Knowledge-Resonance (KRES) model of category learning. Psychonomic Bulletin & Review, 10, 759–784. Rehder, B., & Ross, B. H. (2001). Abstract coherent concepts. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27, 1261–1275. Rehder, B., & Waldmann, M. R. (in press). Failures of explaining away and screening off in described versus experienced causal learning scenarios. Memory & Cognition. Reichenbach, H. (1956). The direction of time. Berkeley: University of California Press. Rips, L. J. (1975). Inductive judgments about natural categories. Journal of Verbal Learn­ ing and Verbal Behavior, 14, 665–681. Rips, L. J. (2001). Necessity and natural categories. Psychological Bulletin, 127, 827–852. Rips, L. J. (2010). Two causal theories of counterfactual conditionals. Cognitive Science, 34, 175–221. Rips, L. J., & Edwards, B. J. (2013). Inference and explanation in counterfactual reason­ ing. Cognitive Science, 37, 1107–1135. Rogers, T. T., & McClelland, J. L. (2004). Semantic cognition: A parallel distributed pro­ cessing approach. Cambridge, MA: MIT Press. Rogers, T. T., & McClelland, J. L. (2011). Semantics without categorization. In E. M. Pothos & A. J. Wills (Eds.), Formal approaches in categorization (pp. 88–119). New York: Cambridge University Press. Ross, B. H., & Murphy, G. L. (1999). Food for thought: Cross-classification and category organization in a complex real-world domain. Cognitive Psychology, 38, 495–553. Page 59 of 64

Concepts as Causal Models: Induction Ross, N., Medin, D. L., Coley, J. D., & Atran, S. (2003). Cultural and experiential differ­ ences in the development of biological induction. Cognitive Development, 18, 25–47. Rosch, E. (1978). Principles of categorization. In E. Rosch & B. B. Lloyd (Eds.), Cognition and categorization (pp. 28–71). Hillsdale, NJ: Lawrence Erlbaum Associates. Rosch, E. H., Mervis, C. B., Gray, W., Johnson, D., & Boyes-Braem, P. (1976). Basic objects in natural categories. Cognitive Psychology, 8, 382–439. Rottman, B., & Hastie, R. (2014). Reasoning about causal relationships: Inferences on causal networks. Psychological Bulletin, 140, 109–139. Sanborn, A. N., Griffiths, T. L., & Navarro, D. J. (2010). Rational approximations to ratio­ nal models: Alternative algorithms for category learning. Psychological Review, 117, 1144–1167. Shafto, P., & Coley, J. D. (2003). Development of categorization and reasoning in the natural world: Novices to experts, naive similarity to ecological knowledge. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 641–649. (p. 412)

Shafto, P., Coley, J. D., & Vitkin, A. (2007). Availability in category-based induction. In A. Feeney & E. Heit (Eds.), Inductive reasoning (pp. 114–136). Cambridge, England: Cam­ bridge University Press. Shafto, P., Coley, J. D., & Baldwin, D. (2007). Effects of time pressure on context-sensitive property induction. Psychonomic Bulletin & Review, 14, 890–894. Shafto, P., Kemp, C., Bonawitz, E. B., Coley, J. D., & Tenenbaum, J. B. (2008). Inductive reasoning about causally transmitted properties. Cognition, 109, 175–192. Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27, 379–423, 623–656. Skov, R. B., & Sherman, S. J. (1986). Information-gathering processes: Diagnosticity, hy­ pothesis-confirmatory strategies, and perceived hypothesis confirmation. Journal of Ex­ periment Social Psychology, 22, 93–121. Sloman, S. A. (1993). Feature-based induction. Cognitive Psychology, 25, 231–280. Sloman, S. A. (1994). When explanations compete: The role of explanatory coherence on judgments of likelihood. Cognition, 52, 1–21. Sloman, S. A. (1997). Explanatory coherence and the induction of properties. Thinking and Reasoning, 3, 81–110. Sloman, S. A. (2005). Causal models: How people think about the world and its alterna­ tives. Oxford: Oxford University Press. Sloman, S. A., & Lagnado, D. A. (2005a). Do we “do?” Cognitive Science, 29, 5–39. Page 60 of 64

Concepts as Causal Models: Induction Sloman, S. A., & Lagnado, D. A. (2005b). The problem of induction. In K. J. Holyoak & R. G. Morrison (Eds.), The Cambridge handbook of thinking and reasoning (pp. 95–116). New York: Cambridge University Press. Sloman, S. A., & Lagnado, D. (2015). Causality in thought. Annual Review of Psychology, 66, 223–247. Sloman, S. A., Love, B. C., & Ahn, W. (1998). Feature centrality and conceptual coher­ ence. Cognitive Science, 22, 189–228. Sloutsky, V. M., & Fisher, A. V. (2004). Induction and categorization in young children: A similarity-based model. Journal of Experimental Psychology: General, 133, 166–188. Sloutsky, V. M., & Fisher, A. V. (2008). Attentional learning and flexible induction: How mundane mechanisms give rise to smart behaviors. Child Development, 79, 639–651. Sloutsky, V. M., & Fisher, A. V. (2012). Linguistic labels: Conceptual markers or object fea­ tures? Journal of Experimental Child Psychology, 111, 65–86. Slowiaczek, L. M., Klayman, J., Sherman, S. J., & Skov, R. B. (1992). Information selection and use in hypothesis testing: What is a good question, and what is a good answer? Mem­ ory & Cognition, 20, 392–405. Smith, E. E., & Medin, D. L. (1981). Categories and concepts. Cambridge, MA: Harvard University Press. Smith, E. E., Shafir, E., & Osherson, D. (1993). Similarity, plausibility, and judgments of probability. Cognition, 49, 67–96. Sobel, D. M., Tenenbaum, J. B., & Gopnik, A. (2004). Children’s causal inferences from in­ direct evidence: Backwards blocking and Bayesian reasoning in preschoolers. Cognitive Science, 28, 303–333. Sobel, D. M., Yoachim, C. M., Gopnik, A., Meltzoff, A. N., & Blumenthal, E. J. (2007). The blicket within: Preschoolers’ inferences about insides and causes. Journal of Cognition and Development, 8, 159–182. Spalding, T. L., & Murphy, G. L. (1996). Effects of background knowledge on category construction. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22, 525–538. Spirtes, P., Glymour, C., & Scheines, R. (2000). Causation, prediction, and search. New York: Springer-Verlag. Springer, K., & Keil, F. C. (1989). On the development of biologically specific beliefs: The case of inheritance. Child Development, 60, 637–648.

Page 61 of 64

Concepts as Causal Models: Induction Stephens, R. G., Navarro, D. J., Dunn, J. C., & Lee, M. D. (2009). The effect of causal strength on the use of causal and similarity-based information in feature inference. In W. Christensen, E. Schier, & J. Sutton (Eds.), Proceedings of the 9th conference of the Aus­ tralasian Society for Cognitive Science. Sydney: Macquarie Center for Cognitive Science. Tarlowski, A. (2006). If it’s an animal it has axons: Experience and culture in preschool children’s reasoning about animates. Cognitive Development, 21, 249–265. Tenenbaum, J. B., & Niyogi, S. (2003). Learning causal laws. Paper presented at the Pro­ ceedings of the 25th annual conference of the Cognitive Science Society, Boston, MA. Tenenbaum, J. B., Kemp, C., & Shafto, P. (2007). Theory-based Bayesian models of induc­ tive reasoning. In A. Feeney & E. Heit (Eds.), Inductive reasoning: experimental, develop­ mental and computational approaches (pp. 167–204). Cambridge: Cambridge University Press. von Sydow, M., Hagmayer, Y., Meder, B., & Waldmann, M. R. (2010). How causal reason­ ing can bias empirical evidence. In R. Camtrabone & S. Ohlsson (Eds.), Proceedings of the 32nd annual conference of the Cognitive Science Society (pp. 2087–2092). Austin, TX: Cognitive Science Society. von Sydow, M., Meder, B., & Hagmayer, Y. (2009). A transitivity heuristic of probabilistic causal reasoning. In N. A. Taatgen & H. van Rijn (Eds.), Proceedings of the 31st annual conference of the Cognitive Science Society (pp. 803–808). Amsterdam: Cognitive Science Society. Waldmann, M. R. (1996). Knowledge-based causal induction. In D. R. Shanks, K. J. Holyoak, & D. L. Medin (Eds.), The psychology of learning and motivation, Vol. 34: Causal learning (pp. 47–88). San Diego: Academic. Waldmann, M. R., & Holyoak, K. J. (1992). Predictive and diagnostic learning within causal models: Asymmetries in cue competition. Journal of Experimental Psychology: Gen­ eral, 121, 222–236. Waldmann, M. R., Holyoak, K. J., & Fratianne, A. (1995). Causal models and the acquisi­ tion of category structure. Journal of Experimental Psychology: General, 124, 181–206. Waldmann, M. R., & Hagmayer, Y. (2005). Seeing versus doing: Two modes of accessing causal knowledge. Journal of Experimental Psychology: Learning, Memory, & Cognition, 31, 216–227. Waldmann, M. R., & Hagmayer, Y. (2006). Categories and causality: The neglected direc­ tion. Cognitive Psychology, 53, 27–58. Walker, C. M., Lombrozo, T., Legare, C. H., & Gopnik, A. (2014). Explaining prompts chil­ drent to privilege inductively rich properties. Cognition, 133, 420–432.

Page 62 of 64

Concepts as Causal Models: Induction Walsh, C. R., & Sloman, S. A. (2004). Revising causal beliefs. In K. Forbus, D. Gentner, & T. Regier (Eds.), Proceedings (p. 413) of the 26th annual conference of the Cognitive Science Society (pp. 1423–1429). Mahwah, NJ: Lawrence Erlbaum Associates. Walsh, C. R., & Sloman, S. A. (2008). Updating beliefs with causal models: Violations of screening off. In G. H. Bower, M. A. Gluck, J. R. Anderson, & S. M. Kosslyn (Eds.), Memory and Mind: A Festschrift for Gordon Bower (pp. 345–358). New Jersey: Lawrence Erlbaum Associates. Wattenmaker, W. D., Dewey, G. I., Murphy, T. D., & Medin, D. L. (1986). Linear separability and concept learning: Context, relational properties, and concept naturalness. Cognitive Psychology, 18, 158–194. Waxman, S. R., Medin, D. L., & Ross, N. (2007). Folkbiological reasoning from a cross-cul­ tural developmental perspective: Early essentialist notions are shaped by cultural beliefs. Developmental Psychology, 43, 294–308. Wu, M. L., & Gentner, D. (1998). Structure in category-based induction. Paper presented at the the 20th annual conference of the Cognitive Science Society, Madison, WI.

(p. 414)

Notes: (1.) Whereas the companion chapter “Concepts as Causal Models: Categorization” (Chapter 20 in this volume) has already demonstrated how reasoners ex­ hibit some properties of causal reasoning in service of classification, here I examine sub­ jects’ causal inferences in the service of inferring features. (2.) Note that the variables of a causal model may have exogenous influences that are not included in the model. However, these influences are constrained to be uncorrelated (rul­ ing out, e.g., all hidden common causes whose values are not constant). This property, re­ ferred to as causal sufficiency (Spirtes et al. 2000), licenses the causal Markov condition. (3.) See note 3 in the companion chapter “Concepts as Causal Models: Categorization” (Chapter 20 in this volume) for an alternative definition of conjunctive causes in which each conjunct can also have an independent influence on the effect X. (4.) The independence of the causes (the Ys) follows from the fact they are root nodes and the causal sufficiency constraint, which rules out the presence of varying common causes that are exogenous to the model (see note 2). (5.) Mayroher and Waldmann interpreted the Markov violations they observed as reflect­ ing the operation of multiple disablers (which they called preventers) whose contribution to subjects’ inferences varied over experimental conditions. In particular, they advocated a dispositional view of causal relations in which causal powers are attributed to proper­ ties of objects. See Mayroher and Waldmann (2015) for details.

Page 63 of 64

Concepts as Causal Models: Induction (6.) Consistent with the experimental instruction, mCE = .80 and bE = .25 and .75 in the weak and strong alternative conditions, respectively. For purposes of generating the pre­ dictions in Figure 21.5 a, the base rate of the cause feature C (cC) was assumed to be .75. (7.) Heit and Rubinstein themselves attributed these results to subjects computing simi­ larity between the source and target differently, depending on whether the novel property was anatomical or behavioral. (8.) Although generalizations via diagnostic reasoning should indeed often get stronger with more causal links, technical application of Equation 6 requires more information than was provided in this experiment. In particular, it requires assumptions regarding the degree to which the category’s typical features are inter-correlated. This is the case be­ cause in the 3-link, diagnostic condition in Figure 21.7 c, N and X1, X2, and X3 form a common cause network for which the effects (the Xs) are expected to be correlated; thus, evidence for the presence of N will be stronger when the Xs are correlated. These predic­ tions were tested, and confirmed in Experiments 4 and 5. (9.) In the authors’ terms, the level that affords the maximal contrast, where contrast is the difference between the probability of the effect E when the cause C is present versus absent, that is, For example, although and

were equally high in Lien and Cheng’s

condition in which color predicted blooming,

was lower than

(because green substances also led to blooming). Thus, was greater than

and so the maximal

contrast occurred at the superordinate level of color. (10.) Contexts specify different sorts of relations, such as can (e.g., a robin can sing), has (a robin has feathers), and is a (a robin is a bird). (11.) Although doing so would require substantial representational machinery. For exam­ ple, simple symmetric inter-feature relations are unable to reproduce effects such as ex­ plaining away, which consists of a higher order interaction among three or more features.

Bob Rehder

Department of Psychology New York University New York, New York, USA

Page 64 of 64

Causal Explanation

Causal Explanation   Tania Lombrozo and Nadya Vasilyeva The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.22

Abstract and Keywords Explanation and causation are intimately related. Explanations often appeal to causes, and causal claims are often answers to implicit or explicit questions about why or how something occurred. This chapter considers what we can learn about causal reasoning from research on explanation. In particular, it reviews an emerging body of work suggest­ ing that explanatory considerations—such as the simplicity or scope of a causal hypothe­ sis—can systematically influence causal inference and learning. It also discusses pro­ posed distinctions among types of explanations and reviews the effects of each explana­ tion type on causal reasoning and representation. Finally, it considers the relationship be­ tween explanations and causal mechanisms and raises important questions for future re­ search. Keywords: causation, explanation, causal inference, learning, reasoning, causal mechanisms

A doctor encounters a patient: Why does she have a fever and a rash? An engineer inves­ tigates a failure: Why did the bridge collapse? A parent wonders about her child: Why did she throw a tantrum? In each of these cases, we seek an explanation for some event—an explanation that’s likely to appeal to one or more antecedent causes. The doctor might conclude that a virus caused the symptoms, the engineer that defects in cast iron caused the collapse, and the parent that the toy’s disappearance caused the tantrum. Not all explanations are causal, and not all causes are explanatory. Explanations in math­ ematics, for example, are typically taken to be non-causal, and many causal factors are ei­ ther not explanatory at all, or only explanatory under particular circumstances. (Consider, for instance, appealing to the big bang as an explanation for today’s inflation rates, or the presence of oxygen as an explanation for California wildfires.) Nonetheless, causation and explanation are closely related, with many instances of causal reasoning featuring expla­ nations and explanatory considerations, and many instances of abductive inference and explanation appealing to causes and causal considerations. The goal of the present chap­ ter is to identify some of the connections between explanation and causation, with a spe­

Page 1 of 32

Causal Explanation cial focus on how the study of explanation can inform our understanding of causal reason­ ing. The chapter is divided into five sections. In the first three, we review an emerging body of work on the role of explanation in three types of causal reasoning: drawing inferences about the causes of events, learning novel causal structures, and assigning causal respon­ sibility. In the fourth section, we consider different kinds of explanations, including a dis­ cussion of whether each kind is properly “causal” and how different kinds of explanations can differentially influence causal judgments. In the fifth section, we focus on causal ex­ planations that appeal to mechanisms, and consider the relationship between explana­ tion, causal claims, and mechanisms. Finally, we conclude with some important questions for future research.

Causal Inference and Inference to the Best Explanation (p. 416)

Consider a doctor who infers, on the basis of a patient’s symptoms, that the patient has a particular disease—one known to cause that cluster of symptoms. We will refer to such in­ stances of causal reasoning as “causal inference,” and differentiate them from two other kinds of causal reasoning that we will discuss in subsequent sections: causal learning (which involves learning about novel causes and relationships at the type level) and as­ signing causal responsibility (which involves attributing an effect to one or more causes, all of which could have occurred and could have contributed to the effect). How might explanation influence causal inference? One possibility is that people engage in a process called “inference to the best explanation” (IBE). IBE was introduced into the philosophical literature by Gilbert Harman in a 1965 paper, but the idea is likely older, and closely related to what is sometimes called “abductive inference” (Douven, 2011; Lombrozo, 2012, 2016; Peirce, 1955). The basic idea is that one infers that a hypothesis is likely to be true based on the fact that it best explains the data. To borrow vocabulary from another influential philosopher of explanation, Peter Lipton, one uses an explanation’s “loveliness” as a guide to its “likeliness” (Lipton, 2004). A great deal of work has aimed to characterize how people go about inferring causes from patterns of evidence (Cheng, 1997; Cheng & Novick, 1990, 1992; Glymour & Cheng, 1998; Griffiths & Tenenbaum, 2005; Kelley, 1973; Perales & Shanks, 2003; Shanks & Dick­ inson, 1988; Waldmann & Hagmayer, 2001; see Buehner, 2005; Holyoak & Cheng, 2011; Waldmann & Hagmayer, 2013, for reviews), and this work is summarized in other chap­ ters of this volume (see Part I, “Theories of Causal Cognition,” and Meder & Mayrhofer, Chapter 23, on diagnostic reasoning). Thus a question that immediately presents itself is whether IBE is distinct from the kinds of inference these models typically involve, such as analyses of covariation or Bayesian inference. For most advocates of IBE, the answer is “yes”: IBE is a distinct inferential process, where the key commitment is that explanatory

Page 2 of 32

Causal Explanation considerations play a role in guiding judgments. These considerations can include the simplicity, scope, or other “virtues” of the explanatory hypotheses under consideration. To provide evidence for IBE as a distinctly explanatory form of inference, it is thus impor­ tant to identify explanatory virtues, and to demonstrate their role in inference. The most direct evidence of this form comes from research on simplicity (Bonawitz & Lombrozo, 2012; Lombrozo, 2007; Pacer & Lombrozo, in preparation), scope (Khemlani, Sussman, & Oppenheimer, 2011), and explanatory power (Douven & Schupbach, 2015a, 2015b). We focus on this research for the remainder of the section. In one study from Lombrozo (2007), participants learned novel causal structures describ­ ing the relationships between diseases and symptoms on an alien planet. For example, the conjunction of two particular symptoms—say “sore minttels” and “purple spots”— could be explained by appeal to a single disease that caused both symptoms (Tritchet’s syndrome), or by appeal to the conjunction of two diseases that each caused one symp­ tom (Morad’s disease and a Humel infection). Lombrozo set out to test whether partici­ pants would favor the explanation that was simpler in the sense that it invoked a single common cause over two independent causes, and whether they would do so even when probabilistic evidence, in the form of disease base rates, favored the more complex expla­ nation. Lombrozo found that participants’ explanation choices were a function of both simplicity and probability, with a substantial proportion of participants selecting the sim­ pler explanation even when it was less likely than the complex alternative. This is consis­ tent with the idea that an explanation’s “loveliness”—in this case, its simplicity—is used as a basis for inferring its “likeliness.” In subsequent work, Bonawitz and Lombrozo (2012) replicated the same basic pattern of results with 5-year-old children in a structurally parallel task: children observed a toy generating two effects (a light and a spinning fan), and had to infer whether one block (which generated both effects) or two blocks (which each generated one effect) fell into the toy’s activator bin. In this case, probabilistic information was manipulated across par­ ticipants by varying the number of blocks of each type and the process by which they fell into the bin. Interestingly, adults did not show a preference for simplicity above and be­ yond probability in this task, while the 5-year-olds did. Bonawitz and Lombrozo suggest that in the face of probabilistic uncertainty—of the kind that is generated by a more com­ plex task like the alien diagnosis problems used in Lombrozo (2007)—adults rely on ex­ planatory considerations such as simplicity to guide assessments of probability. But when a task involves a transparent and seemingly deterministic causal system, and when the numbers involved are small (as was the case for the task developed for young children in Bonawitz and Lombrozo, 2012), adults may engage in more explicit probabilistic (p. 417) reasoning, and may bypass explanatory considerations altogether. Consistent with this idea, adults in Lombrozo (2007) also ceased to favor simplicity when they were explicitly told that the complex hypothesis was most likely to be true. In more recent work, Pacer and Lombrozo (in preparation) provide a more precise char­ acterization of how people assess an explanation’s simplicity. They differentiate two intu­ Page 3 of 32

Causal Explanation itive metrics for causal explanations, both of which are consistent with prior results: “node simplicity,” which involves counting the number of causes invoked in an explana­ tion; and “root simplicity,” which involves counting the number of unexplained causes in­ voked in an explanation. For example, suppose that Dr. Node explains a patient’s symp­ toms by appeal to pneumonia and sarcoma—two diseases. And that Dr. Root explains the symptoms by appeal to pneumonia, sarcoma, and HIV, where HIV is a cause (or at least a contributing factor) for both pneumonia and sarcoma. Dr. Root has invoked more causes than Dr. Node (three versus two), and so her explanation is less simple according to node simplicity. But Dr. Root has explained the symptoms by appeal to only one unexplained cause (HIV), as opposed to Dr. Node’s two (pneumonia and sarcoma), so her explanation is simpler according to root simplicity. Extending the basic method developed by Lombro­ zo (2007), Pacer and Lombrozo found strong evidence that people favor explanations with low root simplicity (above and beyond what is warranted on the basis of the frequency in­ formation which they were provided), but no evidence that people are sensitive to node simplicity. By using appropriate causal structures, they were able to rule out alternative explanations for these results (e.g., that people prefer explanations that involve interven­ ing variables). These findings suggest that in drawing causal inferences, people do not simply engage in probabilistic inference on the basis of frequency information. In addition to frequency in­ formation, they use explanatory considerations (in this case, low root simplicity) to guide their judgments, at least in the face of probabilistic uncertainty. The findings therefore suggest that IBE plays a role in inferences concerning causal events. But is this effect re­ stricted to simplicity, or do other explanatory considerations play a role as well? Research to date supports a role for two additional factors: narrow latent scope and explanatory power. An explanation’s “latent scope” refers to the number of unverified effects that the expla­ nation predicts. For example, an observed symptom could be explained by appeal to a dis­ ease that predicts that single symptom, or by appeal to a disease that additionally pre­ dicts an effect that has not yet been tested for and is hence unobserved (e.g., whether the person has low blood levels of some mineral). In this case, the former explanation has narrower latent scope. Khemlani, Sussman, and Oppenheimer (2011) found that people favor explanations with narrow latent scope, even if the two diseases are equally preva­ lent. Importantly, they also find that latent scope affects probability estimates: explana­ tions with narrow latent scope are judged more likely than those with broader latent scope (see also Johnson, Johnston, Toig, & Keil, 2014, for evidence that explanatory scope informs causal strength inferences, and Johnston, Johnson, Koven, & Keil, 2015, for evi­ dence of latent scope bias in children). Thus latent scope appears to be among the cues to explanatory “loveliness” that affects the perceived “likeliness” of explanatory hypotheses. Finally, recent work by Douven and Schupbach (2015a, 2015b) provides further evidence of a role for explanatory considerations in inference, with hints that the relevant consid­ eration is “explanatory power.” Employing a quite different paradigm, Douven and Schup­ bach demonstrate that people’s explanatory judgments better predict their estimates of Page 4 of 32

Causal Explanation posterior probability than do objective probabilities on their own. In a study reported in Douven and Schupbach (2015a), participants observed 10 balls successively drawn from one of two urns, which was selected by a coin flip. One urn contained 30 black balls and 10 white balls, and the other contained 15 black balls and 25 white ones. After each draw, participants were asked to consider the evidence so far, and to rate the “explanatory goodness” of each of two hypotheses: the hypothesis that the balls were drawn from the 30/10 urn, or the hypothesis that the balls were drawn from the 15/25 urn. Participants were also asked to estimate a posterior probability for each hypothesis after each draw. In a series of models, Douven and Schupbach tested whether people’s judgments of the explanatory “goodness” of each hypothesis improved model predictions of their subjec­ tive posterior probabilities, above and beyond the objective posteriors calculated on the basis of the data presented to each participant. They found that models incorporating these explanatory judgments outperformed alternatives, even when appropriately penal­ ized for using additional predictors. Douven and Schupbach’s (2015a) results suggest that explanatory considerations do in­ form (p. 418) assessments of probability, and that these considerations diverge from poste­ rior probability. However, the findings do not pinpoint the nature of the explanatory con­ siderations themselves. On what basis were participants judging one hypothesis more or less explanatory than the other? Additional analyses of these data, reported in Douven and Schupbach (2015b), provide some hints: models that took into account some measure of “explanatory power”—computed on the basis of the objective probabilities—outper­ formed the basic model that only considered posteriors. The best-performing model em­ ployed a measure based on Good (1960) that roughly tracks confirmation: it takes the log of the ratio of the probability of the data given the hypothesis to the probability of the da­ ta. In other work, Schupbach (2011) finds evidence that people’s judgments of an explanation’s “goodness” are related to another measure of explanatory power, proposed by Schupbach and Sprenger (2011), which is also related to Bayesian measures of confir­ mation. These findings suggest that explanatory considerations—in the form of root simplicity, la­ tent scope, and explanatory power—inform causal inference, and in so doing reveal some­ thing potentially surprising: that while people’s responses to evidence are systematic, they do not (always) lead to causal inferences that track the posterior probabilities of each causal hypothesis. This not only supports a role for explanatory considerations in causal inference, but also challenges the idea that identifying causes to explain effects is essentially a matter of conditionalizing on the effects to infer the most likely cause. Fur­ ther challenging this idea, Pacer, Williams, Chen, Lombrozo, and Griffiths (2013) compare judgments of explanatory goodness from human participants to those generated by four distinct computational models of explanation in causal Bayesian networks, and find that models that compute measures of evidence or information considerably outperform those that compute more direct measures of (posterior) probability.

Page 5 of 32

Causal Explanation In sum, there is good evidence that people engage in a process like IBE when drawing in­ ferences about causal events: they use explanatory considerations to guide their assess­ ments of which causes account for observed effects, and of how likely candidate hypothe­ ses are to be true. The most direct evidence to date concerns root simplicity, latent scope, and explanatory power, but there is indirect evidence that other explanatory considera­ tions, such as coherence, completeness, and manifest scope, may play a similar role (Pen­ nington & Hastie, 1992; Read & Marcus-Newhall, 1993; Preston & Epley, 2005; Thagard, 1989; Williams & Lombrozo, 2010). Before concluding this section on IBE in causal inference, it is worth considering the nor­ mative implications of this work. It is typically assumed that Bayesian updating provides the normatively correct procedure for revising belief in causal hypotheses in light of the evidence. Do the findings reported in this section describe a true departure from Bayesian inference, and therefore a systematic source of error in human judgment? This is certainly one possibility. For example, it could be that IBE describes an imperfect algo­ rithm by which people approximate Bayesian inference. If this is the case, it becomes an interesting project to spell out when and why explanatory considerations ever succeed in approximating more direct probabilistic inference. There are other possibilities, however. In particular, an appropriately specified Bayesian model could potentially account for these results. In fact, some have argued that IBE-like inference could simply fall out of hierarchical Bayesian inference with suitably assigned priors and likelihoods (Henderson, 2014), in which case there could be a justified, Bayesian account of this behavior. It could also be that the Bayesian models implicit in the comparisons between people’s judgments and posterior probabilities fail to describe the inference that people are actually making. In their chapter in this volume on diagnos­ tic reasoning, for example, Meder and Mayrhofer (Chapter 23) make the important point that there can be more than one “Bayesian” model for a given inference, and in fact find different patterns of inference for models that make different assumptions when it comes to elemental diagnostic reasoning: inferring the value of a single binary cause from a sin­ gle binary effect, which has clear parallels to the cases considered here. In particular, they argue for a model that takes into account uncertainty in causal structures over one that simply computes the empirical conditional probability of a cause given an effect. Sim­ ilarly, it could be that the “departures” from Bayesian updating observed here reflect the consequences of a Bayesian inference that involves more than a straight calculation of posteriors. Finally, some argue that IBE corresponds to a distinct but normatively justifiable alterna­ tive to Bayesianism (e.g., Douven & Schupbach, 2015a). In particular, while Bayesian in­ ference may be the best approach for minimizing expected inaccuracy in the long run, it could be that a process like IBE (p. 419) dominates Bayesian inference when the goal is, say, to get things mostly right in the short term, or to achieve some other aim (Douven, 2013). It could also be that explanation judgments take considerations other than accura­ cy into account, such as the ease with which the explanation can be communicated, re­

Page 6 of 32

Causal Explanation membered, or used in subsequent processing. These are all important possibilities to ex­ plore in future research.

Causal Learning and the Process of Explaining Consider a doctor who, when confronted with a recurring pattern of symptoms, posits a previously undocumented disease, or a previously unknown link between some pathogen and those symptoms. In each case, the inference involves a change in the doctor’s beliefs about the causal structure of the world, not only about the particular patient’s illness. This kind of inference, which we will refer to as causal model learning, differs from the kinds of causal inferences considered in the preceding section in that the learner posits a novel cause or causal relation, not (only) a new token of a known type. Just as explanatory considerations can influence causal inference, it is likely that a process like IBE can guide causal model learning. In fact, “Occam’s Razor,” the classic admonition against positing unnecessary types of entities (Baker, 2013), is typically for­ mulated and invoked in the context of positing novel types, not tokens of known types. However, research to date has not (to our knowledge) directly explored IBE in the context of causal model learning. Doing so would require assessing whether novel causes or causal relations are more likely to be inferred when they provide better explanations. What we do know is that engaging in explanation—the process—can affect the course of causal learning. In particular, a handful of studies with preschool-aged children suggest that being prompted to explain, even without feedback on the content or quality of expla­ nations, can promote understanding of number conservation (Siegler, 1995) and of physi­ cal phenomena (e.g., a balance beam; Pine & Siegler, 2003), and recruit causal beliefs that are not invoked spontaneously to guide predictions (Amsterlaw & Wellman, 2006; Bartsch & Wellman, 1995; Legare, Wellman, & Gelman, 2009). Prompts to explain can al­ so accelerate children’s understanding of false belief (Amsterlaw & Wellman, 2006; Well­ man & Lagattuta, 2004; see Wellman & Liu, 2007, and Wellman, 2011, for reviews), which requires a revision from one causal model of behavior to a more complex model involving an unobserved variable (belief) and a causal link between beliefs and behavior (e.g., Goodman et al., 2006). Finally, there is evidence that prompting children to explain can lead them to preferentially learn about and remember causal mechanisms over causally irrelevant perceptual details (Legare & Lombrozo, 2014), and that prompting children to explain makes them more likely to generalize internal parts and category membership from some objects to others on the basis of shared causal affordances as opposed to per­ ceptual similarity (Walker, Lombrozo, Legare, & Gopnik, 2014; see also Muentener & Bonawitz, Chapter 33 in this volume, for more on children’s causal learning). To better understand the effects of explanation on children’s causal learning, Walker, Lombrozo, Williams, Rafferty, and Gopnik (2016) set out to isolate effects of explanation on two key factors in causal learning: evidence and prior beliefs. Walker et al. used the classic “blicket detector” paradigm (Gopnik & Sobel, 2000), in which children observe blocks placed on a machine, where some of the blocks make the machine play music. Page 7 of 32

Causal Explanation Children have to learn which blocks activate the machine, which can involve positing a novel kind corresponding to a subset of blocks, and/or positing a novel causal relationship between those blocks (or some of their features) and the machine’s activation. In Walker et al.’s studies, 5-year-old children observed eight blocks successively placed on the machine, where four activated the machine and four did not. Crucially, half the children were prompted to explain after each observation (“Why did [didn’t] this block make my machine play music?”), and the remaining children, in the control condition, were asked to report the outcome (“What happened to my machine when I put this block on it? Did it play music?”). This control task was intended to match the explanation condi­ tion in eliciting a verbal response and drawing attention to the relaionship between each block and the machine, but without requiring that the child explain. Across studies, Walker et al. (2016) varied the properties of the blocks to investigate whether prompting children to explain made them more likely to favor causal hypotheses that were more consistent with the data (i.e., one hypothesis accounted for 100% of ob­ servations and the other for 75%) and/or more consistent with prior beliefs (i.e., one hy­ pothesis involved heavier blocks activating the machine, which matched children’s initial asumptions; the other involved blocks of a given color activating the machine). When competing causal hypotheses were matched in terms of prior beliefs but varied in the evidence they accounted for, children who were prompted to explain were significant­ ly more likely than controls to favor the hypothesis with stronger evidence. And when competing causal hypotheses were matched in terms of evidence but varied in their con­ sistency with prior beliefs, children who were prompted to explain were significantly more likely than controls to favor the hypothesis with a higher prior. In other words, ex­ plaining made children more responsive to both crucial ingredients of causal learning: ev­ idence and prior beliefs. (p. 420)

In their final study, Walker et al. (2016) considered a case in which evidence and prior be­ liefs came into conflict: a hypothesis that accounted for 100% of the evidence (“blue blocks activate the machine”) was pitted against a hypothesis favored by prior beliefs (“big blocks activate the machine”), but that only accounted for 75% of the evidence. In this case, children who were prompted to explain were significantly more likely than con­ trols to go with prior beliefs, guessing that a novel big block rather than a novel blue block would activate the machine. This pattern of responses was compared against the predictions of a Bayesian model that incorporated children’s own priors and likelihoods as estimated from an independent task. The results suggested that children who were prompted to explain were less likely than children in the control condition to conform to Bayesian inference. This result may seem surprising in light of explainers’ greater sensi­ tivity to both evidence and prior beliefs, which suggests that explaining results in “bet­ ter” performance. However, it is less surprising in light of the findings reported in the previous section, which consistently point to a divergence between explanation-based judgments and assessments of posterior probability.

Page 8 of 32

Causal Explanation While the evidence summarized thus far is restricted to preschool-aged children, it is like­ ly that similar processes operate in older children and adults. For instance, Kuhn and Katz (2009) had fourth-grade children engage in a causal learning task that involved iden­ tifying the causes of earthquakes by observing evidence. The children subsequently par­ ticipated in a structurally similar causal learning task involving an ocean voyage, where half were instructed to explain the basis for each prediction that they made, and those in a control group were not. When the same students completed the earthquake task in a post-test, those who had explained generated a smaller number of evidence-based infer­ ences; instead, they seemed to rely more heavily on their (mistaken) prior beliefs, in line with the findings from Walker et al. (2016). In a classic study with eighth-grade students, Chi, De Leeuw, Chiu, and LaVancher (1994) prompted students to “self-explain” as they read a passage about the circulatory system, with students in the control condition in­ stead prompted to read the text twice. Students who explained were significantly more likely to acquire an accurate causal model of the circulatory system, in part, they suggest, because explaining “involved the integration of new information into existing knowl­ edge”—that is, the coordination of evidence with prior beliefs. Finally, evidence with adults investigating the effects of explanation in categorization tasks mirrors the findings from Walker et al. (2016), with participants who explain both more responsive to evidence (Williams & Lombrozo, 2010) and more likely to recruit prior beliefs (Williams & Lombro­ zo, 2013). Why does the process of explaining affect causal learning? One possibility is that explain­ ing simply leads to greater attention or engagement. This is unlikely for a variety of rea­ sons. Prior work has found that while explaining leads to some improvements in perfor­ mance, it also generates systematic impairments. In one study, children prompted to ex­ plain were significantly less likely than controls to remember the color of a gear in a gear toy (Legare & Lombrozo, 2014); in another, they were significantly less likely to remem­ ber which sticker was placed on a block (Walker et al., 2014). Research with adults has also found that a prompt to explain can slow learning and increase error rates in a cate­ gory learning task (Williams, Lombrozo, & Rehder, 2013). Moreover, the findings from the final study of Walker et al. (2016) suggest that prompting children to explain makes them look less, not more, like ideal Bayesian learners. Far from generating a global boost in performance, explanation seems to generate highly selective benefits. A second possibility is that explaining plays a motivational role that is specifically tied to causal learning. In a provocatively titled paper (“Explanation as Orgasm and the Drive for Causal Understanding”), Gopnik (2000) argues that the phenomenological satisfaction that accompanies a good explanation is part of what motivates us to learn about the causal structure of the world. Prompting learners to explain could potentially ramp up this motivational process, directing children and adults to causal relationships over causally irrelevant details (consistent with Legare & Lombrozo, 2014; Walker et al., 2014). Explaining could also affect the course of causal inquiry itself, with effects on (p. 421)

which data are acquired and how they inform beliefs (see Legare, 2012, for preliminary evidence that explanation guides exploration). Page 9 of 32

Causal Explanation Finally (and not mutually exclusively), it could be that effects of explanation on learning are effectively a consequence of IBE—that is, that in the course of explaining, children generate explanatory hypotheses, and these explanatory hypotheses are evaluated with “loveliness” as a proxy for “likeliness.” For instance, in Walker et al. (2016), children may have favored the hypothesis that accounted for more evidence because it had greater scope or coverage, and the hypothesis consistent with prior knowledge because it provid­ ed a specification of mechanism or greater coherence. We suspect that this is mostly, but only mostly, correct. Some studies have found that children who are prompted to explain outperform those in control conditions even when they fail to generate the right explana­ tion, or any explanation at all (Walker et al., 2014). This suggests the existence of some effects of engaging in explanation that are not entirely reducible to the effects of having generated any particular explanation. While such findings are puzzling on a classic interpretation of IBE, they can potentially be accommodated with a modified and augmented version (Lombrozo, 2012, 2016; Wilken­ feld & Lombrozo, 2015). Wilkenfeld and Lombrozo (2015) argue for what they call “ex­ plaining for the best inference” (EBI), an inferential practice that differs from IBE in fo­ cusing on the process of explaining as opposed to candidate explanations themselves. While IBE and EBI are likely to go hand in hand, there could be cases in which the ex­ planatory processes that generate the best inferences are not identical with those pro­ moted by possessing the best explanations, and EBI allows for this possibility. In sum, there is good evidence that the process of engaging in explanation influences causal learning. This is potentially driven by effects of explanation on the evaluation of both evidence and prior beliefs (Walker et al., 2016). One possibility is that by engaging in explanation, learners are more likely to favor hypotheses that offer “lovely” explana­ tions (Lombrozo, 2012, 2016), and to engage in cognitive processes that affect learning even when a lovely or accurate explanation is not acquired (Wilkenfeld & Lombrozo, 2015). It is not entirely clear, however, whether and when these effects of explanation lead to “better” causal learning. The findings from Amsterlaw and Wellman (2006) and Chi et al. (1994) suggest that effects can be positive, accelerating conceptual develop­ ment and learning. Other findings are more mixed (e.g., Kuhn & Katz, 2009), with the modeling result from Walker et al. (2016) suggesting that prompting children to explain makes them integrate evidence and prior beliefs in a manner that corresponds less close­ ly to Bayesian inference. Better delineating the contours of explanation’s beneficial and detrimental effects will be an important step for future research. It will also be important to investigate how people’s tendency to engage in explanation spontaneously corresponds to these effects. That is, are the conditions under which explaining is beneficial also the conditions under which people tend to spontaneously explain?

Assigning Causal Responsibility The previous sections considered two kinds of causal reasoning, one involving novel causal structures and the other causal events generated by known structures. Another Page 10 of 32

Causal Explanation important class of causal judgments involves the assignment of causal responsibility: to which cause(s) do we attribute a given effect? For instance, a doctor might attribute her patient’s disease to his weak immune system or to a cold virus, when both are in fact present and play a causal role. Causal attribution has received a great deal of attention within social psychology, with the classic conundrum concerning the attribution of some behavior to a person (“she’s so clumsy!”) versus a situation (“the staircase is so slippery!”) (for reviews, see Fiske & Tay­ lor, 2013; Kelley & Michela, 1980; Malle, 2004). While this research is often framed in terms of causation, it is natural to regard attribution in terms of explanation, with attribu­ tions corresponding to an answer to the question of why some event occurred (“Why did Ava slip?”). In his classic “ANOVA model,” Kelley (1967, 1973) proposed that people effec­ tively carry out an analysis of covariation between the behavior and a number of internal and external factors, such as the person, stimulus, and situation. For example, to explain why Ava slipped on the staircase yesterday, one would consider how this behavior fares along the dimensions of consensus (did other people slip?), the distinctiveness of the stimulus (did she slip only on that staircase?), and consistency across situations (does she usually slip, or was it the only time she did so?). Subsequent work, however, has identi­ fied a variety of additional factors that influence (p. 422) people’s attributions (e.g., Ahn, Kalish, Medin, & Gelman, 1995; Försterling, 1992; Hewstone & Jaspars, 1987; McArthur, 1972), and some have challenged the basic dichotomy on which the person-versus-situa­ tion analysis is based (Malle, 1999, 2004; Malle, Knobe, O’Laughlin, Pearce, & Nelson, 2000). (We direct readers interested in social attribution to Hilton, Chapter 32 in this vol­ ume.) Assignments of causal responsibility also arise in the context of what is sometimes called “causal selection”: the problem of deciding which cause or causes in a chain or other causal structure best explain or account for some effect. Such judgments are especially relevant in moral and legal contexts, where they are closely tied to attributions of blame. For example, suppose that someone steps on a log, which pushes a boulder onto a picnic blanket, crushing a chocolate pie. The person, the log, and the boulder all played a causal role in the pie’s destruction, but various factors might influence our assignment of causal responsibility, including the location of each factor in the chain, whether and by how much it increased the probability of the outcome, and whether the person intended and foresaw the culinary catastrophe (see, e.g., Hart & Honoré, 1985; Hilton, McClure, & Sut­ ton, 2009; Lagnado & Channon, 2008; McClure, Hilton, & Sutton, 2007; Spellman, 1997). (Chapter 29 in this volume, in which Lagnado and Gerstenberg discuss moral and legal reasoning, explores these issues in detail; also relevant is Chapter 12 by Danks on singu­ lar causation.) While research has not (to our knowledge) investigated whether explanatory considera­ tions such as simplicity and explanatory power influence judgments of causal responsibili­ ty, ideas from the philosophy and psychology of explanation can usefully inform research on this topic. For example, scholars of explanation often emphasize the ways in which an explanation request is underspecified by a why-question itself. When we ask, “Why did Page 11 of 32

Causal Explanation Ava slip on the stairs?” the appropriate response is quite different if we’re trying to get at why Ava slipped (as opposed to Boris) than if we’re trying to get at why Ava slipped on the stairs (as opposed to the landing). These questions involve a shift in what van Fraassen (1980) calls a “contrast class,” that is, the set of alternatives to the target event that the explanation should differentiate from the target via some appropriate relation (see also Cheng & Novick, 1991). McGill (1989) showed in a series of studies that a number of previously established ef­ fects in causal attribution—effects of perspective (actor vs. observer; Jones & Nisbett, 1971), covariation information (consensus and distinctiveness; Kelley, 1967), and the va­ lence of the behavior being explained (positive vs. negative; Weiner, 1985)—are related to shifts in the contrast class. Specifically, by manipulating the contrast class adopted by participants, McGill was able to eliminate the actor–observer asymmetry, interfere with the roles of consensus and distinctiveness information, and counteract self-serving attri­ butions of positive versus negative performance. These findings underscore the close re­ lationship between attribution and explanation. Focusing on explanation is also helpful in bringing to the foreground questions of causal relevance as distinct from probability. In a 1996 paper, Hilton and Erb presented a set of studies designed to clearly differentiate these notions. In one study, Hilton and Erb showed that contextual information can influence the perceived “goodness” and rele­ vance of an explanation without necessarily affecting its probability. For example, partici­ pants were asked to rate the following explanation of why a watch broke (an example adapted from Einhorn & Hogarth, 1986): “the watch broke because the hammer hit it.” This explanation was rated as fairly good, relevant, and likely to be true; however, after learning that the hammer hit the watch during a routine testing procedure at a watch fac­ tory, participants’ ratings of explanation quality and relevance dropped. In contrast, rat­ ings of probability remained high, suggesting that causal relevance and the probability of an explanation can diverge, and that these two factors differ in their susceptibility to this contextual manipulation. It is possible that these effects were generated by a shift in con­ trast, from “Why did this watch break now (as opposed to not breaking now)?” to “Why did this watch break (as opposed to some other watch breaking)?” More recently, Chin-Parker and Bradner (2010) showed that effects of background knowl­ edge and implicit contrasts extend to the generation of explanations. They manipulated participants’ background assumptions by presenting a sequence of causal events that ei­ ther did or did not seem to unfold toward a particular functional outcome (when it did, the sequence appeared to represent a closed-loop system functioning in a self-sustaining manner). Participants’ explanations of an ambiguous observation at the end of the se­ quence tended to (p. 423) invoke a failure of a system to perform its function in the former case, but featured proximal causes in the latter case. (In contrast to prior research, con­ text did not affect explanation evaluation in this design.) Taken together, these studies offer another set of examples of how explanatory considera­ tions (in this case, the contextually determined contrast class) can influence causal judg­ Page 12 of 32

Causal Explanation ments, and suggest that ascriptions of causal responsibility may vary depending on how they are framed: in terms of causal relevance and explanation, or in terms of probability and truth. It is also possible that considerations such as simplicity and scope play a role in assigning causal responsibility, above and beyond their roles in causal inference and learning. These are interesting questions for future research.

The Varieties of Causal Explanation There is no agreed-upon taxonomy for explanations; in fact, even the distinction between causal and non-causal explanation generates contested cases. For instance, consider an example from Putnam (1975). A rigid board has a round hole and a square hole. A peg with a square cross-section passes through the square hole, but not the round hole. Why? Putnam suggests that this can be explained by appeal to the geometry of the rigid objects (which is not causal), without appeal to lower-level physical phenomena (which are pre­ sumably causal). Is this a case of non-causal explanation? Different scholars provide dif­ ferent answers. One taxonomy that has proven especially fruitful in the psychological study of explanation has roots in Aristotle’s four causes (efficient, material, final, and formal), which are some­ times characterized not as causes per se, but in terms of explanation—as distinct answers to a “why?” question (Falcon, 2015). Efficient causes, which identify “the primary source of the change or rest” (e.g., a carpenter who makes a table), seem like the most canoni­ cally causal. Material causes, which specify “that out of which” something is made (e.g., wood for a table), are not causal in a narrow sense (for instance, we wouldn’t say that the wood causes or is a cause of the table), but they nonetheless play a clear causal role in the production of an object. Final and formal causes are less clearly causal; but, as we consid­ er in the following discussion, there are ways in which each could be understood causally, as well. First, consider final causes, which offer “that for the sake of which a thing is done.” Final cause explanations (or perhaps more accurately, their contemporary counterparts) are al­ so known as teleological or functional explanations, as they offer a goal or a function. For instance, we might explain the detour to the café by appeal to a goal (getting coffee), or the blade’s sharpness by appeal to its function (slicing vegetables). On the face of it, these explanations defy the direction of causal influence: they explain a current event (the detour) or property (the sharpness) by appeal to something that occurs only later (the coffee acquisition or the vegetable slicing). Nonetheless, some philosophers have ar­ gued that teleological explanations can be understood causally (e.g., Wright, 1976), and there is evidence that adults (Lombrozo & Carey, 2006) and children (Kelemen & DiYanni, 2005) treat them causally, as well (see also Chaigneau, Barsalou, & Sloman, 2004, and Lombrozo & Rehder, 2012, for more general investigations of the causal structure of functions).

Page 13 of 32

Causal Explanation How can teleological explanations be causal? On Wright’s view, teleological explanations do not explain the present by appeal to the future—rather, the appeal to an unrealized goal or function is a kind of shorthand for a complex causal process that brought about (and hence preceded) what is being explained. In cases of intentional action, the function or goal could be a shorthand for the corresponding intention that came first: the detour to the café was caused by a preceding intention to get coffee, and the blade’s sharpness was caused by the designer’s antecedent intention to create a tool for slicing vegetables. Oth­ er cases, however, can be more complex. For instance, we might explain this zebra’s stripes by appeal to their biological function (camouflage) because its ancestors had stripes that produced effective camouflage, and in part for that reason, stripes were in­ creased or maintained in the population. If past zebra stripes didn’t produce camouflage, then this zebra wouldn’t have stripes (indeed, this zebra might not exist at all). In this case, the function can be explanatory because it was produced by “a causal process sensi­ tive to the consequences of changes it produces” (Lombrozo & Carey, 2006; Wright, 1976), even in the absence of a preceding intention to realize the function. Lombrozo and Carey (2006) tested these ideas as a descriptive account of the conditions under which adults accept teleological explanations. In one study, they presented partici­ pants with causal stories in which a functional property did or did not satisfy Wright’s conditions. For example, participants learned about genetically engineered gophers that eat weeds, and whose pointy claws damage the roots of weeds as they dig, making them popular among (p. 424) farmers. The causal role of “damaging roots” in bringing about the pointy claws varied across conditions, from no role (the genetic engineer accidentally introduced a gene sequence that resulted in gophers with pointy claws), to a causal role stemming from an intention to damage roots (the genetic engineer intended to help elimi­ nate weeds, and to that end engineered pointy claws), to a causal role without an inten­ tion to damage roots (the genetic engineer didn’t realize that pointy claws damaged weed roots, but did notice that the pointy claws were popular and decided to create all of his gophers with pointy claws). Participants then rated the acceptability and quality of teleo­ logical (and other) explanations. For the vignette involving genetically engineered go­ phers, they were asked why the gophers had pointy claws, and rated “because the pointy claws damage weed roots” as a response. In this and subsequent studies, Lombrozo and Carey (2006) found that teleological expla­ nations are understood causally in the sense that participants only accepted teleological explanations when the function or goal invoked in the explanation played an appropriate causal role in bringing about what was being explained. More precisely, this causal re­ quirement was necessary for teleological explanations to be accepted, but not sufficient. In the preceding examples, teleological explanations were accepted at high levels when the function was intended, at moderate levels when the function played a non-intentional causal role, and at low levels when the function played no causal role at all. Lombrozo and Carey suggest (and provide evidence) that in addition to satisfying certain causal re­ quirements, teleological explanations might call for the existence of a general pattern that makes the function predictively useful. Page 14 of 32

Causal Explanation Kelemen and DiYanni (2005) conducted a study with elementary school children (6–7 and 9–10-year-olds) investigating the relationship between their acceptance and generation of teleological explanations for natural phenomena, on the one hand, and their causal com­ mitments concerning their origins, on the other hand—specifically, whether they believed that an intentional designer of some kind (“someone or something”) made them or they “just happened.” The tendency to endorse and generate teleological explanations of nat­ ural events, non-living natural objects, and animals was significantly correlated with be­ lief in the existence of an intentional creator of some kind, be it God, a human, or an un­ specified force or agent. While these findings do not provide direct support for the idea that teleological explanations are grounded in a preceding intention to produce the spe­ cific function in question, the link between teleological explanations and intentional de­ sign more generally is consistent with the idea that teleological explanations involve some basic causal commitments. Along the same lines, Kelemen, Rottman, and Seston (2013) found that adults (including professional scientists) who believe in God or “Gaia” are more likely to accept scientifically unwarranted teleological explanations (see also ojale­ hto, Waxman, & Medin, 2013, for a relevant discussion). Thus, the findings to date sug­ gest that teleological explanations are understood causally by both adults and children. What about formal explanations? Within Aristotle’s framework, a formal explanation of­ fers “the form” of something or “the account of what-it-is-to-be.” Within psychology, what little work there is on formal explanation has focused on explanations that appeal to cate­ gory membership. For example, Prasada and Dillingham (2006) define formal explana­ tions as stating that tokens of a type have certain properties because they are the kinds of things they are (i.e., tokens of the respective type): we can say that Zach diagnoses ail­ ments because he is a doctor, or that a particular object is sharp because it is a knife. In their original paper and in subsequent work, Prasada and Dillingham (2006, 2009) ar­ gue that formal explanations are not causal, but instead are explanatory by virtue of a part–whole relationship. They show that only properties that are considered to be aspects of the kind support formal explanations, in contrast to “statistical” properties that are merely reliably associated with the kind. For example, people accepted a formal explana­ tion of why something has four legs by reference to its category (“because it’s a dog”), and also accepted the claim that “having four legs” is one aspect of being a dog. In con­ trast, participants rejected formal explanations such as “that (pointing to a barn) is red because it’s a barn,” and also denied that being red is one aspect of being a barn (even though most barns are red). Prasada and Dillingham (2009) argue that the relationship underlying such formal explanation is constitutive (not causal): aspects are connected to kinds via a part–whole relationship, and such relationships are explanatory because the “existence of a whole presupposes the existence of its parts, and thus the existence of a part is rendered intelligible by identifying the whole of which it is a part” (p. 421). Prasada and Dillingham offer two additional pieces of evidence for the proposal that for­ mal (p. 425) explanations are constitutive, and not causal. First, they demonstrate the ex­ planatory potential of the part–whole relationship by showing that when this relationship is made explicit, even statistical features can support formal explanations. For example, Page 15 of 32

Causal Explanation we can explain, “Why is that (pointing to a barn) red? Because it is a red barn,” where be­ ing red is understood as part of being a red barn (Prasada & Dillingham, 2009). This ex­ planation isn’t great, but neither is it tautological: it identifies the source of the redness in something about the red barn, as opposed, for instance, to the light that happens to be shining on it (see also Cimpian & Salomon, 2014, on “inherent” explanations). Less con­ vincingly, they attempt to differentiate formal explanations from causal-essentialist expla­ nations. On causal-essentialist accounts, a category’s essence is viewed as the cause of the category members’ properties (Gelman, 2003; Gelman & Hirschfeld, 1999; Medin & Ortony, 1989), which could ground formal explanations in a causal relationship. To test this, Prasada and Dillingham had participants evaluate explanations such as “Why does that (pointing to a dog) have four legs? Because it has the essence of a dog which causes it to have four legs” (Prasada & Dillingham, 2006). While there was a trend for formal ex­ planations to be rated more highly than causal-essentialist explanations for properties that were taken to be aspects of a given kind, the results were inconclusive. As Prasada and Dillingham acknowledge, the wording of the causal-essentialist explanations was awkward, which could partially account for their middling acceptance. It thus remains a possibility that at least some formal explanations are understood causally, as pointers to some category-associated essence or causal factor responsible for the properties being explained. One reason it is valuable to recognize the diversity of explanations is that different kinds of explanations lead to systematically different patterns of causal judgment. For example, Lombrozo (2009) investigated the relationship between different kinds of causal explana­ tions and the relative importance of features in classification (see also Ahn, 1998). Partici­ pants learned about novel artifacts and organisms with three causally related features. To illustrate, one item involved “holings,” a type of flower with “brom” compounds in its stem, which makes it bend over as it grows, which means its pollen can be spread to oth­ er flowers by wandering field mice. Participants were asked a why-question about the middle feature (e.g., “Why do holings typically bend over?”), which was ambiguous as a request for a mechanistic explanation (e.g., “Because of the brom compounds”) or a teleo­ logical explanation (e.g., “In order to spread their pollen”). Participants provided an ex­ planation and were subsequently asked to decide whether novel flowers were holings, where some shared the mechanistic feature (brom compounds) and some shared the functional feature (bending over). Lombrozo found that participants who provided func­ tional explanations in response to the ambiguous why-question were significantly more likely than participants who did not to then privilege the functional feature relative to the mechanistic feature when it came to classification. Similarly, a follow-up study found that experimentally prompting participants to generate a particular explanation type by dis­ ambiguating the why-question (“In other words, what purpose might bending over serve?”) had the same effect (see also Lombrozo & Rehder, 2012, for additional evidence about the relationship between functions and kind classification). Additional studies suggest that the effects of mechanistic versus functional explanations extend beyond judgments of category membership. Lombrozo and Gwynne (2014) employed a method similar to Lombrozo (2009), presenting participants with causal Page 16 of 32

Causal Explanation chains consisting of three elements, such as a certain gene that causes a speckled pat­ tern in a plant, which attracts butterflies that play a role in pollination. Participants ex­ plained the middle feature (the speckled pattern) and generalized a number of aspects of that feature (e.g., its density, contrast, and color) to novel entities that shared either a causal or a functional feature with the original. Lombrozo and Gwynne found that explain­ ing a property functionally (versus mechanistically) promoted the corresponding type of generalization. Vasilyeva and Coley (2013) demonstrated a similar link between explanation and general­ ization in an open-ended task. Participants learned about plants and animals possessing novel but informative properties (e.g., ducks have parasite X [or X-cells]) and generated hypotheses about which other organisms might share the property. In the course of gen­ erating these hypotheses, participants spontaneously produced formal, causal, and teleo­ logical explanations in a manner consistent with the property they reasoned about. Of most importance the type of explanation predicted the type of generalization: for exam­ ple, people were most likely to generalize properties to entities related via causal interac­ tions (e.g., plants and insects that ducks eat, or things that eat ducks) after generating causal (p. 426) explanations (e.g., they got it from their food). In a separate set of studies, Vasilyeva and Coley (in preparation) ruled out an alternative account based exclusively on the direct effects of generalized properties on generalizations. Beyond highlighting some causal relationships over others, different kinds of explana­ tions could change the way participants represent and reason about causal structure. In­ deed, findings from Lombrozo (2010) suggest that this is the case. In a series of studies, Lombrozo presented participants with causal structures drawn from the philosophical lit­ erature and intended to disambiguate two accounts of causation: those based on some kind of dependence relationship (see Le Pelley, Griffiths, and Beesley, Chapter 2 in this volume) and those based on some kind of transference (see Wolff and Thorstad, Chapter 9 in this volume). According to one version of the former view, C is a cause of E if it is the case that had C not occurred, E would not have occurred. In other words, E depends upon C in the appropriate way, in this case counterfactually. According to one version of trans­ ference views, C is a cause of E if there was a physical connection between C and E— some continuous mechanism or conserved physical quantity, such as momentum. While dependence and transference often go hand in hand, they can come apart in cases of “double prevention” and “overdetermination.” Lombrozo presented participants with such cases and found that judgments were more closely aligned with dependence views than transference views when the causal structures were directed toward a function or goal, and therefore supported a teleological explanation. Lombrozo (2010) explains this result, in part, by appeal to the idea of equifinality: when a process is goal-directed, the end may be achieved despite variations in the means. To borrow William James’s famous example, Romeo will find his way to Juliet whatever obstacle is placed in his path (James, 1890). He might scale a fence or wade through a river, but the end—reaching Juliet—will remain the same. When participants reason about a structure in teleological or goal-di­ rected terms, they may similarly represent it as means- or mechanism-invariant, and Page 17 of 32

Causal Explanation therefore focus on dependence relationships irrespective of the specific transference that happened to obtain. In sum, pluralism has long been recognized as a feature of explanation, with Aristotle’s taxonomy providing a useful starting point for charting variation in explanations (al­ though it is by no means the only taxonomy of explanation; see, for example, Cimpian & Salomon, 2014, on inherent versus extrinsic explanations). We have reviewed evidence that teleological explanations are causal explanations, but that they are nonetheless treated differently from mechanistic explanations, which do not appeal to functions or goals. The evidence concerning formal explanations is less conclusive, but points to a vi­ able alternative to a causal interpretation, with formal explanation instead depending on constitutive part–whole relations. Recognizing explanatory pluralism can provide a useful road map for thinking about plu­ ralism when it comes to causation and causal relations. In fact, as we have seen, different kinds of explanations do lead to systematic differences in classification and inference, with evidence that causal relationships themselves may be represented differently under different “explanatory modes.” In the following section, we take a closer look at mecha­ nistic explanations and their relationship to causation and mechanisms.

Explanation and Causal Mechanisms The “mechanistic explanations” considered in the previous section concerned the identifi­ cation of one or more causes that preceded some effect. Often, however, causal explana­ tions do not simply identify causes, but instead aim to articulate how the cause brought about the effect. That is, they involve a mechanism. But what, precisely, is a mechanism? Are all mechanisms causal? And do mechanisms have a privileged relationship to explana­ tion? In this section, we begin to address these questions about the relationship between mechanisms and explanations. For a more general discussion of mechanisms, we direct readers to the chapter on mechanisms by Johnson and Ahn (Chapter 8 in this volume). Within psychology, there is growing interest in the role of mechanisms in causal reason­ ing. For example, Ahn, Kalish, Medin, and Gelman (1995) found that people seek “mecha­ nistic” information in causal attribution. Park and Sloman (2013) found that people’s vio­ lations of the Markov assumption depended on their “mechanistic” beliefs about the un­ derlying causal structure. Buehner and McGregor (2006) showed that beliefs about mech­ anism type moderate effects of temporal contiguity in causal judgments (see also Ahn & Bailenson, 1996; Buehner & May, 2004; Fugelsang & Thompson, 2000; Koslowski & Oka­ gaki, 1986; Koslowski, Okagaki, Lorenz, & Umbach, 1989; for reviews, see Ahn & Kalish, 2000; Johnson & Ahn, Chapter 8 in this volume; Koslowski, 1996, 2012; Koslowski & Mas­ nik, 2010; Sloman & Lagnado, 2014; Waldmann & Hagmayer, 2013 (p. 427) ). Despite these frequent appeals to mechanisms and mechanistic information, however, there is no explic­ itly articulated and widely endorsed conception of “mechanism.”

Page 18 of 32

Causal Explanation Most often, a mechanism is taken to spell out the intermediate steps between some cause and some effect. For example, Park and Sloman (2014) define a mechanism as “the set of causes, enablers, disablers, and preventers that are directly involved in producing an ef­ fect, along with information about how the effect comes about, including how it unfolds over time” (p. 807). Research that adopts a perspective along these lines often goes fur­ ther in explicitly identifying such mechanisms as explanations (and these terms are often used interchangeably, as in Koslowski & Masnik, 2010). Other work operationalizes mech­ anisms using measures of explanation, implicitly suggesting a correspondence. For exam­ ple, to validate a manipulation of mechanism, Park and Sloman asked participants whether the same explanation applies to both effects in a common-cause structure (see al­ so Park & Sloman, 2013). Similarly, in a study examining mental representations of mech­ anisms, Johnson and Ahn (2015) considered (but did not ultimately endorse) an “explana­ tory” sense of mechanism, which they operationalized by asking participants to rate the extent to which some event B explains why event A led to event C. Shifting from psychology to philosophy, we find a class of accounts of explanation that likewise associate explanations with a specification of mechanisms (e.g., Bechtel & Abra­ hamsen, 2005; Glennan, 1996, 2002; Machamer, Darden, & Craver, 2000; Railton, 1978; Salmon, 1984). Consistent with the empirical work reviewed earlier, some of these ac­ counts (e.g., Railton, 1978; Salmon, 1984) consider mechanisms to be “sequences of in­ terconnected events” (Glennan, 2002, p. S345). Canonical examples include causal chains or networks of events leading to a specific outcome, such as a person who kicks a ball, which bounces off a pole, which breaks a window. On these views, explanation, causation, and mechanisms are not only intimately related, but potentially interdefined. A second view of mechanisms within philosophy, however, departs more dramatically from work in psychology, and also suggests a more circumscribed role for causation. These views analyze mechanisms as complex systems that involve a (typically hierarchi­ cal) structure and arrangement of parts and processes, such as that exhibited by a watch, a cell, or a socioeconomic system (e.g., Bechtel & Abrahamsen, 2005; Glennan, 1996, 2002; Machamer, Darden, & Craver, 2000). Within this framework, Craver and Bechtel (2007) offer an insightful analysis of causal and non-causal relationships within a multi­ level mechanistic system. Specifically, they suggest that interlevel (i.e., “vertical”) rela­ tionships within a mechanism are not causal, but constitutive. For instance, a change in rhodopsin in retinal cells can partially explain how signal transduction occurs, but we wouldn’t say that this change causes signal transduction; it arguably is signal transduc­ tion (or one aspect of it). Craver and Bechtel point out that constitutive relations conflict with many common assumptions about event causation: that causes and effects must be distinct events, that causes precede their effects, that the causal relation is asymmetrical, and so on. Unlike causation, explanation can accommodate both causal (intralevel) rela­ tionships and constitutive (interlevel) relationships, of the kind documented by Prasada and Dillingham’s (2009) work on formal explanation.

Page 19 of 32

Causal Explanation Although Craver and Bechtel convincingly argue that the causal reading of interlevel rela­ tionships is erroneous (see also Glennan, 2010, for related claims), as a descriptive mat­ ter, it could be that laypeople nonetheless interpret them in causal terms. An example from the Betty Crocker Cookbook, discussed by Patricia Churchland (1994), illustrates the temptation. In the book, Crocker is correct to explain that microwave ovens work by ac­ celerating the molecules comprising the food, but she wrongly states that the excited molecules rub against one another and that their friction generates heat. Crocker as­ sumes that the increase in mean kinetic energy of the molecules causes heat, when in fact heat is constituted by the mean kinetic energy of the molecules (Craver & Bechtel, 2007). A study by Chi, Roscoe, Slotta, Roy, and Chase (2012) showed that eighth and ninth graders, like Crocker, tended to misconstrue non-sequential, emergent processes as di­ rect sequential causal relationships. It’s possible that adults might make similar errors as well, assimilating non-causal explanations to a causal mold. There are thus many open questions about how best to define mechanisms for the purpos­ es of psychological theory, and about the extent to which mechanisms are represented in terms of strictly causal relationships. What we do know, however, is that explanations and mechanisms seem to share a privileged relationship. More precisely, there is evidence that the association between mechanisms and explanation claims is closer than that be­ tween (p. 428) mechanisms and corresponding causal claims (Vasilyeva & Lombrozo, 2015). The studies by Vasilyeva and Lombrozo (2015) used “minimal pairs”: causal and explana­ tory claims that were matched as closely as possible. For example, participants read about a person, PK, who spent some time in the portrait section of a museum and made an optional donation to the museum. They were then asked to evaluate how good they found an explanation for the donation (“Why did PK make an optional donation to the mu­ seum? Because PK spent some time in the portrait section”), or how strongly they en­ dorsed a causal relationship (“Do you think there exists a causal relationship between PK spending some time in a portrait section and PK making an optional donation to the mu­ seum?”). Vasilyeva and Lombrozo varied two factors across items and participants: the strength of covariation evidence between the candidate cause and effect, and knowledge of a mediat­ ing mechanism. In the museum example, some participants learned the speculative hy­ pothesis that “being surrounded by many portraits (as opposed to other kinds of paint­ ings) creates a sense that one is surrounded by watchful others. This reminds the person of their social obligations, which in turn encourages them to donate money to the public museum.” Both explanation and causal judgments were affected by these manipulations of covariation and mechanism information. However, they were not affected equally: specifying a mechanism had a stronger effect on explanation ratings than on causal rat­ ings, while the strength of covariation evidence had a stronger effect on causal ratings than on explanation ratings.

Page 20 of 32

Causal Explanation The findings from Vasilyeva and Lombrozo (2015) support a special relationship between explanations and mechanisms. They also challenge views that treat explanations as equiv­ alent to identifying causal relationships, since matched explanation and causal claims were differentially sensitive to mechanisms and covariation. The findings thus raise the possibility that explanatory and causal judgments are tuned to support different cognitive functions. For example, explanation could be especially geared toward reliable and broad generalizations (Lombrozo & Carey, 2006), which can benefit from mechanistic informa­ tion: when we understand the mechanism by which some cause generates some effect, we can more readily infer whether the same relationship will obtain across variations in circumstances. By learning the mechanism that mediates the relationship between visit­ ing a portrait gallery and making an optional museum donation, for example, we are in a better position to predict whether visiting a figurative versus an abstract sculpture gar­ den will have the same effect. This benefit can potentially be realized with quite skeletal mechanistic (Rozenblit & Keil, 2002) or functional understanding (Alter, Oppenheimer, & Zemla, 2010); people need not understand a mechanism in full detail to gain some infer­ ential advantage. Causal claims, by contrast, could more closely track the evidence con­ cerning a particular event or relationship, rather than the potential for broad generaliza­ tion. In sum, the picture that emerges is one of partial overlap between causality, explanation, and mechanisms. Work in philosophy offers a variety of proposals emphasizing different aspects of mechanisms: structure, functions, temporally unfolding processes connecting starting conditions to the end state, and so on. Explanatory and causal judgments could track different aspects of mechanisms, resulting in the patterns of association and diver­ gence observed. We suspect that adopting more explicit and sophisticated notions of mechanism will help research in this area move forward. On a methodological note, we think the strategy adopted in Vasilyeva and Lombrozo (2015)—of contrasting the charac­ teristics of causal explanation claims with “matched” causal claims—could be useful in driving a wedge between different kinds of judgments, thus shedding light on their unique characteristics and potentially unique roles in human cognition. This strategy can also generalize to other kinds of judgments. For example, Dehghani, Iliev, and Kaufmann (2012) and Rips and Edwards (2013) both report systematic patterns of divergence be­ tween explanations and counterfactual claims, another judgment with a potentially foun­ dational relationship to both explanation and causation.

Conclusions Throughout the chapter, we have presented good evidence that explanatory considera­ tions affect causal reasoning, with implications for causal inference, causal learning, and attribution. We have also considered different kinds of explanations, including their dif­ ferential effects on causal generalizations and causal representation, and the role of mechanisms in causal explanation. However, many questions remain open. We highlight four especially pressing questions here.

Page 21 of 32

Causal Explanation First, we have observed many instances in which explanation leads to departures from “normative” (p. 429) reasoning, at least on the assumption that one ought to infer causes and causal relationships by favoring causal hypotheses with the highest posterior proba­ bilities. Are these departures truly errors? Or have we mischaracterized the relevant com­ petence? In particular, could it be that explanatory judgments are well-tuned to some cog­ nitive end, but that end is not the approximation of posterior probabilities? Second, we have focused on a characterization of explanations and the effects of engag­ ing in explanation, with little attention to underlying cognitive mechanisms. How do peo­ ple actually go about generating and evaluating causal explanations? How do the mental representations that support explanation relate to those that represent causal structure? And how do explanatory capacities arise over the course of development? Third, what is the relationship between causal and non-causal explanations? Are they both explanatory by virtue of some shared explanatory relationship, or are causal expla­ nations explanatory by virtue of being causal, with non-causal explanations explanatory for some other reason (for instance, because they embody a part–whole relationship)? On each view, what are the implications for causation? Finally, we have seen how debates in explanation (from both philosophy and psychology) can inform the study of causation, with examples including inference to the best explana­ tion, the idea of a “contrast class,” and pluralism about explanatory kinds. Can the litera­ ture on levels of explanation (e.g., Potochnik, 2010) perhaps inspire some new debates about levels of causation (as in, e.g., Woodward, 2010)? Recent work on hierarchical Bayesian models and hierarchical causal structures is beginning to move in this direction, with the promise of a richer and more powerful way to understand humans’ remarkable ability to reason about and explain the causal structure of the world.

Acknowledgments The preparation of this chapter was partially supported by the Varieties of Understanding Project funded by the Templeton Foundation, as well as an NSF CAREER award to the first author (DRL-1056712). We are also grateful to David Danks, Samuel Johnson, and Michael Waldmann for helpful comments on a previous draft if this chapter.

References Ahn, W. (1998). Why are different features central for natural kinds and artifacts? The role of causal status in determining feature centrality. Cognition, 69(2), 135–178. Ahn, W. K., & Bailenson, J. (1996). Causal attribution as a search for underlying mecha­ nisms: An explanation of the conjunction fallacy and the discounting principle. Cognitive Psychology, 31(1), 82–123. Ahn, W. K., & Kalish, C. (2000). The role of mechanism beliefs in causal reasoning. In F. C. Keil (Ed.), Explanation and cognition. Cambridge, MA: MIT Press. Page 22 of 32

Causal Explanation Ahn, W. K., Kalish, C. W., Medin, D. L., & Gelman, S. A. (1995). The role of covariation ver­ sus mechanism information in causal attribution. Cognition, 54, 299–352. Alter, A. L., Oppenheimer, D. M., & Zemla, J. C. (2010). Missing the trees for the forest: A construal level account of the illusion of explanatory depth. Journal of Personality and So­ cial Psychology, 99, 436–451. Amsterlaw, J., & Wellman, H. M. (2006). Theories of mind in transition: A microgenetic study of the development of false belief understanding. Journal of Cognition and Develop­ ment, 7(2), 139–172. Baker, A. (2013). Simplicity. In E. N. Zalta (Ed.), The Stanford encyclopedia of philosophy (Fall 2013 ed.). http://plato.stanford.edu/archives/fall2013/entries/simplicity/. Bartsch, K., & Wellman, H. M. (1995). Children talk about the mind. Oxford: Oxford Uni­ versity Press. Bechtel, W., & Abrahamsen, A. (2005). Explanation: A mechanist alternative. Studies in History and Philosophy of Science Part C :Studies in History and Philosophy of Biological and Biomedical Sciences, 36(1995), 421–441. Bonawitz, E. B., & Lombrozo, T. (2012). Occam’s rattle: Children’s use of simplicity and probability to constrain inference. Developmental Psychology, 48(4), 1156–1164. Buehner, M. J. (2005). Contiguity and covariation in human causal inference. Learning & Behavior: A Psychonomic Society Publication, 33(2), 230–238. Buehner, M. J., & May, J. (2004). Abolishing the effect of reinforcement delay on human causal learning. The Quarterly Journal of Experimental Psychology, 57B, 179–191. Buehner, M. J., & McGregor, S. (2006). Temporal delays can facilitate causal attribution: Towards a general timeframe bias in causal induction. Thinking & Reasoning, 12, 353– 378. Chaigneau, S. E., Barsalou, L. W., & Sloman, S. A. (2004). Assessing the causal structure of function. Journal of Experimental Psychology: General, 133(4), 601–25. Cheng, P. W. (1997). From covariation to causation: A causal power theory. Psychological Review, 104, 367–405. Cheng, P. W., & Novick, L. R. (1990). A probabilistic contrast model of causal induction. Journal of Personality and Social Psychology, 58(4), 545. Cheng, P. W., & Novick, L. R. (1991). Causes versus enabling conditions. Cognition, 40(1– 2), 83–120. Cheng, P. W., & Novick, L. R. (1992). Covariation in natural causal induction. Psychologi­ cal Review, 99(2), 365–382. Page 23 of 32

Causal Explanation Chi, M. T. H., De Leeuw, N., Chiu, M.-H., & Lavancher, C. (1994). Eliciting self-explana­ tions improves understanding. Cognitive Science, 18(3), 439–477. Chi, M. T. H., Roscoe, R. D., Slotta, J. D., Roy, M., & Chase, C. C. (2012). Misconceived causal explanations for emergent processes. Cognitive Science, 36(1), 1–61. Chin-Parker, S., & Bradner, A. (2010). Background shifts affect explanatory style: How a pragmatic theory of explanation accounts for background effects in the generation of ex­ planations. Cognitive Processing, 11(3), 227–249. Churchland, P. S. (1994). Can neurobiology teach us anything about consciousness? Pro­ ceedings and Addresses of the American Philosophical Association, 67(4), 23–40. Cimpian, A., & Salomon, E. (2014). The inherence heuristic: An intuitive means of making sense of the world, and a potential precursor to psychological essentialism. Be­ havioral and Brain Sciences, 37(5), 461–480. (p. 430)

Craver, C. F., & Bechtel, W. (2007). Top-down causation without top-down causes. Biology and Philosophy, 22, 547–563. Dehghani, M., Iliev, R., & Kaufmann, S. (2012). Causal explanation and fact mutability in counterfactual reasoning. Mind & Language, 27(1), 55–85. Douven, I. (2011). Abduction. In E. N. Zalta (Ed.), The Stanford encyclopedia of philoso­ phy. (Spring 2011 ed.). http://plato.stanford.edu/archives/spr2011/entries/abduc­ tion/. Douven, I. (2013). Inference to the best explanation, Dutch books, and inaccuracy minimi­ sation. Philosophical Quarterly, 63(252), 428–444. Douven, I., & Schupbach, J. N. (2015a). The role of explanatory considerations in updat­ ing. Cognition, 142, 299–311. Douven, I., & Schupbach, J. N. (2015b). Probabilistic alternatives to Bayesianism: The case of explanationism. Frontiers in Psychology, 6, 1–9. Einhorn, H. J., & Hogarth, R. M. (1986). Judging probable cause. Psychological Bulletin, 99(1), 3–19. Falcon, A. (2015). Aristotle on causality. In E. N. Zalta (Ed.), The Stanford encyclopedia of philosophy (Spring 2015 Edition). http://plato.stanford.edu/archives/spr2015/entries/ aristotle-causality/. Fiske, S. T., & Taylor, S. E. (2013). Social cognition: From brains to culture. Thousand Oaks, CA: Sage. Försterling, F. (1992). The Kelley model as an analysis of variance analogy: How far can it be taken? Journal of Experimental Social Psychology, 28(5), 475–490. Page 24 of 32

Causal Explanation Fugelsang, J. A., & Thompson, V. A. (2000). Strategy selection in causal reasoning: When beliefs and covariation collide. Canadian Journal of Experimental Psychology, 54, 15–32. Gelman, S. A. (2003). The essential child: Origins of essentialism in everyday thought. Ox­ ford: Oxford University Press. Gelman, S. A., & Hirschfeld, L. A. (1999). How biological is essentialism. Folkbiology, 9, 403–446. Glennan, S. (1996). Mechanisms and the nature of causation. Erkenntnis, 44(1), 49–71. Glennan, S. (2002). Rethinking mechanistic explanation. Philosophy of Science, 69(3), S342–S353. Glennan, S. (2010). Mechanisms, causes, and the layered model of the world. Philosophy and Phenomenological Research, 81(2), 362–381. Glymour, C., & Cheng, P. W. (1998). Causal mechanism and probablity: A normative ap­ proach. In M. Oaksford & N. Chater (Eds.), Rational models of cognition (pp. 295–313). Oxford: Oxford University Press. Good, I. J. (1960). Weight of evidence, corroboration, explanatory power, information and the utility of experiments. Journal of the Royal Statistical Society: Series B (Methodologi­ cal), 22(2), 319–331. Goodman, N. D., Baker, C. L., Bonawitz, E. B., Mansinghka, V. K., Gopnik, A., Wellman, H., et al. (2006). Intuitive theories of mind: A rational approach to false belief. In R. Sun & N. Miyake (Eds.), Proceedings of the 28th annual conference of the Cognitive Science Soci­ ety (pp. 1382–1387). Mahwah, NJ: Lawrence Erlbaum Associates. Gopnik, A. (2000). Explanation as orgasm and the drive for causal knowledge: The func­ tion, evolution, and phenomenology of the theory formation system. In F. Keil & R. A. Wil­ son (Eds.), Explanation and cognition (pp. 299–323). Cambridge, MA: MIT Press. Gopnik, A., & Sobel, D. M. (2000). Detecting blickets: How young children use informa­ tion about novel causal powers in categorization and induction. Child Development, 71(5), 1205–1222. Griffiths, T. L., & Tenenbaum, J. B. (2005). Structure and strength in causal induction. Cognitive Psychology, 51(4), 334–384. Harman, G. H. (1965). The inference to the best explanation. Philosophical Review, 74(1), 88–95. Hart, H. L. A., & Honoré, T. (1985). Causation in the Law. Oxford: Oxford University Press. Henderson, L. (2014). Bayesianism and inference to the best explanation. The British Journal for the Philosophy of Science, 65, 687–715. Page 25 of 32

Causal Explanation Hewstone, M., & Jaspars, J. (1987). Covariation and causal attribution: A logical model of the intuitive analysis of variance. Journal of Personality and Social Psychology, 53(4), 663– 672. Hilton, D. J., & Erb, H.-P. (1996). Mental models and causal explanation: Judgments of probable cause and explanatory relevance. Thinking and Reasoning, 2(4), 273–308. Hilton, D. J., McClure, J., & Sutton, R. M. (2009). Selecting explanations from causal chains: Do statistical principles explain preferences for voluntary causes? European Jour­ nal of Social Psychology, 39, 1–18. Holyoak, K. J., & Cheng, P. W. (2011). Causal learning and inference as a rational process: The new synthesis. Annual Review of Psychology, 62, 135–163. James, W. (1890). The principles of psychology. New York: H. Holt. Johnson, S. G. B., & Ahn, W. (2015). Causal networks or causal islands? The Rrepresenta­ tion of mechanisms and the transitivity of causal judgment. Cognitive Science, 1–36. Johnson, S. G. B., Johnston, A. M., Toig, A. E., & Keil, F. C. (2014). Explanatory scope in­ forms causal strength inferences. In P. Bello, M. Guarini, M. McShane, & B. Scassellati (Eds.), Proceedings of the 36th annual conference of the Cognitive Science Society (pp. 2453–2458). Austin, TX: Cognitive Science Society. Johnston, A. M., Johnson, S. G. B., Koven M. L., & Keil, F. C. (2015). Probabilistic versus heuristic accounts of explanation in children: Evidence from a latent scope bias. In D. C. Noelle, R. Dale, A. S. Warlaumont, J. Yoshimi, T. Matlock, C. D. Jennings, & P. P. Maglio (Eds.), Proceedings of the 37th annual conference of the Cognitive Science Society (pp. 1021–1026). Austin, TX: Cognitive Science Society. Jones, E. E., & Nisbett, R. E. (1971). The actor and the observer: Divergent perceptions of the causes of behavior. In E. E. Jones et al. (Eds.), Attribution: Perceiving the causes of behavior. Morristown, N.J.: General Learning Press. Kelemen, D., & DiYanni, C. (2005). Intuitions about origins: Purpose and intelligent de­ sign in children’s reasoning about nature. Journal of Cognition and Development, 6, 3–31. Kelemen, D., Rottman, J., & Seston, R. (2013). Professional physical scientists display tenacious teleological tendencies: Purpose-based reasoning as a cognitive default. Journal of Experimental Psychology: General, 142(4), 1074–1083. Kelley, H. H. (1967). Attribution theory in social psychology. Nebraska Symposium on Mo­ tivation, 15, 192–238. Kelley, H. H. (1973). The process of causal attributions. American Psychologist, 28, 107– 128.

Page 26 of 32

Causal Explanation Kelley, H. H., & Michela, J. L. (1980). Attribution theory and research. Annual Review of Psychology, 31, 457–501. Kemp, C., Goodman, N., & Tenenbaum, J. (2010). “http://www.psy.cmu.edu/%7Eckemp/ papers/kempgt10_learningtolearncausalmodels.pdf” Learning to learn causal mod­ els. Cognitive Science, 34(7), 1185–1243. Khemlani, S. S., Sussman, A. B., & Oppenheimer, D. M. (2011). Harry Potter and the sorcerer’s scope: Latent scope biases in explanatory reasoning. Memory & Cognition, 39(3), 527–535. (p. 431)

Koslowski, B. (1996). Theory and evidence: The development of scientific reasoning. Cam­ bridge, MA: MIT Press. Koslowski, B. (2012). Scientific reasoning: Explanation, confirmation bias, and scientific practice. In G. Feist & M. Gorman (Eds.), Handbook of the psychology of science. New York: Springer. Koslowski, B., & Masnick, A. (2010). Causal reasoning and explanation. In U. C. Goswami (Ed.), The Wiley-Blackwell handbook of childhood cognitive development (2nd ed., pp. 377–398). Malden, MA: Wiley-Blackwell. Koslowski, B., & Okagaki, L. (1986). Non-Humean indices of causation in problem-solving situations: Causal mechanism, analogous effects, and the status of rival alternative ac­ counts. Child Development, 57(5), 1100–1108. Koslowski, B., Okagaki, L., Lorenz, C., & Umbach, D. (1989). When covariation is not enough: The role of causal mechanism, sampling method, and sample size in causal rea­ soning. Child Development, 60(6), 1316–1327. Kuhn, D., & Katz, J. (2009). Are self-explanations always beneficial? Journal of Experimen­ tal Child Psychology, 103(3), 386–394. Lagnado, D. A., & Channon, S. (2008). Judgments of cause and blame: the effects of inten­ tionality and foreseeability. Cognition, 108(3), 754–70. Legare, C. H. (2012). Exploring explanation: Explaining inconsistent evidence informs ex­ ploratory, hypothesis-testing behavior in young children. Child Development, 83(1), 173– 85. Legare, C. H., & Lombrozo, T. (2014). Selective effects of explanation on learning during early childhood. Journal of Experimental Child Psychology, 126, 198–212. Legare, C. H., Wellman, H. M., & Gelman, S. A. (2009). Evidence for an explanation ad­ vantage in naïve biological reasoning. Cognitive Psychology, 58(2), 177–94. Lipton, P. (2004). Inference to the best explanation. London: Routledge.

Page 27 of 32

Causal Explanation Lombrozo, T. (2007). Simplicity and probability in causal explanation. Cognitive Psycholo­ gy, 55(3), 232–257. Lombrozo, T. (2009). Explanation and categorization: How “why?” informs “what?.” Cog­ nition, 110(2), 248–53. Lombrozo, T. (2010). Causal-explanatory pluralism: How intentions, functions, and mecha­ nisms influence causal ascriptions. Cognitive Psychology, 61(4), 303–32. Lombrozo, T. (2012). Explanation and abductive inference. In Oxford handbook of think­ ing and reasoning (pp. 260–276). Oxford: Oxford University Press. Lombrozo, T. (2016). Explanatory preferences shape learning and inference. Trends in Cognitive Sciences, 20, 748–759. Lombrozo, T., & Carey, S. (2006). Functional explanation and the function of explanation. Cognition, 99(2), 167–204. Lombrozo, T., & Gwynne, N. Z. (2014). Explanation and inference: Mechanistic and func­ tional explanations guide property generalization. Frontiers in Human Neuroscience, 8(September), 700. Lombrozo, T., & Rehder, B. (2012). Functions in biological kind classification. Cognitive Psychology, 65(4), 457–485. Machamer, P., Darden, L., & Craver, C. F. (2000). Thinking about mechanisms. Philosophy of Science, 67(1), 1–25. Malle, B. F. (1999). How people explain behavior: A new theoretical framework. Personali­ ty and Social Psychology Review, 3(1), 23–48. Malle, B. F. (2004). How the mind explains behavior: Folk explanations, meaning, and so­ cial interaction. Cambridge, MA: MIT Press. Malle, B. F., Knobe, J., O’Laughlin, M. J., Pearce, G. E., & Nelson, S. E. (2000). Conceptual structure and social functions of behavior explanations: Beyond person-situation attribu­ tions. Journal of Personality and Social Psychology, 79(3), 309–326. Mansinghka, V. K., Kemp, C., Tenenbaum, J. B., & Griffiths, T. L. (2006). Structured priors for structure learning. Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence (UAI 2006). McArthur, L. A. (1972). The how and what of why: Some determinants and consequences of causal attribution. Journal of Personality and Social Psychology, 22(2), 171–193. McClure, J., Hilton, D. J., & Sutton, R. M. (2007). Judgments of voluntary and physical causes in causal chains: Probabilistic and social functionalist criteria for attributions. Eu­ ropean Journal of Social Psychology, 37, 879–901. Page 28 of 32

Causal Explanation McGill, A. L. (1989). Context effects in judgments of causation. Journal of Personality and Social Psychology, 57(2), 189–200. Medin, D. L., & Ortony, A. (1989). Psychological essentialism. In S. Vosniadou & A. Ortony (Eds.), Similarity and Analogical Reasoning (pp. 179–195). Cambridge: Cambridge Univer­ sity Press. ojalehto, b., Waxman, S. R., & Medin, D. L. (2013). Teleological reasoning about nature: Intentional design or relational perspectives? Trends in Cognitive Sciences, 17(4), 166– 171. Pacer, M., & Lombrozo, T. (2015). Ockham’s Razor cuts to the root: simplicity in causal explanation. Manuscript in revision. Pacer, M., Williams, J. J., Chen, X., Lombrozo, T., & Griffiths, T. L. (2013). Evaluating com­ putational models of explanation using human judgments. In A. Nicholson & P. Smyth (Eds.), Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelli­ gence (pp. 498–507). Corvallis, Oregon: AUAI Press. Park, J., & Sloman, S. A. (2014). Causal explanation in the face of contradiction. Memory & Cognition, 42(5), 806–820. Park, J., & Sloman, S. A. (2013). Mechanistic beliefs determine adherence to the Markov property in causal reasoning. Cognitive Psychology, 67(4), 186–216. Peirce, C. S. (1955). Abduction and induction. In Philosophical writings of Peirce (Vol. 11). New York: Dover. Pennington, N., & Hastie, R. (1992). Explaining the evidence: Tests of the Story Model for juror decision making. Journal of Personality and Social Psychology, 62(2), 189–206. Perales, J. C., & Shanks, D. R. (2003). Normative and descriptive accounts of the influ­ ence of power and contingency on causal judgement. The Quarterly Journal of Experimen­ tal Psychology. A: Human Experimental Psychology, 56(6), 977–1007. Pine, K. J., & Siegler, R. S. (2003). The role of explanatory activity in increasing the gener­ ality of thinking. Paper presented at the biennial meeting of the Society for Research in Child Development, Tampa, FL. Potochnik, A. (2010). Levels of explanation reconceived. Philosophy of Science, 77(1), 59– 72. Prasada, S., & Dillingham, E. M. (2006). Principled and statistical connections in common sense conception. Cognition, 99(1), 73–112. Prasada, S., & Dillingham, E. M. (2009). Representation of principled connections: A win­ dow onto the formal aspect of common sense conception. Cognitive Science, 33(3), 401– 48. Page 29 of 32

Causal Explanation Preston, J., & Epley, N. (2005). Explanations versus applications: The explanatory power of valuable beliefs. Psychological Science, 16(10), 826–832. Putnam, H. (1975). Philosophy and our mental life. In H. Putnam, Mind, Language and Reality: Philosophical Papers (Vol. 2). New York: Cambridge University Press. (p. 432)

Railton, P. (1978). A deductive-nomological model of probabilistic explanation. Philosophy of Science, 44(2), 206–226. Read, S. J., & Marcus-Newhall, A. (1993). Explanatory coherence in social explanations: A parallel distributed processing account. Journal of Personality and Social Psychology, 65(3), 429. Rips, L. J., & Edwards, B. J. (2013). Inference and explanation in counterfactual reason­ ing. Cognitive Science, 37(6), 1107–35. Rozenblit, L., & Keil, F. (2002). The misunderstood limits of folk science: An illusion of ex­ planatory depth. Cognitive Science, 26, 521–562. Salmon, W. (1984). Scientific explanation and the causal structure of the world. Prince­ ton, NJ: Princeton University Press. Schupbach, J. N. (2011). Comparing probabilistic measures of explanatory power. Philoso­ phy of Science, 78(5), 813–829. Schupbach, J. N., & Sprenger, J. (2011). The logic of explanatory power. Philosophy of Science, 78(1), 105–127. Shanks, D. R., & Dickinson, A. (1988). Associative accounts of causality judgment. Psy­ chology of Learning and Motivation: Advances in Research and Theory, 21(C), 229–261. Siegler, R. S. (1995). How does change occur: A microgenetic study of number conserva­ tion. Cognitive Psychology, 28, 225–273. Sloman, S. A, & Lagnado, D. (2014). Causality in thought. Annual Review of Psychology, 66, 223–247. Spellman, B. A. (1997). Crediting causality. Journal of Experimental Psychology: General, 126(4), 323–348. Thagard, P. (1989). Explanatory coherence. Behavioral and Brain Sciences, 12, 435–502. van Fraassen, B. C. (1980). The scientific image. Oxford University Press. Vasilyeva, N., & Coley, J.C. (2013). Evaluating two mechanisms of flexible induction: Selective memory retrieval and evidence explanation. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th annual conference of the Cognitive Science Society (pp. 3645–3650). Austin, TX: Cognitive Science Society.

Page 30 of 32

Causal Explanation Vasilyeva, N., & Lombrozo, T. (2015). Explanation and causal judgments are differentially sensitive to covariation and mechanism information. In Proceedings of the 37th annual conference of the Cognitive Science Society (pp. 2663–2668). Austin, TX: Cognitive Science Society. Waldmann, M. R., & Hagmayer, Y. (2001). Estimating causal strength: The role of struc­ tural knowledge and processing effort. Cognition, 82(1), 27–58. Waldmann, M. R., & Hagmayer, Y. (2013). Causal reasoning. In D. Reisberg (Ed.), Oxford handbook of cognitive psychology (pp. 733–752). New York: Oxford University Press. Walker, C. M., Lombrozo, T., Legare, C. H., & Gopnik, A. (2014). Explaining prompts chil­ dren to privilege inductively rich properties. Cognition, 133(2), 343–57. Walker, C.M., Lombrozo, T., Williams, J. J., Rafferty, A., & Gopnik, A. (2016). Explaining constrains causal learning in childhood. Child Development. Weiner, B. (1985). An attributional theory of achievement motivation and emotion. Psycho­ logical Review, 92(4), 548–573. Wellman, H. M. (2011). Reinvigorating explanations for the study of early cognitive devel­ opment. Child Development Perspectives, 5(1), 33–38. Wellman, H. M., & Lagattuta, K. H. (2004). Theory of mind for learning and teaching: The nature and role of explanation. Cognitive Development, 19, 479–497. Wellman, H. M., & Liu, D. (2007). Causal reasoning as informed by the early development of explanations. In A. Gopnik & L. Schulz (Eds.), Causal Learning: Psychology, Philosophy, and Computation (pp. 261–279). Oxford: Oxford University Press. Wilkenfeld, D. A., & Lombrozo, T. (2015). Infernece to the Best Explanation (IBE) vs. Ex­ plaining for the Best Inference (EBI). Science and Education, 24(9–10), 1059–1077. Williams, J. J., & Lombrozo, T. (2010). The role of explanation in discovery and generaliza­ tion: Evidence from category learning. Cognitive Science, 34(5), 776–806. Williams, J. J., & Lombrozo, T. (2013). Explanation and prior knowledge interact to guide learning. Cognitive Psychology, 66(1), 55–84. Williams, J. J., Lombrozo, T., & Rehder, B. (2013). The hazards of explanation: Overgener­ alization in the face of exceptions. Journal of Experimental Psychology. General, 142(4), 1006–14. Woodward, J. (2010). Causation in biology: Stability, specificity, and the choice of levels of explanation. Biology & Philosophy, 25(3), 287–318. Wright, L. (1976). Teleological explanations: An etiological analysis of goals and functions. Berkeley: University of California Press. Page 31 of 32

Causal Explanation

Tania Lombrozo

Department of Psychology University of California, Berkeley Berkeley, California, USA Nadya Vasilyeva

Department of Psychology University of California, Berkeley Berkeley, California, USA

Page 32 of 32

Diagnostic Reasoning

Diagnostic Reasoning   Björn Meder and Ralf Mayrhofer The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.23

Abstract and Keywords This chapter discusses diagnostic reasoning from the perspective of causal inference. The computational framework that provides the foundation for the analyses—probabilistic in­ ference over graphical causal structures—can be used to implement different models that share the assumption that diagnostic inferences are guided and constrained by causal considerations. This approach has provided many critical insights, with respect to both normative and empirical issues. For instance, taking into account uncertainty about causal structures can entail diagnostic judgments that do not reflect the empirical condi­ tional probability of cause given effect in the data, the classic, purely statistical norm. The chapter first discusses elemental diagnostic inference from a single effect to a single cause, then examines more complex diagnostic inferences involving multiple causes and effects, and concludes with information acquisition in diagnostic reasoning, discussing different ways of quantifying the diagnostic value of information and how people decide which information is diagnostically relevant. Keywords: diagnostic reasoning, causal inference, uncertainty, causal structure, probabilistic inference

Diagnostic reasoning is ubiquitous in everyday life. A physician diagnoses diseases from observed symptoms. An engineer engages in diagnostic reasoning when trying to identify what caused a plane to crash. A cognitive scientist reasons diagnostically when figuring out if an experimental manipulation proved successful in an experiment that did not yield any of the expected outcomes. A judge makes a diagnostic inference when reasoning how strongly a piece of evidence supports the claim that the defendant has committed the crime. More generally, diagnostic reasoning concern inferences from observed effects to (as yet) unobserved causes of these effects. Thus, diagnostic reasoning usually involves a kind of backward inference, as people typically infer (often unobserved) conditions that existed prior to what they have observed (in contrast to predictive reasoning from causes to effects, which is a kind of forward inference from present conditions or events into the future). Diagnostic reasoning from effect to cause can, therefore, be conceptualized as a

Page 1 of 42

Diagnostic Reasoning special case of inductive inference, in which a datum e, the observed effect, is used to up­ date beliefs about a hypothesis c, the unobserved target cause of the effect. Diagnostic reasoning, as discussed in this chapter, needs to be differentiated from other related kinds of inference. Diagnostic reasoning is tightly connected to explanatory rea­ soning (see Lombrozo & Vasilyeva, Chapter 22 in this volume) and abductive reasoning (Josephson & Josephson, 1996), as all are concerned with reasoning about the causes of observed effects. However, the scope and aim differ in that both explanatory and abduc­ tive reasoning are broader and less constrained. In diagnostic reasoning, as we define it here, the set of potential causes is fixed and known; the target inference is about the presence of (one of) these causes (with the potential goal of an intervention on these causes). In abductive reasoning, by contrast, the set of variables the inference operates on is often not known a priori and has to be actively constructed. In explanatory reason­ ing, the target of the inference (p. 434) is the explanation of the observed effect by means of its causes; the diagnostic inference may be part of it, but other considerations play a role as well (see Lombrozo & Vasilyeva, Chapter 22 in this volume). In this chapter, we discuss diagnostic reasoning from the perspective of probabilistic causal inference. Pearl (2000), Spirtes, Glymour, and Scheines (1993), and Spohn (1976/1978; as cited in Spohn, 2001) laid the foundations with the development of causal Bayes nets theory, which provides a comprehensive modeling framework for a formal treatment of probabilistic inference over causal graphical models. This computational framework has been used to address several theoretical and empirical key issues within a unified account (for overviews, see Rottman & Hastie, 2014; Waldmann & Hagmayer, 2013; Waldmann, Hagmayer, & Blaisdell, 2006; see also chapters in this volume by Cheng & Lu [Chapter 5]; Griffiths [Chapter 7]; Oaksford & Chater [Chapter 19]; Rehder [Chap­ ters 20 and 21]; and Rottman [Chapter 6]). Examples include the formal analysis of differ­ ent measures of causal strength (Griffiths & Tenenbaum, 2005; Lu, Yuille, Liljeholm, Cheng, & Holyoak, 2008), the distinction between inferences based on observations and interventions (Lagnado & Sloman, 2004; Meder, Hagmayer, & Waldmann, 2008, 2009; Sloman & Lagnado, 2005; Waldmann & Hagmayer, 2005), categorization (Rehder, 2003, 2010; Waldmann, Holoyak, & Fratianne, 1995), causal structure learning (Bramley, Lagnado, & Speekenbring, 2015; Coenen, Rehder, & Gureckis, 2015; Mayrhofer & Wald­ mann, 2011, 2015a, 2015b; Steyvers, Tenenbaum, Wagenmakers, & Blum, 2003), and ana­ logical reasoning in causal domains (Holyoak, Lee, & Lu, 2010). The framework of probabilistic inference over causal graphical models has also provided new pathways for the formal analysis of diagnostic reasoning in causal domains. Several computational models of diagnostic inference have been proposed that differ in their the­ oretical assumptions, technical implementation, and empirical scope (Fernbach, Darlow, & Sloman, 2011; Meder, Mayrhofer, & Waldmann, 2014; Waldmann, Cheng, Hagmayer, & Blaisdell, 2008). The remainder of this chapter is structured as follows. We first consider the case of ele­ mental diagnostic reasoning, based on a single causal relation between two events (i.e., Page 2 of 42

Diagnostic Reasoning cause and effect). We discuss different computational models of elemental diagnostic rea­ soning, the issues they address, and their role in empirical research as descriptive or nor­ mative models. In the second part of this chapter, we discuss more complex cases of diag­ nostic inferences, involving multiple causes or effects, from both a theoretical and an em­ pirical perspective. The third section highlights different ways of quantifying the diagnos­ tic value of information and how people decide which information is diagnostically rele­ vant. We conclude by discussing key questions for future research and by outlining path­ ways for developing an empirically grounded and normatively justified theory of diagnos­ tic causal reasoning.

Elemental Diagnostic Reasoning In this section, we focus on the most basic type of diagnostic causal reasoning, which con­ cerns an inference from a single binary effect to a single binary cause. We refer to this kind of diagnostic inference as elemental diagnostic reasoning. Although this most basic type of diagnostic inference seems quite simple compared with real-world scenarios in­ volving a complex network of multiple causes and multiple effects, it highlights a number of critical questions about both how people should reason diagnostically (i.e., what would constitute an adequate normative model) and how people in fact do reason diagnostically (i.e., what would constitute an adequate descriptive model). In the following, we provide an overview of alternative models of diagnostic inference from a single effect to a single cause and the empirical studies that have been used to test the respective models. These accounts provide computational-level models (in Marr’s, 1982, terminology), in that they specify the cognitive task being solved, the information involved in solving it, and the rationale by which it can be solved (Anderson, 1990; Chater & Oaksford, 1999, 2008; for critical reviews, see Brighton & Gigerenzer, 2012; M. Jones & Love, 2011). Our goals are to highlight the ways in which a causal inference perspective provides novel insights into the computational analysis of diagnostic reasoning and to dis­ cuss how different models have informed empirical research.

Simple Bayes: Diagnostic Reasoning with Empirical Probabilities When reasoning from effect to cause, for instance, when assessing the probability of a particular disease given the presence of a symptom, it seems natural to estimate the con­ ditional probability of a cause given the effect. A critical question is how exactly this diag­ nostic probability is inferred. Many researchers have endorsed Bayes’s rule applied to the empirical probabilities as the natural (p. 435) normative—and potentially descriptive— model for computing the diagnostic probability. Let random variables C and E denote a binary cause and binary effect, respectively, and let {c, ¬c} and {e, ¬e} indicate the presence and absence of the cause and the effect event (Figure 23.1 a). Consider a physician examining a sample of 40 patients. Each pa­ tient has been tested for the presence of a certain genetic predisposition (cause event C) and the presence of elevated blood pressure (effect event E). This set of observations Page 3 of 42

Diagnostic Reasoning forms a joint frequency distribution over C and E, which can be represented in a 2 × 2 contingency table (Figure 23.1 b). The conditional probability of the cause given the ef­ fect (i.e., genetic predisposition given elevated blood pressure), P(c|e), can be inferred by using Bayes’s rule:

(1)

where P(c) denotes the prior probability (base rate) of the cause [with P(¬c) = 1− P(c)], P(e|c) is the likelihood of the effect conditional on the presence of the cause, and P(e|¬c) is the likelihood of the effect in the absence of the cause. For the data shown in Figure 23.1 b, the corresponding (frequentist) estimates are P(c) = 20/40 = .5, P(e|c) = 6/20 = . 3, P(e|¬c) = 2/20 = .1, and P(e) = 8/40 = .2. Plugging these numbers into Equation 1 yields P(c|e) = .75. An alternative way of computing the diagnostic probability is to estimate it directly from the observed joint frequencies, the number of cases in which both C and E are present, , and the number of cases in which C is absent and E is present, :

(2)

For the data shown in Figure 23.1 b, this computation yields the same result as applying Bayes’s rule: P(c|e) = 6/(6+2) = .75. Under the simple Bayes account, no reference is made to the causal processes that may have generated the observed data, and no uncertainty regarding the probability esti­ mates is incorporated in the model. This model is strictly non-causal in that it can be ap­ plied to arbitrary hypotheses and data; whether these events refer to causes or effects does not matter (Waldmann & Hagmayer, 2013).

Page 4 of 42

Diagnostic Reasoning

Figure 23.1 (a) A 2 × 2 contingency table for repre­ senting the joint frequency distribution of a binary cause, C = {c, ¬c}, and a binary effect, E = {e, ¬e}. (b) Example data set. Numbers denote frequencies of co-occurrence (e.g., cause and effect were both present in 6 of 40 cases). (c) Causal structure hy­ pothesis S1, the default causal model in power PC theory (bc = prior probability of cause C; wc = causal strength of C; wa = strength of background cause A). (d) Causal structure hypothesis S0, according to which C and E are independent variables, that is, there is no causal relation between candidate cause C and candidate effect E (bc = prior probability of cause C; wa = strength of background cause A).

(p. 436)

Empirical Studies

The simple Bayes model has a long-standing tradition in research on elemental diagnostic reasoning in a broader sense. Starting roughly in the 1950s, psychologists began using this model as a normative, and potentially descriptive, account of sound probabilistic rea­ soning. The most common tasks involved book bag and poker chip (or urn) scenarios with a well-defined statistical structure (e.g., Peterson & Beach, 1967; Phillips & Edwards, 1966). A key question was whether and to what extent people’s intuitive belief revision would correspond to the prescriptions of Bayes’s rule. Many studies found that subjects did take into account the diagnostic impact of the observed data, but to a lesser extent than prescribed by Bayes’s rule (a phenomenon referred to as conservatism; Edwards, 1968). By and large, however, the conclusion was that people have good statistical intu­ itions, leading to the metaphor of “man as intuitive statistician” (Peterson & Beach, 1967). With the advent of the heuristics and biases program (Kahneman & Tversky, 1972, 1973; Tversky & Kahneman, 1974), research on probabilistic inference and elemental diagnos­ tic reasoning continued. However, the studies conducted within this program led to a very different view of people’s capacity for making sound diagnostic inferences. Findings from scenarios such as the lawyer–engineer problem (Kahneman & Tversky, 1973), the cab problem (Bar-Hillel, 1980), and the mammography problem (Eddy, 1982) seemed to indi­ cate that people’s judgments are inconsistent with Bayes’s rule and generally are biased and error prone. Specifically, it was argued that people tend to neglect base rate informa­ tion (i.e., the prior probability of the hypothesis) when reasoning diagnostically. In the Page 5 of 42

Diagnostic Reasoning mammography problem, for example, people were asked to give a diagnostic judgment regarding the posterior probability of breast cancer, based on a verbal description of the prior probability of the disease, P(c), the likelihood of obtaining a positive test result for a woman who has cancer, P(e|c), and the likelihood of a positive test result for a woman who does not have cancer, P(e|¬c). For instance, people were told that the prior probabili­ ty of breast cancer is 1%, the likelihood of having a positive mammogram given cancer is 80%, and the probability of having a positive test result given no cancer is 9.6% (e.g., Gigerenzer & Hoffrage, 1995). Given these numbers, the posterior probability of breast cancer given a positive mammogram is about 8%. In stark contrast, a common finding was that people’s diagnostic judgments of the probability of breast cancer given a posi­ tive mammogram were often much higher than Bayes’s theorem suggests (often around 70%–80%), which was explained by assuming that people do not take into account the low prior probability of having breast cancer in the first place. However, the claim that people neglect base rate information on a regular basis is too strong. Koehler (1996; see also Barbey & Sloman, 2007) critically reviewed the literature, concluding that there are a variety of circumstances under which base rates are appreci­ ated. One important factor is the way in which probabilistic information is presented (e.g., specific frequency formats vs. conditional probabilities), which can facilitate or im­ pede people’s sensitivity to base rate information when making diagnostic inferences. Gigerenzer and Hoffrage (1995; see also Sedlmeier & Gigerenzer, 2001) provided the in­ formation in the mammography problem and several other problems as natural frequen­ cies (i.e., the joint frequencies of cause and effect, such as the number of women who have cancer and have a positive mammogram). Providing information this way facilitates derivation of the diagnostic probability because Equation 2 can be used and base rate in­ formation does not need to be introduced via Bayes’s rule. These findings served as start­ ing point for identifying and characterizing the circumstances under which base rate in­ formation is utilized and have informed more applied issues, such as risk communication in medicine (for a review, see Meder & Gigerenzer, 2014). The question of whether and to what extent people use base rate information has been the focus of many studies on elemental diagnostic reasoning. In contrast, the relation be­ tween causal inference and elemental diagnostic reasoning has received surprisingly lit­ tle attention in the literature, with respect to both normative and descriptive issues. Ajzen (1977) noted that “people utilize information, including information supplied by popula­ tion base rates, to the extent that they find it possible to incorporate the information with­ in their intuitive theories of cause and effect” (p. 312). At that time, however, the neces­ sary tools for a formal treatment of diagnostic reasoning in terms of causal inference were not yet available, so that the exact nature of the interplay between diagnostic rea­ soning and causal representations was left largely unspecified (see also Tversky & Kahne­ man, 1982a, 1982b). Recent theoretical advances in causal modeling have made it possi­ ble to address this issue in a more rigorous way.

Page 6 of 42

Diagnostic Reasoning

Power PC Theory: Diagnostic Reasoning Under Causal Power Assumptions (p. 437)

In contrast to the simple Bayes account, Cheng’s (1997) power PC theory separates the data level (i.e., covariation information) from estimates of causal power that refer to the underlying but unobservable causal relations. The theory assumes that people aim to in­ fer causal strength estimates because one goal of cognitive systems is to acquire knowl­ edge of stable causal relations, rather than arbitrary statistical associations in noisy envi­ ronments. The theoretical assumptions underlying the power PC model instantiate a particular gen­ erative causal structure known as a noisy-OR gate (Glymour, 2003; Pearl, 1988): a com­ mon-effect structure with an observable effect E and two causes, namely an observable cause C and an amalgam of unobservable background causes A, which can independently bring about the effect (graph S1 in Figure 23.1 c). The original version of the power PC model (Cheng, 1997) is equivalent to estimating the probability of C bringing about E (i.e., causal power) in causal structure S1 using maximum likelihood estimates (MLEs) for the parameters derived from the relative frequencies in the data (see Griffiths & Tenenbaum, 2005, for a formal proof). An estimate for the strength of the background cause A, denot­ ed wa, is given by P(e|¬c) in the sample data, as the occurrence of E in the absence of C necessarily has to be attributed to some (unknown) background cause or causes (for mathematical convenience, A is assumed to be constantly present; Cheng, 1997; Griffiths & Tenenbaum, 2005). The observed rate of occurrence of C in the sample, P(c), provides an estimate of the base rate of C, denoted bc. The unobservable probability with which C produces E, its generative causal power, is denoted wc (see Cheng, 1997, for analogous derivations for preventive causal power). This estimate of causal strength is computed from P(e|c) by partializing out the influence of the background causes that may also have generated the effect (Cheng, 1997).1 It can be estimated from the observed relative fre­ quencies by

(3)

Waldmann and colleagues (2008) showed how diagnostic inferences can be modeled in the power PC framework, that is, using the parameters of causal structure S1. Given the causal structure’s parameters and a noisy-OR parameterization, the diagnostic probabili­ ty of candidate cause c given an effect e is given by

(4)

Page 7 of 42

Diagnostic Reasoning If this diagnostic inference is based on maximum likelihood point estimates directly de­ rived from the observed frequencies, the power PC model yields the same numeric pre­ dictions as the simple Bayes approach. For instance, for the data set shown in Figure 23.1 b, the standard power PC account predicts that P(c|e) = .75.2 Thus, although the infer­ ence operates on the causal rather than the data level, the two accounts make the same prediction, namely, that diagnostic judgments should reflect the empirical conditional probability of the cause given the effect in the sample data. A Bayesian variant of the power PC model can be implemented by associating prior distri­ butions with the parameters of structure S1 and updating the parameter distributions in light of the available data via Bayesian updating (Holyoak et al., 2010; Lu et al., 2008). In this case, the predictions of the power PC model do not necessarily correspond to the simple Bayes model, with the specific differences varying as a function of the used prior and sample size (see Meder et al., 2014, for a detailed discussion and example predic­ tions). Bayesian variants of the power PC account allow it to incorporate prior knowledge and expectations of the reasoner into the diagnostic inference task via specific priors over the parameters of structure S1 (Lu et al., 2008) and are also able to quantify (via distribu­ tions over parameters) the amount of uncertainty associated with the parameter esti­ mates of structure S1.

Empirical Studies Krynski and Tenenbaum (2007; see also Hayes, Hawkins, & Newell, 2015) studied the role of causal structure in elemental diagnostic reasoning tasks designed to investigate the use of base rate information, such as the mammography problem (Eddy, 1982; Gigerenzer & Hoffrage, 1995). The question they were interested in was whether people’s diagnostic inferences would be mediated by the match between the provided sta­ tistics and the causal representations that people construct from the task information (e.g., a causal structure with one observed and one unobserved cause, or a causal (p. 438) structure with two observed causes). According to their experiments, when the given sta­ tistics can be clearly mapped onto the structure of the respective mental causal model, people’s diagnostic inferences are more sensitive to normatively relevant variables, such as base rate information. For instance, if in the mammography problem an explicit cause for the false positive rate is provided (e.g., a benign cyst that can also cause a positive mammogram), people’s diagnostic judgments improve substantially relative to the stan­ dard version of the problem in which no causal explanation for the false positive rate is provided. In follow-up research, McNair and Feeney (2014; see also McNair & Feeney, 2015) ex­ plored the role of individual differences. They assessed people’s numeracy, that is, the ability to perform elementary mathematical operations (Cokely, Galesic, Schulz, Ghazal, & Garcia-Retamero, 2012; Lipkus, Samsa, & Rimer, 2001). According to their results, clari­ fying the causal structure among the domain variables seems helpful only for participants with high numeracy skills; the performance of participants with low numeracy did not im­ prove.

Page 8 of 42

Diagnostic Reasoning Fernbach, Darlow, and Sloman (2010, 2011) investigated to what extent people consider the influence of alternative causes in diagnostic reasoning from effect to cause, compared with predictive reasoning from cause to effect. They used a simple causal Bayes net equivalent to the power PC model (structure S1; Figure 23.1 c) as the normative bench­ mark for people’s predictive and diagnostic inferences. To derive model predictions for different real-world scenarios, they elicited participants’ existing causal beliefs about the relevant quantities, that is, the parameters associated with structure S1 (base rate bc, causal strength wc, and strength of alternative causes, wa). For instance, Fernbach and colleagues (2011) asked participants to estimate the prior probability that a mother of a newborn baby is drug addicted, how likely it is that the mother’s drug addiction causes her baby to be drug addicted, and how likely a newborn baby is to be drug addicted if the mother is not. These estimates were then used to derive model predictions for predictive and diagnostic inferences (e.g., estimates for how likely a baby is to be drug addicted giv­ en that the mother is drug addicted, and how likely a mother is to be drug addicted given that her baby is drug addicted). Different methods were used across experiments, such as deriving posterior distributions of P(c|e) and P(e|c) via sampling from participants’ esti­ mates, or generating predictions for each reasoner separately based on his or her individ­ ual estimates. According to their findings, people are more sensitive to the existence and strength of alternative causes when reasoning diagnostically from effect to cause than when making predictive inferences from cause to effect (but see Meder et al., 2014; Tver­ sky & Kahneman, 1982a).

Structure Induction Model: Diagnostic Reasoning with Causal Struc­ ture Uncertainty Although the power PC model operates on causal parameters that are estimated from the observed data (in one way or another), it brings the strong assumption to the task that there is actually a causal link between C and E. The only situation in which the account assumes that there is no causal relation is when P(c|e) = P(c|¬e), and therefore wc = 0. This approach lacks the expressive power to take into account the possibility that an ob­ served contingency in the data [i.e., P(c|e) ≠ P(c|¬e)] is just coincidental. Consider again the data set in Figure 23.1 b. The observed data indicate that the candidate cause (e.g., genetic predisposition) raises the probability of the effect (e.g., elevated blood pressure) from P(e|¬c) = 2/20 = .1 to P(e|c) = 6/20 = .3; accordingly, the estimated causal strength of C is wc = 0.22 (Equation 4). But how reliable is this estimate, given the available data? If the estimate is based on a data sample, it may well be that the observed contingency is merely accidental and not diagnostic for a causal relation. This is similar to a situation in which one tosses a fair coin 40 times—one would not be surprised if the observed number of heads was not exactly 20 but, say, 24. The important point here is that when inductive inferences are drawn based on samples, there is usually uncertainty about whether the observed contingency is indicative of a causal relation or is merely coincidental. The structure induction model of diagnostic reasoning (Meder, Mayrhofer, & Waldmann, 2009, 2014) formalizes the intuition that diagnostic reasoning should be sensitive to the question of whether the sample data warrant the existence of a causal relation between C Page 9 of 42

Diagnostic Reasoning and E. The characteristic feature of the model is that it does not operate on a single causal structure, as the power PC model does (and its isomorphic Bayes nets representa­ tion, i.e., structure S1; Figure 23.1 c). Rather, it also considers the possibility that C and E are, in fact, independent of each other (Anderson, 1990; Griffiths & Tenenbaum, 2005; see also McKenzie (p. 439) & Mikkelsen, 2007), as illustrated with structure S0 in Figure 23.1 d. Importantly, the two causal structures have different implications for the diagnos­ tic inference from cause to effect. Under S1, observing effect E provides (probabilistic) ev­ idence for the presence of the cause, so that P(c|e) > P(c) (except for the limiting case in which P(c|e) = P(c|¬e), and therefore wc = 0). For instance, for the data set in Figure 23.1 b, structure S1 entails that P(c|e) = .71. Note that this value is similar but not identical to the empirical probability of .75, with the divergence resulting from the fact that the ac­ count does not use maximum likelihood but Bayesian estimates (i.e., independent uniform priors over the structures’ parameters are used, which are updated in light of the avail­ able sample data). Structure S0, however, entails a very different value for the diagnostic probability. According to S0, C and E are independent events; therefore observing the presence of E does not increase the probability of C; that is, P(c|e) = P(c). Since in the da­ ta set shown in Figure 23.1 b the cause is present in 20 of the 40 cases, S0 entails P(c|e) = P(c) = .5. To take into account the diverging implications of the causal structures and their relative probability given the data, the structure induction model integrates out the two struc­ tures to arrive at a single estimate for the diagnostic probability. Formally, this is done by weighting the two diagnostic estimates derived from the parameterized structures by the corresponding posterior probability of structures S0 and S1, respectively (i.e., Bayesian model averaging; Chickering & Heckerman, 1997), which are P(S0|data) = .49 and P(S1| data) = .51 in our example (assuming a uniform prior over the structures, i.e., P(S0) = P(S1) = .5). For instance, for the data set in Figure 23.1 b the structure induction model predicts P(c|e; data) = .61, which results from weighting each structure’s diagnostic esti­ mate with the structure’s posterior probability (i.e., .49 × .5 + .51 × .71 = .61). This diag­ nostic probability then reflects the uncertainty with respect to the true underlying causal structure and the uncertainty of the parameter estimates.3 In sum, the structure induction model of diagnostic reasoning takes into account uncer­ tainty regarding possible causal models that may have generated the observed data. As a consequence, the derived diagnostic probabilities can systematically deviate from the em­ pirical diagnostic probability of a cause given an effect in the sample data and the predic­ tions of the simple Bayes account and power PC theory.

Empirical Studies Meder and colleagues (2014) tested the structure induction model using a medical diag­ nosis paradigm in which participants were provided with learning data about the co-oc­ currences of a (fictitious) virus and a (fictitious) symptom. Given this sample data, partici­ pants were asked to make a diagnostic judgment for a novel patient who has the symp­ tom. The studies used nine different data sets, factorially combining different levels of the (empirical) diagnostic probability, P(c|e), with different levels of the (empirical) predictive Page 10 of 42

Diagnostic Reasoning probability, P(e|c). The experimental rationale was to fix the empirical diagnostic proba­ bility but to vary other aspects of the data in order to generate predictions that distin­ guish the structure induction model from the simple Bayes model and different variants of power PC theory. Consider Figure 23.2 a: in all three data sets the base rate of the cause is P(c) = P(¬c) = . 5 and the empirical diagnostic probability is P(c|e) = .75. In contrast, the predictive prob­ ability of effect given cause, P(e|c), and the causal strength of C, wc, vary across the three data sets (from left to right the causal strength estimate increases; Equation 3). Figure 23.2 b shows the models’ predictions for the three data sets. The simple Bayes and power PC model entail the same diagnostic judgment across the data sets, since the empirical diagnostic probability is invariant. The structure induction model, however, makes a very different prediction, entailing diagnostic probabilities that systematically deviate from the empirical diagnostic probability. Specifically, the model predicts an upward trend, yield­ ing an increasing probability of target cause c given the effect e across the data sets. This upward trend results from the posterior probabilities of structures S0 and S1, whose pos­ terior Bayesian estimates vary across the data samples (see Meder et al., 2014, for de­ tails). As a consequence, the inferred diagnostic probability increases when the posterior probability of S1 becomes higher (i.e., when it becomes more likely that the observed con­ tingency is indicative of an underlying causal relation). Empirically, participants’ diagnostic judgments showed the upward trends predicted by the structure induction model, that is, human diagnostic judgments were not invariant for different data sets entailing the same empirical diagnostic probability P(c|e). These stud­ ies demonstrated that people’s diagnostic judgments do not solely reflect the empirical probability of a cause given an effect, but systematically vary as a function of causal structure (p. 440) uncertainty. These findings support the idea that people’s diagnostic in­ ferences operate on the causal level, rather than on the data level, and that their diagnos­ tic inferences are sensitive to alternative causal structures that may underlie the data.

Page 11 of 42

Diagnostic Reasoning

Figure 23.2 Predictions of different computational models of elemental diagnostic inference. (a) Three data sets in which the empirical diagnostic probabili­ ty of a cause given an effect is P(c|e) = .75. The pre­ dictive probability, P(e|c), and the causal strength of C, wc, vary across the data sets (numbers are maxi­ mum likelihood estimates of causal power, based on the empirical probabilities.) (b) Predictions of the structure induction model, the simple Bayes model, and the power PC model, using maximum likelihood estimates (MLEs), for the three data sets. The latter two models predict identical diagnostic probabilities across the data sets, whereas the structure induction model predicts a systematic upward trend, resulting from different structure posteriors entailed by the data samples.

Summary: Elemental Diagnostic Reasoning Different computational models of elemental diagnostic inference share the assumption that the goal of the diagnostic reasoner is to infer the conditional probability of the candi­ date cause given the effect. However, the accounts differ strongly in their theoretical as­ sumptions and the ways in which the diagnostic probability is computed from the avail­ able data. The simple Bayes model, which is usually presumed to provide the rational benchmark in diagnostic reasoning, prescribes that causal judgments should reflect the empirical probability of the cause given the effect in the data. Power PC theory and its isomorphic Bayes net representation conceptualize diagnostic reasoning as an inference on the causal level, using structure S1 as the default structure. The structure induction model advances this idea by considering a causal structure hypothesis according to which C and E are in fact independent events, with the inferred diagnostic probability taking in­ to account the uncertainty about the existence of a causal relation. As a consequence, di­

Page 12 of 42

Diagnostic Reasoning agnostic probabilities derived from the structure induction model can systematically di­ verge from the empirical probability of the cause given the effect.

Diagnostic Reasoning with Multiple Causes and Effects Our discussion thus far has centered on elemental diagnostic inferences from a single ef­ fect to a single cause. In this section, we discuss diagnostic causal reasoning with more complex causal models that (p. 441) can involve multiple causes or effects. For instance, the same symptom could be caused by different diseases, such as a viral or bacterial in­ fection. In this case, a single piece of evidence can have differential diagnostic implica­ tions for different possible causes. Conversely, a viral infection (cause) can generate sev­ eral symptoms (effects), such as headache, fever, and nausea. In this case, different pieces of evidence need to be combined to make a diagnostic judgment about one target cause. In the framework of probabilistic inference over graphical causal models, the causal de­ pendencies in the graph determine the factorization of the joint probability distribution over the domain variables (Pearl, 2000; Spirtes et al., 1993). The factorization follows from applying the causal Markov condition to the graph, which states that the value of any variable in the graph is a function only of its direct causes (its Markovian parents). In other words, conditional on its direct causes, each variable in the model is independent of all other variables, except its causal descendants (i.e., its direct and indirect effects). This causally based factorization implies specific relations of conditional dependence and inde­ pendence for the probability distribution associated with the graph, which facilitate and constrain inferences across multiple variables. Importantly for the present discussion, the particular dependency and independency relations entail specific diagnostic inference patterns when reasoning with different causal structures. In the following, we discuss key issues related to diagnostic reasoning in causal models with multiple causes or effects, focusing on common-effect and common-cause models (see also Rehder, Chapters 20 and 21, and Rottman, Chapter 6, in this volume). Subse­ quently, we address the relation between diagnostic reasoning and information search, which is an important aspect of diagnostic reasoning in a broader sense.

Diagnostic Reasoning with Common-Effect Structures: Explaining Away An important property of diagnostic reasoning in common-effect structures is explaining away (Morris & Larrick, 1995; Pearl, 1988, 2000; Rottman & Hastie, 2014).4 Consider the example of a common-effect structure shown in Figure 23.3 a, according to which C1 = {c1, ¬c1} (e.g., virus present vs. absent) and C2 = {c2, ¬c2} (e.g., bacteria present vs. ab­ sent) are independent, not mutually exclusive, causes of a common effect E = {e, ¬e} (e.g., symptom present vs. absent). Associated with the causal structure is a set of para­ Page 13 of 42

Diagnostic Reasoning meters: the base rates of the two cause events, their respective causal strengths, and the strength of the background cause (not shown). These parameters fully specify the joint probability distribution over the two causes and the effect. Figure 23.3 shows an example data set for 100 cases generated from setting the base rate of each independent cause to .5 and the strength of the background cause to zero (i.e., the effect never occurs when both C1 and C2 are absent). The two causes, virus and bacte­ ria, vary in their causal strength: a virus infection (C1) generates the symptom with a probability of .8, and a bacterial infection (C2) generates the symptom with a probability of .6 (in this example scenario, C1 and C2 are the sole causes of E, i.e., there are no alter­ native background causes; therefore these probabilities correspond to the individual causal power estimates of C1 and C2).5 Assuming a noisy-OR parameterization, the proba­ bility of the symptom is .92 when both causes are present (i.e., P(e|c1, c2) = wc1 + wc2 − wc1wc2 = .8 + .6 − .8 ⋅ .6 = .92). Explaining away occurs in common-effect structures when reasoning diagnostically from the effect to the causes. Since both C1 and C2 are (independent) causes of their common effect, observing the presence of the effect raises the probability of both: If we know that a patient has the symptom, this increases the probability of having a virus as well as of having a bacterial infection. The particular diagnostic probabilities depend on the causes’ base rates and their causal strengths, as well as on the strength of the unobserved back­ ground causes. For instance, for the example data in Figure 23.3 b, P(c1|e) = 43/58 = .74 and P(c2|e) = 38/58 = .66: both causes are equally likely a priori, but C1 is more likely to cause the symptom, so the diagnostic probability for C1 is higher than for C2. This diag­ nostic inference can be modeled by Bayes’s rule using a structure parameterized with conditional probability estimates (Pearl, 1988) or using estimates of causal strength, as similarly discussed in the section on elemental diagnostic reasoning. If available, the di­ agnostic probabilities can also be computed directly from a joint frequency distribution, as done above with the example data in Figure 23.3 b. Explaining away with respect to some target cause occurs when conditioning not only on the effect, but also on the known presence of an alternative cause. In the present sce­ nario, with respect to cause C1, explaining away corresponds to the inequality P(c1|e) > P(c1|e, c2). In words, the diagnostic (p. 442) probability of cause c1 conditional on effect e alone is higher than when conditioning on both the effect e and the alternative cause c2; thus, the presence of c2 explains away some of the diagnostic evidence of e with respect to c1. Consider again the medical scenario: if a patient has the symptom, reasoning diag­ nostically increases the probability of the virus being present. Now imagine you also learn that the patient has a bacterial infection, which is the other of the two possible causes that could have produced the symptom. Intuitively, if we learn that the patient has a bacterial infection this “explains away” (some of) the diagnostic evidence of the symp­ tom regarding the presence of the virus; that is, it reduces the probability of the virus be­ ing present relative to a situation in which we only condition on the effect.

Page 14 of 42

Diagnostic Reasoning

Figure 23.3 Explaining away. (a) A common-effect model with the independent, not mutually exclusive causes C1 = {c1, ¬c1} and C2 = {c2, ¬c2}, virus and bacteria, and an effect E = {e, ¬e}, a symptom. (b) Joint frequency distribution generated from a noisyOR parameterization of the common-effect model, as­ suming no background cause and P(e|c1, ¬c2) = .8 and P(e|¬c1, c2) = .6. (c) Explaining away of c1 across different prior probabilities of the two causes, with P(c1) = P(c2). The difference between P(c1|e) and P(c1|e, c2) is the amount of explaining away, ex­ emplified with the two diagnostic probabilities for P(c1) = P(c2) = .5.

Consider the example data set shown in Figure 23.3 b: Given this joint frequency distribu­ tion, what are the diagnostic probabilities P(c1|e) and P(c1|e, c2)? In other words, how likely is the virus to be present if the symptom is present, and how likely is the virus to be present given (p. 443) the presence of both the symptom and the bacteria? Based on the joint frequency distribution, P(c1|e) = 43/58 = .74; that is, the virus is present in about 74% of the cases in which the symptom is present. The probability of c1 given both e and c2 can be computed analogously, yielding P(c1|e, c2) = 23/38 = .61; that is, the virus is present in about 61% of the cases in which both the symptom and the bacteria are present—the presence of the alternative cause c2 has “explained away” some of the diag­ nostic evidence of e with respect to c1. The amount of explaining away is the difference between the two diagnostic probabilities; that is, P(c1|e) – P(c1|e, c2) = .13. Note that the probability of c1 does not reduce to zero: Because the virus and the bacterial infection are independently occurring causes, the presence of the bacterial infection does not rule out that the patient also has a viral infection—it only makes it less likely than before (see Morris & Larrick, 1995, for a detailed analysis of the conditions of explaining away). In fact, the diagnostic probability is still higher than the base rate of the virus, which is .5 in this example.

Page 15 of 42

Diagnostic Reasoning Figure 23.3 c (cf. Figure 5 in Morris & Larrick, 1995) illustrates a more general case, showing the amount of explaining away for different base rates of the two cause events, under the constraint that P(c1) = P(c2). The causal strengths are fixed to the values as above (i.e., the individual likelihoods are .8 for c1 and .6 for c2, no background cause, noisy-OR parameterization). The curves correspond to the two diagnostic probabilities P(c1|e) and P(c1|e, c2) across different base rates of the two causes, showing how the amount of explaining away varies as a function of the causes’ prior probability. The two dots are the data points from the preceding example, in which P(c1) = P(c2) = .5.

Empirical Studies Empirical research on explaining away in diagnostic causal reasoning with common-ef­ fect structures has yielded mixed findings. While there are many studies on discounting in a broader sense (see Khemlani & Oppenheimer, 2011, for an overview), there are few studies that have directly investigated explaining away from the perspective of inductive causal inference. Morris and Larrick (1995; Experiment 1) investigated whether and to what extent people demonstrate explaining away in a social inference scenario. They used a paradigm by E. E. Jones and Harris (1967), in which the task was to infer the political attitude of the writer of an essay E. For instance, the potential causes of a positive essay about Fidel Castro were a pro-Castro attitude (A) of the writer or the instruction (I) to write a positive essay. This situation can be conceptualized as a common-effect model A→E←I. The inde­ pendence and base rate of I were instructed through a cover story; quantitative model predictions were derived by eliciting participants’ subjective judgments of the other rele­ vant probabilities (e.g., base rates of causes A and I, the prevalence of pro-Castro atti­ tudes and probability of having been instructed to write a pro-Castro essay, and corre­ sponding likelihoods). Explaining away can be tested by comparing judgments for P(A|E), the probability that the writer has a positive attitude given a pro-Castro essay, with P(A|E, I), the probability that the writer has a positive attitude given a pro-Castro essay and given that the writer was instructed to write a positive essay. Consistent with explaining away, lower judgments for P(A|E, I) were obtained than for P(A|E): given a pro-Castro es­ say, participants increased their judgment of the probability that the writer had a proCastro attitude but lowered their judgments when informed that the writer had been in­ structed to write a positive essay. More recent research has tested explaining away in the context of causal Bayes net theo­ ries. Rehder (2014; see also Rehder & Waldmann, in press) used common-effect struc­ tures with two binary causes and one binary effect in different domains, such as econom­ ics, meteorology, and sociology. Participants were taught qualitative causal models based on described causal relations between binarized variables, such as “a low amount of ozone causes high air pressure” or “low interest rates cause high retirement savings.” The instructions also explicated the causal mechanisms underlying these relations (see Rehder, 2014, for details). No quantitative information on the exact parameters of the in­ structed causal networks was provided; the studies focused on the qualitative diagnostic inference patterns. The studies used a forced-choice task in which participants were pre­ Page 16 of 42

Diagnostic Reasoning sented with a pair of situations, corresponding to judgments about P(c1|e) and P(c1|e, c2). The task was to choose in which situation a target cause C1 was more likely to take a par­ ticular value: when only the state of the effect was known, or when both the effect and the alternative cause were known. If people’s inferences exhibit explaining away, they should prefer the former over the latter, corresponding to the inequality P(c1|e) > P(c1|e, c2). Human behavior was at variance with explaining away; in fact, participants tended to exhibit the opposite pattern [i.e., choosing P(c1|e, c2) over P(c1|e)]. Rottman and Hastie (2015; see also Rottman & Hastie, 2016) investigated explain­ ing away using a learning paradigm in which participants observed probabilistic data generated from a parameterized common-effect model with binary variables. Quantitative predictions for patterns of explaining away were derived from the parameterized causal model. However, people’s inferences were inconsistent with the model predictions, and most of the diagnostic judgments did not exhibit explaining away. (p. 444)

Summary The currently available evidence on explaining away in human reasoning with common-ef­ fect models is limited. While some studies observed explaining away, others found diag­ nostic inference patterns at variance with explaining away. These are critical findings for adopting causal Bayes net theories as a modeling framework for human causal induction and diagnostic inference. Further empirical research is needed to identify and character­ ize the circumstances under which human diagnostic reasoning is sensitive to explaining away.

Diagnostic Inference in Common-Cause Structures: Sequential Diag­ nostic Reasoning In many diagnostic-reasoning situations, such as medical diagnosis, several pieces of evi­ dence (e.g., results of different medical tests) are observed sequentially at different points in time. In this case, multiple effects are used to reason about the presence of an underlying cause (e.g., a disease) constituting a common-cause structure (Figure 23.4). Sequential diagnostic inferences also raise the question of possible order effects (Hoga­ rth & Einhorn, 1992), such as those resulting from temporal weighing of the sequentially acquired information (e.g., primacy or recency effects). Hayes, Hawkins, Newell, Pasqualino, and Rehder (2014; see also Hayes et al., 2015), drawing on the work of Krysnki and Tenenbaum (2007) discussed earlier, explored se­ quential diagnostic reasoning in the mammography problem (Eddy, 1982). In the stan­ dard version of the problem, participants are presented with a single piece of evidence, a positive mammogram, and are asked to make an inference about the probability of the target cause, breast cancer. In the studies by Hayes and colleagues, diagnostic judgments based on one versus two positive test results from two different machines were elicited. The crucial manipulation concerned information on possible causes of false-positive re­ sults. In the non-causal condition, participants were merely informed about the relative frequency of false positives (e.g., that 15% of women without breast cancer had a positive Page 17 of 42

Diagnostic Reasoning mammogram). In this situation, the false-positive rates of the two machines are assumed to be independent of each other, so that the second mammogram provides additional di­ agnostic evidence (i.e., participants’ diagnostic judgments regarding the target cause, breast cancer, should further increase relative to diagnostic judgments based on a single test result). In the causal condition, participants received the same statistical information but were also told about a possible alternative cause that can lead to false positives, a be­ nign cyst. The underlying rationale was that the benign cyst would constitute a stable common cause within a tested person, so that a second positive mammogram provides lit­ tle diagnostic value over the first one. Participants’ diagnostic judgments closely resem­ bled these predictions: in the non-causal condition the second mammogram was treated as providing further diagnostic evidence, raising the probability of the target cause rela­ tive to the situation with just a single positive test result. By contrast, in the causal condi­ tion the second positive mammogram had very little influence on diagnostic judgments. These findings show that people are sensitive to the causal underpinnings of different sit­ uations and their implications for probabilistic diagnostic inferences. Meder and Mayrhofer (2013) investigated sequential diagnostic reasoning with a com­ mon-cause model consisting of a binary cause (two chemicals) and four binary effects (dif­ ferent symptoms, e.g., fever and headache). They presented participants with a series of three symptoms, one after the other, with a diagnostic judgment required after each piece of evidence. Information on the individual cause–effect relations was given either in a nu­ merical format (e.g., “Chemical X causes symptom A in 66% of the cases”) or in verbal frequency terms (e.g., “Chemical X frequently causes symptom A”). Diagnostic probabili­ ties for the verbal reasoning condition were derived using the numerical equivalents of the used verbal terms from an unrelated study (Bocklisch, Bocklisch, & Krems, 2012; see Mosteller and Youtz, 1990, for an overview). The diagnostic task for participants was to estimate the posterior probabilities of the two causes, given all observed effects so far. In this study, people’s sequential diagnostic inferences were remarkably accurate, with judg­ ments closely (p. 445) tracking the diagnostic probabilities derived from the parameter­ ized common-cause model. This was the case regardless of whether information on the cause–effect relations was provided numerically or through rather vague verbal frequen­ cy terms. This finding is also interesting with respect to studies showing Markov viola­ tions (see the following discussion), because participants’ diagnostic judgments were very close to the predictions of a common-cause model in which the effects are independent given the cause. Finally, the study points to interindividual differences regarding the tem­ poral weighting of evidence in sequential diagnostic reasoning. For instance, when previ­ ously observed symptoms had to be recalled from memory, the judged diagnostic proba­ bilities reflected a stronger influence of the current evidence, relative to earlier observed symptoms.

Page 18 of 42

Diagnostic Reasoning

Figure 23.4 Information search scenario in a com­ mon-cause model with a binary cause C and two bina­ ry effects, E1 and E2. (a) Parameterized causal struc­ ture; numbers denote unconditional and conditional probabilities. (b) Joint frequency distribution of 100 cases generated from the parameterized causal mod­ el. (c) Diagnostic tree. On the first step, the diagnos­ tic reasoner has to decide which symptom to query (i.e., fever or nausea). On the next step, information about the state of the symptom is obtained, with the numbers referring to the probability of the different states. For instance, the probability that a patient has fever is .55, and the probability that a patient has nausea is .17. The bottom of the tree shows the diag­ nostic probabilities entailed by the symptom status. For instance, if a patient has fever, the probability of the virus being present is .49; if the patient has no fever, the probability of the virus being present is . 07.

Rebitschek, Bocklisch, Scholz, Krems, and Jahn (2015; see also Jahn & Braatz, 2014; Jahn, Stahnke, & Rebitschek, 2014; Rebitschek, Krems, & Jahn, 2015) investigated order ef­ fects in sequential diagnostic reasoning more closely. They used a medical diagnosis task with four chemicals as possible causes and six symptom categories, with each category including two symptoms (e.g., “twinge” and “sting” belonged to the category “pain”). Par­ ticipants were presented with four sequentially presented (p. 446) symptoms, with the symptom sequences designed to examine possible order effects (e.g., whether it matters which of two hypotheses was supported more strongly by the first symptom, even if the total diagnostic evidence supported them equally). The diagnostic task was to choose the chemical that was most likely to have caused the symptom(s). Diagnostic judgments were obtained either after participants saw the full sequence of symptoms, or judgments were obtained after each symptom (see Hogarth & Einhorn, 1992, for a discussion of different elicitation methods with respect to order effects). Diagnostic judgments were not invari­ ant with respect to presentation order, with the diagnoses often being influenced by the Page 19 of 42

Diagnostic Reasoning initially presented piece of evidence. This primacy effect was mediated by the testing pro­ cedure: diagnostic judgments after the full symptom sequence showed a strong primacy effect, whereas when participants were asked to rate their diagnostic beliefs after each symptom, the final diagnosis was only weakly influenced by the initially observed symp­ tom. Moreover, the influence of late symptoms was revealed (i.e., recency effects).

Summary and Discussion Diagnostic reasoning in common-cause models has been primarily investigated from the perspective of order effects. The exact nature of order effects, the conditions under which they occur, and how they can be formally modeled from the perspective of causal infer­ ence remain important issues for future research (see also Trueblood & Busemeyer, 2011). In common-cause models, it is assumed that the effects are conditionally independent of each other given their common cause (i.e., Markov property), such that they provide inde­ pendent evidence for the cause. (In the machine-learning literature, this property is re­ ferred to as class-conditional independence of features, implemented in the naïve Bayes classifier; see Domingos & Pazzani, 1997; Jarecki, Meder, & Nelson, 2016.) Making this assumption strongly simplifies the diagnostic inference process, because the number of estimates required to parameterize the causal structure is greatly reduced. However, a growing body of research on human causal reasoning shows that people’s inferences in related tasks, such as (conditional) predictive causal reasoning, do not honor the Markov condition (Mayrhofer & Waldmann, 2015a; Park & Sloman, 2013; Rehder, 2014; Rehder & Burnett, 2005; Rottman & Hastie, 2015; Walsh & Sloman, 2008; but see Jarecki, Meder, & Nelson, 2013; von Sydow, Hagmayer, & Meder, 2015): typically people seem to expect a stronger correlation between effects of a common cause than normatively justified. These findings raise the question to what extent and under what conditions human causal rea­ soning is consistent with the Markov condition and the entailed dependency and indepen­ dency relations that should guide and constrain diagnostic inferences.

Diagnostic Reasoning and Information Search How do people decide what information is diagnostically relevant? So far our discussion has focused on situations in which the reasoner makes diagnostic inferences from one or more effects to possible causes. In many circumstances, however, diagnostically relevant information needs to be actively acquired before making a diagnostic inference, such as when deciding which medical test to conduct. A key theoretical question is how to quantify the diagnostic value of possible information queries (Nelson, 2005). Different models of the value of information have been proposed in the literature, based on a probabilistic framework. The models entail different types of informational utility functions that quantify the diagnostic value of a datum (e.g., the out­ come of a medical test; Benish, 1999) according to some formal metric, such as expected reduction in uncertainty or expected improvement in classification accuracy. In the fol­ Page 20 of 42

Diagnostic Reasoning lowing, we introduce key ideas pertaining to diagnostic causal reasoning and discuss the application of information-theoretic concepts in empirical research.

Quantifying Diagnostic Value Consider a medical scenario in which a virus (binary cause event C) probabilistically gen­ erates two symptoms, fever (E1) and nausea (E2). This scenario can be represented as a common-cause structure (Figure 23.4 a). The parameters associated with the causal structure are unconditional and conditional probabilities.6 The virus has a base rate of P(virus) = .3 and generates fever and nausea with likelihoods P(fever|virus) = .9 and P(nausea|virus) = 1/3. The symptoms can also occur in the absence of the virus, with P(fever|¬virus) = .4 and P(nausea|¬virus) = .1. Figure 23.4 b shows an example data set of 100 cases, generated from the parameterized common-cause model. Now imagine a physician diagnosing a new patient. It is unknown if the patient has fever or nausea, but the doctor can acquire information about the symptoms. Is it more useful to find out (p. 447) about the presence or absence of fever or nausea, respectively? Note the crucial difference in the diagnostic reasoning scenarios considered so far, where the diagnostic inference was based on knowing the state of the effect. In the present sce­ nario, the critical question is which query is more useful to conduct, with the outcome be­ ing uncertain. For instance, when testing the patient for fever there are two possible out­ comes, namely, fever or no fever. Both states have implications for the diagnostic infer­ ence about the virus, but prior to gathering information the state of the effect is uncer­ tain. Since the virus is causally related to both fever and nausea, learning about either of them provides diagnostic information about the presence of the virus. This is illustrated in the diagnostic tree in Figure 23.4 c, which shows the probability of observing the different symptom states, as well as the resulting posterior probabilities of the cause. For instance, if testing for the presence of fever (left branch), the probability that the patient has fever is .55, in which case the probability of the virus being present will increase to .49. Con­ versely, if the patient does not have fever, which happens with probability .45, the posteri­ or probability of the virus being present is .07. (These probabilities can be computed from the parameterized causal model via Bayes’s rule or directly from the joint frequencies in Figure 23.4 b.) But which query has higher diagnostic value: Is it better to test for the presence of fever or for the presence of nausea? The answer to this question crucially depends on how we value a query’s outcome. Different measures for quantifying the usefulness of a datum (e.g., outcome of a medical test) have been suggested in statistics, philosophy of science, and psychology (for reviews, see Crupi & Tentori, 2014; Nelson, 2005). Typically, the dif­ ferent measures are based on a comparison of the prior versus posterior probability dis­ tributions, for each possible outcome of a query (a pre-posterior analysis, in the terminol­ ogy of Raiffa & Schlaifer, 1961). The expected usefulness of a query Q (e.g., a medical test) is computed by weighting the usefulness of each possible query outcome by its prob­ Page 21 of 42

Diagnostic Reasoning ability of occurrence. In the present example there are two queries, referring to gather­ ing information about whether the patient has fever or nausea, with each query having two possible outcomes (e.g., fever present or absent). Importantly, alternative measures of the value of information are not formally equivalent, as they rank the usefulness of possible diagnostic queries differently (Nelson, 2005). To il­ lustrate, we here focus on two prominent measures: information gain (Lindley, 1956), which values queries according to the expected reduction in uncertainty, measured via Shannon (1948) entropy, and probability gain, which values queries according to the ex­ pected improvement in classification accuracy (Baron, 1985). Information gain quantifies the usefulness of a datum by the expected reduction in Shan­ non entropy.7 (Note that in the expectation, information gain is equivalent to KullbackLeibler, 1951, divergence, although the usefulness of individual outcomes may differ.) In the current scenario, to compute the information gain of, say, testing a patient for the presence of fever, the posterior entropy of the cause’s distribution given the two possible test outcomes (fever vs. ¬fever) is considered. The information gain of a test outcome (which can be positive or negative) is the difference between the entropy of the prior dis­ tribution and the entropy of the posterior distribution, conditional on the status of the ef­ fect. The expected information gain is then computed by weighting the (positive or nega­ tive) gain of each possible outcome of the query by the probability of observing the out­ come. Given the parameters of the common-cause model, the expected information gain of testing for fever is 0.172 bits. In other words, learning whether the patient has fever will, in the expectation, reduce the diagnostic reasoner’s uncertainty about the virus by 0.172 bits.8 The analogous calculation for the alternative effect, nausea, yields an expect­ ed information gain of 0.054 bits. Thus, from the perspective of uncertainty (entropy) re­ duction, testing a patient for the presence of fever is more useful than testing for the presence of nausea, because the former entails a higher reduction in Shannon entropy. A different model for quantifying the usefulness of diagnostic tests is probability gain (Baron, 1985), which values information by the expected improvement in classification accuracy (Nelson, McKenzie, Cottrell, & Sejnowski, 2010). Formally, this measure is based on the difference in accuracy prior to conducting a query versus accuracy after conducting a query. Consider a patient drawn randomly from the data sample in Figure 23.4 b. If the goal is classification accuracy, one should predict the most likely hypothesis, namely, that the patient does not have the virus, because the virus is present in only 30% of the cases (see Meder & Nelson, 2012, for analyses of scenarios with situation-specific payoffs). In other words, the probability of making a correct classification decision is .7 prior to obtaining any information about the effects (symptoms). Can a higher accuracy be expected if testing the patient for fever or nausea? The basic rationale is the same as with the information gain model. First, the posterior distrib­ (p. 448)

ution of the cause given each state of the effect is considered. For instance, when fever is present, accuracy decreases to .51, because 51% of patients with fever do not have the virus. By contrast, when the patient does not have fever, accuracy increases to .93, be­ Page 22 of 42

Diagnostic Reasoning cause in 93% of the cases the patient does not have the virus. To compute the overall probability gain of the query, an expectation is computed by weighting each outcome’s gain by the probability that a patient does or does not have fever. Interestingly, the ex­ pected probability gain of testing a patient for the presence of fever in our example is ze­ ro.9 Thus, from the perspective of the probability gain model this query is useless. By con­ trast, the same computations for the second effect, nausea, give a probability gain of .03, that is, testing a patient for the presence of nausea will, on average, increase classifica­ tion accuracy by 3%. Thus, a diagnostic reasoner who aims to increase classification ac­ curacy should find out whether the patient has nausea. By contrast, a diagnostic reasoner who aims to reduce uncertainty should find out whether a patient has fever, because this query entails the higher expected reduction in Shannon entropy. This divergence between different models of the value of information is critical because it highlights that the usefulness of possible queries depends on which metric is used to quantify the diagnostic value of information. In the scenario considered here, if the goal is to reduce uncertainty (measured via Shannon entropy) about the virus, the diagnostic reasoner should test for the presence of fever. By contrast, if the goal is classification ac­ curacy, the diagnostic reasoner should test for the presence of nausea. (Similar diver­ gences hold for other models of the value of information; see Nelson et al., 2010.)

Empirical Studies Different models of the value of information have been used to explain human behavior on a variety of cognitive tasks involving active information acquisition (Austerweil & Grif­ fiths, 2011; Baron & Hershey, 1988; Markant, Settles, & Gureckis, 2015; Meder & Nelson, 2012; Meier & Blair, 2013; Nelson et al., 2010; Nelson, Divjak, Gudmundsdottir, Mar­ tignon, & Meder, 2014; Rusconi & McKenzie, 2013; Wells & Lindsay, 1980). For instance, Oaksford and Chater (1994) re-analyzed Wason’s (1968) selection task from the perspec­ tive of inductive probabilistic inference, arguing that human behavior is inconsistent with the classic logico-deductive analysis but constitutes rational behavior from the perspec­ tive of active information sampling (Oaksford and Chater used Shannon entropy to quan­ tify the usefulness of queries; Nelson, 2005, showed that alternative models of the value of information yield similar predictions). Crupi, Tentori, and Lombardi (2009) provided an analysis of the pseudodiagnosticity paradigm (Doherty, Mynatt, Tweney, & Schiavo, 1979) —a task that has been interpreted to demonstrate flawed human thinking regarding the diagnostic value of information. Crupi and colleagues showed that this interpretation re­ lies on a specific model for computing diagnostic value, and that participants’ behavior is, in fact, consistent with seeking high-probability-gain information. Most of these studies have not explicitly adopted a causal modeling framework, but there are important connections between key theoretical ideas. Nelson and colleagues (2010; see also Meder & Nelson, 2012) examined information search in a classification task. First, participants learned about the statistical structure of the environment in a trial-bytrial learning procedure, categorizing artificial biological stimuli into one of two classes based on two binary features. The generative model underlying the task environment cor­ responds to a common-cause structure, in which the likelihoods of the features are condi­ Page 23 of 42

Diagnostic Reasoning tionally independent given the true class. This situation is analogous to the preceding common-cause scenario, with the class corresponding to the cause variable and the stimuli’s features corresponding to its effects. In a subsequent search task, learners could query one of the two features to obtain information before making a classification deci­ sion. The structure of the environment was such that one query would improve classifica­ tion accuracy (i.e., had higher probability gain), whereas the alternative query was more useful from the perspective of information gain (or some other model of the value of infor­ mation; Nelson and colleagues considered several models from the literature). Across several experiments, participants’ search behavior was best accounted for by probability gain. The studies also highlight the importance of how information about the relevant probabilities is conveyed. A clear preference for the diagnostic query with the higher probability gain was only obtained when people learned about the statistical structure of the environment through experience, whereas conveying probability information (base rates and likelihoods) through words and (p. 449) numbers was not very helpful for identi­ fying the higher-probability-gain query, with search decisions often being close to chance level (see also Meder & Nelson, 2012).

Summary There is a rich theoretical literature on quantifying the diagnostic value of information queries. Different models have been suggested, based on different assumptions about what makes information valuable with respect to the goals of the diagnostic reasoner. An important insight is that different models can make similar predictions in many statistical environments (Nelson, 2005), which highlights the need for carefully designed experi­ ments that allow researchers to disentangle competing models (Meder & Nelson, 2012; Nelson et al., 2010). This is also an important issue for the normative analysis of human search behavior and people’s sensitivity to the diagnostic value of queries (e.g., Crupi et al., 2009; Oaksford & Chater, 1994). Most empirical studies on information search have not explicitly adapted a causal model­ ing framework, but there are relations in terms of the generative models that have been used (e.g., the close relation between the independency relations in common-cause mod­ els and the notion of class-conditional independence, which can be considered a special case of the Markov condition). More recently, empirical studies on causal structure induc­ tion have applied different models of the value of information (see also Rottman, Chapter 6 in this volume). Steyvers and colleagues (2003) explored different variants of models based on information gain to predict intervention decisions on causal networks. Bramley et al. (2015) considered different models besides entropy reduction for quantifying the usefulness of interventions in causal structure learning. Coenen and colleagues (2015) contrasted information gain with a positive test strategy (e.g., Klayman & Ha, 1987) in structure induction. These studies provide pathways for future research by bringing to­ gether information-theoretic ideas of the diagnostic value of information with studies on human causal reasoning.

Page 24 of 42

Diagnostic Reasoning

General Discussion The goal of this chapter was to discuss diagnostic reasoning from the perspective of causal inference. The computational framework that provides the foundation for our analyses, probabilistic inference over graphical causal models, makes it possible to imple­ ment a variety of different models that share the assumption that diagnostic inferences are guided and constrained by causal considerations. The first part of this chapter high­ lighted that causal-based models of diagnostic inference can make systematically differ­ ent predictions from purely statistical accounts, such as the simple Bayes model. This is a critical insight for both the normative and descriptive analysis of human diagnostic rea­ soning, regardless of whether computational (or “rational”) models of cognition (in the sense of Marr, 1982, and Anderson, 1990) are treated as normative standards or psycho­ logical theories of human behavior (McKenzie, 2003). In the second part, we discussed more complex diagnostic inferences involving multiple causes or multiple effects. A causal-model-based factorization of probability distributions entails specific relations of conditional dependence and independence among the domain variables, which constrain diagnostic inferences when reasoning with more complex causal models. The third sec­ tion considered the question of how to quantify the diagnostic value of information. De­ ciding what information is diagnostically relevant is a key issue in diagnostic reasoning, and future research should aim to explicate the relations between models of diagnostic inference, measures of the value of information, and human information-acquisition strategies in the context of diagnostic causal reasoning.

Key Issues for Future Research The analysis of diagnostic reasoning from the perspective of causal inference has provid­ ed a number of novel theoretical insights and guided empirical research on people’s diag­ nostic reasoning. In the following, we discuss theoretical and empirical key issues that should be addressed in future work.

The Indeterminacy of Rational Models The development of the framework of probabilistic inference over graphical causal mod­ els (Pearl, 2000; Spirtes, Glymour, & Scheines, 1993) has advanced research on human causal reasoning, from both a theoretical and an empirical perspective. One way to think about the relation between the general modeling framework and particular models is in terms of the “building blocks” that can be used to characterize existing models or develop new accounts. One differentiating feature concerns the question of parameter estimation and representation, that is, whether a particular model uses maximum likelihood esti­ mates (e.g., Cheng, 1997; Fernbach et al., 2011) or distributions over parameters (e.g., Holyoak et al., 2010; Lu et al., 2008; (p. 450) Meder et al., 2014). In the case of elemental diagnostic reasoning, a power PC model based on maximum likelihood estimates directly derived from the data makes the same predictions as the simple Bayes model. A Bayesian power PC model, in contrast, leads to different predictions, depending on what kinds of priors are used. Examples include uniform priors, which the structure induction model Page 25 of 42

Diagnostic Reasoning uses, the “sparse-and-strong” prior suggested by Lu and colleagues (2008), and a “suffi­ ciency prior” formalizing a tendency to assume that causal relations are almost determin­ istic (i.e., high causal strengths; see Mayrhofer & Waldmann, 2011, 2015b; Yeung & Grif­ fiths, 2015). Another key issue concerns the question of structure uncertainty. One idea is to use a single causal structure whose parameters are estimated from data (in one way or another); another idea is to consider multiple causal structures that may have generated the data. These considerations can lead to quite different model behavior, as exemplified by the diverging predictions of the power PC account and the structure induction model. Another issue that we have not discussed so far concerns the functional form of the con­ sidered causal structures, that is, how multiple causes combine to generate an effect. We focused on a noisy-OR parameterization as a default functional form for how multiple causes produce the effect (Cheng, 1997; Pearl, 1988), but different functional forms are plausible in other circumstances (Novick & Cheng, 2004; Waldmann, 2007). For instance, a causal model of a food allergy may state that two ingredients (e.g., peanuts and raisins) are jointly necessary to produce an allergic shock. Thus, the assumed functional form constitutes another building block, with the question being whether the functional form is fixed or is assumed to be part of the inference problem (Lucas & Griffiths, 2010). The upshot is that it is important to distinguish a computational modeling framework such as probabilistic inference over graphical causal models from the specific model in­ stantiations, which can strongly differ in their scope and predictions. The framework sup­ ports the development of different computational models that can be tested empirically, but the framework itself is not subject to direct empirical tests. A possible exception might be to test the psychological validity of central theoretical assumptions, such as people’s sensitivity to particular dependency and independency relations, based on the Markov condition (e.g., conditional independence in common-cause models or explaining away in common-effect models). In sum, research should be guided by competitive model testing. Instead of comparing human behavior to a single “rational” model (Anderson, 1990), multiple (rational or other­ wise) models should be considered and evaluated with respect to their psychological va­ lidity and normative desirability.

From Diagnostic Probabilities to Estimates of Causal Responsibility Models of diagnostic reasoning typically assume that the computational goal is to infer the probability of a cause given an effect. Another plausible goal of a reasoner might be a judgment of causal responsibility (or causal attribution). Such a diagnostic judgment refers to the probability that the occurrence of effect E was in fact brought about by tar­ get cause C, which is different from the diagnostic conditional probability P(c|e) (Cheng & Novick, 2005). Consider a medical diagnosis scenario regarding the causal relation between a genetic predisposition and elevated blood pressure. Assume that a study tests 100 patients and finds that 50 have a genetic predisposition, so the cause’s empirical base rate in the sam­ ple is P(c) = P(¬c) = .5. Of the 50 patients with a genetic predisposition, 30 have elevated Page 26 of 42

Diagnostic Reasoning blood pressure, so P(e|c) = .6. On the other hand, 30 of the 50 patients without the genet­ ic predisposition also have elevated blood pressure; that is, P(e|¬c) = .6. These estimates suggest that having a genetic predisposition does not raise the probability of elevated blood pressure, which implies that the causal strength is zero (Equation 3). In this case, the probability that a patient from the sample with elevated blood pressure has the genet­ ic predisposition is 50%, as P(c|e) = P(c) = .5. A different diagnostic inference concerns the probability that the genetic predisposition is causally responsible for the elevated blood pressure. Intuitively, the answer to this ques­ tion is very different: if there does not exist a causal relation, then the probability that the genetic predisposition is causally responsible for the elevated blood pressure is zero. (Note that the difference between estimates of diagnostic probability and estimates of causal responsibility holds not only when the data indicate that there is no causal relation from C to E, but also in situations in which there is a relation, i.e., wc > 0.) The difference between estimates of conditional probability and causal responsibility is intuitively plausible, but a purely statistical account of diagnostic reasoning (i.e., the sim­ ple Bayes model) lacks (p. 451) the expressive power for providing a formal treatment of causal responsibility. Cheng and Novick (2005; see also Holyoak et al., 2010) showed how to formally derive different estimates of causal responsibility within power PC theory. Their analyses can also be incorporated into the structure induction model of diagnostic reasoning, which allows for deriving estimates of causal responsibility that take into ac­ count structure uncertainty (Meder et al., 2014). For instance, in the case of elemental di­ agnostic reasoning, an estimate of causal responsibility is computed separately under structures S0 and S1. According to structure S0 there is no causal relation between C and E; therefore this structure entails that estimates of causal responsibility are zero. Under structure S1, the model’s parameters are used to derive an estimate of causal responsibil­ ity, as in power PC theory. The final step is to integrate out the two causal structures, with the resulting estimate of causal responsibility depending on the relative posterior probabilities of the two structures. The critical point is that depending on the goal of the diagnostic reasoner, different quan­ tities are of interest, and these can systematically diverge from each other. A common as­ sumption in the normative and descriptive analysis of diagnostic reasoning is that the computational goal is to assess the conditional probability of a cause given an effect. However, sometimes it might be more appropriate to ask for the probability that the ef­ fect was indeed brought about by the target cause. With respect to empirical studies, little is known about the extent to which human diag­ nostic reasoners are sensitive to the distinction between different types of diagnostic in­ ferences. Holyoak and colleagues (2010) used a Bayesian variant of the power PC causal attribution account to model predictive inferences and causal attribution judgments in the context of analogical reasoning. Meder and colleagues (2014) examined the power PC model of causal responsibility in the context of elemental diagnostic reasoning, in which the goal is to infer the conditional probability of a cause given an effect, but their studies Page 27 of 42

Diagnostic Reasoning were not specifically designed to investigate different types of diagnostic inferences. Stephan and Waldmann (2016) tested which model of causal responsibility best accounts for human judgments, pitting the standard power PC model and a Bayesian variant of it against the structure induction model. The results of three studies supported the struc­ ture induction model of causal responsibility, showing that people’s judgments of causal responsibility are sensitive to causal structure uncertainty. These findings provide path­ ways for future research on different kinds of diagnostic inferences.

Diagnostic Hypothesis Generation Throughout this chapter, we discussed diagnostic reasoning in situations in which the set of variables and their causal roles (i.e., causes vs. effects) were predefined and well speci­ fied. While this assumption eases theoretical analysis as well as experimental considera­ tion, in most real-world situations diagnostic inferences are embedded in a complex web of causally related and often unknown variables. This naturally raises the question of how diagnostic inferences might be carried out under such circumstances and how the causal structures on which the inferences operate are learned or determined. The most unconstrained way is to infer the relevant causal structure (i.e., causal roles of the variables, as well as the relations between them) directly from data, such as patterns of co-occurrences, as the causal structure of the world imposes constraints on the data that can potentially be observed (e.g., Gopnik et al., 2004; Steyvers et al., 2003). Howev­ er, the number of possible causal structures that have to be taken into account grows ex­ ponentially with the number of variables considered, which poses several computational challenges. In addition, humans have been shown to fail in contingency-based structure induction, even in quite simple cases (Fernbach & Sloman, 2009; White, 2006), unless specific constraints or assumptions are met (e.g., determinism; Deverett & Kemp, 2012; Mayrhofer & Waldmann, 2015b; Rothe, Deverett, Mayrhofer, & Kemp, 2016). From a psychological perspective, it seems plausible that humans consider only a subset of possible causal structures, with different cues to causality constraining the hypothesis space, such as temporal information (Lagnado & Sloman, 2004, 2006), hierarchical event structures (Johnson & Keil, 2014), linguistic markers in causal language (Mayrhofer & Waldmann, 2015a), and prior knowledge (Waldmann, 1996). In line with this idea, Grif­ fiths and Tenenbaum (2007; Tenenbaum, Griffiths, & Niyogi, 2007) proposed a “causal grammar” that specifies the variables that form the causal structure, the possible rela­ tions between the domain variables (i.e., their causal roles), and the functional form of the considered relations. This knowledge is at a higher level of abstraction than a specific causal structure hypothesis, much like a grammar in language constrains the set of possi­ ble sentences. This approach can be formalized as a hierarchical (p. 452) Bayesian model, in which abstract knowledge about the domain generates and constrains the hypothesis space over the causal structures, thereby addressing the problem of combinatorial explo­ sion (Griffiths & Tenenbaum, 2009).

Page 28 of 42

Diagnostic Reasoning A very different approach addresses the question of diagnostic hypothesis generation and evaluation from the perspective of memory processes and cue recall. According to the Hy­ Gene model (Thomas, Dougherty, Sprenger, & Harbison, 2008), diagnostic hypotheses in long-term semantic memory are activated, matched against sampled probes (i.e., previ­ ously encountered and stored diagnosis–cue sets) from episodic memory, and then poten­ tially placed in working memory (constituting the set of leading contending hypotheses) in an iterative fashion. In the end, the diagnostic judgment for the target hypothesis is computed relative to the memory strengths of the alternatives in working memory. This model is non-causal in nature, as the causal relations between hypothesis (to-be-diag­ nosed causes) and cues are not relevant for judgments; essentially it can be applied—just like the simple Bayes model—to any arbitrarily related set of hypothesis and data.

Concluding Remarks Research on diagnostic reasoning has a long tradition in psychology. Much of the litera­ ture on judgment and decision-making has focused on the conditions under which people utilize base rate information and make judgments in accordance with a simple statistical model, Bayes’s rule. We think it is time to take a fresh look at the problem of diagnostic reasoning from the perspective of causal inference under uncertainty. The framework of probabilistic inference over graphical causal models provides a strong formal foundation for modeling diagnostic reasoning, and a variety of empirically testable models can be re­ alized within this computational framework. The discussion on elemental diagnostic rea­ soning illustrates that there is not a single normative benchmark for diagnostic reasoning under uncertainty against which human behavior can be evaluated, but that different ideas exist about what may constitute an appropriate standard of rational inference from effect(s) to cause(s). Importantly, these models make diverging predictions, for instance, on whether a diagnostic judgment should reflect solely the observed probability of a cause given an effect. A key goal for future research is to systematically investigate the descriptive validity of the alternative accounts in different circumstances, as well as their theoretical behavior under different conditions. This will support the development of a comprehensive theory of human diagnostic reasoning that is informed and constrained by normative considerations and empirical data.

Author Note This research was supported by grants ME 3717/2-2 and Ma 6545/1-2 from the Deutsche Forschungsgemeinschaft (DFG) as part of the priority program “New Frameworks of Rationality” (SPP 1516). We thank Michael Waldmann for helpful feedback on this chap­ ter and Anita Todd for editing the manuscript.

Page 29 of 42

Diagnostic Reasoning

References Ajzen, I. (1977). Intuitive theories of events and the effects of base-rate information on prediction. Journal of Personality and Social Psychology, 35, 303–314. doi: 10.1037/0022-3514.35.5.303 Anderson, J. R. (1990). The adaptive character of thought. Hillsdale, NJ: Lawrence Erl­ baum Associates. Austerweil, J. L., & Griffiths, T. L. (2011). Seeking confirmation is rational for determinis­ tic hypotheses. Cognitive Science, 35, 499–526. doi:10.1111/j.1551-6709.2010.01161.x Barbey, A. K., & Sloman, S. A. (2007). Base-rate respect: From ecological rationality to dual process. Behavioral and Brain Sciences, 30, 241–297. doi:10.1017/ S0140525X07001653 Bar-Hillel, M. (1980). The base-rate fallacy in probability judgments. Acta Psychologica, 44, 211–233. doi:10.1016/0001-6918(80)90046-3 Baron, J. (1985). Rationality and intelligence. Cambridge, UK: Cambridge University Press. Baron, J., & Hershey, J. C. (1988). Heuristics and biases in diagnostic reasoning: I. Priors, error costs, and test accuracy. Organizational Behavior and Human Decision Processes, 41, 259–279. doi:10.1016/0749-5978(88)90030-1 Benish, W. A. (1999). Relative entropy as a measure of diagnostic information. Medical Decision Making, 19, 202–206. doi:10.1177/0272989X9901900211 Bocklisch, F., Bocklisch, S. F., & Krems, J. F. (2012). Sometimes, often, and always: Explor­ ing the vague meanings of frequency expressions. Behavior Research Methods, 44, 144– 157. doi:10.3758/s13428-011-0130-8 Bramley, N. R., Lagnado, D. A., & Speekenbrink, M. (2015). Conservative forgetful schol­ ars: How people learn causal structure through sequences of interventions. Journal of Ex­ perimental Psychology: Learning, Memory, and Cognition, 41, 708–731. doi:10.1037/ xlm0000061 Brighton, H., & Gigerenzer, G. (2012). Are rational actor models “rational” outside small worlds? In S. Okasha & K. Binmore (Eds.), Evolution and rationality: Decisions, co-opera­ tion, and strategic behaviour (pp. 84–109). Cambridge, UK: Cambridge University Press. Chater, N., & Oaksford, M. (1999). Ten years of the rational analysis of cognition. Trends in Cognitive Sciences, 3, 57–65. doi:10.1016/S1364-6613(98)01273-X Chater, N., & Oaksford, M. (2008). (Eds.). The probabilistic mind. New York: Oxford Uni­ versity Press.

Page 30 of 42

Diagnostic Reasoning Cheng, P. W. (1997). From covariation to causation: A causal power theory. Psychological Review, 104, 367–405. doi:10.1037//0033-295X.104.2.367 Cheng, P. W., & Novick, L. R. (1990). A probabilistic contrast model of causal induction. Journal of Personality and Social Psychology, 58, 545–567. doi: 10.1037/0022-3514.58.4.545 Cheng, P. W., & Novick, L. R. (1992). Covariation in natural causal induction. Psychologi­ cal Review, 99, 365–382. doi:10.1037/0033-295X.99.2.365 Cheng, P. W., & Novick, L. R. (2005). Constraints and nonconstraints in causal reasoning: Reply to White (2005) and to Luhmann & Ahn (2005). Psychological Review, 112, 694– 707. doi:10.1037/0033-295X.112.3.694 Chickering, D. M., & Heckerman, D. (1997). Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables. Machine Learning, 29, 181–212. doi:10.1023/A:1007469629108 Coenen, A., Rehder, B., & Gureckis, T. M. (2015). Strategies to intervene on causal sys­ tems are adaptively selected. Cognitive Psychology, 79, 102–133. doi:10.1016/j.cogpsych. 2015.02.004 Cokely, E. T., Galesic, M., Schulz, E., Ghazal, S., & Garcia-Retamero, R. (2012). Measuring risk literacy: The Berlin Numeracy Test. Judgment and Decision Making, 7, 25–47. Crupi, V., Nelson, J. D., Meder, B., Cevolani, G., & Tentori, K. (2016). Generalized informa­ tion theory meets human cognition: Introducing a unified framework to model uncertain­ ty and information search. Manuscript in preparation. Crupi, V., & Tentori, K. (2014). State of the field: Measuring information and confirma­ tion. Studies in History and Philosophy of Science, 47(C), 81–90. doi:10.1016/j.shpsa. 2014.05.002 Crupi, V., Tentori, K., & Lombardi, L. (2009). Pseudodiagnosticity revisited. Psychological Review, 116, 971–985. doi:10.1037/a0017050 Deverett, B., & Kemp, C. (2012). Learning deterministic causal networks from observa­ tional data. In N. Miyake, D. Peebles, & R. P. Cooper (Eds.), Proceedings of the 34th annu­ al conference of the Cognitive Science Society (pp. 288–293). Austin, TX: Cognitive Science Society. Doherty, M. E., Mynatt, C. R., Tweney, R. D., & Schiavo, M. D. (1979). Pseudodiagnostici­ ty. Acta Psychologica, 43, 111–121. doi:10.1016/0001-6918(79)90017-9 Domingos, P., & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier un­ der zero-one loss. Machine Learning, 29, 103–130. doi:10.1023/A:1007413511361

Page 31 of 42

Diagnostic Reasoning Eddy, D. M. (1982). Probabilistic reasoning in clinical medicine: Problems and op­ portunities. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncertainty: Heuristics and biases (pp. 249–267). New York: Cambridge University Press. (p. 454)

Edwards, W. (1968). Conservatism in human information processing. In B. Kleinmuntz (Ed.), Formal representation of human judgment (pp. 17–52). New York: John Wiley & Sons. Fernbach, P. M., Darlow, A., & Sloman, S. A. (2010). Neglect of alternative causes in pre­ dictive but not diagnostic reasoning. Psychological Science, 21, 329–336. doi: 10.1177/0956797610361430 Fernbach, P. M., Darlow A., & Sloman, S. A. (2011). Asymmetries in predictive and diag­ nostic reasoning. Journal of Experimental Psychology: General, 140, 168–185. doi:1037/ a0022100 Fernbach, P. M., & Sloman, S. A. (2009). Causal learning with local computations. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35, 678–693. doi:10.1037/ a0014928 Gigerenzer, G., & Hoffrage, U. (1995). How to improve Bayesian reasoning without in­ struction: Frequency formats. Psychological Review, 102, 684–704. doi: 10.1037/0033-295X.102.4.684 Glymour, C. (2003). Learning, prediction and causal Bayes nets. Trends in Cognitive Science, 7, 43–47. doi:10.1016/S1364-6613(02)00009-8 Goedert, K. M., Harsch, J., & Spellman, B. A. (2005). Discounting and conditionalization: Dissociable cognitive processes in human causal inference. Psychological Science, 18, 590–595. doi:10.1111/j.1467-9280.2005.01580.x Gopnik, A., Glymour, C., Sobel, D. M., Schulz, L. E., Kushnir, T., & Danks, D. (2004). A the­ ory of causal learning in children: Causal maps and Bayes nets. Psychological Review, 111, 3–32. doi:10.1037/0033-295X.111.1.3 Griffiths, T. L., & Tenenbaum, J. B. (2005). Structure and strength in causal induction. Cognitive Psychology, 51, 334–384. doi:10.1016/j.cogpsych.2005.05.004 Griffiths, T. L., & Tenenbaum, J. B. (2007). Two proposals for causal grammars. In A. Gop­ nik & L. Schulz (Eds.), Causal learning: Psychology, philosophy, and computation (pp. 323–346). Oxford, UK: Oxford University Press. Griffiths, T. L., & Tenenbaum, J. B. (2009). Theory-based causal induction. Psychological Review, 116, 661–716. doi:10.1037/a0017201 Hayes, B. K., Hawkins, G. E., & Newell, B. R. (2015, November 16). Consider the alterna­ tive: The effects of causal knowledge on representing and using alternative hypotheses in

Page 32 of 42

Diagnostic Reasoning judgments under uncertainty. Journal of Experimental Psychology: Learning, Memory, and Cognition, 42, 723–739. doi:10.1037/xlm0000205 Hayes, B. K., Hawkins, G. E., Newell, B. R., Pasqualino, M., & Rehder, B. (2014). The role of causal models in multiple judgments under uncertainty. Cognition, 133, 611–620. doi: 10.1037/0096-3445.134.4.596 Hogarth, R. M., & Einhorn, H. J. (1992). Order effects in belief updating: The belief-ad­ justment model. Cognitive Psychology, 24, 1–55. doi:10.1016/0010-0285(92)90002-J Holyoak, K. J., Lee, H. S., & Lu, H. (2010). Analogical and category-based inference: A theoretical integration with Bayesian causal models. Journal of Experimental Psychology: General, 139, 702–727. doi:10.1037/a0020488 Jahn, G., & Braatz, J. (2014). Memory indexing of sequential symptom processing in diag­ nostic reasoning. Cognitive Psychology, 68, 59–97. doi:10.1016/j.cogpsych.2013.11.002 Jahn, G., Stahnke, R., & Rebitschek, F. G. (2014). Parallel belief updating in sequential di­ agnostic reasoning. In P. Bello, M. Guarini, M. McShane, & B. Scassellati (Eds.), Proceed­ ings of the 36th annual conference of the Cognitive Science Society (pp. 2405–2410). Austin, TX: Cognitive Science Society. Jarecki, J., Meder, B., & Nelson, J. D. (2013). The assumption of class-conditional indepen­ dence in category learning. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th annual conference of the Cognitive Science Society (pp. 2650– 2655). Austin, TX: Cognitive Science Society. Jarecki, J., Meder, B., & Nelson, J. D. (2016). Naïve and robust: Class-conditional indepen­ dence in human classification learning. Manuscript submitted for publication. Johnson, S. G. B., & Keil, F. C. (2014). Causal inference and the hierarchical structure of experience. Journal of Experimental Psychology: General, 143, 2223–2241. doi:10.1037/ a0038192 Jones, E. E., & Harris, V. A. (1967). The attribution of attitudes. Journal of Experimental Social Psychology, 3, 1–24. doi:10.1016/0022-1031(67)90034-0 Jones, M., & Love, B. C. (2011). Bayesian fundamentalism or enlightenment? On the ex­ planatory status and theoretical contributions of Bayesian models of cognition. Behavioral and Brain Sciences, 34, 169–188. doi:10.1017/S0140525X10003134 Josephson, J. R., & Josephson, S. G. (1996). Abductive inference: Computation, philosophy, technology. Cambridge, UK: Cambridge University Press. Kahneman, D., & Tversky, A. (1972). Subjective probability: A judgment of representative­ ness. Cognitive Psychology, 3, 430–454. doi:10.1016/0010-0285(72)90016-3

Page 33 of 42

Diagnostic Reasoning Kahneman, D., & Tversky, A. (1973). On the psychology of prediction. Psychological Re­ view, 80, 237–251. doi:10.1037/h0034747 Khemlani, S. S., & Oppenheimer, D. M. (2011). When one model casts doubt on another: A levels-of-analysis approach to causal discounting. Psychological Bulletin, 137, 195–210. doi:10.1037/a0021809 Klayman, J., & Ha, Y.-W. (1987). Confirmation, disconfirmation, and information in hypoth­ esis testing. Psychological Review, 94, 211–228. doi:10.1037/0033-295X.94.2.211 Koehler, J. J. (1996). The base rate fallacy reconsidered: Descriptive, normative and methodological challenges. Behavioral and Brain Sciences, 19, 1–54. doi:10.1017/ S0140525X00041157 Krynski, T. R., & Tenenbaum, J. B. (2007). The role of causality in judgment under uncer­ tainty. Journal of Experimental Psychology: General, 136, 430–450. doi: 10.1037/0096-3445.136.3.430 Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. The Annals of Mathe­ matical Statistics, 22, 79–86. doi:0.1214/aoms/1177729694 Lagnado, D. A., & Sloman, S. A. (2004). The advantage of timely intervention. Journal of Experimental Psychology: Learning, Memory, and Cognition, 30, 856–876. doi: 10.1037/0278-7393.30.4.856 Lagnado, D. A., & Sloman, S. A. (2006). Time as a guide to cause. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32, 451–460. doi: 0.1037/0278-7393.32.3.451 Lindley, D. V. (1956). On a measure of the information provided by an experiment. The Annals of Mathematical Statistics, 27, 986–1005. doi:10.1214/aoms/1177728069 (p. 455)

Lipkus, I. M., Samsa, G., & Rimer, B. K. (2001). General performance on a numeracy scale among highly educated samples. Medical Decision Making, 21, 37–44. doi: 10.1177/0272989X0102100105 Lu, H., Yuille, A. L., Liljeholm, M., Cheng, P. W., & Holyoak, K. J. (2008). Bayesian generic priors for causal learning. Psychological Review, 115, 955–984. doi:10.1037/a0013256 Lucas, C. G., & Griffiths, T. L. (2010). Learning the form of causal relationships using hier­ archical Bayesian models. Cognitive Science, 34, 113–147. doi:10.1111/j. 1551-6709.2009.01058.x Markant, D. B., Settles, B. and Gureckis, T. M. (2015). Self-directed learning favors local, rather than global, uncertainty. Cognitive Science, 40, 100–120. doi:10.1111/cogs.12220 Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. San Francisco, CA: Freeman. Page 34 of 42

Diagnostic Reasoning Mayrhofer, R., & Waldmann, M. R. (2011). Heuristics in covariation-based induction of causal models: Sufficiency and necessity priors. In L. Carlson, C. Hoelscher, & T. F. Ship­ ley (Eds.), Proceedings of the 33rd annual conference of the Cognitive Science Society (pp. 3110–3115). Austin, TX: Cognitive Science Society. Mayrhofer, R., & Waldmann, M. R. (2015a). Agents and causes: Dispositional intuitions as a guide to causal structure. Cognitive Science, 39, 65–95. doi:10.1111/cogs.12132 Mayrhofer, R., & Waldmann, M. R. (2015b). Sufficiency and necessity assumptions in causal structure induction. Cognitive Science. doi:10.1111/cogs.12318 McKenzie, C. R. M. (2003). Rational models as theories—not standards—of behavior. Trends in Cognitive Sciences, 7, 403–406. doi:10.1016/S1364-6613(03)00196-7 McKenzie, C. R. M., & Mikkelsen, L. A. (2007). A Bayesian view of covariation assess­ ment. Cognitive Psychology, 54, 33–61. doi:10.1016/j.cogpsych.2006.04.004 McNair, S., & Feeney, A. (2014). When does information about causal structure improve statistical reasoning? Quarterly Journal of Experimental Psychology, 67, 625–645. doi: 10.1080/17470218.2013.821709 McNair, S., & Feeney, A. (2015). Whose statistical reasoning is facilitated by a causal structure intervention? Psychonomic Bulletin & Review, 22, 258–264. doi:10.3758/ s13423-014-0645-y Meder, B., & Gigerenzer, G. (2014). Statistical thinking: No one left behind. In E. J. Cher­ noff & B. Sriraman (Eds.), Advances in mathematics education: Probabilistic thinking: Presenting plural perspectives (pp. 127–148). Dordrecht, Netherlands: Springer. Meder, B., Hagmayer, Y., & Waldmann, M. R. (2008). Inferring interventional predictions from observational learning data. Psychonomic Bulletin & Review, 15, 75–80. doi:10.3758/ PBR.15.1.75 Meder, B., Hagmayer, Y., & Waldmann, M. R. (2009). The role of learning data in causal reasoning about observations and interventions. Memory & Cognition, 37, 249–264. doi: 10.3758/MC.37.3.249 Meder, B., & Mayrhofer, R. (2013). Sequential diagnostic reasoning with verbal informa­ tion. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th annual conference of the Cognitive Science Society (pp. 1014–1019). Austin, TX: Cogni­ tive Science Society. Meder, B., Mayrhofer, R., & Waldmann, M. R. (2009). A rational model of elemental diag­ nostic inference. In N. A. Taatgen & H. van Rijn (Eds.), Proceedings of the 31st annual conference of the Cognitive Science Society (pp. 2176–2181). Austin, TX: Cognitive Science Society.

Page 35 of 42

Diagnostic Reasoning Meder, B., Mayrhofer, R., & Waldmann, M. R. (2014). Structure induction in diagnostic causal reasoning. Psychological Review, 121, 277–301. doi:10.1037/a0035944 Meder, B., & Nelson, J. D. (2012). Information search with situation-specific reward func­ tions. Judgment and Decision Making, 7, 119–148. Meier, K. M., & Blair, M. R. (2013). Waiting and weighting: Information sampling is a bal­ ance between efficiency and error-reduction. Cognition, 126, 319–325. doi:10.1016/ j.cognition.2012.09.014 Melz, E. R., Cheng, P. W., Holyoak, K. J., & Waldmann, M. R. (1993). Cue competition in human categorization: Contingency or the Rescorla-Wagner Learning Rule? Comment on Shanks (1991). Journal of Experimental Psychology: Learning, Memory, and Cognition, 19, 1398–1410. doi:10.1037/0278-7393.19.6.1398 Morris, M. W., & Larrick, R. P. (1995). When one cause casts doubt on another: A norma­ tive analysis of discounting in causal attribution. Psychological Review, 102, 331–355. doi: 10.1037/0033-295X.102.2.331 Mosteller, F., & Youtz, C. (1990). Quantifying probabilistic expressions. Statistical Science, 5, 2–12. doi:10.1214/ss/1177012251 Nelson, J. D. (2005). Finding useful questions: On Bayesian diagnosticity, probability, im­ pact, and information gain. Psychological Review, 112, 979–999. doi:10.1037/0033-295X. 112.4.979 Nelson, J. D., Divjak, B., Gudmundsdottir, G., Martignon, L. F., & Meder, B. (2014). Children’s sequential information search is sensitive to environmental probabilities. Cog­ nition, 130, 74–80. doi:10.1016/j.cognition.2013.09.007 Nelson, J. D., McKenzie, C. R. M., Cottrell, G. W., & Sejnowski, T. J. (2010). Experience matters: Information acquisition optimizes probability gain. Psychological Science, 21, 960–969. doi:10.1177/0956797610372637 Novick, L. R., & Cheng, P. W. (2004). Assessing interactive causal power. Psychological Review, 111, 455–485. doi:10.1037/0033-295X.111.2.455 Oaksford, M., & Chater, N. (1994). A rational analysis of the selection task as optimal da­ ta selection. Psychological Review, 101, 608–631. doi:10.1037/0033-295X.101.4.608 Park, J., & Sloman, S. A. (2013). Mechanistic beliefs determine adherence to the Markov property in causal reasoning. Cognitive Psychology, 67, 186–216. doi:10.1016/j.cogpsych. 2013.09.002 Pearl, J. (1988). Probabilistic reasoning in intelligent systems. San Francisco, CA: Mor­ gan-Kaufmann.

Page 36 of 42

Diagnostic Reasoning Pearl, J. (2000). Causality: Models, reasoning and inference. Cambridge, UK: Cambridge University Press. Peterson, C. R., & Beach, L. R. (1967). Man as intuitive statistician. Psychological Bulletin, 68, 29–46. doi:10.1037/h0024722 Phillips, L. D., & Edwards, W. (1966). Conservatism in a simple probability inference task. Journal of Experimental Psychology, 72, 346–354. doi:10.1037/h0023653 Raiffa, H., & Schlaifer, R. O. (1961). Applied statistical decision theory. Cambridge, MA: Division of Research, Graduate School of Business Administration, Harvard University. Rebitschek, F. G., Bocklisch, F., Scholz, A., Krems, J. F., & Jahn, G. (2015). Biased processing of ambiguous symptoms favors the initially leading hypothesis in sequential di­ agnostic reasoning. Experimental Psychology, 62, 287–305. doi:10.1027/1618-3169/ a000298 (p. 456)

Rebitschek, F. G., Krems, J. F., & Jahn, G. (2015). Memory activation of multiple hypothe­ ses in sequential diagnostic reasoning, Journal of Cognitive Psychology, 27, 780–796. doi: 10.1080/20445911.2015.1026825 Rehder, B. (2003). Categorization as causal reasoning. Cognitive Science, 27, 709–748. doi:10.1016/S0364-0213(03)00068-5 Rehder, B. (2010). Causal-based classification: A review. In B. Ross (Ed.), The psychology of learning and motivation (Vol. 52, pp. 39–116). San Diego, CA: Academic Press. Rehder, B. (2014). Independence and dependence in human causal reasoning. Cognitive Psychology, 72, 54–107. doi:10.1016/j.cogpsych.2014.02.002 Rehder, B., & Burnett, R. C. (2005). Feature inference and the causal structure of cate­ gories. Cognitive Psychology, 50, 264–314. doi:10.1016/j.cogpsych.2004.09.002 Rehder, B., & Waldmann, M. R. (in press). Failures of explaining away and screening off in described versus experienced causal learning scenarios. Memory & Cognition. Rényi, A. (1961). On measures of entropy and information. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, 1, 547–561. Rothe, A., Deverett, B., Mayrhofer, R., & Kemp, C. (2016). Successful structure learning from observational data. Unpublished manuscript. Rottman, B. M., & Hastie, R. (2014). Reasoning about causal relationships: Inferences in causal networks. Psychological Bulletin, 140, 109–139. doi:10.1037/a0031903 Rottman, B. M., & Hastie, R. (2015). Do Markov violations and failures of explaining away persist with experience? In R. Dale, C. Jennings, P. Maglio, T. Matlock, D. Noelle, A. War­

Page 37 of 42

Diagnostic Reasoning laumont, & J. Yoshimi (Eds.), Proceedings of the 37th annual conference of the Cognitive Science Society (pp. 2027–2032). Austin, TX: Cognitive Science Society. Rottman, B. M., & Hastie, R. (2016). Do people reason rationally about causally related events? Markov violations, weak inferences, and failures of explaining away. Cognitive Psychology, 87, 88–134. doi:10.1016/j.cogpsych.2016.05.002 Rusconi, P., & McKenzie, C. R. M. (2013). Insensitivity and oversensitivity to answer diag­ nosticity in hypothesis testing. The Quarterly Journal of Experimental Psychology, 66, 2443–2464. doi:10.1080/17470218.2013.793732 Sedlmeier, P., & Gigerenzer, G. (2001). Teaching Bayesian reasoning in less than two hours. Journal of Experimental Psychology: General, 130, 380–400. doi: 10.1037//0096-3445.130.3.380 Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Techni­ cal Journal, 27, 379–423, 623–656. Stephan, S., & Waldmann, M. R. (2016). Answering causal queries about singular cases. In A. Papafragou, D. Grodner, D. Mirman, & J. C. Trueswell (Eds.), Proceedings of the 38th Annual Conference of the Cognitive Science Society (pp. 2795–2801). Austin, TX: Cogni­ tive Science Society. Sloman, S. A., & Lagnado, D. A. (2005). Do we “do?” Cognitive Science, 29, 5–39. doi: 10.1207/s15516709cog2901_2 Spirtes, P., Glymour, C., & Scheines, P. (1993). Causation, prediction, and search. New York: Springer. Spohn, W. (1976/1978). Grundlagen der Entscheidungstheorie, Dissertation at the Univer­ sity of Munich 1976, published: Kronberg/Ts.: Scriptor, 1978. Spohn, W. (2001). Bayesian nets are all there is to causal dependence. In M. C. Galavotti, P. Suppes, & D. Costantini (Eds.), Stochastic dependence and causality (pp. 157–172). Stanford, CA: CSLI Publications. Steyvers, M., Tenenbaum, J. B., Wagenmakers, E.-J., & Blum, B. (2003). Inferring causal networks from observations and interventions. Cognitive Science, 27, 453–489. doi: 10.1207/s15516709cog2703_6 Tenenbaum, J. B., Griffiths, T. L., & Niyogi, S. (2007). Intuitive theories as grammars for causal inference. In A. Gopnik & L. Schulz (Eds.), Causal learning: Psychology, philosophy, and computation (pp. 301–322). Oxford, UK: Oxford University Press. Thomas, R. P., Dougherty, M. R., Sprenger, A. M., & Harbison, J. I. (2008). Diagnostic hy­ pothesis generation and human judgment. Psychological Review, 115, 155–185. doi: 10.1037/0033-295X.115.1.155

Page 38 of 42

Diagnostic Reasoning Trueblood, J. S., & Busemeyer, J. R. (2011). A quantum probability account of order ef­ fects in inference. Cognitive Science, 35, 1518–1552. http://doi.org/10.1111/j. 1551-6709.2011.01197.x Tsallis, C. (1988). Possible generalization of Boltzmann-Gibbs statistics. Journal of Statisti­ cal Physics, 52, 479–487. doi:10.1007/BF01016429 Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185, 1124–1131. doi:10.1126/science.185.4157.1124 Tversky, A., & Kahneman, D. (1982a). Causal schemas in judgments under uncertainty. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncertainty: Heuristics and biases (pp. 117–128). New York: Cambridge University Press. Tversky, A., & Kahneman, D. (1982b). Evidential impact of base rates. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncertainty: Heuristics and biases (pp. 153– 160). New York: Cambridge University Press. von Sydow, M., Hagmayer, Y., & Meder, B. (2015). Transitive reasoning distorts induction in causal chains. Memory & Cognition, 44, 469–487. http://doi.org/10.3758/ s13421-015-0568-5 Waldmann, M. R. (1996). Knowledge-based causal induction. In D. R. Shanks, K. J. Holyoak, & D. L. Medin (Eds.), The psychology of learning and motivation (Vol. 34, pp. 47– 88). San Diego, CA: Academic Press. Waldmann, M. R. (2007). Combining versus analyzing multiple causes: How domain as­ sumptions and task context affect integration rules. Cognitive Science, 31, 233–256. doi: 10.1080/15326900701221231 Waldmann, M. R., Cheng, P. W., Hagmayer, Y., & Blaisdell, A. P. (2008). Causal learning in rats and humans: A minimal rational model. In N. Chater & M. Oaksford (Eds.), The prob­ abilistic mind: Prospects for Bayesian cognitive science (pp. 453–484). Oxford, UK: Ox­ ford University Press. Waldmann, M. R., & Hagmayer, Y. (2005). Seeing versus doing: Two modes of accessing causal knowledge. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 216–227. doi:10.1037/0278-7393.31.2.216 Waldmann, M. R., & Hagmayer, Y. (2013). Causal reasoning. In D. Reisberg (Ed.), Oxford handbook of cognitive psychology (pp. 733–752). New York: Oxford University Press. (p. 457)

Waldmann, M. R., Hagmayer, Y., & Blaisdell, A. P. (2006). Beyond the information

given: Causal models in learning and reasoning. Current Directions in Psychological Science, 15, 307–311. Waldmann, M. R., Holyoak, K. J., & Fratianne, A. (1995). Causal models and the acquisi­ tion of category structure. Journal of Experimental Psychology: General, 124, 181–206. Page 39 of 42

Diagnostic Reasoning Walsh, C., & Sloman, S. (2008). Updating beliefs with causal models: Violations of screen­ ing off. In M. A. Gluck, J. R. Anderson, & S. M. Kosslyn (Eds.), Memory and mind: A Festschrift for Gordon H. Bower (pp. 345–357). New York: Erlbaum. Wason, P. C. (1968). Reasoning about a rule. Quarterly Journal of Experimental Psycholo­ gy, 20, 273–281. doi:10.1080/14640746808400161 Wells, G. L., & Lindsay, R. C. (1980). On estimating the diagnosticity of eyewitness non­ identifications. Psychological Bulletin, 88, 776–784. doi:10.1037/0033-2909.88.3.776 White, P. (2006). How well is causal structure inferred from co-occurrence information? European Journal of Cognitive Psychology, 18, 454–480. doi: 10.1080/09541440500264861 Yeung, S., & Griffiths, T. L. (2015). Identifying expectations about the strength of causal relationships. Cognitive Psychology, 76, 1–29. doi:10.1016/j.cogpsych.2014.11.001 (p. 458)

Notes: (1.) The noisy-OR gate assumes independent causes C and A; thus, the probability P(e|c) is given by wc + wa – wcwa, because when C is present E is present, when either C produced it (with probability wc) or the background A generated it (with probability wa); the last term corrects for double counting cases in which both causes brought about E (with prob­ ability wcwa). (2.) For the data set in Figure 22.1 b, the maximum likelihood estimates (MLEs) for the parameters of structure S1 (Figure 22.1C) are bc = 0.5, wc = 0.222, and wa = 0.1. Plug­ ging these number into Equation 4 yields P(c|e) = .75. (3.) From a mathematical perspective, the structure induction model contains Bayesian variants of power PC theory (e.g., Holyoak et al., 2010; Lu et al., 2008) as special cases. If the prior for structure S1 is set to 1, structure S0 plays no role when integrating out the structures to obtain a single estimate for the diagnostic probability P(c|e). Note, however, that certain technical differences exist between proposed Bayesian variants of power PC theory and the structure induction model, such as the used priors (e.g., so-called sparseand-strong priors in Lu et al., 2008, vs. uniform priors in the structure induction model in Meder et al., 2014). (4.) A related term is discounting, which in the literature has sometimes been used inter­ changeably with the notion of explaining away, but also has been used to describe differ­ ent empirical phenomena, such as variations in causal strength judgments of a target cause in the presence of alternative causes (e.g., Goedert, Harsch, & Spellman, 2005). See Khemlani and Oppenheimer (2011) for a review of the use of both terms and an overview of different models and findings. We here focus on explaining away as conceptu­ alized in the context of diagnostic inference over graphical causal models.

Page 40 of 42

Diagnostic Reasoning (5.) More generally, estimating individual causal powers in situations with multiple caus­ es according to Equation 2 requires conditioning on the absence of the alternative causes to determine the relevant strength estimates. See Cheng and Novick (1990, 1992; Melz, Cheng, Holyoak, & Waldmann, 1993) for details; see also Novick and Cheng (2004) for a formal analysis of strength estimates when causes are interacting. (6.) An alternative way would be to parameterize the model with causal strength esti­ mates, analogous to the cases discussed earlier. The computations for deriving the infor­ mation value of different queries as discussed later can then be based on these parame­ ters (e.g., derivation of diagnostic probabilities via causal strength estimates; e.g., Equa­ tion 4). To simplify matters, we here consider only the case in which the causal structure is parameterized by conditional probabilities. (7.) Other entropy measures besides Shannon (1948) could be used, for instance, the Rényi (1961) or Tsallis (1988) families of entropy measures (for a detailed discussion, see Crupi, Nelson, Meder, Cevolani, & Tentori, 2016). (8.) If the patient has fever, the entropy increases, because the posterior probability dis­ tribution over the cause (virus) is close to uniform (.49 vs. .51, respectively; Figure 22.4 c). Therefore, this datum entails a negative information gain (i.e., an increase in uncer­ tainty about the true state of the cause variable). (The entropy of a binary random vari­ able is maximal when the distribution is uniform, i.e., both states are equiprobable.) Con­ versely, if the patient has no fever, this datum decreases the entropy, because conditional on the absence of fever it is very likely that the virus is not present (.07 vs. .93; Figure 22.4 c). To compute the expected information gain of testing for fever, the individual gains are integrated by weighing each gain with the probability of observing each of the two states, fever vs. ¬fever (.55 vs. .45; Figure 22.4 c). (Values in the text are not based on the rounded values in the tree in Figure 22.4 c but are the exact values.) (9.) The probability gain model assumes that the diagnostic reasoner always selects the more likely hypothesis, i.e., uses an arg-max decision rule. Accuracy based on the prior distribution of the cause is .7. If fever is present, accuracy decreases to .51; the probabili­ ty gain relative to the prior is negative. By contrast, when the patient does not have fever, accuracy increases to .93; accordingly, the probability gain is positive. To compute the overall probability gain of the query, an expectation is computed based on taking into ac­ count the probability of each state (i.e., that a patient does or does not have fever).

Björn Meder

Center for Adaptive Behavior and Cognition Max Planck Institute for Human Devel­ opment Berlin, Germany Ralf Mayrhofer

Department of Psychology University of Göttingen Göttingen, Germany

Page 41 of 42

Inferring Causal Relations by Analogy

Inferring Causal Relations by Analogy   Keith J. Holyoak and Hee Seung Lee The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.25

Abstract and Keywords When two situations share a common pattern of relationships among their constituent el­ ements, people often draw an analogy between a familiar source analog and a novel tar­ get analog. This chapter reviews major subprocesses of analogical reasoning and discuss­ es how analogical inference is guided by causal relations. Psychological evidence sug­ gests that analogical inference often involves constructing and then running a causal model. It also provides some examples of analogies and models that have been used as tools in science education to foster understanding of critical causal relations. A Bayesian theory of causal inference by analogy illuminates how causal knowledge, represented as causal models, can be integrated with analogical reasoning to yield inductive inferences. Keywords: analogical reasoning, causal relations, analogy, Bayesian, inference

The human mind (perhaps uniquely among all species presently on earth; see Penn, Holyoak, & Povinelli, 2008) provides the capability to represent and manipulate struc­ tured relationships. Relational reasoning is apparent in the ability to see analogies be­ tween very different situations. In essence, an analogy between a familiar source analog and a novel target analog can serve as a source of hypotheses about the latter. For exam­ ple, this is how Nobel Laureate economist Paul Krugman (2011) described a proposed “debt ceiling deal” between the US Congress and President Obama: “… those demanding spending cuts now are like medieval doctors who treated the sick by bleeding them, and thereby made them even sicker.” As in many everyday analogies, this one foregrounds an abstract type of cause–effect relationship (cure makes patient worse), strongly implies a moral evaluation (this intervention is wrong), and conveys a prescription for action (stop it). Research has confirmed that moral evaluations are guided by causal understanding (Wiegmann & Waldmann, 2014). In this chapter we consider in more detail the role played by relational reasoning in general, and analogy in particular, in generating causal inferences. We also review applied research on the use of causal analogies in educational practice, and consider theoretical treatments of the connections between causal and ana­ logical reasoning.

Page 1 of 25

Inferring Causal Relations by Analogy Inferences are often based on induction of causal relationships from a series of examples. In typical laboratory studies of causal learning, the training examples are very similar to one another in terms of perceptual and semantic features (e.g., a series of cases in which a person eats a specific type of food, and then may or may not have an allergic reaction). In contrast, analogical reasoning typically involves examples (as few as two) that are not particularly similar in semantic and perceptual features. Two situations are generally said to be analogous if they share a common pattern of relationships among their constituent elements, even though the elements themselves differ across the two situations (as in Krugman’s comparison of a debt ceiling deal to medieval medicine). Our focus will be on such cases, where it is not immediately obvious that the examples would be expected to exhibit similar causal regularities. To clarify these distinctions, Figure 24.1 schematizes two major dimensions along which knowledge (p. 460) representations appear to vary (Holyoak, Lee, & Lu, 2010; see also Kemp & Jern, 2014). The x-axis represents variation in degree of abstraction of the knowl­ edge representations (specific to general), and the y-axis represents variation in “relation­ al richness” (low to high), which is related to the complexity of the relations involved. Ab­ straction ranges from the low end, at which reasoning depends heavily on specific cases, to the high end, at which it depends primarily on generalizations over multiple examples. Relational richness ranges from the low end, where the representations are primarily based on simple features of individual objects, to the high end, at which representations include many complex relations potentially instantiated with dissimilar objects.

Figure 24.1 Schematic relationships among types of knowledge representations related to causal infer­ ence. From Holyoak et al. (2010); reprinted by permission.

Each corner of the quadrant in Figure 24.1 is labeled with a “prototypical” psychological concept related to inductive inference. The lower left corner represents inferences based on feature-defined instances. The role of causal knowledge is minimal in this quadrant, in Page 2 of 25

Inferring Causal Relations by Analogy which inferences are primarily based on featural similarity. The lower right corner corre­ sponds to relatively simple categories based on distributions of properties over their in­ stances, potentially accompanied by causal generalizations. This is the type of knowledge typically used in paradigms associated with category-based induction (e.g., Rehder, 2006; for a review, see Rehder, Chapters 20 and 21 in this volume) The top left corner focuses on relationally complex instances, the central focus of work on analogical reasoning (Gen­ tner, 1983; for a review, see Holyoak, 2012). Even a single example can potentially trigger causal inferences, especially if the example involves some degree of abstraction (e.g., Ahn, Brewer, & Mooney, 1992). The upper right corner focuses on abstract relational categories, often referred to as schemas. Comparison of multiple analogs can lead to the development of a schema repre­ senting a category of situations (Gick & Holyoak, 1983; Gentner, Lowenstein & Thomp­ son, 2003), and the computational processes involved in deriving inferences based on in­ dividual analogs and on generalized schemas appear to be very similar (Hummel & Holyoak, 1997, 2003). In actual inductive reasoning, multiple sources of knowledge at varying levels of abstraction and relational richness may be used together. In this chapter we will often refer to “analogical” reasoning when in fact we mean reasoning based on some mixture of a specific relational instance and more schematic causal generalizations. In this chapter, we will consider the major subprocesses of analogical reasoning, and crit­ ical factors that may influence each. We will then survey examples of analogies and mod­ els that have been used as tools in science education, as analogies are often used to fos­ ter understanding of critical causal relations. Such examples highlight the key role of causal relations in guiding analogical inference. Finally, we will discuss the representa­ tion of causal relations and how analogical reasoning can be integrated with causal mod­ els, focusing on a Bayesian theory of causal inference by analogy.

Analogical Reminding, Mapping, and Inference Analogical reasoning involves several subprocesses. Though taxonomies vary (e.g., Holyoak, Novick, & Melz, 1994), the most basic distinctions are between retrieval of a source analog given the target as a cue (reminding), aligning the components of source and target (mapping), and using the source to fill gaps in knowledge about the target (in­ ference). Analogy is an inductive mechanism (Holland, Holyoak, Nisbett, & Thagard, 1986), and like all forms of induction, is fallible. Any of the component processes may fail, and even if an inference is made successfully, it may prove erroneous if the causal struc­ ture of the source situation is importantly different from that of the target. (For examples of both successful and unsuccessful instances of analogy use in strategic business plan­ ning, see Gavetti, Giovanni, & Rivkin, 2005.) When the source and target are drawn from dissimilar domains, people often fail to no­ tice the potential relevance of the former to the latter, even when presented in close tem­ poral contiguity (Gick & Holyoak, 1980; Holyoak & Koh, 1987 (p. 461) ). Conversely, rela­ tively superficial similarities can cause a target to activate a potential source in memory. Page 3 of 25

Inferring Causal Relations by Analogy As a consequence, retrieval processes may trigger suboptimal inferences. Not all similari­ ties between situations are normatively relevant to causal inferences. Rather, only those aspects of the source that in some way matter in producing an outcome (i.e., causal as­ pects) are important in deciding whether an analogous outcome is likely in the target (Bartha, 2010). However, people (particularly domain novices) often make analogical in­ ferences that are influenced by non-causal similarities. In a classic study, Gilovich (1981) had Stanford University undergraduates enrolled in a political science class make judg­ ments about possible American responses to a hypothetical foreign crisis. At that time, the best-known US military actions were World War II (where intervention led to victory) and the Vietnam War (where it led to quagmire and ultimate failure). One group of stu­ dents read a description of the hypothetical crisis couched in terms that were intended to evoke World War II (e.g., an impending invasion was described as a “blitzkrieg,” with mi­ norities fleeing the threatened country in “boxcars on freight trains”). A second group read about the same basic crisis, except that it was described in terms meant to evoke the Vietnam War (e.g., the impending invasion was a “quickstrike,” and minorities were fleeing in “small boats sailing up the coast”). Gilovich found that the students given the description reminiscent of World War II made more interventionist recommendations for an American response than did those given the Vietnam-like description. These findings indicate that superficial aspects of the description of a target situation can influence inferences about it, presumably because these aspects trigger reminding of one or another source analog. But importantly, Gilovich (1981) also found that the same stu­ dents did not rate the hypothetical crisis as especially similar to one of the two sources when directly asked (at the end of the experiment). In addition, after both potential sources had been explicitly mentioned, participants rated the expected degree of success for military intervention in the hypothetical crisis to be the same, regardless of which de­ scription of it they had read. It thus appears that although analogical inferences can be biased by causally irrelevant target features that call to mind one or another source, once alternative source analogs have been activated and subjected to attentional scrutiny, peo­ ple are able to focus on causally relevant similarities. Moreover, although non-functional similarities certainly influence spontaneous analogical reminding (e.g., Gentner, Rattermann, & Forbus, 1993; Keane, 1987; Ross, 1987, 1989), similarities based on functionally relevant relations also have an impact (Holyoak & Koh, 1987; Wharton et al., 1994; Wharton, Holyoak & Lange, 1996). In general, factors that in­ crease attention to functionally relevant relations increase spontaneous transfer based on such relations. For example, students who are relatively expert in mathematics (as as­ sessed by scores on the math section of the Scholastic Aptitude Test) were more likely to show spontaneous transfer between math problems that shared the same underlying structure, despite surface differences in cover stories (Novick, 1988). Active comparison of multiple analogous examples encourages induction of a relational schema, which in turn facilitates spontaneous transfer (Catrambone & Holyoak, 1989; Gentner et al., 2003; Gick & Holyoak, 1983). Adding animation to the source analog can also enhance sponta­

Page 4 of 25

Inferring Causal Relations by Analogy neous transfer (Kubricht, Lu, & Holyoak, 2015). In essence, acquiring a relational schema for a causal system is a key element of expertise (Hummel & Holyoak, 2003). Spontaneous recognition of causally relevant relational similarities plays a critical role in creative thinking, including scientific discovery (Hesse, 1966; Holyoak & Thagard, 1995). A good example is the earliest major scientific analogy, dating from the era of imperial Rome. The concrete source analog of water waves provided a deeper understanding of sound. Sound is analogous to water waves in that sound exhibits a pattern of behavior corresponding to that of water waves: propagating across space with diminishing intensi­ ty, passing around small barriers, rebounding off large barriers, and so on. The perceptu­ al features are very different (water is wet, air is not), but the underlying pattern of rela­ tions among the elements is similar. In this example, like most analogies involving empiri­ cal phenomena, the key functional relations involve causes and their effects. By transfer­ ring knowledge about causal relations, the analogy provides a new explanation of why various phenomena occur.

Analogies and Models as Tools in Science Edu­ cation Developmental evidence indicates that causal relations can serve as the basis for prob­ lem-solving by analogical transfer, even for preschool children (e.g., Brown, 1989; Brown & Kane, 1990; Holyoak, Junn, & Billman, 1984), suggesting the potential (p. 462) useful­ ness of analogy as a device for teaching science to young children as well as older stu­ dents. And indeed, analogies have been widely used to teach scientific topics, and are of­ ten recommended as a useful tool (Glynn, 1997; Kolodner, 1997; Lawson, 1993; Venville & Treagust, 1997). Note that although the originator of the water-wave/sound-wave analogy had to spontaneously notice a connection between the two domains, the same analogy can now be used as an instructional device, with a teacher actively posing the source as a possible way to understand the target. Although near analogies are easier to map than far analogies (Keane, 1987), far source analogs that have clear relational parallels to the tar­ get are likely to be more informative than near analogs, if the source domain is better un­ derstood than the target (Halpern, Hansen, & Riefer, 1990). Note that if (as will usually be the case) the target domain is not well understood, then a very-near source likely will not be well understood either. There are many examples of cross-domain analogies that are often used in science educa­ tion. Biology teachers explain the way that enzymes interact with substances by making an analogy to a lock and key; chemistry teachers explain the structure of the atom by analogy to the solar system; and physics teachers explain electricity flow by drawing an analogy with water flow. Such analogies can potentially help students to understand the causal relationships involved in a new concept based on information the students already understand.

Page 5 of 25

Inferring Causal Relations by Analogy Several studies based on textbook analyses suggest that analogies constitute one of the most common instructional methods found in science education. Since the development of a widely used classification scheme by Curtis and Reigeluth (1984), science textbook analyses have been conducted on many different levels and for many subtypes of science, including social science (Curtis, 1988), elementary school science books (Newton, 2003), high-school chemistry (Thiele & Treagust, 1994, 1995; Thiele, Venville, & Treagust, 1995), high-school biology (Thiele et al., 1995), high-school physics (Stocklmayer & Treagust, 1994), and college-level biochemistry (Orgill & Bodner, 2006). Harrison and Coll (2008) provide a list of 50 concepts from biology, chemistry, physics, and earth and space sci­ ence that have been taught using a model or analogy. As an example, consider one of the widely used analogies in the field of geoscience in­ struction (Jee et al., 2010), which involves a comparison between the target domain of convection of the mantle and the source of a simmering pot of water (an example that would fall in the top left quadrant of Figure 24.1). The source and target share a substan­ tial amount of relational structure (including causal relations); accordingly, many facts about the convection process can be inferred from the simmering-pot analog. In the source domain, a stove heats the water at the bottom of the pot, and this heating lowers the density of water. The lower density will in turn cause the water at the bottom to rise. In the analogical structure, water at the bottom of the pot corresponds to the lower man­ tle, water at the top of the pot to the upper mantle, and the stove (heat source) to the earth’s core. Based on the correspondence between the water and the mantle, students can make an inference about what will cause the mantle to rise. Specifically, the earth’s core heats the lower mantle, and the heat lowers the density of the lower mantle. The lower density in turn causes the lower part of the mantle to rise. In elementary science education, a study by Asoko (1996), conducted in northern Eng­ land, provides a good example of how analogy can be used to support the development of children’s causal understanding of energy transfer and current flow in electrical circuits. In this study, child participants played a direct part in a “string circuit” model by standing in a circle and loosely supporting with their hand a continuous string loop in order to rep­ resent an electric circuit, a target concept. The teacher encouraged students to describe what was happening inside the wire, and explained basic ideas regarding current travel around the circuit and mechanisms of how energy makes a bulb light up. After being giv­ en this analogy instruction, most students succeeded in identifying whether the bulb would light or not, given a few specific arrangements of circuits. Guy et al. (2013) demonstrated that analogy can be an effective tool for building public knowledge, and possibly lead to behavioral change, on the issue of climate change. Even highly educated adults often have difficulty in understanding the process of CO2 accumulation and stabilization (Sweeney & Sterman, 2000), and lack of such understand­ ing could be a barrier to motivating behavioral change. Guy et al. compared the effective­ ness of analogy materials versus direct information conveyed in graph format, for both Australian university students and the Australian public. The analogy is made between carbon accumulation (the buildup of carbon in the atmosphere) and a bathtub scenario Page 6 of 25

Inferring Causal Relations by Analogy (the process of water flow in a bathtub). Analogical processing improved performance on an inference task, and led to greater preference for strong action (p. 463) on climate change. In contrast, using graphs to convey information about emissions rate conveyed no positive benefit. As illustrated by examples such as the preceding, analogy can be a powerful instructional device for explaining causal principles that govern a target domain, which might not be obvious to students. Based on understanding of the causal structure of the source do­ main, students can make inferences about causal relations in the target domain (Bowdle & Gentner, 1997; Clement, 1991; Brown & Clement, 1989; Hesse, 1966; Holyoak & Tha­ gard, 1995). For instance, in the well-known analogy between electric circuits and water flow (Gentner & Gentner, 1983), causal relations between voltage and current in the elec­ tric circuit can be inferred based on an understanding of causal relations between pres­ sure difference and flow rate of water. Such causal understanding is critical when draw­ ing scientific inferences, because inference that is not based on causal relations could produce incorrect mappings and misconceptions. For this reason, Glynn, Britton, SemrudClikerman, and Muth (1989) referred to analogies as “double-edged swords.” Similarly, Harrison and Treagust (2006) noted that analogies may not lead students to develop sound scientific conceptions, but they may instead mislead students by triggering the generation of misconceptions. In the preceding analogy between water flow and an elec­ tric circuit, for example, students have to understand that two water tanks in series will create greater pressure, and this will in turn affect the flow rate of water. Without under­ standing that pressure causes a higher flow rate of water, students may erroneously infer that serial and parallel batteries would yield the same voltage. Although several empirical studies have supported the effectiveness of instructional analogies (e.g., Bean, Searles, Singer, & Cowan, 1990; Dagher, 1995; Donnelly & Mc­ Daniel, 1993; Glynn & Takahashi, 1998; Halpern et al., 1990; Vosniadou & Schommer, 1988; Yanowitz, 2001), others have found inconsistent or negative effects (e.g., Didier­ jean, Cauzinille-Marmeche, & Savina, 1999; Gilbert, 1989; McDaniel & Donnelly, 1996; Id­ ing, 1993; Zook & Maier, 1994). Clement and Yanowitz (2003; see also Brown & Clement, 1989) proposed that such mixed results may in part reflect failure to model causal mecha­ nisms in instructional analogies. When a source analog fails to provide an explanatory model that illustrates the causal mechanisms of a target, students might not accept the analogy. To test this hypothesis, Clement and Yanowitz (2003) assessed the effect of an analogical text that provided a model of causal mechanisms, using a fictional, pseudo-sci­ entific story of an “adaptation” schema. Participants who were given an analogous source story were more likely to construct a better representation of how the causal mechanism was instantiated, and generated significantly more of the causal mechanisms that had to be inferred, relative to participants who were given only the target story. Similarly, Yanowitz (2001) demonstrated the effectiveness of analogical texts in fostering inferential reasoning about scientific concepts among children in third to sixth grade. In the analogy condition, the text explicitly compared the source (e.g., power plant) and tar­ get (e.g., mitochondria) concepts, and elaborated on how those concepts were similar. In Page 7 of 25

Inferring Causal Relations by Analogy the no-analogy condition, non-analogical texts provided only information about the sci­ ence concept (e.g., mitochondria without mention of a power plant). The results showed that students who received an analogical text were more accurate in answering inference questions such as, “What would happen if mitochondria stopped working?” relative to those who received a non-analogical text. Though being able to answer an inference ques­ tion does not necessarily indicate complete causal understanding of the target domain, the analogies appeared to help to identify the causal relations involved in the target do­ main, thereby helping elementary school students to develop a deeper understanding of the novel domain. Although analogies have the potential to support causal understanding in science instruc­ tion, the presence of analogies does not always make the instruction effective in support­ ing understanding. Treagust and his colleagues (Treagust, Duit, Joslin, & Lindauer, 1992), in their investigation of how teachers taught science in 50 lessons, pointed out that analo­ gies were often poorly used in science classes, without the appropriate explanation and analysis required for appropriate conceptual change. Similarly, Newton and Newton (2000) noted that elementary-level teachers emphasize factual description and do not necessarily support causal understanding in science class. To make instructional analogies effective, an extensive body of research has been used to generate guides for teaching with analogies, both in science and mathematics (for the lat­ ter, see Richland, Holyoak, & Stigler, 2004; Richland, Zur, & Holyoak, 2007). For science topics, example guides include the “bridging with analogies” approach (Clement, 1993; Clement, Brown, & Zietsman, 1989), the TWA model (Teaching-With-Analogies; (p. 464) Glynn, 1991) and the FAR guide (Focus—Action—Reflection; Treagust, Harrison & Venville, 1998).

Representation of Causal Relations Research on the use of analogies in science instruction further highlights the key role of causal relations in guiding analogical inference. Whenever a source includes properties that an unfamiliar target initially lacks, any of these missing properties in the target might appear to be candidates for analogical inference. Yet people do not draw all possi­ ble inferences. For example, consider the earth as a source analog for inferring proper­ ties of the moon. It seems more likely that this analogy will lead to the inference that the moon may contain iron deposits than that the moon has a system of freeways, even though the earth contains iron deposits and also has freeways. The difference in the plau­ sibility of the two inferences hinges on our understanding that human life on earth is causally necessary for the existence of freeways but not iron deposits, and that the moon lacks any acceptable analog of human life. To understand in greater depth how causal knowledge may guide analogical inferences, we need to consider how such knowledge is represented. In the literature on analogical reasoning, causal relations have sometimes been treated as one salient example of the broader class of “higher-order” relations, which are relations that take other relations as Page 8 of 25

Inferring Causal Relations by Analogy arguments (Gentner, 1983). Thus in a rough predicate-calculus style, the fact that human life is causally related to freeways might be represented as CAUSE ((live-on (humans, earth)) (exist-on (freeways, earth)). (In this representation, “live-on” and “exist-on” exem­ plify first-order relations.) Gentner proposed that all higher-order relations, including “cause,” have a preferential impact on analogical mapping and transfer. Experimental evidence suggests that causal relations indeed guide analogical transfer. In a task based on imaginary animals, Lassaline (1996) demonstrated that when a causal re­ lation in the source is unmapped, and the causal property is shared by the source and tar­ get, people are likely to infer the corresponding effect in the target. For example, Animal A has properties X, W, and Z. For Animal A, X causes Z. Animal B has X, W, and Y. Therefore, Animal B also has Z. Here property X is the causal property shared by Animal A and Animal B, leading to the inference that effect Z found in Animal A will also be present in the target, Animal B. But importantly, Lassaline (1996) also demonstrated that people make stronger infer­ ences on the basis of the higher-order relation “cause” than on the basis of a non-causal relation, “temporally prior to”, which appears to have the same formal structure as “cause” (i.e., a higher-order relation that takes propositions as its arguments). Thus the same syntactic “order” of relations does not always yield the same degree of inductive strength about the target property to be inferred. Other models of analogical transfer have attributed priority specifically to causal rela­ tions, rather than to higher-order relations in general (Holyoak & Thagard, 1989; Hum­ mel & Holyoak, 1997; Winston, 1980). Holyoak (1985) emphasized that causal knowledge is the basis for pragmatic constraints on analogical inference: constraints on reason for the solution plan; the resources enable it; the constraints prevent alternative plans; and the outcome is the result of executing the solution plan. In situations involving reasoning about the world, the typical goal of analogical inference is to predict the presence or ab­ sence of some outcome in the target, a determination that hinges on understanding of causal relations. Moreover, the importance of specific causal relationships is modulated by the reasoner’s goals in making inferences. Using complex stories, Spellman and Holyoak (1996) showed that when the source-target mapping was ambiguous by structur­ al criteria, which causal relations had the greatest impact on inferences was determined by the reasoner’s goal. Thus, a complete model of analogical inference requires a more nuanced account of how causal knowledge is represented and used.

Page 9 of 25

Inferring Causal Relations by Analogy

A General Algorithm for Analogical Inference To understand how causal knowledge might guide analogical inference, it is necessary to consider how analogical inferences can be made in general. The basic algorithm used by all major computational models has been termed “copy with substitution and generation,” or CWSG (Holyoak, Novick, & Melz, 1994), and involves constructing target analogs based on unmapped source propositions by substituting the corresponding target ele­ ment (if known) for each source element, and if no corresponding target element is known, postulating one as needed. CWSG allows the generation of structured proposi­ tions about the target (as opposed to simple associations) because of its reliance on (p. 465) variable binding and mapping. In this respect, inference by CWSG is similar to rule-based inferences of the sort modeled by production systems (e.g., Anderson & Lebiere, 1998). However, the constraints on analogical mapping are more fluid than are the typical constraints on matching in a production system. CWSG is more flexible in that unlike production rules, there is no strict division between a “left-hand side” to be matched and a “right-hand side” that creates an inference. Rather, any subset of the two analogs may provide an initial mapping, and the unmapped remainder of the source may be used to create target inferences. The CWSG algorithm, and analogical inference in general, can fail in a variety of ways. If elements are mismapped, corresponding inference errors will result (Holyoak et al., 1994; Reed, 1987). Most important, the great fluidity of CWSG has its downside. Without additional constraints on when CWSG is invoked, any unmapped source proposition could be used to generate an inference about the target (simply by postulating a corresponding target element for each element of the source). In addition, CWSG is not sensitive to sys­ tematic asymmetries involving causal inferences. For example, suppose we hear that a certain rare disease causes a patient to have headaches. Intuitively, we are more likely to infer that another patient with the disease would also have a headache, than to infer that someone else with a headache suffers from the disease. The former inference depends on the probability of the effect (headache) given the disease (cause), which is high; whereas the latter depends on the probability of the disease given the headache, which is low (be­ cause headache has many alternative causes). But whereas causal inferences are sensi­ tive to such probabilities, CWSG is not. Rather, this algorithm simply creates target propositions by generating elements to correspond to “missing” source elements, regard­ less of whether a missing element is cause or effect, or whether the conditional probabili­ ties associated with the “cause” relation are asymmetric.

Integrating Analogical Reasoning with Causal Models In order to develop a theoretical integration of analogical reasoning with causal infer­ ence, we (Holyoak et al., 2010; Lee & Holyoak, 2008) adapted the general framework of causal models (see in this volume, Griffiths, Chapter 7; Oaksford & Chater, Chapter 19; Page 10 of 25

Inferring Causal Relations by Analogy Rehder, Chapters 20 and 21; Rottman, Chapter 6). Inspired by the work of Pearl (1988) in artificial intelligence, Waldmann and Holyoak (1992) introduced graphical causal models as a psychological account of human causal learning (for a recent review, see Holyoak & Cheng, 2011). A causal model is a representation of cause–effect relations, expressing such information as the direction of the causal arrow, the polarity of causal links (genera­ tive causes make things happen, preventive causes stop things from happening), the strength of individual causal links, and the manner in which the influences of multiple causes combine to determine their joint influence on an effect. Empirical evidence indicates that causal models guide both initial formation of categories (Kemp, Goodman, & Tenenbaum, 2010; Lien & Cheng, 2000; Waldmann, Holyoak, & Fra­ tianne, 1995) and inferences based on learned categories (Ahn, 1999; Heit, 1998; Rehder, 2006, 2007, 2009; Rehder & Burnett, 2005; Rehder & Kim, 2006). For example, Rehder (2006) taught participants the causal relations that influenced the distribution of features associated with novel categories, and showed that inferences about properties of catego­ ry members are then guided by these causal relations, which can override the influence of overall similarity. Rehder (2009) developed a formal model of category-based infer­ ences within the framework of causal-based generalization (CBG). His CBG model pre­ dicts various inductive inferences that depend on integrating knowledge about the distri­ butions of category properties and causal relations that influence (or are influenced by) these properties. Work by Lee and Holyoak (2008) demonstrated the close connection between analogical inference and the operation of causal models, and Holyoak et al. (2010) formalized this in­ tegration by extending a Bayesian model of causal learning (Lu, Yuille, Liljeholm, Cheng, & Holyoak, 2008) to deal with analogical inference. Lu et al.’s model is itself a Bayesian instantiation of the power PC theory (Cheng, 1997). In this theory (see Cheng & Lu, Chapter 5 in this volume), the strength of a causal link represents the power of the cause operating in the external world to produce (or prevent) the corresponding effect. Genera­ tive causal power corresponds to the probability that the target cause, if it were to act alone (i.e., in the absence of other causal factors), would produce the effect. Analogously, preventive power corresponds to the probability that the target cause acting alone would stop the effect from occurring. The power PC theory partitions all causes of an effect E in­ to candidate cause C and the rest of the causes, represented by B, an (p. 466) amalgam of observed and unobserved background causes that occur with unknown frequencies that may vary from one situation to another. C, B, and E are typically binary variables with a “present” and an “absent” value. The focus is typically on w1, a random variable repre­ senting the strength of candidate cause C to influence effect E. The power PC theory pos­ tulates that people approach causal learning with four general prior beliefs: 1. B and C influence effect E independently, 2. B could produce E but not prevent it, 3. causal powers are independent of the frequency of occurrences of the causes (e.g., the causal power of C is independent of the frequency of occurrence of C), and 4. E does not occur unless it is caused. Page 11 of 25

Inferring Causal Relations by Analogy From these basic assumptions, Cheng (1997) derived a normative theory of causal learn­ ing that predicts (for binary variables) that multiple generative causes operate according to a noisy-OR function, and that preventive causes operate according to a noisy-AND-NOT function. Lu et al. (2008) gave the power PC theory a Bayesian formulation, which cap­ tures uncertainty in causal inferences by treating causal strength (e.g., w1) as a probabili­ ty distribution rather than a fixed point. In addition, Lu et al. proposed that people adopt generic priors on causal strength that favor causes that are individually strong and few in number (see also Powell, Merrick, Lu, & Holyoak, 2016). Holyoak et al. (2010) proposed a model that incorporated this specific account of causal learning into a model of analogical inference based on transfer of causal models. Rather than considering analogical inference in isolation, they proposed that it is useful to view the entire transfer process as the joint product of causal learning and relational mapping: the reasoner learns the causal structure of the source, maps the source to target, applies CWSG to augment the causal model of the target, and then uses the resulting model to evaluate an open-ended range of potential inferences about the target. The basic idea is that for empirical analogies, the causal model of the source analog (including both causal links and any associated information about the strength of each link), coupled with the mapping of source to target, provide the input to CWSG. This procedure constrains CWSG in a way that favors accurate and useful inferences, generating as its output an elaborated causal model of the target. This elaborated causal model can then be used to evaluate the probability of specific inferences about the target. By treating analogical and causal inference within a unifying theoretical framework, it proved possible to explain situations in which the strengths of inferences about the target are dissociable from the overall similarity of the source and target. A series of experi­ ments reported by Lee and Holyoak (2008) demonstrated how causal knowledge guides analogical inference, and that analogical inference is not solely determined by quality of the overall mapping between source and target. Using a common-effect structure (Wald­ mann & Holyoak, 1992), Lee and Holyoak manipulated structural correspondences be­ tween the source and the target as well as the causal polarity (generative or preventive) of multiple causes present in the source. In Figure 24.2, panels (a), (b), and (c) show ex­ amples of causal structures used in their experiments. In the source (a), three causes (two generative, G1 and G2, and one preventive, P) are simultaneously present, and when the influences of these three causes are combined, the effect occurs. The target analog (b) shares all three causal (p. 467) factors with the source, whereas target (c) shares only the two generative factors with the source, not the preventive one. Accordingly, target (b) has greater semantic and structural overlap with the source than does target (c). All pre­ vious computational models of analogical mapping and inference, which predict that the plausibility of target inferences will increase monotonically with some measure of the quality of the overall mapping between the source and target analogs, therefore predict that target (b) is more likely to have the effect E than is target (c).

Page 12 of 25

Inferring Causal Relations by Analogy

Figure 24.2 The use of causal models in analogical inference. G, P, and E represent generative causes, a preventive cause, and an effect, respectively; + and – indicate generative and preventive causes, respec­ tively. Dotted elements are initially missing from the target and must be inferred based on the source. From Holyoak et al. (2010); reprinted by permission.

If analogical inference is guided by causal models, however, the prediction reverses, be­ cause dropping a preventive cause, as in target (c) relative to target (b), yields a causal model of the target in which the probability that the effect occurs will increase. Lee and Holyoak (2008) found that people in fact rated target (c) as more likely to exhibit the ef­ fect than target (b), even though participants rated (c) as less similar than (b) to the source analog (a). In other words, if the source exhibited an effect despite the presence of a preventive cause, then people judged the effect to be more likely in the target if it lacked the preventer (even though absence of the preventer reduced overall similarity of the source and target). Figure 24.3 schematizes how analogical and causal inference are integrated in the theory proposed by Holyoak et al. (2010). The two networks represent causal models for a source (left) and target analog (right). The nodes represent variable causes (C) and ef­ fects (E). The superscripts (S, T) indicate the source and the target, respectively. The links represent the causal structure (only linked nodes have direct causal connections). The vectors wi represent the causal polarity (generative or preventive) and the causal strength for links. A key assumption is that causal knowledge of the source is used to develop a causal mod­ el of the target, which can in turn be used to derive a variety of inferences about the val­ ues of variables in the target. Unlike other formalisms that have been adopted by analogy models (e.g., predicate calculus), causal relations in Bayesian causal models can carry in­ Page 13 of 25

Inferring Causal Relations by Analogy formation about both the existence of causal links (e.g., causal structure) and distribu­ tions of causal strength (Griffiths & Tenenbaum, 2005; Lu et al., 2008), as well as about the generating function by which multiple causes combine to influence effects (see Cheng & Lu, Chapter 5 in this volume; Griffiths, Chapter 7 in this volume). In the integrated the­ ory, the first step in analogical inference is to learn a causal model of the source. The source model is then mapped to the initial (typically impoverished) representation of the target. Based on the mapping, the causal structure and strengths associated with the source are transferred to the target, creating or extending the causal model of the latter. The model of the target can then be “run,” (p. 468) using causal reasoning to derive infer­ ences about the values of endogenous variables in the target. Accordingly, as summarized in Figure 24.3, the four basic components in causal inference based on analogy are learn­ ing of a causal model for a source (step 1); assessment of the analogical mapping be­ tween the source and a target (step 2); transfer of causal knowledge from the source to the target based upon the analogical mapping to construct the causal model of the target (step 3); and inference based on the causal model of the target (step 4).

Figure 24.3 Framework for analogical transfer based on causal models. G, P, and E represent gener­ ative causes, a preventive cause, and an effect, re­ spectively; w1S, w2S, w1T, and w2T each represent a distribution over causal strength for causal links in the source (S) and in the target (T), respectively. Dot­ ted lines indicate knowledge transferred from source to target (see text). From Holyoak et al. (2010); reprinted by permission.

To further test the proposed model, Holyoak et al. (2010, Experiment 1) had participants read descriptions of pairs of fictional animals (e.g., “trovids”). This fictional animal was described as having an abnormal characteristic (dry flaky skin) and three different gene mutations (mutations A, B, and C). The mutations were described as tending either to produce or prevent the abnormal characteristic. It was stated that each of these gene mu­ tations occurred randomly for unknown reasons, so any individual might have 0, 1, 2, or 3 distinct mutations. A source analog was simply referred to as “trovid #1” and a target analog was referred to as “trovid #2.” The source analog always had three causal proper­ ties (i.e., three mutations) that were causally connected to the effect property (i.e., the abnormal characteristic). In the “positive” condition, the source exhibited the effect prop­ Page 14 of 25

Inferring Causal Relations by Analogy erty; in the “negative” condition, it did not. An example of the positive condition is the fol­ lowing: For trovid #1, it happens that all three mutations have occurred. For trovid #1, Mutation A tends to PRODUCE dry flaky skin; Mutation B tends to PRODUCE dry flaky skin; Mutation C tends to PREVENT dry flaky skin. Trovid #1 has dry flaky skin. For the negative condition, in the last statement in the preceding, “has” was simply re­ placed with “does NOT have.” After reading the description of the source analog, participants made three different judg­ ments, one for each argument type. The presence of the effect property was unknown, and the presence or absence of each of the three mutations was listed. Each target ana­ log had two or three mutations depending on the argument type (No-drop: G1G2P; P-drop: G1G2; and G-drop: G2P). For example, in the No-drop (G1G2P) condition, all three muta­ tions were present in the target. When a causal factor was dropped, that mutation was explicitly described as absent. In making judgments, participants were asked to suppose there were 100 animals “just like” the target animal described, and to estimate how many of these 100 would have the effect property, choosing a number between 0 and 100. Figure 24.4 (a) shows the human ratings, and also the predictions derived from the Bayesian causal model. The effect was rated as more likely to occur in the target when it had been exhibited in the source (Figure 24.4 a) than when it had been absent (Figure 24.4 b), indicating that people updated their estimates of causal strength based on the single example. In addition (replicating Lee & Holyoak, 2008), the effect was rated as more likely to occur when the preventive cause in the source was dropped in the target than in the No-drop condition, even though source-target similarity was reduced when any factor present in the source was dropped in the target. The experiments described so far have focused on predictive causal inferences, in which information about the presence of various causes is used to infer the state of the effect. However, people are also able to use causal models to reason in the opposite direction, from information about an effect to inferences about possible causes. Such “backward” causal inferences are typically referred to as causal “attribution” or “diagnosis” (see Waldmann, Cheng, Hagmayer, & Blaisdell, 2008; Meder & Mayrhofer, Chapter 23 in this volume). Causal attribution requires considering possible combinations of alternative causes that may have been present. For example, if you find the grass to be wet in the morning, you might infer that it rained overnight. But if you discover that a sprinkler had been on, you might attribute the wet grass to the sprinkler and discount the probability that it was due to rain. In a second experiment, Holyoak et al. (2010) used a similar design (positive source only), except that instead of asking participants a predictive question (how likely is the effect to occur?), they asked a causal attribution question (how likely is it that the unknown cause Page 15 of 25

Inferring Causal Relations by Analogy has occurred and produced the effect?). In the causal attribution experiment, the source analog always exhibited the effect (as in the source-positive condition of Experiment 1). In the target analog, the presence of one of the mutations was described as unknown, and the presence or absence of each of the other mutations was explicitly stated (see Figure 24.2 (d) and (e) for an example of causal structure used in this experiment). Participants were to suppose there were 100 animals just like the target animal, and to estimate in how many an unknown mutation (G1) had occurred and had produced the (p. 469) effect property, assigning a number between 0 and 100. In sharp contrast to the pattern ob­ served for the corresponding predictive inference (Figure 24.4 a), dropping a preventive cause (G2E condition) decreased the rated probability of an inference about a potential generative cause (Figure 24.4 c) relative to the No-drop condition (G2PE) that kept the preventive cause. For the case of causal attribution, the Bayesian model predicts that be­ cause the target lacks the preventive cause and is known to have a generative cause (G2), an additional generative cause (G1) is not as likely (a phenomenon termed “causal dis­ counting”; Kelley, 1973; see also Novick & Cheng, 2004). Thus, predictive and attribution questions each yielded a distinct pattern of dissociation between analogy-based infer­ ences and overall quality of the source-target mapping.

Figure 24.4 Mean predictive inference ratings (Ex­ periment 1) when source outcome was positive (a) or negative (b); and (c) mean causal attribution ratings (Experiment 2), for each argument type. G, P, and E represent generative causes, a preventive cause, and an effect, respectively. Error bars represent 1 stan­ dard error of the mean. Predictions derived from the Bayesian model are shown in the right panel of each graph. From Holyoak et al. (2010); reprinted by permission.

Page 16 of 25

Inferring Causal Relations by Analogy

Contributions of the Bayesian Model of Causal Inference by Analogy The Bayesian theory of causal inference by analogy (Holyoak et al., 2010; Lee & Holyoak, 2008) shows in some detail how causal knowledge, represented as causal models, can be integrated with analogical reasoning to yield inductive inferences. The theory has much in common with previous models specifically focused on the role of causal models in mak­ ing category-based inferences, notably the CBG model of Rehder (2009). However, the in­ tegrated theory is more general than previous models of this sort in that it can be applied to situations involving high uncertainty about causal powers (including cases in which on­ ly one or even zero specific cases are available to guide inference). (p. 470) Moreover, Lee and Holyoak (2008, Experiment 3) found a similar pattern of results for predictive causal inferences based on cross-domain analogies (transfer between artificial situations set in the domains of astronomy and chemistry). Such cross-domain analogical transfer cannot be accounted for by inferences based on pre-existing categories (i.e., category-based in­ duction). The integrated Bayesian theory may help to clarify what new information is actually pro­ vided by a source analog. Although models of analogy have typically represented causal knowledge about a source analog by simply stating “cause” relations as part of a predi­ cate-calculus-style description, this is clearly an oversimplification. Causal relations will seldom if ever be solely based on a single example; rather, they reflect more general prior information derived from multiple examples and/or more direct instruction. In this sense, categorical knowledge and analogical inference are closely interconnected, as suggested by Figure 24.1. A source analog can, however, provide additional information beyond prior knowledge about individual causal links. In our experiments reviewed in the present chapter, the source provided information about the joint impact of a set of causal factors that did or did not yield the effect, thus allowing revision of strength distributions to reflect the rela­ tive strengths of generative and preventive causes. More generally, the source may also provide information about a sequence of causal events that leads (or fails to lead) to goal attainment; or about side effects generated in the course of attaining a goal. As suggest­ ed by Gavetti et al. (2005), this type of detailed information about a specific pattern of causal events will often go well beyond prior knowledge about individual causal relations, enabling the source to provide critical guidance in making inferences about a complex and poorly understood target analog. The power of analogy lies in its capacity to generate a rough sketch of the pattern of causal links potentially governing a novel target domain. It is this power that leads to its practical usefulness in theory development, science in­ struction, and many other inference tasks.

References Ahn, W. (1999). Effect of causal structure on category construction. Memory & Cognition, 27, 1008–1023. Page 17 of 25

Inferring Causal Relations by Analogy Ahn, W., Brewer, W. F., & Mooney, R. J. (1992). Schema acquisition from a single example. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 391–412. Anderson, J. R., & Lebiere, C. (1998). The atomic components of thought. Mahwah, NJ: Lawrence Erlbaum Associates. Asoko, H. (1996). Developing scientific concepts in the primary classroom: Teaching about electric circuits. In G. Welford, J. Osborne, & P. Scott (Eds.), Research in science ed­ ucation in Europe: Current issues and themes. London: Falmer Press. Bartha, P. (2010). By parallel reasoning: The construction and evaluation of analogical ar­ guments. New York: Oxford University Press. Bean, T. W., Searles, D., Singer, H., & Cowan, S. (1990). Learning concepts form biology text through pictorial analogies and an analogical study guide. Journal of Educational Re­ search, 83, 233–237. Bowdle, B. F., & Gentner, D. (1997). Informativity and asymmetry in comparisons. Cogni­ tive Psychology, 34, 244–286. Brown, A. L. (1989). Analogical learning and transfer: What develops? In S. Vosniadou & A. Ortony (Eds.), Similarity and analogical reasoning (pp. 369–412). New York: Cambridge University Press. Brown, D., & Clement, J. (1989). Overcoming misconceptions via analogical reasoning: Abstract transfer versus explanatory model construction. Instructional Science, 18, 237– 261. Brown, A., & Kane, M. J. (1990). Preschool children can learn to transfer: Learning to learn and learning from example. Cognitive Psychology, 20, 493–523. Catrambone, R., & Holyoak, K. J. (1989). Overcoming contextual limitations on problemsolving transfer. Journal of Experimental Psychology: Learning, Memory, and Cognition, 15, 1147–1156. Cheng, P. W. (1997). From covariation to causation: A causal power theory. Psychological Review, 104, 367–405. Clement, J. (1991). Nonformal reasoning in experts and in science students: The use of analogies, extreme cases, and physical intuition. In J. Voss, D. N. Perkins, & J. Segal (Eds.), Informal reasoning and education (pp. 345–362). Hillsdale, NJ: Lawrence Erlbaum Associates. Clement, J. (1993). Using bridging analogies and anchoring intuitions to deal with stu­ dents’ preconceptions in physics. Journal of Research in Science Teaching, 30, 1241– 1257.

Page 18 of 25

Inferring Causal Relations by Analogy Clement, J., Brown, D. E., & Zietsman, A. (1989). Not all preconceptions are misconcep­ tions: Finding ‘anchoring conceptions’ for grounding instruction on students’ intuitions. International Journal of Science Education, 11(5), 554–565. Clement, C. A., & Yanowitz, K. L. (2003). Using an analogy to model causal mechanisms in a complex text. Instructional Science, 31(3), 195–225. Curtis, R. V. (1988). When is a science analogy like a social studies analogy: A comparison of text analogies across two disciplines. Instructional Science, 17, 169–177. Curtis, R., & Reigeluth, C. (1984). The use of analogies in written text. Instructional Science, 13, 99–117. Dagher, Z. R. (1995). Review of studies on the effectiveness of instructional analogies in science education. Science Education, 79, 295–312. Didierjean, A., Cauzinille-Marmeche, E., & Savina, Y. (1999). Learning from examples: Case based reasoning in chess for novices. Current Psychology of Cognition, 18, 337–361. Donnelly, C. M., & McDaniel, M. A. (1993). Use of analogy in learning scientific concepts. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19, 975–987. Gavetti, G., Giovanni, D. A., & Rivkin, J. W. (2005). Strategy-making in novel and complex worlds: The power of analogy. Strategic Management Journal, 26, 691–712. Gentner, D. (1983). Structure-mapping: A theoretical framework for analogy. Cognitive Science, 7, 155–170. Gentner, D., & Gentner, D. R. (1983). Flowing waters or teeming crowds: Mental models of electricity. In D. Gentner & A. L. Stevens (Eds.), Mental models (pp. 99–129). Hillsdale, NJ: Lawrence Erlbaum Associates. (p. 471)

Gentner, D., Loewenstein, J., & Thompson, L. (2003). Learning and transfer: A general role for analogical encoding. Journal of Educational Psychology, 95, 393–408. Gentner, D., Rattermann, M., & Forbus, K. (1993). The roles of similarity in transfer: Separating retrievability from inferential soundness. Cognitive Psychology, 25, 524–575. Gick, M. L., & Holyoak, K. J. (1980). Analogical problem solving. Cognitive Psychology, 12, 306–355. Gick, M. L., & Holyoak, K. J. (1983). Schema induction and analogical transfer. Cognitive Psychology, 15, 1–38. Gilovich, T. (1981). Seeing the past in the present: The effect of associations to familiar events on judgments and decisions. Journal of Personality and Social Psychology, 40, 797– 808.

Page 19 of 25

Inferring Causal Relations by Analogy Gilbert, S. W. (1989). An evaluation of the use of analogy, simile, and metaphor in science texts. Journal of Research in Science Teaching, 26, 315–327. Glynn, S. M. (1991). Explaining science concepts: A teaching-with-analogies model. In S. Glynn, R. Yeany, & B. Britton (Eds.), The psychology of learning science (pp. 219–240). Hillsdale, NJ: Lawrence Erlbaum Associates. Glynn, S. M. (1997). Learning from science text: Role of an elaborate analogy. Reading Research Report, 71, 1–23. Glynn, S. M., Britton, B. K., Semrud-Clikerman, M., & Muth, K. D. (1989). Analogical rea­ soning and problem solving in science textbooks. In J. A. Glover, R. R. Ronning, & C. R. Reynolds (Eds.), Handbook of creativity (pp. 383–398). New York: Plenum Press. Glynn, S. M., & Takahashi, T. (1998). Learning from analogy-enhanced science text. Jour­ nal of Research in Science Teaching, 35, 1129–1149. Griffiths, T. L., & Tenenbaum, J. B. (2005). Structure and strength in causal induction. Cognitive Psychology, 51, 334–384. Guy, S., Kashima, Y., Walker, I., & O’Neill, S. (2013). Comparing the atmosphere to a bath­ tub: Effectiveness of analogy for reasoning about accumulation. Climatic Change, 121(4), 579–594. Halpern, D. F., Hansen, C., & Riefer, D. (1990). Analogies as an aid to understanding and memory. Journal of Educational Psychology, 82, 298–305. Harrison, A. G., & Coll, R. K. (Eds.). (2008). Using analogies in middle and secondary sci­ ence classrooms: The FAR guide—An interesting way to teach with analogies. Thousand Oaks, CA: Corwin Press. Harrison, A. G., & Treagust, D. F. (2006). Teaching and learning with analogies: Friend or Foe. In P. J. H. Abusson, A. G. Harrison, & S. M. Ritchie (Eds.), Metaphor and analogy in science education (pp. 11–24). Dordrecht, Netherlands: Springer. Heit, E. (1998). A Bayesian analysis of some forms of inductive reasoning. In M. Oaksford & N. Chater (Eds.), Rational models of cognition (pp. 248–274). Oxford: Oxford University Press. Hesse, M. B. (1966). Models and analogies in science. Notre Dame, IN: University of Notre Dame Press. Holland, J. H., Holyoak, K. J., Nisbett, R. E., & Thagard, P. (1986). Induction: Processes of inference, learning, and discovery. Cambridge, MA: MIT Press. Holyoak, K. J. (1985). The pragmatics of analogical transfer. In G. H. Bower (Ed.), The psy­ chology of learning and motivation (Vol. 19, pp. 59–87). New York: Academic Press.

Page 20 of 25

Inferring Causal Relations by Analogy Holyoak, K. J. (2012). Analogy and relational reasoning. In K. J. Holyoak & R. G. Morrison (Eds.), The Oxford handbook of thinking and reasoning (pp. 234–259). New York: Oxford University Press. Holyoak, K. J., & Cheng, P. W. (2011). Causal learning and inference as a rational process: The new synthesis. Annual Review of Psychology, 62, 135–163. Holyoak, K. J., Junn, E. N., & Billman, D. O. (1984). Development of analogical problemsolving skill. Child Development, 55, 2042–2055. Holyoak, K. J., & Koh, K. (1987). Surface and structural similarity in analogical transfer. Memory & Cognition, 15, 323–340. Holyoak, K. J., Lee, H. S., & Lu, H. (2010). Analogical and category-based inference: A theoretical integration with Bayesian causal models. Journal of Experimental Psychology: General, 139, 702–727. Holyoak, K. J., Novick, L. R., & Melz, E. R. (1994). Component processes in analogical transfer: Mapping, pattern completion, and adaptation. In K. J. Holyoak & J. A. Barnden (Eds.), Advances in connectionist and neural computation theory, Vol. 2: Analogical con­ nections (pp. 113–180). Norwood, NJ: Ablex. Holyoak, K. J., & Thagard, P. (1989). Analogical mapping by constraint satisfaction. Cogni­ tive Science, 13, 295–355. Holyoak, K. J., & Thagard, P. (1995). Mental leaps: Analogy in creative thought. Cam­ bridge, MA: MIT Press. Hummel, J. E., & Holyoak, K. J. (1997). Distributed representations of structure: A theory of analogical access and mapping. Psychological Review, 104, 427–466. Hummel, J. E., & Holyoak, K. J. (2003). A symbolic-connectionist theory of relational infer­ ence and generalization. Psychological Review, 110, 220–264. Iding, M. K. (1993). Instructional analogies and elaborations in science text: Effects on re­ call and transfer performance. Reading Psychology, 14, 33–55. Jee, B. D., Uttal, D. H., Gentner, D., Manduca, C., Shipley, T. F., Tikoff, B., … Sageman, B. (2010). Commentary: Analogical thinking in geoscience education. Journal of Geoscience Education, 58(1), 2–13. Keane, M. T. (1987). On retrieving analogues when solving problems. Quarterly Journal of Experimental Psychology, 39A, 29–41. Kelley, H. H. (1973). The process of causal attribution. American Psychologist, 28, 107– 128. Kemp, C., Goodman, N., & Tenenbaum, J. (2010). Learning to learn causal models. Cogni­ tive Science, 34, 1185–1243. Page 21 of 25

Inferring Causal Relations by Analogy Kemp, C., & Jern, A. (2014). A taxonomy of inductive problems. Psychonomic Bulletin & Review, 21(1), 23–46. Kolodner, J. L. (1997). Educational implications of analogy. American Psychologist, 52, 57– 66. Krugman, P. (2011). The President surrenders. New York Times, Opinion Pages, July 31. http://www.nytimes.com/2011/08/01/opinion/the-president-surrenders-on-debtceiling.html?_r=2&hp& Kubricht, J., Lu, H., & Holyoak, K. J. (2015). Animation facilitates source understanding and spontaneous analogical transfer. In R. Dale, C. Jennings, P. Maglio, D. Noelle, A. War­ faumont, & J. Yoshimi (Eds.), Proceedings of the 37th annual conference of the Cognitive Science Society. Austin, TX: Cognitive Science Society. Lassaline, M. E. (1996). Structural alignment in induction and similarity. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22, 754–770. (p. 472)

Lawson, A. E. (1993). The importance of analogy. Journal of Research in Science Teaching, 30, 1213–1214. Lee, H. S., & Holyoak, K. J. (2008). The role of causal models in analogical inference. Jour­ nal of Experimental Psychology: Learning, Memory, and Cognition, 34, 1111–1122. Lien, Y., & Cheng, P. W. (2000). Distinguishing genuine from spurious causes: A coherence hypothesis. Cognitive Psychology, 40, 87–137. Lu, H., Yuille, A. L., Liljeholm, M., Cheng, P. W., & Holyoak, K. J. (2008). Bayesian generic priors for causal learning. Psychological Review, 115, 955–982. McDaniel, M. A., & Donnelly, C. M. (1996). Learning with analogy and elaborative interro­ gation. Journal of Educational Psychology, 88, 508–519. Murphy, G. L., & Medin, D. L. (1985). The role of theories on conceptual coherence. Psy­ chological Review, 92, 289–316. Newton, L. D. (2003). The occurrence of analogies in elementary school science books. In­ structional Science, 31(6), 353–375. Newton, D. P., & Newton, L. D. (2000). Do teachers support causal understanding through their discourse when teaching primary science? British Educational Research Journal, 26(5), 599–613. Novick, L. R. (1988). Analogical transfer, problem similarity, and expertise. Journal of Ex­ perimental Psychology: Learning, Memory, and Cognition, 14, 510–520. Novick, L. R., & Cheng, P. W. (2004). Assessing interactive causal influence. Psychological Review, 111, 455–485. Page 22 of 25

Inferring Causal Relations by Analogy Orgill, M., & Bodner, G. M. (2006). An analysis of the effectiveness of analogy use in col­ lege‐level biochemistry textbooks. Journal of Research in Science Teaching, 43(10), 1040– 1060. Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible infer­ ence. San Mateo, CA: Morgan Kaufmann. Penn, D. C., Holyoak, K. J., & Povinelli, D. J. (2008). Darwin’s mistake: Explaining the dis­ continuity between human and nonhuman minds. Behavioral and Brain Sciences, 31, 109– 178. Powell, D., Merrick, M. A., Lu, H., & Holyoak, K. J. (2016). Causal competition based on generic priors. Cognitive Psychology, 86, 62–86. Reed, S. K. (1987). A structure-mapping model for word problems. Journal of Experimen­ tal Psychology: Learning, Memory, and Cognition, 13, 124–139. Rehder, B. (2006). When similarity and causality compete in category-based property gen­ eralization. Memory & Cognition, 34, 3–16. Rehder, B. (2007). Property generalization as causal reasoning. In A. Feeney & E. Heith (Eds.), Inductive reasoning: Experimental, developmental, and computational approaches (pp. 81–113). Cambridge, UK: Cambridge University Press. Rehder, B. (2009). Causal-based property generalization. Cognitive Science, 33, 301–343. Rehder, B., & Burnett, R. C. (2005). Feature inference and the causal structure of cate­ gories. Cognitive Psychology, 50, 264–314. Rehder, B., & Kim, S. (2006). How causal knowledge affects classification: A generative theory of categorization. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32, 659–683. Richland, L. E., Holyoak, K. J., & Stigler, J. W. (2004). The role of analogy in teaching mid­ dle-school mathematics. Cognition and Instruction, 22, 37–60. Richland, L. E., Zur, O., & Holyoak, K. J. (2007). Cognitive supports for analogy in the mathematics classroom. Science, 316, 1128–1129. Ross, B. H. (1987). This is like that: The use of earlier problems and the separation of sim­ ilarity effects. Journal of Experimental Psychology: Learning, Memory, and Cognition, 13, 629–639. Ross, B. H. (1989). Distinguishing types of superficial similarities: Different effects on the access and use of earlier problems. Journal of Experimental Psychology: Learning, Memo­ ry, and Cognition, 15, 456–468. Spellman, B. A., & Holyoak, K. J. (1996). Pragmatics in analogical mapping. Cognitive Psy­ chology, 31, 307–346. Page 23 of 25

Inferring Causal Relations by Analogy Stocklmayer, S. M., & Treagust, D. F. (1994). A historical analysis of electric currents in textbooks: A century of influence on physics education. Science & Education, 3(2), 131– 154. Sweeney, L. B., & Sterman, J. D. (2000). Bathtub dynamics: Initial results of a systems thinking inventory. System Dynamics Review, 16(4), 249–286. Thiele, R. B., & Treagust, D. F. (1994). The nature and extent of analogies in secondary chemistry textbooks. Instructional Science, 22, 61–74. Thiele, R. B., & Treagust, D. F. (1995). Analogies in chemistry textbooks. International Journal of Science Education, 17, 783–795. Thiele, R. B., Venville, G. J., & Treagust, D. F. (1995). A comparative analysis of analogies in secondary biology and chemistry textbooks used in Australian schools. Research in Science Education, 25, 221–230. Treagust, D. F., Duit, R., Joslin, P., & Lindauer, I. (1992). Science teachers’ use of analo­ gies: Observations from classroom practice. International Journal of Science Education, 14(4), 413–422. Treagust, D. F., Harrison, A. G., & Venville, G. J. (1998). Teaching science effectively with analogies: An approach for preservice and inservice teacher education. Journal of Science Teacher Education, 9(2), 85–101. Venville, G. J., & Treagust, D. F. (1997). Analogies in biology education: A contentious is­ sue. The American Biology Teacher, 59, 282–287. Vosniadou, S., & Schommer, M. (1988). Explanatory analogies can help children acquire information from expository text. Journal of Educational Psychology, 80, 524–536. Waldmann, M. R., Cheng, P. W., Hagmayer, Y., & Blaisdell, A. P. (2008). Causal learning in rats and humans: A minimal rational model. In N. Chater & M. Oaksford (Eds.), Rational models of cognition (pp. 453–484). London: Oxford University Press. Waldmann, M. R., & Holyoak, K. J. (1992). Predictive and diagnostic learning within causal models: Asymmetries in cue competition. Journal of Experimental Psychology: Gen­ eral, 121, 222–236. Waldmann, M. R., Holyoak, K. J., & Fratianne, A. (1995). Causal models and the acquisi­ tion of category structure. Journal of Experimental Psychology: General, 124, 181–206. Wharton, C. M., Holyoak, K. J., Downing, P. E., Lange, T. E., Wickens, T. D., & Melz, E. R. (1994). Below the surface: Analogical similarity and retrieval competition in reminding. Cognitive Psychology, 26, 64–101. Wharton, C. M., Holyoak, K. J., & Lange, T. E. (1996). Remote analogical remind­ ing. Memory & Cognition, 24, 629–643. (p. 473)

Page 24 of 25

Inferring Causal Relations by Analogy Wiegmann, A., & Waldmann, M. R. (2014). Transfer effects between moral dilemmas: A causal model theory. Cognition, 131, 28–43. Winston, P. H. (1980). Learning and reasoning by analogy. Communications of the ACM, 23, 689–703. Yanowitz, K. L. (2001). Using analogies to improve elementary school students’ inferential reasoning about scientific concepts. School Science and Mathematics, 101(3), 133–142. Zook, K. B., & Maier, J. M. (1994). Systematic analysis of variables that contribute to the formation of analogical misconceptions. Journal of Educational Psychology, 86, 589–600. (p. 474)

Keith J. Holyoak

Department of Psychology University of California, Los Angeles Los Angeles, Califor­ nia, USA Hee Seung Lee

Department of Education Yonsei University Seoul, South Korea

Page 25 of 25

Causal Argument

Causal Argument   Ulrike Hahn, Roland Bluhm, and Frank Zenker The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.26

Abstract and Keywords This chapter outlines the range of argument forms involving causation that can be found in everyday discourse. It also surveys empirical work concerned with the generation and evaluation of such arguments. This survey makes clear that there is presently no unified body of research concerned with causal argument. It highlights the benefits of a unified treatment both for those interested in causal cognition and those interested in argumen­ tation, and identifies the key challenges that must be met for a full understanding of causal argumentation. Keywords: argument, causal argument, cognition, causation, argumentation

Although causality is fundamental to human cognition in many different ways, a review of theoretical work in argumentation studies and cognitive science suggests that “causal ar­ gument” remains ill understood. This holds for both senses of the term “argument”: argu­ ment as a situated process of actors engaged in an actual dispute, on the one hand, and, on the other, the essential component of such a dialectical exchange, namely arguments as individual claims and reasons for those claims. In particular, surprisingly little psycho­ logical research has been devoted to how people construct, process, and evaluate causal arguments in this latter sense of “argument,” that is, arguments as abstract inferential objects comprising premises and conclusions. Nevertheless, there are a number of independent strands of research involving causation and argument. The goal of this chapter is to bring these currently separate bodies of work together in order to provide a coherent basis for future empirical research on causal argument that elucidates the dialectical and inferential role of cause–effect relationships in reasoned discourse. Such a program has immediate implications for what is a central aspect of everyday argumentation, and thus a central aspect of our everyday lives. Conse­ quently, an improved understanding of causal argument should benefit directly a range of areas such as reasoning, argumentation, learning, science comprehension, and communi­ cation, to name but a few.

Page 1 of 33

Causal Argument However, insight into causal argument seems beneficial also for those projects within cognitive science aimed at understanding what “cause” actually is, both normatively and in laypeople’s understanding. Causal argument, we will seek to show, thus not only con­ stitutes an important topic for research in its own right, but also can provide new impetus to the many, more familiar, aspects of human behavior and cognition concerned with causality, such as causal learning (see in this volume, Rottman, Chapter 6; Le Pelley, Grif­ fiths, & Beesley, Chapter 2) or causal inference (see Griffiths, Chapter 7 in this volume).

“Because”: Reasons as Causes Versus Causes as Reasons The comparative scarcity of research on causal argument relative to work on causal learning or causal inference may surprise, since few topics seem to be more intimately re­ lated than argumentation and causation. This is nowhere more apparent, perhaps, than in the term “because” itself. While it marks causal relations in the empirical world (p. 476) (e.g., “Socrates died because the jury had him drink poison”), the same term is used more generally to identify justificatory and explanatory reasons. Imagine, for example, an argu­ ment about whether or not Jones killed Smith. Questions about Jones might be answered by stating something like “we know it was Jones, not his wife, who killed Smith, because Miller saw him. …” In this case, the word “because” signifies a relation of evidential sup­ port.1 To the extent that such evidential uses of “because” themselves imply a causal rela­ tion, that causal relation pertains to human beliefs and the relationships between them (i.e., knowledge of Miller’s testimony should change our beliefs), as well as between be­ liefs and world (e.g., Miller’s testimony “I saw Jones kill Smith” is putatively caused by Miller witnessing that Jones killed Smith). In short, causes can provide reasons, and rea­ sons can provide causes. Both of these aspects are central to an understanding of “causal argument.” The main function of argumentation as we use the term here is to effect changes in beliefs (e.g., Goodwin, 2010; Hahn, 2011; Hahn & Oaksford, 2012), specifically changes in beliefs that may be considered to be rational (as contrasted with “mere persuasion”). Understanding causal argument therefore involves both asking to what extent causal arguments should change people’s beliefs and how successful they actually are in changing people’s beliefs. This chapter introduces relevant work on both of these questions. Understanding people’s responses to causal arguments, both from a normative and a descriptive perspective, re­ quires understanding what those arguments typically are. The natural starting point for investigation is thus the typical argument forms that involve aspects of causation found in everyday life, and we describe present findings on this in the next section. The material of that section—“Arguments for Causes and Causes for Arguments”—reflects the reasons/ causes duality just described, as people in everyday life both argue about causes and use causes as arguments for particular claims. This initial consideration of types of causal ar­ guments is then followed by examination of what literature there is on how people actual­

Page 2 of 33

Causal Argument ly respond to causal arguments. The chapter concludes with a research agenda for future work on causal argumentation.

Arguments for Causes and Causes for Argu­ ments When asked to provide reasons why something has happened, might happen, or should happen, the term “because” will soon arise. Similarly, we suspect, few disagreements (scholarly ones included) last longer than those about causes. In brief, causality is as ubiquitous in argumentation as it is difficult to understand fully the causal structure of the real world. It seems reasonable to suppose that the language of causal arguments is intimately relat­ ed to the cognition of causal relations in the world, and to the way people think about causal structure. As things stand, however, “[t]hough basic to human thought, causality is a notion shrouded in mystery, controversy, and caution, because scientists and philoso­ phers have had difficulties defining when one event truly causes another” (Pearl, 2009, p. 401). A paradigmatic example of these difficulties is J. L. Mackie’s (1965) conceptual analysis of “cause” as “an insufficient but necessary part of an unnecessary but sufficient condition (INUS)”—which, at least on initial reading, seems as difficult to understand as the notion it is trying to unpack.2 Such conceptual difficulties in clarifying the basic con­ cept are likely to be reflected in actual discourse whenever “cause” and related terms such as “effect” are used argumentatively. In order to examine causal argument, however, one need not start with a clear theoretical notion of “cause.” Instead, studying causal argument may itself be informative of laypeople’s underlying conception of “cause,” and thus provide raw material for both the­ oretically refined notions of “cause” and for the psychology of causal cognition. Rather than first elucidate the notion of cause in an abstract manner, we thus start from a survey of causal argument types arising in everyday discourse as they have been identified both through corpus analysis and in the argumentation literature. Only with a sense of the range of kinds of causal arguments is it possible to start addressing questions of their co­ gency and actual persuasiveness, and to examine possible implications of causal argu­ ment for an understanding of the notion of “cause” itself.

Causal Argument Patterns from Corpus Analysis Based on an analysis of corpora of natural language text, Oestermeier and Hesse (2000) provided an extensive typology for causal argument. Their categories rest on “the basic argumentative moves of defending, attacking, and qualifying claims” (p. 68). For each type, this typology specifies three types of premises involved in causal arguments: Observational (i.e. spatial, temporal, or episodic), explanatory (i.e. intentional or causal), and ab­ Page 3 of 33

Causal Argument stract knowledge (i.e. conceptual knowledge about (p. 477) criteria for causation) […] [along with] the inference patterns which are needed to come up with a causal conclusion, namely, inferences from observations, generalizations, comparisons, mental simulations, and causal explanations. (2000, p. 68) Figure 25.1 shows Oestermeier and Hesse’s basic typology (stripped of the examples and historical references they provide). This typology features 11 pro types (arguments of­ fered in defense of a causal claim), 10 con types (arguments that attack a causal claim), and 6 qualifying types (which refine a causal claim). For example, “wrong temporal or­ der” is a type of argument advanced against a causal claim. The premises of such an ar­ gument consist of episodic knowledge about the observed temporal order of the events A and B. These premises support an inference to the effect that A has not caused B because A happened after B, as in the example, “The server problems have not caused your sys­ tem crash, the server problems occurred afterward” (for examples of all types listed in Figure 25.1, see Oestermeier & Hesse, 2000). In the texts analyzed by Oestermeier and Hesse, these types varied considerably in preva­ lence. The vast majority of instances of causal argument in the corpora analyzed by Oestermeier and Hesse were of the causal mechanism type (78.2%). These are arguments that cite, and so seek to explain through, a causal mechanism. Structurally, these take the form “A caused C because A led to C via the process/mechanism B”; for example, “His anger caused the accident. It affected his concentration.” After causal explanations, the next most numerous type accounts for a mere 3.9% (!) of the total number of causal arguments in their corpus (Oestermeier & Hesse, 2000, p. 76). Given the pervasiveness of the causal mechanism type, one might ask whether Oester­ meier and Hesse’s typology is fine-grained enough, or whether causal mechanism argu­ ments should be further divided into subtypes. At the same time, however, Oestermeier and Hesse’s notion of causal argument (and hence their typology) is limited to causal claims or conclusions (e.g., “smoking causes cancer”) and the premises (reasons) offered to support those causal claims (e.g., “because smokers have a much higher risk of getting cancer”). However, research within argumentation theory, described next, shows that everyday discourse features not only arguments about causes, and causal explanations, but also arguments from causes.

Scheme-Based Approaches A long-standing tradition within argumentation theory has sought to devise a typology of causal argument, though based on less systematic descriptive (p. 478) procedures than Oestermeier and Hesse’s (2000) corpus analysis. Argument types, in this tradition, are re­ ferred to as schemes. The specific schemes identified by this tradition overlap only par­ tially with those identified by Oestermeier and Hesse, so that a complete typology will need to consider both.

Page 4 of 33

Causal Argument

Figure 25.1 Types of arguments for causal claims (pro, con, qualifier); italicized terms name types of pro-evidence (circumstantial, contrastive, causal ex­ planatory), types of con-evidence ([circumstantial] counter-evidence, alternative explanation, insuffi­ cient evidence), and types of qualifiers (causal com­ plexities, causation without responsibility). Adapted from Oestermeier & Hesse (2000, p. 69).

In contrast to basic corpus analysis, the so-called scheme-based approach to argumenta­ tion not only seeks to describe different types of informal argument schemes, but also pursues normative questions concerning their use. In other words, it seeks to provide guidance on which arguments should convince (on normative foundations for argumenta­ tion theory, see Corner & Hahn, 2013). Hence, much of the extant work on causal argu­ ment in argumentation theory has remained tightly connected to classical fallacies such as post hoc propter hoc (inferring cause from correlation) and slippery slope argument (more on these later). To provide normative guidance, authors typically associate “critical questions” with schemes in order to allow evaluation of the quality of particular in­ stances. This tradition thus feeds directly into the burgeoning, applied literature on criti­ cal thinking (e.g., Inch & Warnick, 2009; but see also Hamby, 2013, and Willingham, 2007, for a critical perspective on this literature). The central status of causal argument in everyday argument is reflected in the typologies of the scheme-based tradition. For instance, Garssen (2001) views causal arguments as one of three top-level argumentation schemes (“symptomatic argumentation,” “argumen­ tation by analogy,” and “causal argumentation”), and maintains that all other schemes found in everyday informal argument are reducible to these three.3 However, the schemebased tradition itself is fairly heterogeneous and has produced a number of competing classification schemes that vary considerably in the number of basic schemes (argument types) assumed, ranging from 3 (as in Garssen, 2001) to 60 (in Walton, Reed, and Macagno, 2008). Walton et al.’s (2008) volume is certainly the most comprehensive treatment within the scheme-based approach, and has sought to amalgamate all individual schemes found in the prior literature, yet Oestermeier and Hesse (2000) include schemes not found in that Page 5 of 33

Causal Argument treatment. There are thus continuing theoretical questions about what should constitute a separate argument scheme and how many distinct schemes there are, a question that is unlikely to be independent of the intended use of the typology (see also Hahn & Hornikx, 2015, for discussion of principled scheme typology). In the following, we provide a brief overview of the causal schemes identified within the scheme-based literature, following, by and large, Walton et al. (2008). First, the seminal work in the scheme-based tradition, Hastings (1962), distinguishes two basic types of causal argument: argument from cause to effect, and its converse from ef­ fect to cause. Hastings sees both as involving further sub-schemes. For the first type, cause to effect, he distinguishes the sub-schemes “prediction on the basis of existing con­ ditions,” referring to an argument whose conclusion states that certain events will occur, and “causal argument based on a hypothetical,” which concerns conclusions that would or will obtain (e.g., “if we were to adopt the proposal, the budget would be overdrawn”). As Hastings notes, the hypothetical sub-scheme appears to be the more common version of the argument, and hypothetical causal arguments are particularly prevalent in policy debates. For the second basic type of causal argument, effect to cause, a close relation obtains to two sub-schemes that Hastings calls “sign reasoning” (e.g., “there are bear tracks, so there is a bear around”), and “argument from evidence to a hypothesis,” both of which typically involve causes. From this very general perspective, then, most arguments about facts are likely to be causal arguments. For the argument from cause to effect, Hastings (1963, p. 74) states four critical ques­ tions that are assumed to be relevant regardless of the specific sub-type: 1. Does the cause have a valid causal relation with the effect? That is, is it the true cause? 2. How probable is the effect on the basis of the correlation? 3. Is the cause a sufficient cause to produce the effect? 4. Are any other factors operating to interfere with the production of the cause? These questions reflect Hastings’ view, formed on the basis of text analysis, that realworld causal arguments are typically complex, involving many causal and correlational sub-components. It is also for this reason, or so Hastings speculates, that causal general­ izations invoked in real-world causal arguments are rarely provided with explicit argu­ mentative support. He consequently describes an assertion such as “if the government nationalised industries, poor planning of the operation of those industries will ensue,” for instance, as a claim with “many elements with varying probabilities” (1962, p. 72), rather than a claim about a fully fleshed out causal model. It may be that in many, or even most, real-world contexts, people’s causal models are rather sparse. This aspect (p. 479)

seems important to any more detailed normative considerations about causal arguments in everyday life.

Page 6 of 33

Causal Argument We next describe in more detail different types of causal argument schemes both for cause-to-effect and effect-to-cause.

From Cause to Effect Since Hastings, many authors have included some form of argument from cause to effect within their basic classification of argumentation schemes (e.g., Grennan, 1997; Kien­ pointner, 1992; Perelman & Olbrechts-Tyteca, 1969; Prakken & Renooij, 2001; van Eemeren & Kruiger, 1987; Walton, 1996). What varies across these authors’ work is their characterization of the nature of the scheme and how, if at all, it may be formalized. Hastings (1962), Walton (1996), and Grennan (1997), for instance, use Toulmin’s (1958) framework to characterize informal argument, which has been popular also in the context of studying the development of argumentation skills (Kuhn, 1991). Toulmin’s framework rests on the insight that classical logic has little to say about everyday informal argu­ ment, which must typically deal with uncertainty. As a more appropriate model, Toulmin suggested the inherently dialectical argumentation between two opposing parties in a courtroom.4 Following from this, Toulmin outlined a general format for representing argu­ ments (Figure 25.2). Arguments are broken down into the basic components of “claim” (the conclusion to be established), “data” (the facts appealed to in order to sup­ port of the claim), “warrants” (reasons that support the inferential link between data and claim), “backing” (basic assumptions that justify particular warrants), “rebuttals” (exceptions to the claim or the link between warrant and claim) and, finally, “qualifiers” (indications of the degree of regularity with which claim may be stated, such as “certain,” “highly probable,” “rare”). Figure 25.2 shows an example argument from cause to effect analyzed by Hastings (1962, p. 67) in this way. This argument, taken from a speech by US president Dwight D. Eisen­ hower, runs as follows: Europe cannot attain the towering material stature possible to its people’s skills and spirit so long as it is divided by patchwork territorial fences. They foster local­ ized instead of common interest. They pyramid every cost with middlemen, tariffs, taxes, and overheads. Barred, absolutely, are the efficient division of labor and re­ sources and the easy flow of trade. In the political field, these barriers promote distrust and suspicion. They served vested interests at the expense of peoples and prevent truly concerted action for Europe’s own and obvious good. (quoted in Hastings, 1962, from Harding, 1952, p. 532) This framework provides a potentially useful way of identifying the various components of an overall argument, and has consequently been widely used in psychological research on the development and pedagogy of argument skills (see e.g., van Gelder, Bissett, & Cum­ ming, 2004; von Aufschaiter, Erduran, Osborne, & Simon, 2008). On its own, however, it does little to provide an evaluative (p. 480) framework for everyday informal argument that might rival classical logic. Arguably, an argument might be better if a warrant is pro­ vided, than if it is not (see Kuhn, 1991, and later in this chaper), but the Toulmin frame­ Page 7 of 33

Causal Argument work itself offers no grounds for a judgment on whether the warrant itself is more or less compelling.

Figure 25.2 Hastings’s (1962) Toulmin diagram of an argument from cause to effect taken from a speech by Dwight Eisenhower. Full quote appears in text.

It is clearly desirable to have a normative account of argument, that is, an account that tells us how we should argue, and what arguments we, as rational agents, should find compelling and which ones weak. Classical logic sought to provide such a normative stan­ dard for “good argument”; Toulmin’s diagnosis of its weakness in the context of everyday argument is right, but his framework does not provide a more suitable replacement. Somehow, an evaluative, normative perspective must be able to engage with the actual content of claim, data, warrant, and backing, not just their structural relationships per se. Indeed, in its structural orientation, Toulmin’s model ultimately shows the same limita­ tions as the framework of classical logic that it seeks to replace. Subsequent authors have therefore tried to introduce a stronger normative component via critical questions. They have also tried to formalize adequately the type of inference involved in order to bring out more clearly normative aspects of causal argument. Within the scheme-based tradition, the point of departure for such attempts has typically again been a dialectical perspective. Such a perspective tries to bring together the two different senses of “argument” identified at the beginning of this chapter in order to evaluate argu­ ments. It maintains that argument must be understood in the context of wider, dialectical exchange; that is, “argument,” in the narrow sense of an inferential object, can be evalu­ ated properly only by reference to the wider argumentative discourse (“argument” in the wider sense) in which it occurs (e.g., van Eemeren & Grootendorst, 2004). More specifically, the basic premise underlying much of the more recent work within the scheme-based tradition is that an argument such as that from cause to effect can be treated as containing a defeasible generalization (Walton, 1996; Walton et al., 2008), as indicated by the quasi-quantifier “generally” (echoing Hastings’s point that causal gener­ alizations are typically not references to fully fleshed out causal models): Generally, if A occurs then B will (might) occur. Page 8 of 33

Causal Argument In this case, A occurs (might occur). Therefore, in this case, B will (might) occur. From a formal perspective, however, this is not to be read as a probabilistic modus po­ nens (as in, e.g., Edgington, 1995; Evans & Over, 2004; Oaksford & Chater, 1994; a more detailed discussion of probabilistic modus ponens will follow). Instead, authors within the scheme-based tradition take this to be a defeasible argument (in the narrow sense) that is embedded in a (potential) series of dialectical moves (i.e., an argument in the wider sense) where, once a proponent has raised the argument, it is the respondent’s task to re­ ply either by challenging a premise, asking an appropriate critical question, or accepting the argument. Whether or not an argument ultimately “goes through” depends crucially on the allocation of the burden of proof (see, e.g., Walton, 1988; but also Hahn & Oaks­ ford, 2007b). Raising critical questions may shift the burden of proof, so that the defeasi­ ble conclusion can no longer be maintained unless further evidence is provided. While this provides a semi-formal rendition, at best, work in artificial intelligence (AI) over the last decade has sought to embed such an approach within more well-defined sys­ tems of non-classical logic (e.g., Gordon, Prakken, & Walton, 2007; Prakken & Renooij, 2001). Much of this computational work on defeasible reasoning and argumentation has been driven by the belief that a probabilistic approach would be inadequate or impossi­ ble. This work has generally ignored probabilistic treatments of conditional inferences such as modus ponens (e.g., Oaksford & Chater, 1994). However, these burden of proof– based approaches have recently been explicitly contrasted with Bayesian, probabilistic approaches to argumentation more generally (e.g., Hahn & Oaksford, 2006, 2007a; Hahn, Oaksford, & Harris, 2013; Hahn & Hornikx, 2015), and we will see positive examples of probabilistic treatments of such generalizations in the following. Concerning the nature of the generalization from which a defeasible argument about cau­ sation is to unfold, Walton et al. (2008) provide several alternative bases for that general­ ization (where Si is the cause and Sj the effect): 1. Regularity: Sj regularly follows Si. 2. Temporal sequence: Si occurs earlier than (or at the same time as) Sj. 3. Malleability: Si is changeable/could be changed. 4. Causal status: Si is a necessary or sufficient or INUS condition of Sj.5 (p. 481) 5. Pragmatic status: Pragmatic criteria, like voluntariness or abnormality, may single out a cause. The purpose of these clauses is to serve as operative criteria in reasoned discourse such that in any given case, the proponent of a causal argument can select one of these clauses as representing the kind of rule [s]he has in mind as representing the sort of claim [s]he is making […]. [I]t should be possible to make all or any of these

Page 9 of 33

Causal Argument claims, and so the analysis of causal argumentation schemes should permit all three possibilities. (Walton et al., 2008, 185) Notice that clause 4 allows for causal overdetermination, that is, cases where causes are sufficient but non-necessary for their alleged effects, that is, the effects may be brought about by one or more of several sufficient causes. Here, at the very least, it becomes clear that the normative project concerning the evaluation of causal arguments must nec­ essarily engage in the question of what a cause is. While clauses 1–4 more or less repeat standard criteria from work on causation within the philosophy of science, clause 5 references pragmatic factors to also include the wider context of causal argument. Walton et al. (2008) motivate this by reference to causal ar­ gument in law (e.g., Hart & Honore, 1985) where particular types of causes are most rele­ vant, most notably voluntary human actions, which are singled out from the overall causal chain of events in order to assign various responsibilities to agents. This echoes Pearl’s (2009, p. 401) observation that, at least since Aristotelian scholarship, causality has served a dual role: causes are both “the target of credit and blame,” on one hand, and “the carriers of physical flow and control on the other.” Both aspects figure prominently in causal arguments. Indeed, it presently remains unclear whether there is a single, uni­ tary notion of causality that suffices for both aspects, or whether seemingly competing accounts of causation such as counterfactual accounts (Lewis, 1973; see Bennett, 2003, for a review) and generative accounts (Dowe, 2000; Salmon, 1984) are in fact both in­ volved in human causal judgments, albeit to different ends (see also Walsh, Hahn, & De­ Gregorio, 2009; Illari & Russo, 2014). Similarly, it is unclear whether domains such as law posses notions of causality that differ from those of the physical or social sciences (see, e.g., Honore, 2010, for discussion and further literature; see also Lagnado & Gersten­ berg, Chapter 29 in this volume). Closer textual analysis of real-world arguments, in law and elsewhere, should be informative here.

Argument from Consequences Finally, there is a type of argument from cause to effect that seems both practically im­ portant and prevalent enough to be discussed separately: namely the so-called argument from consequences. The argument from consequences is a type of what Hastings (1963; see earlier discussion) calls “hypothetical causal argument.” Such an argument seeks to promote or deter from a particular course of action on the basis of that action’s putative consequences. This type of argument forms the basis of much practical reasoning, that is, reasoning about what to do: Argument from consequences Premise: If A is brought about, good (bad) consequences will plausibly occur. Conclusion: Therefore A should (should not) be brought about.

Page 10 of 33

Causal Argument Within the scheme-based tradition, a number of closely related practical reasoning schemes are distinguished, such as a general scheme for “practical inference” (I have goal X, Y realizes X, I should carry out Y; see Walton et al., 2008, p. 323; or similarly, the “argument from goal,” Walton et al., 2008, p. 325; Verheij, 2003), and a number of special cases of the argument from consequences feature prominently in the traditional cata­ logue of fallacies, such as the argumentum ad misericordiam (see, e.g., Hahn & Oaksford, 2006), which uses an appeal to pity or sympathy to support a conclusion, or the argumen­ tum ad baculum, an argument from threat (see also Walton, 2000), or slippery slope argu­ ments (more on these later). For the argumentation theorist, these different subtypes may each hold interest in their own right (or at least it will be of interest whether or not these subtypes do, in fact, merit conceptual distinction because they actually raise normative and empirical issues of their own; see also Hahn & Hornikx, 2015). Common to consequentialist arguments is that valuations are central to their strength. Consequentialist arguments (see, e.g., Govier, 1982) about the desirability of a particular event, policy, or action rest not only on the probability with which a cause (a hypothetical action) will bring about a particular consequence (the action’s effect), but also on the util­ ities of both the action and the desired/undesired future consequence. The strength of an argument from consequences, (p. 482) therefore, is properly determined by considerations of probability and utility. For example, when the relevant consequence of a given action under debate is (perceived to be) more or less neutral, this gives few grounds to in fact take that action, even if the consequence itself is almost certain to obtain. Likewise, if the consequence of an action is highly undesirable, that is, has a high negative utility, this will give few grounds to not avoid the action, even if the probability that this consequence will obtain is almost zero. Given that consequentialist arguments are about action, it is unsurprising that they align well with Bayesian decision theory (Ramsey, 1931; Savage, 1954), which identifies opti­ mal courses of action through the multiplicative combination of probability and utility. A number of authors have pursued a decision-theoretic approach to consequentialist argu­ ment (e.g., Elqayam et al., 2015; Evans, Neilens, Handley & Over, 2008; Hahn & Oaks­ ford, 2006, 2007a), including empirical examination of people’s subjective valuations of argument strength for such arguments. This work will be described in more detail in our survey of experimental work.

From Effect to Cause What then of arguments from effects to causes? As noted earlier, a number of schemes concerning inference from data to a hypothesis are pertinent here, whether the hypothe­ sis be a causal generalization or an individual event. Most authors in the scheme-based tradition have included some form of “argument from sign” and/or from “evidence to hypothesis” (see Walton et al., 2008, for further references). At the most general level (where the hypothesis may be of any kind, causal or otherwise) such arguments seem well-captured by Bayesian inference (see also Hahn & Hornikx, 2015), though there are also theoretical and empirical questions here about the relation of such inference to ab­ Page 11 of 33

Causal Argument duction and inference to the best explanation (e.g., Lombrozo, 2007; Schupbach & Sprenger, 2011; van Fraassen, 1989; Weisberg, 2009; Lombrozo & Vasilyeva, Chapter 22 in this volume). The fact that these inferences concern putative causes, however, puts them in the remit of causal learning (see, e.g., Rottmann, Chapter 6 in this volume). An important project would thus be to square psychological results on causal learning with the types of arguments people entertain when seeking to identify putative causes. Chief among the “specialist” arguments seeking to identify causation is the classic fallacy of “inference from correlation to cause,” which we discuss next.

Correlation and the Bayesian Approach to Argument Strength Putative fallacies, or “arguments that seem correct but are not” (Hamblin, 1970), or that “seem to be better arguments of their kind than they in fact are” (e.g., Hansen, 2002; Wal­ ton, 2010), pervade everyday informal argument. Catalogs of fallacies (originating with Aristotelian scholarship) have been the focus of long-standing theoretical debate, and a full understanding of the fallacies has remained a central concern to philosophers, com­ munication scholars, logicians, rhetoricians, and cognitive scientists interested in argu­ ment. A staple of the traditional fallacies catalog is the inference from correlation to cause, also known as the post hoc fallacy, as in post hoc, ergo propter hoc (which roughly translates as “after this, hence because of this”), or relatedly, the cum hoc ergo propter hoc fallacy (Latin for “with this, therefore because of this”). In more modern terms, this is normally stated as “there is a positive correlation between A and B (premise), so A causes B (con­ clusion).” As a (fallacious) example, one may consider the (now debunked) claim that the measles, mumps, and rubella (MMR) vaccination causes autism—a spurious link that could arise due to the temporal nature of MMR vaccination and the emergence of overt signs of autism. The relevant argument scheme has been associated with the following critical questions (see Walton et al., 2008): CQ1: Is there really a correlation between A and B? CQ2: Is there any reason to think that the correlation is more than a coincidence? CQ3: Could there be some third factor, C, that is causing both A and B? These critical questions reflect the general appreciation that, although they are not de­ ductively valid (i.e., their conclusions are not logically entailed by the premises), such ar­ guments nevertheless can be, and often are, reasonable inductive inferences. Such argu­ ments, then, need not be “fallacious” in any stronger sense than lack of logical validity (and logical validity itself is no guarantee that an argument is strong, as the case of circu­ lar arguments demonstrates; see, e.g., Hahn, 2011). Moreover, this lack of logical validity is a feature that they have in common with the overwhelming majority of everyday argu­ ments, since informal argument typically (p. 483) involves uncertain inference. The argu­ Page 12 of 33

Causal Argument ment scheme approach thus highlights a characteristic aspect of most fallacies within the catalog: depending on their specific content, instances of these arguments often seem quite strong in the sense that their premises lend inductive support to their conclusions. Exceptions and content-specific variation have generally plagued theoretical attempts to provide a comprehensive formal treatment that explains why fallacies make for “bad” ar­ guments of their kind on specific occasions of their use. The logical structure of these ar­ guments cannot be the key, however, because versions of the same form of argument (and hence the same logical structure) differ in relative strength. Variations in strength must be due to content-specific variation, and so require a formal framework such as probabili­ ty theory (as an intensional formal system; see Pearl, 1988), which makes reference to content. From a probabilistic perspective, then, the allegedly fallacious arguments that were historically tabulated as fallacies are typically not fallacious per se. Rather, specific instances are weak due to their specific content, and probabilistic Bayesian formalization brings this to the fore (see, e.g., Hahn & Oaksford, 2006, 2007a; Oaksford & Hahn, 2004; and, specifically in the context of logical reasoning fallacies, Korb, 2004; Oaksford & Chater, 1994). For the argument from correlation to cause, a probabilistic perspective can go beyond the three simple critical questions offered in the scheme-based tradition. Specifically, a wealth of statistical techniques for inferring causation from essentially correlational data have been developed: learning algorithms for causal Bayesian Belief Networks and struc­ tural equation modeling (see, e.g., Pearl, 2000) provide salient examples (see also, Rottman, Chapter 6 in this volume). These techniques seek to learn causal models from data and, as part of this, provide evaluation of how good a given causal model is as a de­ scription of available data, thus providing measures of how convincing a particular infer­ ence from correlation to cause actually is. An important, practical project for future work would be to try to distill insights from such techniques into “critical questions” that can be readily communicated in everyday settings.

Summary In concluding the overview of types of causal arguments, it is worth drawing attention to several aspects of present typology. First, it is notable how varied and diverse causal ar­ gument is, reflecting the central role that considerations of causality take in human think­ ing. Notable also are the differences between attempts at systematization: the typology drawn from corpus analysis (see section “Casual Argument Patterns from Corpus Analy­ sis”) is much richer than the scheme-based literature concerning evidence for causes themselves. At the same time, the scheme-based tradition makes clear how much arguing from causes there is, and that causal argument is not only often hypothetical or counterfactual, but also makes reference to the utilities of putative outcomes in the service of de­ ciding on courses of action. This suggests that a predominant focus within the causal cog­ nition literature on causal learning will arguably miss important aspects of causal cogni­ Page 13 of 33

Causal Argument tion, and it sits well with recent attempts to move the focus of psychological research on causal cognition beyond some of the dichotomies that have dominated the field in the past (see also Oaksford & Chater, Chapter 19 in this volume, and Lagnado & Gerstenberg, Chapter 29 in this volume). At the same time, it is clear that much work remains to be done at the theoretical level of typologies. For one, a single, integrated typology of causal argument would seem desir­ able. It is only when one has a clear overview of a target phenomenon that one can hope to build adequate theories of it. Typologies aim at both completeness and systematiza­ tion. The former determines the scope of a theory, the latter goes hand in hand with theo­ ry development itself, because systematization is about discerning patterns or crucial di­ mensions of variation across cases. Consequently, the mere fact that there is no compre­ hensive, systematic typology of causal argument illustrates that causal argument is still poorly understood. Both traditions surveyed here still appear to underappreciate the vari­ ety of causal argumentation “out there”: the fact that there is comparatively little overlap between Oestermeier and Hesse’s (2000) analysis and the large, scheme-based compendi­ um of Walton et al. (2008) raises the possibility that there are further types of causal ar­ gument that both have missed. Finally, it is apparent that much of the research on types of causal argument also has ex­ plicitly normative concerns. Normative considerations are valuable, not only because they afford standards of comparison for rational, computational analysis of human behavior (see, e.g., Anderson, 1990; and in the context of causation specifically, e.g., Griffiths & Tenenbaum, 2005; Sloman & Lagnado, 2005), but also because the quality of people’s everyday (p. 484) thinking and arguing is of immediate practical concern. The emphasis within the scheme-based tradition on “critical questions” and the explicit links to improv­ ing critical thinking and argument reflects a worthy goal. At the same time, however, the preceding survey makes clear the diversity of normative approaches that presently ob­ tains, ranging from the informal to formal, and spanning non-classical logics as well as probability theory. A unified perspective would clearly be desirable. Before returning to normative issues in the final section of this chapter, however, we next survey the extent of empirical work on causal argument.

Causal Argument and Cognition While there is little empirical work under the header of “causal argument” per se, the breadth of causal argument types identified earlier and the importance of causality to our everyday reasoning suggest that there should nevertheless be considerable amounts of relevant psychological research. And, on closer inspection, there is a sizable body of re­ search whose investigative topic is reasoning or argumentation that happens to involve causality. In particular, investigations of causal arguments and people’s ability to deal with them are reported in the literature on the development of argumentation skills, on science arguments, and on consequentialist argument and reasoning. In the following we provide brief examples of each. Page 14 of 33

Causal Argument

Causal Conditionals Reasoning with conditionals, particularly logical reasoning with conditionals, is a topic of long-standing research within cognitive psychology (see, e.g., Oaksford & Chater, 2010a, for an introduction). Within this body of work one can find a number of studies investigat­ ing conditionals (i.e., “if … then statements”) and potential logical inferences from these for specifically causal materials. Most of this research has centered on four argument forms: modus ponens (MP), modus tollens (MT), affirming the consequent (AC), and denying the antecedent (DA)—exempli­ fied in Table 25.1. Only two of these—MP and MT—are logically valid, that is, when their premises are true, the truth of their conclusions follows by logical necessity. However, the long-standing finding is that people fail to distinguish appropriately between the different schemes when asked about logical validity (e.g., Marcus & Rips, 1979). Moreover, for con­ ditional inference and other forms of logical reasoning such as syllogistic reasoning, there are countless demonstrations that people’s inferences are affected not just by the formal (logical) structure of the inference, which is the only relevant aspect for their validity, but also by specific content (e.g., Evans, Barston, & Pollard, 1983; Oaksford, Chater & Larkin, 2000). Much of the empirical and theoretical debate about conditional reasoning has focused on the issue of the appropriate normative standard against which participants’ responses should be evaluated. Classical logic renders natural language “if … then” as the so-called material conditional of propositional logic, and much early research on logical reasoning within psychology adopted this normative perspective (e.g., Wason, 1968). Both philoso­ phers (e.g., Edgington, 1995) and psychologists (e.g., Evans & Over, 2004; Oaksford & Chater, 1994), however, have argued that this is an inappropriate (p. 485) formalization of what people mean with natural language “if … then,” both conceptually and empirically. This has given rise to alternative, in particular probabilistic, interpretations of the condi­ tional (in which case, factors such as believability need no longer constitute an inappro­ priate bias). Table 25.1 Key Forms of Inference Involving Conditionals Informal

Classical Logic

Logically Valid?

Modus ponens (MP)

If p, then q p Therefore, q

p→q p |-- q

yes

Modus tollens (MT)

If p, then q Not q Therefore, not p

p→q ¬q |-- ¬p

yes

Page 15 of 33

Causal Argument Denying the antecedent (DA)

If p, then q Not p Therefore, not q

p→q ¬p |-- ¬q

no

Affirming the consequent (AC)

If p, then q q Therefore, p

p→q q |-- p

No

One aspect that has figured here is that natural language conditionals often involve causal connections (see also Oaksford & Chater, Chapter 19 in this volume). Cummins et al. (1991) examined specifically how participants’ judgments of the conclusion of a condi­ tional argument differed systematically as a function of the number of alternative causes and disabling conditions that characterized the causal relationship (as benchmarked in a pretest with different participants) and did so in potentially different ways for each the four classic forms: MP, MT, AC, and DA. For example, people were presented with argu­ ments such as “If my finger is cut, then it bleeds. My finger is cut. Therefore, it bleeds.” or “If I eat candy often, then I have cavities. I eat candy often. Therefore, I have cavities.” Here (so Cummins et al., 1991) it is presumably easier to think of disabling conditions for the second example (cavities) than it is for the first (bleeding finger), and one might ex­ pect people’s judgments of conclusion strength to be sensitive to this. In keeping with this, participants’ judgments were found to vary systematically with the number of alter­ native causes and disabling conditions. Conclusions of arguments based on conditionals with few alternative causes or few disabling conditions, in particular, were more acceptable than conclusions based on those with many. Moreover, Cummins et al. showed that both the number of alternative causes and possible disabling conditions affected the extent to which the conditional (if … then) was interpreted as a bi-conditional (if and only if), or not. Subsequent work has also attempted to distinguish competing accounts of logical reason­ ing using specifically causal materials (e.g., Ali, Chater & Oaksford, 2011; Quinn & Markovits, 1998; Verschueren, Schaeken, & d’Ydewalle, 2005; for an overview, see also Oaksford & Chater, 2010b). Recent work, particularly the sophisticated analyses provided by Singmann, Klauer, and Over (2014), finds robust evidence only for an interpretation of the conditional as a condi­ tional probability, but finds no evidence that participants’ judgments of conclusion proba­ bility are sensitive to “delta P” ((P(q|p) − P(q|¬p)), a quantity that has figured prominent­ ly in accounts of causal learning (see. e.g., Over, Hadjichristidis, Evans, Handley & Slo­ Page 16 of 33

Causal Argument man, 2007; Sloman, 2005; see also Over, Chapter 18 in this volume). Whether this result will prove robust in subsequent work remains to be seen, but it highlights the potential for work on causal argument to complement the results from other, more familiar, para­ digms for investigating the psychology of human causal learning and causal understand­ ing. Argument evaluation tasks may provide independent evidence in the context of rival accounts of laypeoples’ understanding of causation as found in decades of causal learning studies.6

Consequentialist Argument Recent years have also seen increasing empirical interest in another form of conditional, namely consequentialist, arguments. Evans, Neilens, Handley, and Over (2008) investigated a variety of conditionals expressing conditional tips, warnings, threats, and promises. For example, “If you go camping this weekend (p) then it will rain (q),” is a clear warning not to go camping. From the decision-theoretic perspective mentioned ear­ lier, the higher P(q|p), and the more negative the utility associated with the consequent, U(q), that is, rain, the more persuasive should be a conditional warning to the conclusion that action p should not be taken (¬p), that is, you should not go camping. Evans et al. found that participants’ judgments of persuasiveness varied in the predicted way as a function of costs and benefits of antecedent (p) and conclusion (q), as well as the condi­ tional probability (P(q|p)) linking the two. These effects held for both positive (tips, promises) and negative (warnings, threats) consequentialist arguments. Corner, Hahn, and Oaksford (2011) provided an empirical examination of a particular type of consequentialist argument: slippery slope arguments such as “if voluntary eu­ thanasia is legalized, then in the future there will be more cases of ‘medical murder.’ ” Slippery slope arguments are a type of warning and are distinct merely in the type of (im­ plied) mechanism that underlies P(q|p). In particular, for many slippery slope arguments a gradual shift of category boundaries is at play (on other forms of slippery slope argu­ ments, see, e.g., Volokh, 2003): The act of categorizing some instance (say, voluntary eu­ thanasia) under a more general predicate (here, legal medical intervention) is assumed to lead inevitably to other items (e.g., involuntary (p. 486) euthanasia or “medical murder”) eventually falling under the same predicate. Corner et al. (2011) examined not only the effects of utility on the perceived strength of slippery slope arguments, but also examined a specific mechanism underlying the condi­ tional probability P(q|p). Specifically, they examined the causal mechanism involved in “sorites”-type slippery slope arguments, namely “category boundary reappraisal”: Cur­ rent theories of conceptual structure typically agree that encountering instances of a cat­ egory at the category boundary should extend that boundary for subsequent classifica­ tions, and there is a wealth of empirical evidence to support this (e.g., Nosofsky, 1986). Building on this, Corner et al. (2011) showed how people’s confidence in classifications of various acts as instances of a particular category was directly related to their degree of endorsement for corresponding slippery slope arguments, and that this relationship is

Page 17 of 33

Causal Argument moderated by similarity. Slippery slope arguments from one instance to another were viewed as more compelling, the more similar the instances were perceived to be. The Corner et al. studies demonstrate how the conditional probability P(q|p) that influ­ ences the strength of consequentialist arguments as a type of cause-to-effect argument may be further unpacked. The same is true for a recent study by Maio et al. (2014) that experimentally examined a particular type of hypothetical consequentialist causal argu­ ment that appeals to fundamental values. Specifically, Maio et al. investigated “co-value argumentation,” which appeals to furthering one value because doing so will further an­ other. The following quote by George W. Bush provides an example: “I will choose free­ dom because I think freedom leads to equality” (see Anderson, 1999, as cited in Maio et al., 2014). Numerous examples of this argument type are found, ranging from Plato—“equality leads to friendship”— to Howard Greenspan, who argued that “honesty leads to success in life and business” (examples cited in Maio et al., 2014). “Success,” “freedom,” “equality,” and “honesty” are terms that exemplify what social psy­ chologists consider to be instances of fundamental values that are universally used to guide and evaluate behavior (Schwartz, 1992; Verplanken and Holland, 2002). As conse­ quentialist arguments, their strength should depend both on the strength of the causal connection between antecedent and conclusion value, and the antecedent value’s impor­ tance. With respect to causal connections, psychological research on fundamental values has provided evidence that our value systems are structured, that is, they display internal or­ dering. This ordering is based on the fact that actions taken in pursuit of a particular val­ ue will have psychological, practical, and social consequences that may be either compat­ ible or incompatible with the pursuit of another value (for empirical evidence concerning the psychological relevance of this structure, see, e.g., Schwartz, 1992; Schwartz & Boehnke, 2004; Maio et al., 2009). From this, one can derive predictions about the strength of arguments that combine val­ ues. Opposing values are classed as such because the actions taken in their respective pursuit may conflict, that is, pursuing one of two opposing values likely impinges nega­ tively on the pursuit of the other. Conversely, values that fulfill similar motives will be pos­ itively correlated in terms of the consequences of actions one might take in their pursuit. Finally, values that are orthogonal will be more or less independent. These relations—comparative incompatibility, compatibility, and independence—translate directly to systematic differences in causal relatedness and hence conditional probabili­ ties: two opposing values will be negatively correlated, similar values will be positively correlated, and orthogonal values will be independent. Expressed in probabilistic terms, a causal perspective can thus provide clear predictions about the relative convincingness of different consequentialist arguments that combine any given two values. Co-value argu­ mentation involving opposing values should give rise to less convincing arguments than Page 18 of 33

Causal Argument using orthogonal values, and similar values should be even more convincing. In keeping with the causal basis, Maio et al. (2014) found this pattern confirmed both in ratings of argument persuasiveness and in proclaimed intention to vote for a political party on the basis of a manifesto that had manipulated co-value argument. These results not only un­ derscore the importance of causal considerations to people’s evaluation of everyday argu­ ments in the practical domain of values. They also provide an extreme example of the de­ gree of abstraction in causal argument and the sparsity of the underlying causal models that people are willing to engage with in arguments about real-world relationships. (p. 487)

Causal Argument and Causal Thought: Kuhn (1991)

Finally, we discuss what is arguably the central study on causal argument to date, even though it is not explicitly billed as an investigation of causal argument. Kuhn’s (1991) monograph The Skills of Argument is a landmark investigation of people’s ability to en­ gage in real-world argument, across the life span and across different levels of education­ al background. Kuhn’s fundamental premise is that thinking ability is intrinsically tied to argumentation ability, to the extent that reasoning may just be seen as a kind of “arguing something through with oneself.” Furthermore, Kuhn maintains that thinking (and with it argumentation) abilities are far less well understood than one might expect. This is be­ cause of the focus within most of cognitive psychology on lab-based experimentation, fre­ quently involving highly artificial, stylized materials, which leave unanswered questions about how people fare in actual everyday argument. These limitations are compounded by the fact that participants are typically drawn from undergraduate samples. Kuhn’s (1991) study of argument skills sought to redress this balance by getting people from a range of backgrounds (with both college- and non-college-level education) and a range of ages (teenagers, 19–29, 40–49, and 60–69 years of age; for studies of younger children, see Kuhn, Amsel, & O’Loughlin, 1988) to engage in argument generation and evaluation for a series of real-world topics that would actually matter to them, but for which they would also have varying levels of expertise. Crucially, from the perspective of the re­ searcher interested in causation and causal arguments, all three topics concerned caus­ es: what causes prisoners to return to crime after they are released; what causes children to fail school; and what causes unemployment. In a series of structured interviews, Kuhn and colleagues asked participants to provide an initial causal explanation or theory of the phenomenon in question. They then asked par­ ticipants to provide evidence for those theories, but consider also alternative causes and what kinds of evidence would count for or against them. Responses were coded with a modified version of the Toulmin framework described earlier into “theories,” “supporting evidence,” and “opposing argumentation” (alternative theories, counterarguments, rebut­ tals). Kuhn found considerable variation in argument skill across participants. In particular, a sizable number of participants were unable to generate genuine evidence for any of their theories across the three topics (29%) or to generate an alternative theory (8%). Further­

Page 19 of 33

Causal Argument more, where there were failures to generate real evidence, evidence was poorly differen­ tiated from the theory itself. Where participants supplied causal arguments for their preferred theories, only a minori­ ty offered genuine evidence that makes reference to covariation. Other forms of genuine evidence found were evidence from analogy, causal generalizations, and discounting or elimination of alternative causes. At the other extreme (non-evidence), participants seemed willing to treat the effect itself as evidence of the cause, or even to deny the need for evidence altogether. This serves to underscore earlier points about the sparsity of causal models in (at least some contexts) of real-world argument, and Kuhn’s findings seem reminiscent of Keil’s work on the “illusion of explanatory depth” (see, e.g., Keil, 2003). At the same time, it is striking that comparatively few participants showed clear insight into the importance of manipulation in establishing causal relationships in the arguments they supplied. Limitations were apparent also in participants’ evaluation of evidence given to them. In a second session, the same group of participants was presented with evidence designed in­ tentionally to be largely non-diagnostic. Examples are given in Table 25.2. The evidence in Table 25.2 contains descriptions with little information that could be used to infer causal relationships. Though this was recognized by some participants (Question: “What do you think is the cause of Peter’s return to crime?” Response: “there’s nothing in here that suggests a cause”; p. 207), sizable numbers of others did perceive the passages to be evidence for a particular causal mechanism, and possibly even more surprisingly, expressed great certainty concerning their preferred cause. In all of this, participants showed variations across topics, seemingly as a function of fa­ miliarity with the domain, but there was greater consistency in performance than would be expected by chance. In particular, there was consistency also with what Kuhn and col­ leagues deemed “epistemological perspectives,” that is, more general beliefs participants had about knowledge, and the extent to which different people might reasonably hold dif­ ferent views, without endorsing a relativism so total that all knowledge is seen as “mere opinion.” Finally, for both epistemological perspectives and argument (p. 488) skills, there was statistical evidence of influence only from educational background, not gender or age. Table 25.2 Underdetermined Evidence Crime topic Pete Johnson is someone who has spent a good portion of his adult life in prison. He was first convicted of a crime at age I4, when he took part in the theft of a newspaper stand. He began serving his first prison sentence at age l8, after being convicted on several charges of auto theft and robbery. He remained in a medium-security state prison until the age of 20. After he was released on parole, he returned to live with his mother in the same neighborhood where he had grown up and began to look for a job. Page 20 of 33

Causal Argument After 3 months out of jail, he took part in the robbery of a grocery store. He was caught and convicted and returned to prison. Since then, Pete has served three more prison sentences for different crimes with only brief periods out of prison between sentences. School topic David Bell is a child who has shown a continuing failure to learn in school. David is l0 years old and repeating second grade. He also repeated first grade because of his poor work. David lives with his parents and younger sister in a medium-sized city. Since age 5. he has attended the elementary school in the family's neighborhood. David had great difficulty in learning to read and now reads only when he is required to and has trouble recognizing all but very simple words. David does no better in math then he does in reading. He dislikes schoolwork of any kind. Often he spends his time in the classroom daydreaming or talking to other children. David finds the work his teacher gives the children to do uninteresting. He says that he would rather spend his time doing other things. Source: Sample “evidence” from Kuhn (1991) (Kuhn, 1991, Table 8.1 p. 205) Follow-up research by Sá, Kelley, Ho, and Stanovich (2005) that again examined self-gen­ erated causal theories for the crime and education topics, elicited in a similar structured interview, confirmed the sizable variation in participants’ argument skills.

Causal Argument: A Research Agenda From our survey of extant research it should be clear that causal argument presents a rich field of inquiry, but one that is presently still underdeveloped. In the final sections, we draw together what appear to us to be the main themes and strands for future re­ search. One strand, on which much else rests, is the issue of typologies of causal argument. The first thing to emerge from the survey of types of causal arguments identified in the litera­ ture is the extraordinary richness of causal argument. People argue daily both about causes and from causes, and these arguments concern not just the way things presently are, but also future possibilities and actions. This also gives many causal arguments an in­ trinsic link with utilities and valuations. For the argumentation theorist, there is clearly more work to be done here: it is unclear whether even the extensive list drawn up in this chapter exhausts the range of different types of causal argument to be found in everyday and specialist discourse. This question of typology (and its completeness) matters because only with a full sense of the many dif­ ferent ways in which causes figure in argument, and thus in everyday life, can one hope to have a complete picture of the psychology of causal reasoning.

Page 21 of 33

Causal Argument The case for an intimate connection between reasoning and argumentation has been well made (Kuhn, 1991; Mercier & Sperber, 2011), and argumentation not only provides a win­ dow into reasoning abilities, it may also be a crucial factor in shaping them. In the con­ text of typologies, important theoretical questions remain about what should count as a distinct “type” of argument, both normatively and descriptively, and why (see Hahn & Hornikx, 2015). Much work also remains to be done with respect to the normative question of what makes a given type of causal argument “good” or “strong.” This matters to researchers who are interested in the development and improvement of skills—argumentation theorists, devel­ opmental and educational psychologists, or researchers interested in science communica­ tion, to name but a few. It matters also to anyone concerned with human rationality, whether from a philosophical or a psychological perspective. Finally, given the benefits of rational analysis and computational level explanation to understanding human behavior (e.g., Anderson, 1990), it should matter also to any psychologist simply interested in psy­ chological processes (see also Hahn, 2014). As seen in the section “Arguments for Causes and Causes for Arguments,”there is presently no fully worked out, coherent normative picture on causal argument. The litera­ ture is fragmented in terms of both approach and preferred formalism (or even whether formal considerations are necessary at all). The notion of “critical questions” has educa­ tional (p. 489) and practical merit, but the present depth of such critical questions falls considerably behind what could be developed on the basis of extant formal frameworks such as causal Bayesian networks (e.g., Pearl, 2000). A comprehensive normative treatment of causal argument seems a genuine cognitive sci­ ence project that will need contributions from a number of fields, including artificial intel­ ligence and philosophy. It offers also the possibility of a more comprehensive and more ef­ fective treatment of causal argument in computational argumentation systems (see, e.g., Rahwan & Simari, 2009; Rahwan, Zablith & Reed, 2007). At the same time, it must be stressed that there is a considerable amount of work required before anything like a com­ prehensive normative treatment of causal argument might be achieved. There are as­ pects, such as causal inference, for which there already exist formal approaches that have a reasonable claim to normative foundations, such as causal Bayesian networks. However, these presently still leave many factors relevant to causal argument largely un­ addressed. In particular, only recently have researchers started to concern themselves with normative questions involved in the transition between causal models, as become necessary, for example, on learning new conditionals (Hartmann, submitted). Only with a clear normative understanding can questions of human competence in causal argument be fully addressed. At present, the evidence on human skill in dealing with cau­ sation is rather mixed. While humans may do very well in lab-based contingency learning tasks (but see also Kuhn, 2007), and do rather well in causal inference involving explicit verbal descriptions such as those presented in causal conditional reasoning tasks, think­ ing and arguing about complex real-world materials (even frequently encountered ones Page 22 of 33

Causal Argument for which people possess considerable amounts of relevant knowledge) seem to be anoth­ er matter, as Kuhn’s (1991) study shows. Both the sizable individual differences in Kuhn’s (1991) study and her finding that, al­ though there is consistency, the degree of competence expressed seems to vary with fa­ miliarity of topic and materials, suggest that skills are more or less readily expressed ac­ cording to context. So a fuller understanding of causal reasoning should understand also what particular aspects contribute to success or failure. Kuhn’s (1991) study also highlights a distinction that has been drawn elsewhere in the lit­ erature on causal reasoning, namely that between causal structure and causal strength (Griffiths & Tenenbaum, 2005). Examination of competence in dealing with causality needs consideration not only of the circumstances under which people draw inferences about causal structure, but also how strong they consider causal relations (and the sup­ porting evidence) to be. In generating and evaluating causal arguments, there are three distinct ways in which reasoners might go wrong, exemplified in Figure 25.3. Specifically, causal arguments should be evaluated with respect to how well they align the causal-premises (of the argu­ ment) with a causal-model and a subjective degree of support (Figure 25.3). Kuhn’s study suggests that, for some people at least, judgments of causal strength may be considerably exaggerated, in that individual aspects of what are clearly complex multi­ factorial problems (such as unemployment) are selectively picked out and treated as “the cause.” Even more worrying, however, are the high levels of certainty that many of Kuhn’s participants exhibit, even in the face of largely undiagnostic evidence, suggesting also that failures with respect to “degree of support” may be more prevalent than one would wish. In all of this, it remains to be fully understood what features people attend to in both es­ tablishing and manipulating causal models, what role argument plays in this (given also that much of our knowledge derives from the testimony of others; Coady, 1994), and what weight people attach to these. It also seems important to understand how detailed or sparse people’s causal models are in many real-world contexts, and how this interacts with argument evaluation. To achieve this, there is a need for integration of research on causal learning, causal inference, and causal argument.

Page 23 of 33

Causal Argument

Figure 25.3 Schematic representation of the align­ ment between causal-premises, causal-model, and a subjective degree of support. Perfect alignment of all three elements maps onto a equilateral triangle, as shown here, while imperfections gives rise to angles deviating from 60° angles.

(p. 490)

At the same time, as we have stressed throughout this chapter, the study of causal

argument offers unique opportunities for insight into causal cognition, and work on causal argument offers methodological approaches that complement lab-based studies in a number of important ways. Examination of real-world argument is possible, not just through structured interviews such as those of Kuhn and colleagues, but also through the direct analysis of corpora of extant speech and text. Since argumentation is a pervasive linguistic activity and causal reasoning is frequent in arguing, linguistic data for the study of causal reasoning are readily available, for in­ stance in a number of freely accessible linguistic text corpora (for an overview of such corpora, see Xiao, 2008, or Lee, 2010). As Oestermeier and Hesse’s (2000) study illus­ trates, corpus analysis does not require a prior determination of variables, and is thus well-suited for explorative research into natural language argumentation. In particular, corpora may be assumed to provide a fairly unbiased empirical basis for specific research questions probed because they have been recorded independently of those questions (see Schütze, 2010, p. 117). Thus, language corpora could be used to study linguistic behavior in more or less natural environments. Linguistic text corpora also come in many shapes and sizes. Some are restricted to writ­ ten language (and, as in the case of books and articles, may reflect repeated careful edit­ ing), while others contain spoken language. Corpora thus not only make available argu­ mentation in more or less natural environments, but also contain argumentation more or less produced “on the fly.” Two general strategies for making use of text corpora can be distinguished. Corpus-dri­ ven research uses them in explorative fashion under minimal hypotheses as to the linguis­ tic forms that may be relevant to a given research question (also known as “letting the data speak first”); corpus-based research, in contrast, uses them to verify or falsify specif­ Page 24 of 33

Causal Argument ic hypotheses regarding the use of natural language on the basis of extant theories of lin­ guistic forms (see Biber, 2010). While it is fairly straightforward to use corpora to gain access to some data that may well exhibit a larger slice of the real-world use of the language of causality, there are limits to studying causal argumentation in this way. First, an exhaustive list of the linguistic means to express causality is unlikely to be forthcoming, as there are several ways to express a causal connection through coordination with terms not specifically marked for expressing causality (exemplified in this very sentence through the use of “as,” which may be read causally, or not). As Oestermeier and Hesse observe, “In verbal arguments, reasons and conclusions can be easily distinguished if connectives like ‘because,’ ‘therefore,’ ‘so’ or other explicit markers are used” (Oestermeier & Hesse 2000, p. 65). However, the sheer number of such markers testifies to an astonishingly wide variety of linguistic forms that can be employed to express some sort of causal connection, and thus figure in causal rea­ soning. The perhaps most frequent linguistic form is the subordination of clauses by means of (linguistic rather than logical) conjunctions such as “because,” “since,” “as,” and so on (see Altenberg, 1984; Diessel & Hetterle, 2011). But a variety of causative verbs, adverbs, adjectives, and prepositions can also be used to express causal connec­ tions (for an overview, see Khoo, Chan, & Niu, 2002). Moreover, causality can also be encoded in text organization, and so pertains to organiza­ tional principles that are reconstructable only at levels of discourse higher than the sen­ tence level (Achugar & Schleppegrell, 2005; Altenberg, 1984). Not only may causal premises thus be left implicit (see the section “Casual Argument Patterns from Corpus Analysis”), the relevant causal connection itself need not be evident on the linguistic sur­ face. Accordingly, we lack sufficient criteria to trace all causal phenomena contained in a given corpus through software that queries corpora for linguistic forms or structures. So, one retains an unknown number of false negatives among the results, and conversely, as is the case with “because” itself, some of the linguistic means for indicating causal rela­ tionships also have other roles than simple causal connection. Nevertheless, corpora make for an excellent tool in studying natural language argumen­ tation. Specifically, we think it worthwhile to examine further the communicative circum­ stances that are typical of different argument schemes. This would provide an entry point for investigation of the extent to which causal arguments in actual discourse meet the norms of the causal argument schemes they might be thought to manifest. It might, ar­ guably, also provide information about the persuasiveness of different types of causal ar­ gument. In particular, the frequency of the causal mechanism argument type (see Oester­ meier & Hesse, 2000, and the section “Causal Argument Patterns from Corpus Analysis” in this chapter) may well be indicative of the prevalence of a certain model of causality that (p. 491) laypeople endorse. Corpus data might then, for example, provide an empiri­ cal basis for establishing whether the mechanistic argument type is in fact associated with a specific model of causality, and to examine the association of other argument types with specific causal models.

Page 25 of 33

Causal Argument In short, corpus analysis may provide a valuable complement to both the experimental and interview-based methodologies that empirical studies of causal argument have seen.

Conclusions Causal argument, we hope the reader is now convinced, represents an important topic in its own right: the practical relevance of argumentation to everyday life is enormous, and, theoretically, a good case has been made that argumentation skills are deeply intercon­ nected with reasoning skills (e.g., Hahn & Oaksford, 2007a; Kuhn, 1991; Mercier & Sper­ ber, 2011). Given further the breadth of argument involving causation that can be seen in everyday life (as surveyed in the discussion of typologies), causal argument deserves far more consideration than it has, to date, received. Last but not least, a greater under­ standing of causal argument, both theoretically and empirically, is likely to provide new impetus to the understanding of causal cognition.

References Achugar, M., Schleppegrell, M. J. (2005). Beyond connectors: The construction of cause in history textbooks. Linguistics and Education, 16(3), 298–318. Ali, N., Chater, N., & Oaksford, M. (2011). The mental representation of causal condition­ al reasoning: Mental models or causal models. Cognition, 119(3), 403–418. Altenberg, B. (1984): Causal linking in spoken and written English. Studia Linguistica, 38(1), 20–69. Anderson, J. R. (1990). Cognitive psychology and its implications. San Francisco: W. H. Freeman. Bennett, J. (2003). A philosophical guide to conditionals. New York: Oxford University Press. Biber, D. (2010). Corpus-based and corpus-driven analyses of language variation and use. In: B. Heine & H. Narrog (Eds.), The Oxford handbook of linguistic analysis (pp. 159–191). Oxford: Oxford University Press. Coady, C. A. J. (1994). Testimony. Oxford: Oxford University Press. Corner, A., Hahn, U., & Oaksford, M. (2011). The psychological mechanism of the slippery slope argument. Journal of Memory and Language, 64, 153–170. Corner, A., & Hahn, U. (2013). Normative theories of argumentation: Are some norms bet­ ter than others? Synthese, 190(16), 3579–3610. Cummins, D. D., Lubart, T., Alksnis, O., & Rist, R. (1991). Conditional reasoning and cau­ sation. Memory & Cognition, 19(3), 274–282.

Page 26 of 33

Causal Argument Diessel, H., & Hetterle, K. (2011). Causal clauses: A crosslinguistic investigation of their structure, meaning, and use. In: Peter Siemund (Ed.), Linguistic universals and language variation (pp. 23–54). Berlin: Mouton de Gruyter. Dowe, P. (2000). Physical causation. New York: Cambridge University Press. Edgington, D. (1995). “On conditionals.” Mind, 104, 235–329. Elqayam, S., Thompson, V. A., Wilkinson, M. R., Evans, J. St. B. T., Over, D. E. (2015). De­ ontic introduction: A theory of inference from is to ought. Journal of Experimental Psy­ chology: Learning, Memory, and Cognition, 41(5), 1516–1532. Evans, J. St. B. T. & Over, D. E. (2004). If. Oxford: Oxford University Press. Evans, J. S. B., Barston, J. L., & Pollard, P. (1983). On the conflict between logic and belief in syllogistic reasoning. Memory & Cognition, 11(3), 295–306. Evans, J., Neilens, H., Handley, S., & Over, D. (2008). When can we say “if”? Cognition, 108(1), 100–116. Garssen, B. J. (2001). Argument schemes. In: F. H. van Eemeren (Ed.), Crucial concepts in argumentation theory (pp. 81–99). Amsterdam: Amsterdam University Press. (p. 492)

Gordon, T. F., Prakken, H., & Walton, D. (2007). The Carneades model of argument and burden of proof. Artificial Intelligence, 171(10), 875–896. Govier, T. (1982). What’s wrong with slippery slope arguments?. Canadian Journal of Phi­ losophy, 12(2), 303–316. Grennan, W. (1997). Informal logic. Kingston: McGill-Queen’s University Press. Griffiths, T. L., & Tenenbaum, J. B. (2005). Structure and strength in causal induction. Cognitive Psychology, 51(4), 334–384. Hahn, U. (2011). The problem of circularity in evidence, argument, and explanation. Per­ spectives on Psychological Science, 6(2), 172–182. Hahn, U. (2014). The Bayesian boom: Good thing or bad? Frontiers in Cognitive Science, 5, Article 765. Hahn, U., & Hornikx, J. (2015). A normative framework for argument quality: Argumenta­ tion schemes with a Bayesian foundation. Synthese, 1–41. Hahn, U., & Oaksford, M. (2006). A Bayesian approach to informal argument fallacies. Synthese, 152(2), 207–236. Hahn, U., & Oaksford, M. (2007a). The rationality of informal argumentation: A Bayesian approach to reasoning fallacies. Psychological Review, 114, 704–732.

Page 27 of 33

Causal Argument Hahn, U. & Oaksford, M. (2007b) The burden of proof and its role in argumentation. Argu­ mentation, 21, 39–61. Hahn, U., & Oaksford, M. (2012). Rational argument. In K. J. Holyoak and R. G. Morrison (Eds.), The Oxford handbook of thinking and reasoning (pp. 277–298). Oxford: Oxford Uni­ versity Press. Hahn, U., Oaksford, M. & Harris, A. J. L. (2013). Testimony and argument: A Bayesian perspective. In: F. Zenker (Ed.), Bayesian argumentation (pp. 15–38). Dordrecht: Springer. Hamblin, C. L. (1970). Fallacies. London: Methuen. Hamby, B. (2013). Libri ad nauseam: The critical thinking textbook glut. Paideusis, 21(1), 39–48. Hansen, H. (2002). The straw thing of fallacy theory: The standard definition of “fallacy.” Argumentation, 16(2), 133–155. Harding, H. F. (1952). The age of danger: Major speeches on American problems. New York: Random House. Hart, H. L. A., & Honoré, T. (1985). Causation in the law (2nd ed.). Oxford: Oxford Univer­ sity Press. Hastings, A. C. (1962). A reformulation of the modes of reasoning in argumentation. Un­ published doctoral dissertation, Northwestern University. Hitchcock, D., & Wagemans, J. (2011). The pragma-dialectical account of argument schemes. In B. Garssen & A. F. Snoeck Henkemans (Eds.), Keeping in touch with pragmadialectics (pp. 185–205). Amsterdam: John Benjamins. Honoré, A. (2010). “Causation in the Law.” In E. N. Zalta (Ed.), The Stanford encyclopedia of philosophy. http://plato.stanford.edu/archives/win2010/entries/causation-law/. Inch, E. S., & Warnick, B. H. (2009). Critical thinking and communication: The use of rea­ son in argument (6th ed.). Upper Saddle River, NJ: Pearson. Illari, P., & Russo, F. (2014). Causality: Philosophical theory meets scientific practice. Ox­ ford: Oxford University Press. Keil, F. C. (2003). Folkscience: Coarse interpretations of a complex reality. Trends in Cog­ nitive Sciences, 7(8), 368–373. Khoo, C., Chan, S. & Niu, Y. (2002). The many facets of the cause-effect relation. In: R. Green, C. A. Bean, & S. H. Myaeng (Eds.), The semantics of relationships: An interdiscipli­ nary perspective (pp. 51–70). Dordrecht: Springer Netherlands. Kienpointner, M. (1992). Alltagslogik: Struktur und Funktion von Argumentationsmustern. Stuttgart-Bad Cannstatt: Friedrich Frommann. Page 28 of 33

Causal Argument Kuhn, D., Amsel, E., & O’Loughlin, M. (1988). The development of scientific thinking skills. San Diego, CA: Academic Press. Kuhn, D. (1991). The skills of argument. Cambridge: Cambridge University Press. Kuhn, D. (2007). Jumping to conclusions: Can people be counted on to make sound judg­ ments? Scientific American Mind (February/March), 44–51. Lee, David Y. W. (2010). What corpora are available? In: M. McCarthy & A. O’Keeffe (Eds.), Corpus Linguistics (pp. 107–121). London; New York: Routledge. Lewis, D. (1973). Counterfactuals. Oxford: Blackwell. Lombrozo, T. (2007). Simplicity and probability in causal explanation. Cognitive Psycholo­ gy, 55(3), 232–257. Mackie, J. L. (1965). Causes and conditions. American Philosophical Quarterly, 245–264. Maio, G. R., Hahn, U., Frost, J.-M., Kuppens, Rehman, T. N., & Kamble, S. (2014). Social values as arguments: Similar is convincing. Frontiers in Psychology. Maio, G. R., Pakizeh, A., Cheung, W., and Rees, K. J. (2009). Changing, priming, and act­ ing on values: Effects via motivational relations in a circular model. Journal of Personality and Social Psychology, 97, 699–715. Marcus, S. L., & Rips, L. J. (1979). Conditional reasoning. Journal of Verbal Learning and Verbal Behavior, 18(2), 199–223. Mercier, H., & Sperber, D. (2011). Why do humans reason? Arguments for an argumenta­ tive theory. Behavioral and Brain Sciences, 34(2), 57–74. Nosofsky, R. M. (1986). Attention, similarity, and the identification-categorization relation­ ship. Journal of Experimental Psychology: General, 115, 39–57. Oaksford, M., & Chater, N. (1994). A rational analysis of the selection task as optimal da­ ta selection. Psychological Review, 101(4), 608. Oaksford, M. & Chater, N. (2010a). Cognition and conditionals: An introduction. In M. Oaksford and N. Chater (Eds.), Cognition and conditionals: Probability and logic in human thinking (pp. 3–36). Oxford: Oxford University Press. Oaksford, M., & Chater, N. (2010b). Causation and conditionals in the cognitive science of human reasoning. Open Psychology Journal, 3, 105–118. Oaksford, M., Chater, N., & Larkin, J. (2000). Probabilities and polarity biases in condi­ tional inference. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26(4), 883–899.

Page 29 of 33

Causal Argument Oaksford, M., & Hahn, U. (2004). A Bayesian approach to the argument from ignorance. Canadian Journal of Experimental Psychology/Revue canadienne de psychologie expéri­ mentale, 58(2), 75. Oestermeier, U. & Hesse, F. W. (2000). Verbal and visual causal arguments. Cognition 75(1), 65–104. Over, D. E., Hadjichristidis, C., Evans, J. S. B. T., Handley, S. J., & Sloman, S. A. (2007). The probability of causal conditionals. Cognitive Psychology, 54(1), 62–97. Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible infer­ ence. San Mateo, CA: Morgan Kaufmann. Pearl, J. (2000/2009). Causality models, reasoning, and inference (2nd ed.). Cambridge: Cambridge University Press. Perelman, C., & Olbrechts-Tyteca, L. (1969). The new rhetoric, trans. John Wilkin­ son and Purcell Weaver. Notre Dame, IN: University of Notre Dame Press. (p. 493)

Prakken, H. & Renooij, S. (2001). Reconstructing causal reasoning about evidence: A case study. In B. Verheij, A. R. Lodder, R. P. Loui, & A. J. Muntjewerff (Eds.), Legal knowledge and information systems. JURIX 2001: The fourteenth annual conference (pp. 131–142). Amsterdam: IOS Press. Quinn, S., & Markovits, H. (1998). Conditional reasoning, causality, and the structure of semantic memory: Strength of association as a predictive factor for content effects. Cog­ nition, 68(3), B93–B101. Rahwan, I., & Simari, G. R. (2009). Argumentation in artificial intelligence. New York: Springer. Rahwan, I., Zablith, F., & Reed, C. (2007). Laying the foundations for a world wide argu­ ment web. Artificial Intelligence, 171(10–15), 897–921. Ramsey, F. P. (1931). Truth and probability: The foundation of mathematics and other logi­ cal essays. Sá, W. C., Kelley, C. N., Ho, C., & Stanovich, K. E. (2005). Thinking about personal theo­ ries: Individual differences in the coordination of theory and evidence. Personality and In­ dividual Differences, 38(5), 1149–1161. Salmon, W. (1984). Scientific explanation and the causal structure of the world. Prince­ ton, NJ: Princeton University press. Savage, L. J. (1954). The foundations of statistics. New York: Wiley. Schupbach, J. N., & Sprenger, J. (2011). The logic of explanatory power. Philosophy of Science, 78(1), 105–127. Page 30 of 33

Causal Argument Schütze, C. T. (2010). Data and evidence. In: K. Brown, A. Barber, & R. J. Stainton (Eds.), Concise encyclopedia of philosophy of language and linguistics (pp. 117–123). Amster­ dam: Elsevier. Schwartz, S. H. (1992). Universals in the content and structure of values: Theoretical ad­ vances and empirical tests in 20 countries. Advances in Experimental Social Psychology, 25, 1–65. Schwartz, S. H., & Boehnke, K. (2004). Evaluating the structure of human values with confirmatory factor analysis. Journal of Research in Personality, 38, 230–255. Singmann, H., Klauer, K. C., & Over, D. (2014). New normative standards of conditional reasoning and the dual-source model. Frontiers in Psychology, 5. Sloman, S., & Lagnado, D. (2005). Do we “do”? Cognitive Science, 29, 5–39. Toulmin, S. E. (1958/2003). The uses of argument (updated edition). Cambridge: Cam­ bridge University Press. van Eemeren, F. H., & Grootendorst, R. (2004). A systematic theory of argumentation: The pragma-dialectical approach. Cambridge: Cambridge University Press. van Eemeren, F. H., & Kruiger, T. (1987). Identifying argumentation schemes. In Argu­ mentation: Perspectives and Approaches (pp. 70–81). van Fraassen, B. (1989). Laws and symmetry. Oxford: Oxford University Press. van Gelder, T. J., Bissett, M., & Cumming, G. (2004). Cultivating expertise in informal rea­ soning. Canadian Journal of Experimental Psychology, 58, 142–152. Verheij, B. (2003). Artificial argument assistants for defeasible argumentation. Artificial Intelligence, 150(1), 291–324. Verplanken, B., and Holland, R. W. (2002). Motivated decision making: Effects of activa­ tion and self-centrality of values on choices and behavior. Journal of Personality and So­ cial Psycholology, 82, 434–447. Verschueren, N., Schaeken, W., & d’Ydewalle, G. (2005). A dual-process specification of causal conditional reasoning. Thinking & Reasoning, 11(3), 239–278. Volokh, E. (2003). The mechanisms of the slippery slope. Harvard Law Review, 116, 1026– 1137. von Aufschaiter, C., Erduran, S., Osborne, J. & Simon, S. (2008). Arguing to learn and learning to argue: Case studies of how students’ argumentation relates to their scientific knowledge. Journal of Research in Science Teaching, 45(1), 101–131.

Page 31 of 33

Causal Argument Walsh, C. R., Hahn, U., & DeGregorio, L. (2009). Severe outcomes and their influence on judgments of causation. Proceedings of the 31st annual meeting of the Cognitive Science Society (pp. 550–554). Walton, D. N. (1988). Burden of proof. Argumentation, 2(2), 233–254. Walton, D. N. (1996). Argument structure: A pragmatic theory. Toronto Studies in Philoso­ phy. Toronto: University of Toronto Press. Walton, D. N. (2000) Scare tactics: Arguments that appeal to fear and threats. Dordrecht: Kluwer. Walton, D. N. (2010). Why fallacies appear to be better arguments than they are. Informal Logic, 30(2), 159–184. Walton, D. N., Reed, C., & Macagno, F. (2008). Argumentation schemes. Cambridge: Cam­ bridge University Press. Wason, P. C. (1968). Reasoning about a rule. Quarterly Journal of Experimental Psycholo­ gy, 20, 273–281. Weisberg, J. (2009). Locating IBE in the Bayesian framework. Synthese, 167(1), 125–143. Willingham, D. (2007). Critical thinking: Why is it so hard to teach? American Educator, 31(2), 8–19. Xiao, Richard. (2008). Well-known and influential corpora. In: A. Lüdeling & M. Kytö (Eds.), Corpus linguistics: An international handbook (Vol. 1, pp. 383–457). Berlin; New York: de Gruyter. (p. 494)

Notes: (1.) This duality—of reasons as causes for belief, and causes as reasons in argumentation —is not limited to English, and can also be discerned in other languages such as German “weil”; Swedish “darför,” French “parce que,” Italian “perque,” Spanish “porque,” Polish “ponieważ,” Russian “potomú čto,” Chinese “yīn,” Japanese “kara,” and so on. (2.) Mackie’s statement reflects a widely shared understanding of causes as partial condi­ tions of their contingent effects, and seeks to convey—in a great hurry—that (i) causes bring their effects about only under suitable background conditions, so that causes alone are not sufficient for their effects; (ii) that an effect cannot occur without its cause, so that causes are nevertheless necessary; (iii) that cause plus background conditions (and perhaps other things) together constitute a condition, C, yielding the effect—thus making C a sufficient condition, since C cannot be a necessary condition, (iv) as contingent ef­ fects can be alternatively conditioned.

Page 32 of 33

Causal Argument (3.) These three types, and with them the critical questions associated with each, may partially overlap. For instance, as Hitchcock and Wagemans (2011, p. 193) point out, a fever can be viewed both as an effect and as a symptom of the infection that causes it. Hence, an argument from effect to cause—here, from fever to infection—may instantiate the symptomatic or the causal argumentation scheme, and possibly even both at the same time. (4.) For a critical evaluation of the import of legal concepts into argumentation theory, in particular the central notion of “burden of proof,” see Hahn and Oaksford (2007b). (5.) Clause 4 gives three different kinds of conditionals, or rules of inference or warrant, each of which is hedged by a ceteris paribus clause. The necessary condition is: if Si would not occur, then Sj would not occur; the sufficient condition states: if Si would oc­ cur, then Sj would occur; and the INUS condition reads: if Si would occur within a set of conditions, each of which is necessary for the occurrence of Sj, then Sj would occur (see note 2 on Mackie’s INUS condition). (6.) It should be noted in this context that Sloman and Lagnado (2005) have argued for qualitative differences between conditional and causal reasoning. However, Oaksford and Chater (2010b) argue that the seeming empirical differences observed by Sloman and Lagnado (2005) are due to inadvertent differences in causal strength. The results of Ali et al. (2011) are in keeping with that suggestion.

Ulrike Hahn

Department of Psychological Sciences Birkbeck, University of London London, Eng­ land, UK Roland Bluhm

Institute of Philosophy and Political Science TU Dortmund University Dortmund, Ger­ many Frank Zenker

Department of Philosophy & Cognitive Science Lund University Lund, Sweden

Page 33 of 33

Causality in Decision-Making

Causality in Decision-Making   York Hagmayer and Philip Fernbach The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.27

Abstract and Keywords Although causality is rarely discussed in texts on decision-making, decisions often depend on causal knowledge and causal reasoning. This chapter reviews what is known about how people integrate causal considerations into their choice processes. It first introduces causal decision theory, a normative theory of choice based on the idea that rational deci­ sion-making requires considering the causal structure underlying a decision problem. It then provides an overview of empirical studies that explore how causal assumptions influ­ ence choice and test predictions derived from causal decision theory. Next it reviews three descriptive theories that integrate causal thinking into decision-making, each in a different way: the causal model theory of choice, the story model of decision-making, and attribution theory. It discusses commonalities and differences between the theories and the role of causality in other decision-making theories. It concludes by noting challenges that lie ahead for research on the role of causal reasoning in decision-making. Keywords: causal decision-making, causal model, theory of choice, causal narratives, story model, attribution theo­ ry

Decisions generally concern potential actions (e.g., going on a diet) or objects that afford actions (e.g., choosing groceries to prepare for dinner). There are many ways these deci­ sions are made (see Koehler & Harvey, 2007, for an overview). Some decisions are guided by social, moral, or legal norms (e.g., not to lie), some are guided by emotions (e.g., avoidance based on disgust), but many are based on the outcomes that we expect to re­ sult from actions afforded by the decision options. For example, we choose to diet be­ cause we expect to lose weight, and we decide on a set of ingredients at the grocery story because we believe they will combine to make a good meal. Which outcomes actually re­ sult depends on underlying causal mechanisms. Whether dieting results in weight loss de­ pends on a complex network of physiological and psychological processes. For instance, dieting may cause cravings for sweet and fatty food, resulting in binge eating, disturbed eating patterns, and sometimes even in weight gain (Howard & Porzelius, 1999). Causal knowledge about the system in question can help us to make more adaptive decisions.

Page 1 of 30

Causality in Decision-Making Knowing that dieting sometimes triggers binging allows a dieter to structure his or her behavior and environment to avoid this negative consequence. In order to predict the consequences of different options, it is sometimes also necessary to analyze the causes of the current situation. Take a general practitioner who is consult­ ed by an emaciated patient. Serious underweight can be caused by many factors, includ­ ing anorexia, diabetes, and cancer. Depending on the cause, the same intervention can have different consequences. A high-calorie diet could be beneficial for a cancer patient but harmful for an untreated diabetic. Sometimes diagnosing the causes of a situation is sufficient to predict which decision option will yield the best consequences. Consider a physician who has to decide on how to manage a patient’s cold. By determining (p. 496) whether the symptoms are due to a viral or bacterial infection, the decision becomes ob­ vious. These examples show that causal knowledge supports decision-making in two ways: by al­ lowing us to predict the consequences of different actions under the given circumstances, and by helping to make diagnoses that suggest which interventions will be effective. Thus causal knowledge and inferences based on this knowledge seem to be critical for good de­ cision-making. The importance of causality to advantageous decision-making is potentially problematic because research into causal explanations has shown that people often have only rough, skeletal knowledge about causal mechanisms (Keil, 2008). People often do not know how everyday objects like bicycles or can openers work (Lawson, 2006; Rozenblit & Keil, 2002), or how favored public policies will lead to beneficial consequences (Fernbach, Roger, Sloman & Fox, 2013). Although some people have rather elaborate lay theories about certain domains like medicine and biology, these theories are often inconsistent with scientific consensus and tend to be incomplete (Furnham, 1988; Murdock, 1980). Therefore people’s causal knowledge only allows for very rough, and sometimes incor­ rect, predictions of consequences. Given that our causal knowledge is incomplete or sometimes wrong, it might be futile or even harmful to try to base decisions on causal considerations. The remainder of this chapter is organized into four parts. First, we explore the norma­ tive question of whether the causal structure underlying a decision problem should be considered when making a decision, and we describe causal decision theory, which claims that causal considerations are necessary for rational decision-making. Second, we present evidence from empirical studies investigating how people use causality in deci­ sion-making. Third, we introduce three descriptive models from cognitive psychology, which assume that causal reasoning is central to decision-making. Finally, we end by dis­ cussing open questions and challenges that lie ahead for causal decision-making re­ search.

Page 2 of 30

Causality in Decision-Making

Should Decisions Be Based on Causal Consid­ erations? Economists and psychologists have argued that rational decision-makers’ choices must conform to the recommendations of expected utility theory (Savage, 1954; von Neumann & Morgenstern, 1944). The theory distinguishes between actions (Ai), outcomes (Oji), util­ ities of outcomes (U[Oji]), and the “state of the world” (Sj), which encompasses all the variables that affect the outcome beside the action. Figure 26.1 illustrates the basic struc­ ture (p. 497) of the decision situation and shows a table demonstrating an example. The theory assumes that every outcome can be assigned a utility, which represents the useful­ ness of the outcome for the decision-maker. For simplicity, assume that the costs of the action (i.e., the time and resources required to perform the action) are already included in the utility of the outcome. The theory also assumes that every outcome can be assigned a probability P(Oji), which represents the uncertainty that a particular outcome will result after the action has been taken. Outcomes can be uncertain for several reasons: (a) the state of the world, which determines the outcome, is not known for sure [P(Sj) < 1], and/ or (b) it is not known for sure whether the state of the world, in conjunction with the ac­ tion taken, is followed by the outcome [P(Oji|Sj, Ai) < 1]. Both probabilities reflect the cur­ rent lack of knowledge or evidence concerning the presence of the outcome given a par­ ticular action. To make a decision, the expected utility (EU) of every action Ai needs to be calculated by multiplying the probability of each possible outcome with the utility of the respective outcome and summing up the products. Formally:

(1)

Figure 26.1 Decision-making from the perspective of expected utility theory. Action A and the state of the world S jointly determine the probability of potential outcomes O. Outcomes have a certain utility for the decision-maker.

Page 3 of 30

Causality in Decision-Making The theory dictates that the action with the highest expected utility should be performed. That is, choice should be based on the principle of maximizing expected utility (MEU prin­ ciple). The table in Figure 26.1 provides an example of how expected utilities are calculated. Imagine that a 40-year-old woman has to decide whether to start breast cancer screening (Action 1) or not (Action 2). The state of the world (S) is either the presence of breast can­ cer (S1) or the absence of cancer (S2). The major outcomes are to die early from cancer (O1) or to live long (O2). Obviously, the utility of a long life is much higher than the utility of dying early. How big the difference is depends on the decision-maker’s happiness in life. The screening procedure has some cost, because it takes time, creates anxiety, and so on. Therefore, the utility of the outcomes after screening are lower than the corre­ sponding outcomes without screening. The outcome with the lowest utility is usually as­ signed zero; all other outcomes are assigned utilities that reflect how beneficial they are for the decision-maker. In the example shown, we assume that a long life has a much higher utility than the costs (e.g., time, pain, anxiety) of cancer screening. The benefit of screening results from the early detection of cancer. When cancer is de­ tected in an early stage, it can be treated more successfully. Hence the probability of liv­ ing long is higher for women who are screened. Research indicates that the lifetime prob­ ability of a woman getting breast cancer is roughly 12% (P[S1]=.12; www.cancer.gov), and roughly 1 in 4 women who contract breast cancer die from it, which means that over­ all about 3% of women die from breast cancer. The risk to die from cancer is reduced through screening starting at age 40 by about 20%, that is, from 3% to about 2.4%. Based on these probabilities and the utilities chosen to illustrate the example in the figure, screening has a slightly higher expected utility than no screening. Therefore, screening should be chosen according to the principle of maximizing expected utility. Note that the decision could change if the assigned utilities were changed. Expected utility theory has intuitive appeal and is considered the gold standard for nor­ mative decision-making theory. Proponents of causal decision theory, however, criticize expected utility theory, because it does not explicitly consider the causal basis of outcome probabilities (Joyce, 1999; Lewis, 2006; Maher, 1987; Nozick, 1993; Skyrms, 1982). Propo­ nents of causal decision theory point out that a probabilistic relation between an action and an outcome sometimes reflects a spurious, non-causal relationship. According to causal decision theorists, this distinction is crucial for good decision-making because an action only increases the decision-maker’s utility if the action causally affects the out­ come, but not when it is spuriously related. Thus, decisions should not be based on ob­ served statistical relationships, but rather should be based on the underlying causal structure. Nozick (1993) elaborated this idea by proposing that decisions should be based on causal expected utilities (CEU) rather than evidential expected utilities (EEU). To make this more concrete, consider the following example, which is related to the pre­ ceding breast cancer example. Women who worry a lot about breast cancer tend to have a greater likelihood of dying from breast cancer than women who do not. There is little Page 4 of 30

Causality in Decision-Making evidence that worry causally relates to death. Instead, this correlation is spurious. Those who worry more are more likely to actually have cancer and hence have a higher proba­ bility of dying. (For instance, worry may be triggered by unusual signals from the body.) As a consequence, it does not make sense to stop worrying in order to increase the chances of survival. Actions, the state of the world, and the resulting outcome may be related through various causal structures. Figure 26.2 shows four possible causal structures. First, ac­ tions may generate the outcome (Figure 26.2a). For example, dieting causes weight loss. Second, actions may not cause an outcome, but they may enable other variables, which are part of the state of the world, to generate the outcome (Figure 26.2b). For example, buying a lottery ticket does not cause us to win, but it enables us to win. Our purchase has no influence on which ticket is drawn as the winner. It is the draw that determines whether we win a lot of money. Third, actions may not directly affect the outcome, but in­ stead indirectly affect it by influencing the state of the world (Figure 26.2c). For example, buying health insurance does not make us healthy. But having health coverage (the re­ sulting state of the world) causes us to get better access to health care, which in turn re­ (p. 498)

sults in better health. Finally, as in the preceding example, actions and outcomes may be spuriously related (Figure 26.2d). This is the case when the (p. 499) action and the out­ come are caused by some other factor. Due to this common cause, the action and the out­ come are statistically but not causally related to each other.1

Figure 26.2 Decision-making from the perspective of causal decision-making theories. Action A, the state of the world S, and the outcome O are causally relat­ ed to each other. They may be related through differ­ ent causal structures. Note that all structures (a)–(d) entail that there is a probabilistic relation between action and outcome, that is, the outcome is proba­ bilistically dependent on the action.

Page 5 of 30

Causality in Decision-Making Figure 26.2 also outlines formalizations for calculating causal expected utilities (CEU) given the different causal structures. As the formalizations show, the causal expected util­ ities are calculated differently, depending on the underlying causal structure. In the first three cases, causal expected utility can be calculated by

(2)

hence by taking into account the statistical dependence between the action and all possi­ ble outcomes. However, it would be wrong to do so when the action and the outcome are spuriously related. In this case, the outcome and the action are independent of each other if the action is deliberately chosen [formally P(Oji|Ai) = P(Oji)]. Therefore the causal ex­ pected utility has to be calculated as

(3)

This equation takes into account that taking the action will not change the probability of the outcome. In order to predict the likelihood of the outcome correctly, it is important to distinguish between choosing the action and observing the action. When there is a spurious relation between an action and an outcome, observing the action is predictive of the outcome, but choosing the action does not increase the likelihood of the outcome. Consider again the example of worry and breast cancer. If you observe someone worrying about breast can­ cer, this should raise your subjective probability that she has breast cancer, and, as a con­ sequence, your subjective probability that she will die from breast cancer. But this is not what is relevant for decision-making. For decision-making, the consequences of making a choice are relevant. If she were to decide to stop worrying, her chances of dying from breast cancer would not be affected. Figure 26.3 illustrates this point. The left panel shows what can be inferred about the likelihood of dying from cancer from observing worry. The right panels shows what will happen if the same person decides to worry or not to worry. Deciding entails that the action is now determined by choice. It is no longer determined by the presence or absence of breast cancer. By virtue of choosing the action, the action becomes independent of breast cancer. Worry is no longer predictive of breast cancer, and therefore worry is no longer predictive of the likelihood of dying from breast cancer. The distinction between observing and choosing an action can be formally captured by a distinction made in causal Bayes net theories between observing the value of a variable and an intervention that sets the variable to a specific value (Pearl, 2000; Spirtes, Gly­ mour, & Scheines, 1993/2000; Rottmann, Chapter 6 in this volume). Interventions render the intervened-on variable independent of all other variables (except the variable’s ef­ fects; Pearl, 2000). Thus the intervened-on variable becomes independent of its usual Page 6 of 30

Causality in Decision-Making causes. Observed variables, by contrast, are probabilistically related to their causes. De­ liberate choices are like interventions in that they exogenously determine the action and render it independent of other factors (Hagmayer & Sloman, 2009; Sloman, Fernbach, and Hagmayer, 2010). For instance, deliberately choosing not to worry renders worry in­ dependent of its normal causes (e.g., the presence of cancer). To summarize, causal decision theory argues that knowledge of the statistical relation­ ship between an action and a desired outcome is sometimes insufficient for making the best choice. In order to make an optimal decision, decision-makers have to understand the causal structure that relates actions, potential states of the world, and outcomes. Moreover, choice should be treated as an intervention, not an observation. Based on as­ sumptions about the underlying causal structure, causal expected utilities can be calcu­ lated that allow the decision-maker to identify and choose the action that increases the probability of obtaining the desired outcomes the most. These points are not necessarily inconsistent with expected utility theory. Instead, they show that reasonable utilities can only be generated if one has knowledge of the causal structure relating the chosen action and the desired outcome. Thus, causal knowledge is central to rational decision-making.

Do People Engage in Causal Decision-Making? Causal decision theory is a normative theory that specifies how a rational decision-maker should determine the utility of decision options. From a psychological perspective, we can ask whether it describes how people actually make decisions. If causal decision theory is interpreted as a descriptive theory, it makes two basic predictions. First, decision-makers should analyze the causal structure (p. 500) underlying a decision problem in order to be able to predict the consequences resulting from the available options. Second, decisionmakers should base their decisions on the causal expected utilities of the given options. This implies that, when predicting consequences, they should treat their choice as an in­ tervention, which entails that actions become independent of other causal factors that al­ so affect the outcome. We will explore whether these predictions are supported by empiri­ cal evidence in the following sections.

Page 7 of 30

Causality in Decision-Making

Figure 26.3 Difference between deliberately choos­ ing an action and observing an action. Solid arrows indicate causal relations, dashed arrows indicate evi­ dential, statistical relations.

Do People Analyze the Causal Structure Underlying a Decision Prob­ lem? As we have explained, people should analyze the underlying causal structure to under­ stand the consequences of different decision options. For example, when choosing be­ tween different treatments for depression, it is important to know whether the depression is a reaction to some other medical or mental problem. If it is a reaction, addressing the other problem will be more effective for ameliorating depression than addressing depres­ sion directly. Therefore, the causes of depression have to be analyzed before the conse­ quences of the different treatment options can be predicted. There is good evidence that people spontaneously search for causes when they are con­ fronted with an unexpected, threatening, or norm-violating event (Kahneman, 2011; Kah­ neman & Miller, 1986; Weiner, 1985). In these cases they use the available information and background knowledge to infer the causes of what happened. Research on naturalis­ tic decision-making reports similar findings. When decision-makers recognize a situation as a familiar decision problem, they know what to do. But when the situation is novel, they initiate a causal analysis (Klein, 1998; Zsambok & Klein, 1997). Other research has explored whether decision-makers analyze the causal structure con­ necting actions and outcome. Hagmayer and Sloman (2009, Experiment 2) presented par­ ticipants with a statistical relation between an action and an outcome (e.g., “people who watch movies in their original language speak better English than people who do not”), and asked whether they would recommend the action to (p. 501) a friend who is interested in achieving the outcome. After participants made their recommendation, they were asked whether the relation was due to a direct causal relation between the action and the outcome or a common cause influencing both the action and the outcome. Participants Page 8 of 30

Causality in Decision-Making who believed in a common cause did not recommend taking the action, while people who believed in a direct causal relation did. The same finding resulted for relations, which participants considered plausible and implausible in a pretest. This indicates that partici­ pants probably considered the underlying causal structure before making their choice and did not rely merely on the plausibility of the given relation. Hagmayer and Meder (2013) investigated whether participants analyze causal structure when making repeated decisions. Participants were asked to maximize the vaccine pro­ duced by batches of genetically modified bacteria. To do so, they could choose between different trigger substances. In the “common cause” condition, the trigger with the high­ est payoff affected two relevant genes directly. In the other “chain” condition, the same trigger affected only one gene, which in turn activated the second gene. Participants not only learned which trigger maximized the outcome, many also learned about the causal structure connecting actions to outcomes. Importantly, assumptions about causal struc­ ture strongly affected later decisions when new options became available. There is, how­ ever, conflicting evidence from other studies on repeated decision-making and control of dynamic systems, which show that many participants do not learn about causal structure (Hagmayer et al., 2010; Osman, 2010; see also Osman, Chapter 16 in this volume). In sum, the first prediction of causal decision theory seems to be supported. People ana­ lyze the causal structure underlying a decision problem when the situation is sufficiently novel to warrant a causal analysis and when they have sufficient background knowledge to infer the underlying causal structure.

Do People Base Their Decisions on Causal or Evidential Expected Utilities? According to causal decision theory, people should base their decisions on causal expect­ ed utilities rather than evidential expected utilities. To do so, they have to consider the causal relation between an action and an outcome and not merely the statistical relation, which could be spurious. Several experiments have investigated this prediction (Hagmay­ er & Sloman, 2009; Robinson, Sloman, Hagmayer, & Hertzog, 2010). Robinson and col­ leagues (2010) presented participants with a number of economic games based on Prisoner’s Dilemma. In one scenario, participants were asked to imagine being a curren­ cy trader at the bank of Japan having to decide whether to buy dollars or euros. A com­ petitor would do the same. Depending on their choices, different profits would result. The expected payoffs of the two choices were presented as a table (see Table 26.1). They were also told that in the past, they and their competitor made the same decision 90% of the time. Based on this information, the evidential expected value of buying dollars would be substantially higher ($905 million) than the expected value of buying euros ($210 mil­ lion). Disregarding the information about statistical relations in the past, however, it would be better to buy euros regardless of the competitor’s decision (1.2 billion vs. 1 bil­ lion when the competitor buys dollars, 100 million vs. 50 million when the other person purchases euros). Participants were also told about the underlying causal structure. In one condition they were told that the relation between their own and their competitor’s Page 9 of 30

Causality in Decision-Making decision was due to a common cause: both base their decision on the same economic da­ ta, which has caused them to make the same decision in the past. In a second condition, they were told that the relation was due to the competitor waiting to see the participant’s decision before deciding. Hence, the participant’s decision directly causes the competitor’s choice. In a third control condition, no information about the underlying causal structure was provided. Respective assumptions were assessed after participants made their choice. The results showed an effect of causal beliefs upon decisions. Partici­ pants were more likely to buy dollars when they believed their choice was a direct cause of their competitor’s decision than when there was a common cause. These findings and the results of (p. 502) other experiments (DeKwaadsteniet et al., 2010; Flores, Cobos, Lopez, & Godoy, 2014; Hagmayer & Sloman, 2009; Yopchick & Kim, 2009) show that deci­ sion-makers tend to maximize causal expected utility rather than evidential expected utili­ ty. Table 26.1 Matrix of Options and Expected Payoffs in Experiment 1 of Robinson, Slo­ man, Hagmayer, & Hertzog (2010) Your competitor buys dollars

Your competitor buys euros

You buy dollars

$1 billion / $1 billion

$50 million / $1.2 billion

You buy euros

$1.2 billion / $50 million

$100 million / $100 million

Expected payoffs of the participant are given in bold print. Participants were also told that they and their competitor made the same decision in the past 90% of the time. Participants had to decide whether to purchase dollars or euros. Causal decision theory prescribes that decision-makers should differentiate between ob­ serving an action and choosing an action when predicting the consequences of their choice. They should consider their choice to be an intervention, which renders the chosen action independent of other causes of the desired outcome. The findings described in the preceding were consistent with this prediction, but did not directly investigate it. Other studies have investigated more directly how decision-makers conceptualize their choice. Studies on agency have shown that deliberate decision-makers perceive themselves as free agents responsible for their actions (Botti & McGill, 2011; DeCharms, 1968; Langer; 1975). Decision-makers often deny being forced or unconsciously influenced by other fac­ tors, even if they objectively are affected by them (Ariely & Norton, 2008; Bargh & Char­ trand, 1999; Wegner, 2002). Hence, it seems that decision-makers conceive of their choice as independent of other factors. Hagmayer and Sloman (2009) directly investigated whether people equate choices with interventions. In one experiment (Hagmayer & Sloman, 2009, Experiment 3) participants were asked to predict the likelihood of an outcome given than an action is chosen, en­ forced by another person or machine, or observed. Assumptions about the causal struc­ Page 10 of 30

Causality in Decision-Making ture underlying the statistical relation between the action and the outcome were also ma­ nipulated. The relation was either introduced as being due to a direct causal relation or as being due to a common cause affecting both action and outcome. For example, partici­ pants were told that research has shown that of 100 men who help with the chores, 82 are in good health, while only 32 of 100 not helping with the chores are. It was either ex­ plained that doing the chores provides additional exercise every day (direct cause condi­ tion) or that men who are concerned about equality issues are also concerned about health and therefore are likely to help with the chores and eat healthier food (common cause condition). Then they were asked to estimate the likelihood that a person will be in good health, if (1) she decides to do the chores, (2) is forced by somebody else to do it, or (3) is observed to do the chores. Participants made the same predictions in the first two conditions (deliberate choice and external force). In these conditions, participants expect­ ed to find good health if they assumed a direct causal relation between doing the chores and health, but not when they assumed a common cause. By contrast, in condition 3, when the action was observed, participants expected good health regardless of causal structure. In the common cause condition, this finding indicates that participants inferred the presence of the common cause from the observed action and in turn expected the out­ come. Experiment 4 of Hagmayer and Sloman (2009) showed that participants in fact de­ rive different inferences about a common cause from chosen and observed actions. There is, however, some conflicting evidence. There are studies which show that people sometimes violate the logic of causal decision theory by choosing options that reduce causal expected utilities. Self-deception is probably the most prominent example (Fern­ bach, Sloman, & Hagmayer, 2014; Mijovic-Prelec & Prelec, 2003; Quattrone & Tversky, 1984; Sloman, Fernbach & Hagmayer, 2011). For example, in Quattrone and Tversky’s (1984) study, participants were told that pain tolerance indicates whether someone has a strong or weak heart. In one condition, high tolerance was purportedly indicative of a good heart, while in the other condition the opposite was true. Participants told that high tolerance indicated a strong heart tolerated a painful task (holding one’s hand in cold wa­ ter) longer than those told the opposite. Since pain tolerance is a consequence, not a cause, of heart type, participants who increased their tolerance increased their pain with­ out generating positive causal consequences. Thus they violated the principle of maximiz­ ing causal expected utility. Presumably, they did it to signal to themselves that they have a strong heart. However, since the behavior violates causal logic, it provides no true evi­ dence about heart type. Thus cases like these indicate self-deception. Related findings come from research on self-handicapping (Urdan & Midgley, 2011). Taken together, the empirical evidence supports the predictions derived from causal deci­ sion theory, although there are some cases where decision-makers violate these predic­ tions.

Page 11 of 30

Causality in Decision-Making

Psychological Theories of Causal DecisionMaking Many psychological theories of decision-making allow for causal beliefs to affect deci­ sions, because (p. 503) these beliefs may influence the subjective likelihood of outcomes (see Box 26.1 for an overview of respective findings). For example, Kahneman (2011) claims that causal beliefs bias subjective likelihood, which in turn affects decisions (see also Tversky & Kahneman, 1980). We focus more narrowly on three theories in which causal reasoning plays a central role: The causal model theory of choice (Sloman & Hag­ mayer, 2006), the story model of decision-making (Pennington & Hastie, 1992), and attri­ bution theory (Weiner, 1985). Box 26.1 Does Causal Knowledge Affect Decisions? There are many ways in which causal beliefs may affect decisions. First, causal beliefs may alter expectancies of outcomes, that is, the subjective probability that an outcome will result from a certain action. For example, research on gambling has shown that many pathological gamblers have an “illusion of control” (Langer, 1975; Langer & Roth, 1975), a false belief that their actions have an influence on their chances of winning (Sylvain, Ladouceur, & Boisvert, 1997). Interestingly, there is good evidence that changing these faulty causal beliefs is an effective treatment for pathological gambling (Ladouceur, Fer­ land, & Fournier, 2003). In the health domain it has been shown that beliefs about control are the best predictors of patients’ choices (for reviews, see Baines & Wittkowski, 2013; Lobban, Barrowclough, & Jones, 2003). People who believe that professional treatments are able to control their illness are more likely to seek respective help and adhere to treatments. People who believe in personal control engage more in active coping and bet­ ter health behavior (Baines & Wittkowski, 2013; Lobban et al., 2003). Different causal be­ liefs also tend to result in different health outcomes. People with strong beliefs in person­ al and treatment control tend to have a better prognosis and experience less distress than people who believe that their illness cannot be controlled (Baines & Wittkowski, 2013; Lobban et al., 2003; Stockford, Turner, & Cooper, 2007). Second, causal beliefs may affect which information people search for before making a choice. For example, in research using the active information search paradigm, decisionmakers were asked to choose between different options in order to achieve a limited number of goals (Huber, Huber, & Bär, 2011; Huber, Wider & Huber, 1997). In this para­ digm, decision-makers have to actively search for the information they consider relevant by asking the researcher, who provides an answer in a standardized way. They are free to collect as much information as they like before making a final decision. The results show that decision-makers looked for the causal consequences resulting from the available op­ tions—first positive consequences and then negative ones. They were also interested in exploring additional actions that may counteract possible negative consequences arising from the different options (“risk diffusing operators”; Huber et al., 1997).

Page 12 of 30

Causality in Decision-Making Third, causal beliefs may affect how decision-makers weigh different pieces of informa­ tion when predicting outcomes before making a choice. Research on cue-based decisionmaking has shown that participants prefer to look up cues that are known causes of an outcome, and they weigh these cues more than other equally predictive, but not causally connected cues (Garcia-Retamero & Hoffrage, 2006; Garcia-Retamero, Wallin, & Dieck­ mann, 2007; Müller, Garcia-Retamero, Cokely, & Maldonado, 2011; Müller, Garcia-Re­ tamero, Galesic, & Maldonado, 2013). These findings are consistent with research show­ ing that causal beliefs can bias subjective probabilities (Chapman & Chapman, 1969; Koslowski, 1996; Tversky & Kahneman, 1980; but see Krynski & Tenenbaum, 2008, for evidence that causal beliefs may also de-bias probability estimates). Fourth, causal beliefs may affect which variables are targeted through an intervention when the outcome cannot be directly manipulated. For example, we cannot directly ma­ nipulate our body weight, but we can change its causes (e.g., through physical activity or diet). When people believe that many variables have a direct or indirect impact on the de­ sired outcome, they tend to judge interventions that address strong causes as more effec­ tive than interventions targeting other, weaker causes. For example, Ahn, Proctor, and Flanagan (2009) showed that mental health professionals consider interventions that tar­ get the most relevant cause of a mental disorder more effective than interventions that target weaker causes. Research also indicates that participants prefer interventions that target the first event within a causal chain (Flores, Cobos, Lopez, & Godoy, 2014; Flores, Cobos, López, Godoy, & González-Martín, 2014; Yopchik & Kim, 2009; see Ahn, Kim, & Lebowitz, Chapter 30 in this volume). The findings of Flores and colleagues (2014) indicate that this preference is probably due to people’s notion that by addressing the first cause, all variables within the chain could be changed in a desirable way.

(p. 504)

Causal Models and the Causal Model Theory of Choice

The causal model theory of choice (Hagmayer & Sloman, 2009; Sloman & Hagmayer, 2006) provides a descriptive model of decision-making based on causal models. A causal model is a mental representation of the causal structure of some set of objects or events in the world (Sloman, 2005; Waldmann, 1996). Causal models contain information about the direction and strength of causal relations connecting the entities. They allow a rea­ soner to predict the consequences of potential actions and to diagnose the likely causes of an observed outcome (Fernbach, Darlow & Sloman, 2011; Sloman & Lagnado, 2005; Waldmann & Hagmayer, 2005). Hence, causal models can support decision-making in the ways predicted by causal decision theory.2 According to the causal model theory of choice, decision-makers go through three phas­ es. First, the decision-maker represents the decision problem as a causal model. This model captures the relevant outcome variables, their potential causes, and the directions and strengths of the causal relations. For example, to decide on a treatment for depres­ sion, a causal model of the patient’s problems (i.e., outcomes) and their potential causes will be constructed. Some potential causes may be observed (e.g., loss of a loved one); Page 13 of 30

Causality in Decision-Making others may be inferred from the observed symptoms using the causal model (e.g., lack of coping skills). In the second phase, the available options (i.e., potential courses of action) are added to the model. Actions may target outcomes directly (e.g., mood-enhancing drugs reducing negative emotions) or causes of outcomes (e.g., training for coping skills). The theory assumes that decision-makers conceive of their choice as an intervention. Therefore, the actions to choose from are considered independent of all other variables in the model apart from their direct effects. Once the actions have been added, the model is used to infer the consequences that result from the different options. Inferences are made by using the causal model to run simulations. These simulations take into account uncertainties, including uncertainties with respect to the causal relations. For example, in the case of depression treatment, the consequences of various treatments and treatment combinations are envisioned, taking into account that a treatment may not work for a par­ ticular patient. In the final phase, a decision is made based on the results of the simula­ tions. If an option has no impact on the desired outcome (i.e., the outcome does not change regardless of whether the option is implemented or not), or if its costs outweigh its benefits, it is discarded right away. If there is only one option that increases the likeli­ hood of the desired outcome, the option is chosen. If several options make the desired outcome more likely, the theory proposes that decision-makers adapt their decision-mak­ ing strategy to the given circumstance. If there is time pressure, they may prefer to take the first option that increases the likelihood of desired consequences to a sufficient de­ gree. If there is enough time and the consequences of the decision are important, the de­ cision-maker may prefer to search for the option that maximizes causal expected utility. In the case of depression, one clinician may immediately decide on drug treatment be­ cause it directly improves the patient’s mood, while another clinician may prefer psy­ chotherapy to target deficits in coping that enable negative events to cause depression. Evidence for the causal model theory of choice comes from studies already described in the previous section (Hagmayer & Sloman, 2009; Robinson et al., 2010).

Causal Narratives and the Story Model of Decision-Making Causal models provide a generic framework for representing causal structure. They can be instantiated in various ways to represent a large set of actually observed and possible cases and to make inferences for many different conditions. For example, the same basic structure can be used represent a case in which depression is due to losing a loved one and a case in which depression is due to learned helplessness. Causal narratives, by contrast, represent a particular, complex causal chain of events (Pennington & Hastie, 1991, 1992). A narrative represents a sequence of events as it probably happened, regardless of whether the events in the chain were typical or not. Its primary function is to make sense of an observed sequence of events and to explain how the events are causally related to each other. Narratives can be used to inform decisionmaking. First, a narrative allows decision-makers to target the causes of a current situa­ tion with their actions. For example, if the narrative indicates that a patient’s depressive symptoms have been caused by a low level of thyroid hormones, hormone therapy is a reasonable treatment option. Second, a certain type of narrative may be linked with a cer­ Page 14 of 30

Causality in Decision-Making tain type of decision. (p. 505) For example, in medical decision-making, the narratives put forward by patients can be matched to illness scripts, which imply certain diagnoses and treatments (Charlin, Boshuizen, Custers, & Feltovich, 2007). The most prominent theoretical model of decision-making based on causal narratives is the story-model (Pennington & Hastie, 1991, 1992). The story model has been developed to account for juror decision-making in legal cases, but can also be used to model deci­ sion-making in other areas. The theory proposes that decision-makers spontaneously start to construct a narrative when presented with information on a case. By constructing a narrative, the temporal and causal order of the actual events is reconstructed. Events in­ clude actions (e.g., stabbing), physical events (e.g., hemorrhage), and mental states (e.g., hate) that are linked through physical and/or intentional/mental causation. Causal links are generally considered necessary and sufficient. When constructing a narrative, given information, knowledge about similar events in the past, and generic knowledge about the world are integrated. For example, decision-makers may use their knowledge that ha­ tred causes aggressive behavior. The resulting narrative is a complex causal chain with many side arms contributing to the main causal sequence of events. A narrative explains what happened and why it happened. Therefore, narrative-based decision-making is also called explanation-based decision-making in the literature (e.g., Pennington & Hastie, 1993). The acceptance of a narrative depends on how well it accounts for all case-specific evidence (“coverage”), the absence of alternative narratives (“uniqueness”), and its “co­ herence” or logical consistency, completeness, and plausibility. If there is only one story that has a high coverage and coherence, the person should accept the narrative and be highly confident that the narrative represents what actually happened. In order to derive a decision from a narrative, decision-makers have to know about op­ tions and how they relate to different types of narratives. In legal decision-making, jurors get this information from a judge who explains how different types of causal chains trans­ late into verdicts. For example, if a person’s intention to kill is the cause of another person’s death, it is murder. If there was no prior intention to kill, it is manslaughter. In medical decision-making, narratives can be matched to clinical guidelines that provide recommendations for treatment. Narrative-based decision-making consists of classifying the constructed narrative as being of a particular type. The type of narrative in turn sug­ gests which actions should be taken. Depending on how well the constructed narrative matches the different types of narratives, decisions become more or less difficult. For ex­ ample, in the medical field a patient’s narrative may match the development of different diseases. The simple narrative of stress → poor diet → gastrointestinal problems may point toward depression, psychosomatic disorder, ulcer, or dietary deficiency. In this case the narrative does not allow for a decision on treatment, but it guides decisions on fur­ ther examinations. The story model makes three critical predictions. First, it claims that observations or giv­ en evidence are spontaneously organized into a causal narrative. Evidence suggests this is the case. Pennington and Hastie (1986) asked participants to think aloud while reading

Page 15 of 30

Causality in Decision-Making through the evidence in a legal case. The thought-protocols showed that participants spontaneously reorganized the evidence into a narrative (cf. Kintsch, 1988). Second, decisions should be based on the constructed narrative and not purely on the giv­ en information. This implies that the same evidence may result in different decisions when different narratives are created. Evidence for this prediction comes from studies showing that participants who generate different narratives for the same evidence reach different verdicts in legal cases (Pennington & Hastie, 1986, 1988, 1992; Lagnado & Ger­ stenberg, Chapter 29 in this volume). For example, Pennigton and Hastie (1988) manipulated the order in which evidence was presented to mock jurors. When the order allowed participants to easily construct a story for the defense but not for the prosecu­ tion, only a minority found the defendant guilty. When the same information was re­ ordered so that it became easy to construct a narrative for the prosecution, but not for the defense, a majority found the defendant guilty. Finally, the model predicts that a person’s decision and confidence in the decision depend on coverage, coherence, and the uniqueness of the constructed narrative, as well as the narrative’s fit with the available options. Several experiments have shown that coherence of the evidence predicts decisions (Pennington & Hastie, 1988, 1992). Pennington and Hastie (1992) also manipulated the completeness of the presented information. Partici­ pants were either given explicit information about the causal relations among events or not. (p. 506) Importantly, the respective causal relations could easily be inferred from the presented events and everyday knowledge. Thus, participants in either condition should have been able to construct the same narratives. Nevertheless, participants favored the verdict associated with the narrative for which information about causal links was explic­ itly provided. Tenney, Cleary, and Spellman (2009) investigated the influence of unique­ ness. They showed that participants were highly sensitive to whether an alternative causal narrative could be constructed for the same evidence. In all conditions, the evi­ dence incriminated the defendant in a murder case. While in one condition, no alternative explanation was provided, it was pointed out in another set of conditions that other peo­ ple would have had the possibility, motive, and/or had no alibi for the time of the murder. Guilty verdicts dropped substantially when other possible suspects were pointed out. Causal narratives have also been investigated in medical decision-making. This research has focused on whether providing decision-makers with a causal narrative affects deci­ sions. A systematic review by Winterbottom, Bekker, Conner, and Montgomery (2008) found that narratives influenced decisions more than statistical or general medical infor­ mation. Participants were more likely to act in the same way as the person in the narra­ tive. The effect of first-person narratives tended to be stronger than third-person narra­ tives. This is what would be expected if participants used the given narrative to construct a narrative for themselves. However, an influence of narratives was found only in about one-third of the studies. Winterbottom and colleagues (2008) also pointed out that at present it is unclear whether such an influence results in better or worse decisions.

Page 16 of 30

Causality in Decision-Making Taken together, the evidence shows that causal narratives influence decisions. They seem to be used spontaneously to organize given information into coherent causal chains, which in turn influence decision-making.

Causal Attribution and Attribution Theory The construction of a causal narrative is not the only way by which an explanation can be provided. Often it may suffice to figure out the cause or the set of causes that led to a particular event or situation. These inferences about causes have been called causal attri­ bution (Kelley, 1972; Kelley & Michaela, 1980). Although there are usually many possible causes that may have contributed to the presence of a target event, people seem to have the tendency to select only one or a few factors as the cause (Kelley, 1972; Weiner, 1985). Attribution theory (Weiner, 1985, 1986) describes how people make causal attributions and how these attributions affect emotion, motivation, and subsequent behavior. Attribu­ tions can be classified with respect to three dimensions: locus of causality (internal vs. ex­ ternal), stability (stable vs. unstable), and controllability (controllable vs. uncontrollable). Thus potential causes can be grouped into eight possible categories. Table 26.2 shows an example for a classification of potential causes of illness (cf. Roesch & Weiner, 2001). It is important to note that the same causal factor may be classified differently depending on the (p. 507) specific circumstances under which the attribution is made. For example, skills are stable and cannot be modified (i.e., controlled) in the short run, but they can be changed in the long run. The same is true for addictive behaviors, which can be con­ trolled if enough time and external support are provided.

Page 17 of 30

Causality in Decision-Making Table 26.2 Categories of Causal Attributions in the Medical Domain Stability

Stable

Unstable

Control

Controllable

Uncontrollable

Controllable

Some physiologi­ cal processes

Heredity, personal­ ity, some physio­ logical processes

Own actions, own effort

Fate, economic en­ vironment

Some environmen­ tal stressors

Uncontrollable

Locus of Causality Internal

External

Adapted from Roesch & Weiner (2001).

Page 18 of 30

Chance, stimuli controlling behav­ ior, some environ­ mental stressors

Causality in Decision-Making According to attribution theory, these categories are meaningful because the respective attributions suggest different emotional, motivational, and behavioral reactions. More precisely, these categories of causes are assumed to be better predictors of emotion, mo­ tivation, and action than the individual causes themselves (Weiner, 1985). Stability of the attributed cause should be linked to expected outcomes, which in turn should influence the motivation for action. If the cause is considered stable, the same event or outcome will be expected in the future with increased certainty. By contrast, if the cause is per­ ceived to be unstable, then expectancies should remain uncertain, or another outcome may be expected on future occasions. For example, attributing gastrointestinal problems to stress at work implies that the problems will go away when the stressful situation pass­ es. Assumptions about controllability should also affect subsequent actions. If a negative event like illness is attributed to an uncontrollable cause, hopelessness should result, which in turn should decrease the tendency to act (cf. Seligman, 1972). By contrast, an assumption of control should result in a willingness to act. Finally, the assumed locus of causality is important, because it implies whether changing one’s own behavior or inter­ vening on the environment should be preferred, given that there is control over the cause. Hence all three dimensions should affect decision-making and the motivation to implement any decision made. For example, when cardiovascular problems are attributed to adopted lifestyle (internal, stable, controllable), then the motivation should rise to change the lifestyle, which in turn should result in behavioral changes. By contrast, when the same problem is attributed to hereditary factors (internal, stable, uncontrollable), then there should be little inclination to change one’s lifestyle. Although having a much broader scope including emotion and motivation, attribution the­ ory can be used as a model of decision-making. The process of decision-making can be de­ scribed as follows. First, the person encounters an unexpected event or problem that prompts a causal explanation and that requires the person to decide how to proceed. Se­ cond, the event or problem is attributed to a set of causes. Observed information and knowledge retrieved from memory provide the basis for these inferences. The inferred causes are classified as internal or external, stable or unstable, and controllable or uncon­ trollable. Depending on the attribution, expectations about the future state of the causes and the effectiveness of potential actions change. Only actions that target controllable causal factors should be judged effective. Therefore, actions controlling stable factors contributing to a problem should be preferred over actions that try to address factors that are uncontrollable or would change on their own without any intervention on behalf of the decision-maker. There is a lot of evidence showing that causal attribution affects decision-making. Roesch and Weiner (2001) looked at causal attribution in the medical domain and reviewed stud­ ies that investigated the relation between causal attribution, coping behavior, and psycho­ logical adjustment in patients suffering from illness. All three dimensions of attribution theory were related to coping decisions made by patients. An internal locus of causality was related to more problem-focused coping, assumed control was related to actively ap­ proaching rather than avoiding the condition, as was stability. Stability also predicted more problem-focused coping. Moreover, patients that assumed control were better ad­ Page 19 of 30

Causality in Decision-Making justed to their condition than patients who did not. These findings are consistent with findings from studies based on the self-regulatory model of illness (Leventhal, Diefen­ bach, & Leventhal, 1992), which also found that assumed control was related to being more active in dealing with illness and, in consequence, higher well-being and better re­ covery (cf. Lobban et al., 2003). Gurevitch, Kliger, and Weiner (2011) investigated the impact of causal attribution on eco­ nomic decisions (see also Gurevitch & Kliger, 2013). They asked participants to allocate money won in a trivia game to themselves and their partner. Participants were presented with eight different scenarios, which corresponded to the eight categories of causes gen­ erated by the three dimensions. For example, in one condition they were told that their teammate got lucky and received easier questions than the others (external, uncontrol­ lable, unstable cause); in another they were told that they themselves had better ability than the others (internal, uncontrollable, (p. 508) stable cause). As expected, causal attri­ butions had a substantial effect on how the prize money was divided up. Unsurprisingly, locus had the strongest effect, with the person being responsible for winning the game receiving more. Control also had an effect. If a person had control over the causally re­ sponsible factor, she received more. Stability did not have an effect, which makes sense as expectancies about the future were irrelevant for the decision to be made. In the context of economic decision-making, Onifade, Harrison, and Cafferty (1997) explored whether causal attribution was related to escalation of commitment (Staw, 1981). They investigated whether causal attribution predicted participants’ decision to continue funding a poorly performing project. They found that the assumed stability of the causes for the poor performance was the best predictor for the continuation of the project. When participants got information that the causes were unstable (i.e., bad luck or lack of effort), they were more willing to provide additional funding. Locus of causality had only a minor effect. If causes were assumed to be internal, the tendency to continue the project was higher. Further analyses showed that stability and locus affected deci­ sions by changing the expectancy of future success, as predicted by attribution theory. In sum, the evidence suggests that causal attribution affects decision-making, as speci­ fied by attribution theory. Which of the causally relevant dimensions (locus, stability, and control) is most relevant depends on the specific decision problem. In general, control seems to be most important, because if there is no control, any action would be futile. Stability is also important because it determines whether taking action is required at all. Unstable causes may resolve themselves without intervention.

Discussion Summary Causal decision theory shows that rational decision-makers should take the causal struc­ ture underlying a decision problem into account, infer the causal consequences of choos­ Page 20 of 30

Causality in Decision-Making ing the available options, and maximize causal rather than evidential expected utility when making a choice. Research on decision-making has found that decision-makers tend to behave in line with these principles. From a psychological perspective, causal decision theory requires decision-makers to en­ gage causal reasoning during decision-making. Most theories of decision-making, which assume that decisions are based on expected outcomes, do not assume that decision-mak­ ers use causal reasoning, although they would concede that causal beliefs may affect de­ cisions by altering expectancies of outcomes (see Box 26.1 for evidence). This is true for subjective expected utility theory (Savage, 1954; von Neumann & Morgenstern, 1947), which still is the gold standard for rational decision-making, and prospect theory (Kahne­ man & Tversky, 1979), which is the most influential descriptive theory (cf. Kahneman, 2011). Theories which claim that decisions are based on cue-based heuristics (e.g., Gigerenzer & Goldstein, 1996) may assume that causal beliefs affect the search for and the weighing of cues (e.g., Garcia-Retamnero & Hoffrage, 2006). Finally, theories, which suggest that decisions are guided by scripts (e.g., Abelson, 1981; Schank & Abelson, 1977), assume that scripts include beliefs about typical causal sequences of events and action rules specifying what to do when a specific sequence unfolds. We focused on three psychological theories, which assume a central role for causal rea­ soning in decision-making: the causal model theory of choice (Sloman & Hagmayer, 2006; see the section “Causal Models and the Causal Model Theory of Choice”), the story model of decision-making (Pennington & Hastie, 1992, see the section “Causal Narratives and the Story Model of Decision-Making”), and attribution theory (Weiner, 1985; see the sec­ tion “Causal Attribution and Attribution Theory”). There are some commonalities and dif­ ferences between these theories, both in general and with respect to decision-making. All three theories assume that people consider the causal structure underlying a decision problem, but they disagree about its representation. Attribution theory assumes that it is represented as a simple model with one or a few causes generating the observed situa­ tion. Theories of narrative-based decision-making assume that a rather long and complex causal chain is constructed, which describes how the situation developed over time. The causal model theory of choice assumes that a more or less complex causal model is con­ strued, which does not directly represent the temporal development over time. All three theories agree that causal background knowledge is involved in the construction process, but there seem to be differences in the degree to which the resulting model is specific for the individual case. Causal model theory conceptualizes a causal model of a particular problem as an instantiation of a generic model for (p. 509) the type of problem. Observed or given case-specific information is used to instantiate the model for the specific case. The story model and attribution theory assume that an individual, specific model or narra­ tive is construed for the particular case at hand, which is informed by generic knowledge and specifics of the case. The narrative or the attributed causes may deviate strongly from a generic model of the type of problem. The three theories also make different as­ sumptions about how a decision is reached. The causal model theory of choice assumes that decisions are based on the expected causal consequences derived from the causal model. The story model assumes that the constructed narrative is compared to other nar­ Page 21 of 30

Causality in Decision-Making ratives, which are linked to decisions. Hence, the decision is based on a match between narratives. Attribution theory assumes that causal attributions affect expectations about future states and the effectiveness of actions, which in turn influence decisions. Predictions of all three theories have support in the literature. However, decision-making does not always follow the predictions of causal theories. Decision-makers do not always consider the causal structure underlying a decision problem, nor do they always base their decisions on the causal consequences that would result from their choice (see the section “Do People Analyze the Causal Structure Underlying a Decision Problem?” and “Do People Base Their Decisions on Causal or Evidential Expected Utilities?”). In the case of self-deception and self-handicapping, decision-makers choose in a way that does not maximize causal expected utilities. Moreover, decisions can be dominated by considera­ tions independent of causality, such as moral rules (Liu & Ditto, 2013) and social norms (Ajzen, 1991). When decision-makers are faced with complex, dynamic systems, they may be unable to use causal reasoning effectively (Dörner, 1997; Osman, 2010; Sterman, 2000). Finally, as we pointed out at the beginning of this chapter, there might be good reasons not to engage in causal analysis. If the decision-maker’s causal background knowledge is very rough or likely to be wrong, then it might be better not to consider causal knowledge. For example, a diabetic who does not recognize the causal influence of his lifestyle choices is likely to end up in poorer health than a diabetic who just adheres to his doctor’s advice and ignores his own causal beliefs (Barnes, Moss-Morris, & Kaufusi, 2004). These findings show that the processes proposed in causal theories are only part of a rich set of strategies that people use to make decisions.

Open Questions One challenge will be to integrate the different accounts into a more complete theory of causal decision-making. Each of the three theories discussed here focuses on a particular type of causal reasoning. Attribution theory focuses on the process of inferring the cause or causes of an event. Narrative theories focus on the process of inferring the complex se­ quence of events that preceded and generated the decision situation. Causal model theo­ ry focuses on generic causal models. It describes how generic models are instantiated for a specific case, how inferences are drawn with respect to unobserved variables, and how consequences of potential actions are inferred. Causal model theory (Sloman, 2005; Wald­ mann, 1996) seems to be the most promising candidate for an integrative account. Attri­ bution can be conceptualized as a form of diagnostic reasoning based on a causal model (Fernbach, Darlow and Sloman, 2010, 2011; Meder, Mayrhofer, & Waldmann, 2014). Thus attribution may be conceptualized as a step in the construction of a causal model of the decision problem. A causal narrative could be described as a complex causal model (Fen­ ton, Neil, & Lagnado, 2013). By assigning a temporal index to the variables, the causal and temporal sequence could be represented within a causal model. The construction of a causal narrative might be an alternative to the construction of a structural causal model if the development of the situation over time is of interest. Thus, an integrative account of causal decision-making based on causal models should be possible. Page 22 of 30

Causality in Decision-Making Another challenge is to account for obvious violations of the causal logic of choice, as in self-deception. Bodner and Prelec (2003; Mijovic-Prelec & Prelec, 2010) suggested that people infer the causal expected utility of an action as well as its diagnostic utility. The di­ agnostic utility of an action arises because an action may signify (but not cause) features that have a particular value for the person. Participants in Quattrone and Tversky’s exper­ iment (1984) change their tolerance for cold water, because they assume tolerance to in­ dicate a strong heart. Fernbach, Sloman, and Hagmayer (2013; Sloman et al., 2011) used causal models and the idea of diagnostic utility to provide an account for self-deception. They propose that self-deceptive behavior arises when people are unsure about the caus­ es of their actions. Quattrone and Tversky’s subjects had uncertainty about the extent to which their endurance was determined by their true pain tolerance versus their desire to achieve a particular (p. 510) result. This uncertainty allows decision-makers to choose an action (e.g., high endurance), but to treat the action as diagnostic of a desired trait. The third challenge will be to clarify the limits of causal decision-making. Causal decision theory recommends that decision-makers should analyze the causal structure underlying a decision problem. When the same type of decision is made repeatedly, however, a causal analysis may not be necessary. The decision-maker may simply learn what the best option is (instrumental learning; e.g., Colwill & Rescorla, 1990) and what the best way to make a decision is (learning of decision strategies, e.g., Rieskamp & Otto, 2006). Danks (2014) provides a rational analysis of learning in decision-making and shows that causal learning should only ensue when the resulting causal knowledge enables better decisions in the fu­ ture. This is the case when new options may become available, whose consequences could be predicted from acquired causal knowledge but are not based on instrumental learning. Currently very little is known about whether people use causal learning adap­ tively when making repeated choices. Research by Steyvers, Tenenbaum, Wagenmakers, and Blum (2003), Hagmayer and Meder (2013), and Coenen and colleagues (2015) provide some evidence that people may do so (but see Fernbach & Sloman, 2009). We still have a long way to go before we fully understand the role of causality in decision-making, its interaction with learning, and—maybe most important—its limits.

References Abelson, R. P. (1981). Psychological status of the script concept. American Psychologist, 36(7), 715–729. Ahn, W.-K., Proctor, C. C., & Flanagan, E. H. (2009). Mental health clinicians’ beliefs about the biological, psychological, and environmental bases of mental disorders. Cogni­ tive Science, 33, 147–182. Ajzen, I. (1991). The theory of planned behavior. Organizational Behavior and Human De­ cision Processes, 50, 179–211. Ariely, D., & Norton, M.I. (2008). How actions create—not just reveal—preferences. Trends in Cognitive Sciences, 12, 13–16.

Page 23 of 30

Causality in Decision-Making Baines, T., & Wittkowski, A. (2013). A systematic review of the literature exploring illness perceptions in mental health utilising the self-regulation model. Journal of Clinical Psy­ chology in Medical Settings, 20, 263–274. Bargh, J. A., & Chartrand, T. L. (1999). The unbearable automaticity of being. American Psychologist, 54, 462–479. Barnes, L., Moss-Morris, R., & Kaufusi, M. (2004). Illness beliefs and adherence in dia­ betes mellitus: A comparison between Tongan and European patients. New Zealand Med­ ical Journal, 117, 743–751. Bodner, R., & Prelec, D. (2003). The diagnostic value of actions in a self-signaling model. The Psychology of Economic Decisions, 1, 105–123. Botti, S., & McGill, A. (2011). The locus of choice: Personal causality and satisfaction with hedonic and utilitarian decisions. Journal of Consumer Research, 37, 1065–1978. Chapman, L. J., & Chapman, J. P. (1969). Illusory correlation as an obstacle to the use of valid diagnostic signs. Journal of Abnormal Psychology, 74, 271–280. Charlin, B., Boshuizen, H. P., Custers, E. J., & Feltovich, P. J. (2007). Scripts and clinical reasoning. Medical Education, 41, 1178–1184. Coenen, A., Rehder, B., & Gurekis, T. (2015). Strategies to intervene on causal systems are adaptively selected. Cognitive Psychology, 79, 102–133. Colwill, R. M., & Rescorla, R. A. (1990). Evidence for the hierarchical structure of instru­ mental learning. Animal Learning and Behavior, 18, 71–82. Danks, D. (2014). Unifying the mind: Cognitive representations as graphical models. Cam­ bridge, MA: MIT Press. DeCharms, R. (1968). Personal causation: The internal affective determinants of behavior. New York:Academic Press. DeKwaadsteniet, L., Hagmayer, Y., Krol, N., & Wittman, C. (2010). Causal client models in selecting effective interventions: A cognitive mapping study. Psychological Assessment, 22, 581–592. Doerner, D. (1997). The logic of failure: Recognizing and avoiding error in complex situa­ tions. Cambridge, MA: Perseus Books. Fenton, N., Neil, M., & Lagnado, D.A (2013). A general structure for legal arguments about evidence using Bayesian networks. Cognitive Science, 37, 61–102. Fernbach, P. M., Darlow, A., & Sloman, S. A. (2010). Neglect of alternative causes in pre­ dictive but not diagnostic reasoning. Psychological Science, 21, 329–336.

Page 24 of 30

Causality in Decision-Making Fernbach, P. M., Darlow, A., & Sloman, S. A. (2011). When good evidence goes bad: The weak evidence effect in judgment and decision-making. Cognition, 119, 459–467. Fernbach, P. M., Rogers, T., Fox, C. R., & Sloman, S. A. (2013). Political extremism is sup­ ported by an illusion of understanding. Psychological Science, 24(6), 939–946. Fernbach, P. M., & Sloman, S. A. (2009). Causal learning with local computations. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35, 678–693. Fernbach, P. M, Sloman, S. A., & Hagmayer, Y. (2014). Effort denial in self-deception. Or­ ganizational Behavior and Human Decision Processes, 123, 1–8. Fisher, R. A. (1958). Lung cancer and cigarettes. Nature, 182, 108. Flores, A., Cobos, P. L., López, F. J., & Godoy, A. (2014). Detecting fast, on-line reasoning processes in clinical decision making. Psychological Assessment, 26, 660–665. Flores, A., Cobos, P. L., López, F. J., Godoy, A., & González-Martín, E. (2014). The influence of causal connections between symptoms on the diagnosis of mental disorders: (p. 511)

Evidence from on-line and off-line measures. Journal of Experimental Psychology: Applied, 20, 175–190. Furnham, A. (1988). Lay theories: Everyday understanding of problems in the social sci­ ences. Oxford: Pergamon Press. Garcia-Retamero, R., & Hoffrage, U. (2006). How causal knowledge simplifies decisionmaking. Minds and Machines, 16, 365–380. Garcia-Retamero, R., Wallin, A., & Dieckmann, A. (2007). When one cue is not enough: Combining fast and frugal heuristics with compound cue processing. Quarterly Journal of Experimental Psychology, 60, 1197–1215. Gigerenzer, G., & Goldstein, D. G. (1996). Reasoning the fast and frugal way: Models of bounded rationality. Psychological Review, 103, 650–669. Gurevitch, G, & Kliger, D. (2013). The manipulation: Socio-economic decision making. Journal of Economic Psychology, 39, 171–184. Gurevitch, G., Kliger, D., & Weiner, B. (2011). The role of attribution of causality in eco­ nomic decision making. The Journal of Socio-Economics, 41, 439–444. Hagmayer, Y., & Meder, B. (2013). Repeated causal decision making. Journal of Experi­ mental Psychology: Learning, Memory, & Cognition, 39, 33–50. Hagmayer, Y., Meder, B., Osman, M., Mangold, S., & Lagnado, D. (2010). Spontaneous causal learning while controlling a dynamic system. The Open Psychology Journal, 3, 145– 162. http://www.benthamscience.com/open/topsyj/articles/V003/SI0088TOPSYJ/ 145TOPSYJ.pdf. Page 25 of 30

Causality in Decision-Making Hagmayer, Y., & Sloman, S. A. (2009). People conceive of their choices as intervention. Journal of Experimental Psychology: General, 138, 22–38. Howard, C. E., & Porzelius, L. K. (1999). The role of dieting in binge eating disorder: Eti­ ology and treatment implications. Clinical Psychology Review, 19, 25–44. Huber, O., Huber, O. W., & Bär, A. S. (2011). Information search and mental representa­ tion in risky decision making: The advantages first principle. Journal of Behavioral Deci­ sion Making, 24, 223–248. Huber, O., Wider, R., & Huber, O. W. (1997). Active information search and complete infor­ mation presentation in naturalistic decision tasks. Acta Psychologica, 95, 15–29. Joyce, J. M. (1999). The foundations of causal decision theory. Cambridge, UK: Cambridge University Press. Kahneman, D. (2011). Thinking fast and slow. London: Penguin Books. Kahneman, D., & Miller, D. T. (1986). Norm theory: Comparing reality to its alternatives. Psychological Review, 93, 136–153. Kahneman, D., & Tversky, A. (1979). Prospect theory: an analysis of decision under risk. Econometrica, 47, 263–291. Keil, F. C. (2008). Explanation and understanding. Annual Review of Psychology, 51, 227– 254. Kelley, H. H. (1972). Causal schemata and the attribution process. In E. E. Jones, D. E. Knaouse, H. H. Kelley, R. E. Nisbett, S. Valins, & B. Weiner (Eds.), Attribution: Perceiving the causes of behavior. Morristown, NJ: General Learning Press. Kelley, H. H., & Michaela, J. L. (1980). Attribution theory and research. Annual Review of Psychology, 31, 457–501. Kintsch, W. (1988). The role of knowledge in discourse comprehension: A construction-in­ tegration model. Psychological Review, 95(2), 163–182. Klein, G. (1998). Sources of power: How people make decisions. Cambridge, MA: MIT Press. Koehler, D., & Harvey, N. (2007). Blackwell handbook on decision making. Oxford: Black­ well Publishing. Koslowski, B. (1996). Theory and evidence: The development of scientific reasoning. Cam­ bridge, MA: MIT Press. Lawson, R. (2006). The science of cycology: Failures to understand how everyday objects work. Memory & Cognition, 34(8), 1667–1675.

Page 26 of 30

Causality in Decision-Making Ladouceur, R., Ferland, F., & Fournier, P. (2003). Correction of erroneous perceptions among primary school students regarding the notions of chance and randomness in gam­ bling. American Journal of Health Education, 34, 272–277. Langer, E. J. (1975). The illusion of control. Journal of Personality and Social Psychology, 32, 311–328. Langer, E. J., & Roth, J. (1975). Heads I win, tails it’s chance: The illusion of control as a function of the sequence of outcomes in a purely chance task. Journal of Personality and Social Psychology, 32, 951–955. Leventhal, H., Diefenbach, M., & Leventhal, E. A. (1992). Illness cognition: using common sense to understand treatment adherence and affect cognition interactions. Cognitive Therapy and Research, 16, 123–163. Lewis, D. (2006). Causal decision theory. The Australasian Journal of Philosophy, 59, 5–30. Liu, B. S., & Ditto, P. H. (2013). What dilemma? Moral evaluation shapes factual beliefs. Social Psychological and Personality Sciences, 4, 316–323. Lobban, F., Barrowclough, C., & Jones, S. (2003). A review of the role of illness models in severe mental illness. Clinical Psychology Review, 23, 171–196. Maher, P. (1987). Causality in the logic of decision. Theory and Decision, 22, 155–172. Meder, B., Mayrhofer, R., & Waldmann, M. R. (2014). Structure induction in diagnostic causal reasoning. Psychological Review, 121, 277–301. Mosing, M. A., Madison, G., Pedersen, N. L., Kuja-Kalkola, R., & Ullen, F. (2014). Practice does not make perfect: No causal effect of music practice on music ability. Psychological Science, 25, 1795–1803. Murdock, G. P. (1980). Theories of illness: A world survey. Pittsburgh, PA: University of Pittsburgh Press. Müller, S. M., Garcia-Retamero, R., Cokely, E., & Maldonado, A. (2011). Causal beliefs and empirical evidence. Decision-making processes in two-alternative forced-choice tasks. Ex­ perimental Psychology, 58, 324–332. Müller, S. M., Garcia-Retamero, R., Galesic, M., & Maldonado, A. (2013). The impact of domain-specific beliefs on decisions and causal judgments. Acta Psychologica, 144, 472– 480. Mijovic-Prelec, D., & Prelec, D. (2010). Self-deception as self-signalling: A model and ex­ perimental evidence. Philosophical Transactions of the Royal Society B, 365, 227–240. Nozick, R. (1993). The nature of rationality. Princeton, NJ: Princeton University Press.

Page 27 of 30

Causality in Decision-Making Onifade, E., Harrison, P. D., & Cafferty, T. P. (1997). Causal attributions for poorly per­ forming projects: Their effect on project continuation decisions. Journal of Applied Social PSychology, 27, 439–452. Osman, M. (2010). Controlling uncertainty: A review of human behavior in complex dy­ namic environments. Psychological Bulletin, 136, 65–86. Pennington, N., & Hastie, R. (1986). Evidence evaluation in complex decision making. Journal of Personality and Social Psychology, 51(2), 242–258. (p. 512)

Pennington, N., & Hastie, R. (1988). Explanation-based decision making: Effects of meme­ ory structure on judgement. Journal of Experimental Psychology: Learning, Memory and Cognition, 14(3), 521–533. Pennington, N., & Hastie, R. (1991). A cognitive theory of juror decision making: The sto­ ry model. Cardozo Law Review, 13, 519–558. Pennington, N., & Hastie, R. (1992). Explaining the evidence: Tests of the Story Model for juror decision making. Journal of Personality and Social Psychology, 62(2), 189–206. Pennington, N., & Hastie, R. (1993). Reasoning in explanation-based decision making. Cognition, 49, 123–163. Quattrone, G., & Tversky, A. (1984). Causal versus diagnostic contingencies: On self-de­ ception and on the voter’s illusion. Journal of Personality and Social Psychology, 46, 237– 248. Rieskamp, J., & Otto, P. E. (2006). SSL: A theory on how people learn to select strategies. Journal of Experimental Psychology: General, 135, 207–236. Robinson, E., Sloman, S. A., Hagmayer, Y., & Hertzog, C. (2010). Causality in solving eco­ nomic problems. The Journal of Problem Solving, 3, 106–130. Roesch, S. C., & Weiner, B. (2001). A meta-analytic review of coping with illness: Do causal attributions matter? Journal of Psychosomatic Research, 50, 205–219. Rozenblit, L., & Keil, F. C. (2002). The misunderstood limits of folk science: An illusion of explanatory depth. Cognitive Science, 26, 521–562. Savage, L. J. (1954). The foundations of statistics. New York: Wiley. Schank, R. C., & Abelson, R. P. (1977). Scripts, plans, goals, and understanding. Hillsdale, NJ: Lawrence Erlbaum Associates. Seligman, M. E. (1972). Learned helplessness. Annual Review of Medicine, 23, 407–412. Skyrms, B. (1982). Causal decision theory. The Journal of Philosophy, 79, 695–711.

Page 28 of 30

Causality in Decision-Making Sloman, S. A. (2005). Causal models: How we think about the world and its alternatives. Oxford: Oxford University Press. Sloman, S. A., & Hagmayer, Y. (2006). The causal psycho-logic of choice. Trends in Cogni­ tive Sciences, 10, 407–412. Sloman, S. A., & Lagnado, D. A. (2005). Do we “do”? Cognitive Science, 29, 5–39. Sloman, S. A., Fernbach, P. M., & Hagmayer, Y. (2010). Self deception requires vagueness. Cognition, 115, 268–281. Staw, B. M., (1981). The escalation of commitment to a course of action. Academy of Man­ agement, 6, 577–587. Sterman, J. D. (2000). Business dynamics: Systems thinking and modeling for a complex world. Boston: McGraw-Hill. Steyvers, M., Tenenbaum, J. B., Wagenmakers, E.-J., & Blum, B. (2003). Inferring causal networks from observations and interventions. Cognitive Science, 27, 453–489. Stockford, K., Turner, H., & Cooper, M. (2007). Illness perception and its relationship to readiness to change in the eating disorders: A preliminary investigation. British Journal of Clinical Psychology, 46, 139–54. Sylvain, C., Ladouceur, R., & Boisvert, J. M. (1997). Cognitive and behavioural treatment of pathological gambling: A controlled study. Journal of Consulting and Clinical Psycholo­ gy, 65, 727–732. Tenney, E. R., Cleary, M. D., & Spellman, B. A. (2009). Unpacking the doubt in “beyond a reasonable doubt”: Plausible alternative stories increase not guilty verdicts. Basic and Ap­ plied Social Psychology, 31, 1–8. Tversky, A., & Kahneman, D. (1980). Causal schemas in judgments under uncertainty. In M. Fishbein (Ed.), Progress in social psychology (pp. 49–72). Hillsdale, NJ: Lawrence Erl­ baum Associates. Von Neumann, J., & Morgenstern, O. (1947). Theory of games and behavior. Princeton, NJ: Princeton University Press. Waldmann, M. R. (1996). Knowledge-based causal induction. In D. R. Shanks, K. J. Holyoak & D. L. Medin (Eds.), The psychology of learning and motivation, Vol. 34: Causal learning (pp. 47–88). San Diego: Academic Press. Waldmann, M. R., & Hagmayer, Y. (2005). Seeing vs. doing: Two modes of accessing causal knowledge. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 216–227. Wegner, D. M. (2002). The illusion of conscious will. Cambridge, MA: MIT Press. Page 29 of 30

Causality in Decision-Making Weiner, B. (1985). An attributional theory of achievement motivation and emotion. Psycho­ logical Review, 92, 548–573. Weiner, B. (1986). An attributional theory of motivation and emotion. New York: Springer. Winterbottom, A., Bekker, H. L., Conner, M., & Montgomery, J. (2008). Does narrative in­ formation bias individual’s decision making? A systematic review. Social Science and Medicine, 67, 2079–2088. Yopchick, J. E., & Kim, N. S. (2009). The influence of causal information on judgments of treatment efficacy. Memory & Cognition, 37, 29–41. Zsambok, C. E., & Klein, G. (1997). Naturalistic decision making. Mahwah, NJ: Lawrence Erlbaum Associates.

Notes: (1.) Figuring out whether an action and an outcome are causally or spuriously related may not be easy. The famous statistician Sir Ronald Fisher argued for many years that the reliable statistical relation between smoking and lung cancer might be due to genetic fac­ tors (e.g., Fisher, 1958). Recently researchers showed that some of the reliable statistical relations between music practice and musical abilities are probably due to genetic factors (Mosing, Madison, Pedersen, Kuja-Kalkola, & Ullen, 2014). (2.) Causal models and the causal model theory of choice can be formalized using causal Bayes nets (Sloman & Hagmayer, 2006; see Rottmann, Chapter 6 in this volume, for an in­ troduction). A respective causal Bayes net allows computing the causal expected utilities of different options.

York Hagmayer

Department of Psychology University of Göttingen Göttingen, Germany Philip Fernbach

Leeds School of Business University of Colorado, Boulder Boulder, Colorado, USA

Page 30 of 30

Intuitive Theories

Intuitive Theories   Tobias Gerstenberg and Joshua B. Tenenbaum The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.28

Abstract and Keywords This chapter first explains what intuitive theories are, how they can be modeled as proba­ bilistic, generative programs, and how intuitive theories support various cognitive func­ tions such as prediction, counterfactual reasoning, and explanation. It focuses on two do­ mains of knowledge: people’s intuitive understanding of physics, and their intuitive un­ derstanding of psychology. It shows how causal judgments can be modeled as counterfac­ tual contrasts operating over an intuitive theory of physics, and how explanations of an agent’s behavior are grounded in a rational planning model that is inverted to infer the agent’s beliefs, desires, and abilities. It concludes by highlighting some of the challenges that the intuitive theories framework faces, such as understanding how intuitive theories are learned and developed. Keywords: intuitive theories, prediction, intuitive physics, intuitive, psychology, counterfactual

Where do babies come from? Why is the sky blue? Why do some people not have enough to eat? Not unlike the most driven scientists, young children have an almost insatiable hunger to figure out how the world works (Frazier, Gelman, & Wellman, 2009). Being bombarded with a series of “why” questions by the little ones can be a humbling experi­ ence for parents who come to realize their limited understanding of how the world works (Keil, 2003; Mills & Keil, 2004). However, our lack of knowledge about some of the big questions stands in stark contrast to the proficiency and ease with which we navigate our everyday lives. We are remarkably good at filtering out what we really need to know from the vast ocean of facts about the world (Keil, 2012). For example, while most of us are pretty hopeless at explaining how helicopters (or even bicycles) work, we can catch base­ balls, pot billiard balls, sink basketballs, or balance a pizza carton on an already overfull trash can, hoping that someone else will take out the garbage. We not only can do these things (Todorov, 2004), we can also make remarkably accurate judgments about these events (see, e.g., Battaglia, Hamrick, & Tenenbaum, 2013; Gerstenberg, Goodman, Lagna­ do, & Tenenbaum, 2012), and explain why they happened (Lombrozo, 2012; Lombrozo &

Page 1 of 57

Intuitive Theories Vasilyeva, Chapter 22 in this volume). The Jenga tower fell because you went for the wrong piece. The Red Sox beat the Yankees because their pitcher was tired. Indeed, the ease with which we sometimes coast through the world can make us blind to the fact that there is something in need of explanation. One way to open our eyes is by learning that some people lack the abilities that we take for granted, such as individuals on the autism spectrum who have difficulty understanding the social world (Baron-Cohen, Leslie, & Frith, 1985). Another way is to look at the state of the art in artificial intelli­ gence. In the not too distant future, we will presumably be cruising to work in a self-dri­ ving car while experiencing another decisive defeat against the chess application on our phone along the way. These advances are clearly impressive. However, a world like the one portrayed in the movie Her (Jonze, 2013) in which the operating system really under­ stands us (p. 516) will most likely remain science fiction for much longer. While Siri, the personal assistant on the iPhone, can tell us where the closest gym is, it cannot answer us who the slacker was in the following sentence: “Tom beat Bill in table tennis because he didn’t try hard” (Hartshorne, 2013; Levesque, Davis, & Morgenstern, 2011; Sagi & Rips, 2014). For people, in contrast, the former question may be difficult, while the latter is trivially easy—of course, Bill is the one who didn’t try hard, rather than Tom. What explains the huge gap between human and machine intelligence? How can we be­ gin to bridge it? In this chapter, we will argue that understanding common-sense reason­ ing requires at minimum two key insights: (1) human knowledge is organized in terms of intuitive theories, and (2) much of human cognition can be understood in terms of causal inferences operating over these intuitive theories. We will focus on two domains of knowl­ edge: people’s intuitive understanding of physics and psychology. The rest of the chapter is organized as follows. We will first clarify what we mean by intu­ itive theories. We will then discuss how intuitive theories can be modeled within a compu­ tational framework and illustrate what intuitive theories are good for. Next, we will put these ideas to work. We will show how people’s causal judgments in a physical domain can be explained in terms of counterfactual simulations operating over people’s intuitive theory of physics, and how causal explanations of behavior can be understood as infer­ ences over an intuitive theory of mind. We will conclude by discussing some of the key challenges that will need to be addressed to arrive at a more complete understanding of common-sense reasoning.

Intuitive Theories What Are Intuitive Theories? What do we mean when we say that people’s knowledge is represented in the form of in­ tuitive theories? The basic idea is that people possess intuitive theories of different do­ mains, such as physics, psychology, and biology, that are in some way analogous to scien­ tific theories (Carey, 2009; Muentener & Bonawitz, Chapter 33 in this volume; Wellman & Page 2 of 57

Intuitive Theories Gelman, 1992). Like their scientific counterparts, intuitive theories are composed of an ontology of concepts, and a system of (causal) laws that govern how the different con­ cepts interrelate (Rehder, Chapters 20 and 21 in this volume). The vocabulary of a theory constitutes a coherent structure that expresses how one part of the theory influences and is influenced by other parts of the theory. A key characteristic of intuitive theories is that they do not simply describe what happened, but interpret the evidence through the vo­ cabulary of the theory. A theory’s vocabulary is more abstract than the vocabulary that would be necessary to simply describe what happened. A vivid example for the abstractness of intuitive theories comes from Heider and Simmel’s (1944) seminal study on apparent behavior. Participants who were asked to de­ scribe what happened in a movie clip that featured interactions of several geometrical shapes did not simply describe the shapes’ movement patterns, but rather interpreted the evidence through their intuitive psychological theory and attributed dispositional mental states, such as beliefs and desires, to the different shapes. The fact that theories are for­ mulated on a higher level of abstraction allows them to go beyond the particular evidence and make predictions for novel situations. For example, having identified the triangle as mean allows one to make predictions about how it is likely to behave in other situations. Predictions based on an intuitive theory are intimately linked to explanation (Lombrozo, 2012; Lombrozo & Vasilyeva, Chapter 22 in this volume). Two people who bring different intuitive theories to the same task will reach a different understanding of what happened and will make different predictions about the future (maybe the triangle just doesn’t like squares, but he is generally a nice guy otherwise). While the concepts and laws in a scientific theory are explicitly defined and known to the scientists in their respective fields, the operation of intuitive theories may be implicit and thus potentially unknown to its user (Borkenau, 1992; Uleman, Adil Saribay, & Gonzalez, 2008). Even though participants in Heider and Simmel’s (1944) study described the clips by stipulating specific beliefs and desires, they may not have had complete insight into the workings of their intuitive psychological theory (Malle, 1999). The example further il­ lustrates that our intuitive theories need to be able to cope with uncertainty. A particular action is usually compatible with a multitude of beliefs and desires. If the triangle “runs away” from the square, it might be afraid, or it might want to initiate playing catch. Intu­ itive theories need to embody uncertainty because many inferences are drawn based on limited, and potentially ambiguous, evidence. The example of the moving geometrical shapes also illustrates that intuitive theories pos­ tulate latent entities and explain observables (motion patterns) in terms of unobservables (beliefs and desires). An (p. 517) intuitive theory of psychology features observable con­ cepts, such as actions and unobservables, such as mental states. Similarly, an intuitive theory of physics postulates unobservable concepts, such as forces to explain the interac­ tion of observable objects (Wolff & Thorstad, Chapter 9 in this volume). Importantly, the concepts in an intuitive theory are coherently structured. In the case of an intuitive theory of physics, concepts such as force and momentum are related through Page 3 of 57

Intuitive Theories abstract laws such as the law of conservation of energy. In the case of an intuitive theory of psychology, beliefs, desires, and actions are linked by a principle of rational action—a person will try to achieve her desires in the most efficient way possible, given her beliefs about the world (Baker, Saxe, & Tenenbaum, 2009; Dennett, 1987; Hilton, Chapter 32 in this volume; Wellman, 2011). This principle allows us to make rich inferences based on very sparse data. From observing a person’s action (Frank goes to the fridge), we can of­ ten infer both their desires and beliefs (Frank is hungry and believes that there is food in the fridge). Another feature of intuitive theories concerns how they interact with evidence (Gelman & Legare, 2011; Henderson, Goodman, Tenenbaum, & Woodward, 2010). Intuitive theories are characterized by a certain degree of robustness, which manifests itself in different ways. Seeing the world through the lenses of an intuitive theory may lead one to simply ignore some aspects that wouldn’t be expected based on the theory (Simons, 2000). Fur­ ther, one’s intuitive understanding may lead one to explain away evidence (Nickerson, 1998) or reinterpret what was observed in a way that is theory-consistent (ChristensenSzalanski & Willham, 1991). For example, rather than changing the abstract laws of one’s intuitive theory, apparent counterevidence can often be explained by positing unobserved latent causes (Saxe, Tenenbaum, & Carey, 2005; Schulz, Goodman, Tenenbaum, & Jenk­ ins, 2008). When the evidence against one’s intuitive theory becomes too strong, one is forced into making sense of the evidence by adopting a new theory (Kuhn, 1996). Some have argued that conceptual changes in development are akin to qualitative paradigmshifts in science (e.g., Gopnik, 2012). One of the strongest pieces of evidence for the existence of intuitive theories comes from children’s development of a theory of mind. From infancy to preschool, a child’s intuitive theory of mind traverses through qualitatively distinct stages. Infants already have expec­ tations about goal-directed actions that are guided by the principle of rational action—an agent is expected to achieve her goals in the most efficient way (Gergely & Csibra, 2003). However, infants form these expectations without yet attributing mental states to agents. Children below the age of four employ an intuitive theory that takes into account an agent’s perceptual access and their desires, but still fails to consider that an agent’s be­ liefs about the world may be false (Butterfill & Apperly, 2013; Gopnik & Wellman, 1992). Children at this theory of mind stage make systematic errors (Saxe, 2005). Only at around four years of age do children start to realize the importance of beliefs for explaining be­ havior and that agents can have beliefs that conflict with reality (Perner, Leekam, & Wim­ mer, 1987; Wellman, Cross, & Watson, 2001).

How Can We Model Intuitive Theories? We have seen that some of the key properties of intuitive theories are their abstract struc­ ture, their ability to deal with uncertainty, and their intimate relationship with causal ex­ planation. How can we best model people’s intuitive theories and the inferences they sup­

Page 4 of 57

Intuitive Theories port? What representations and computational processes do we need to postulate in or­ der to capture common-sense reasoning? In the early days of cognitive science there were two very different traditions of modeling knowledge and inference. Symbolic approaches (e.g., Newell, Shaw, & Simon, 1958) rep­ resented knowledge in terms of logical constructs and inference as deduction from premises to conclusions. While these logical representations captured some important structural aspects of knowledge, they did not support inferences in the light of uncertain­ ty. Statistical approaches such as neural networks (e.g., Rumelhart & McClelland, 1988) represented knowledge as statistical connections in the network architecture and infer­ ence as changes to these connections. These approaches dealt well with uncertainty, but were limited in their capacity to express complex structural relationships. The advent of probabilistic graphical models promised to combine the best of both worlds. Bayesian net­ works (BN) integrate structured representations with probabilistic inference (Pearl, 1988). However, none of these approaches was yet capable of representing causality. Pearl (2000) remedied this limitation by developing causal Bayesian networks (CBN). In contrast to BNs, where the links between variables merely express statistical depen­ dence, the links in a CBN express autonomous causal mechanisms (Johnson & Ahn, Chap­ ter 8 in this volume; Sloman, 2005). Whereas both BNs and CBNs (p. 518) support infer­ ences about unknown variables based on observational evidence about the states of other variables, only CBNs support inferences about what would happen if one were to inter­ vene and change the value of a variable, rather than simply observing it. The CBN frame­ work provided a normative account of how people should update their beliefs based on observations versus interventions. Several empirical studies have since established that people are sensitive to this difference (Meder, Gerstenberg, Hagmayer, & Waldmann, 2010; Rottman & Hastie, 2013; Sloman & Hagmayer, 2006; Sloman & Lagnado, 2005; Waldmann & Hagmayer, 2005). Inspired by these successes, some have proposed that the CBN framework is a candidate for representing people’s intuitive theories (Danks, 2014; Glymour, 2001; Gopnik & Wellman, 2012). However, a key limitation of CBNs is their limited representational power for expressing abstract, general principles that organize knowledge (Buehner, Chapter 28 in this vol­ ume; Tenenbaum, Griffiths, & Niyogi, 2007). Some of these limitations have been over­ come by developing richer frameworks, such as hierarchical CBNs that capture causal de­ pendencies on multiple levels of abstraction (Griffiths & Tenenbaum, 2009; Tenenbaum, Griffiths, & Kemp, 2006; Tenenbaum, Kemp, Griffiths, & Goodman, 2011), or frameworks that combine CBNs with first-order logic (Goodman, Tenenbaum, Feldman, & Griffiths, 2008). Even these richer modeling frameworks, however, are insufficient to accommodate two core characteristics of human thought: compositionality and productivity (Fodor, 1975). Like words in language, concepts—the building blocks of thought—can be produc­ tively combined in infinitely many ways whereby the meaning of more complex concepts is composed of the meaning of its simpler constituents. We can think and talk about a pur­

Page 5 of 57

Intuitive Theories ple tiger flying through the sky in a small helicopter, even though we have never thought that thought before. Goodman, Tenenbaum, and Gerstenberg (2015) have argued that the compositionality and productivity of human thought can be adequately captured within a framework of probabilistic programs (see also Chater & Oaksford, 2013). Within this framework, a pro­ gram describes the step-by-step process of how worlds are generated by evaluating a se­ ries of functions. The input–output relations between the functions dictate the flow of the program. A function whose output serves as input to another function needs to be evalu­ ated first. What makes a program probabilistic is the fact that randomness is injected into the functions. Thus, each time the program is run, the generative process might take a different route, depending on what random choices were made. As a result, the repeated execution of a probabilistic program generates a probability distribution over possible worlds (Goodman, Mansinghka, Roy, Bonawitz, & Tenenbaum, 2008). Gerstenberg and Goodman (2012) have shown how a compact probabilistic program that represents people’s intuitive understanding of a simple domain (table tennis tournaments) accurately explains people’s inferences based on a multitude of different pieces of evidence (such as how strong different players are, based on who beat whom in a series of games). To make things more concrete, let us illustrate the difference between the representation­ al power of a CBN and a probabilistic program by an example of modeling people’s intu­ itive understanding of physics. Sanborn, Mansinghka, and Griffiths (2013) developed a CBN model of how people infer the masses of two colliding objects. Their model incorpo­ rated uncertainty about the relevant physical properties and demonstrated a close fit with people’s judgments. With a few additions, the model also captured people’s causal judg­ ments about whether a particular collision looked causal or not. The model further ex­ plained some deviations of people’s judgments from the normative predictions of Newton­ ian physics—which had traditionally been interpreted as evidence for the operation of heuristic biases—in terms of rational inference on a Newtonian physics model, assuming that people are uncertain about the relevant physical properties. These results are im­ pressive. However, at the same time, the scope of the model is quite limited. For example, the model would need to be changed to license inferences about scenes that feature more than two objects. A much more substantial revision would be required if the model were to be used to make predictions about events in two dimensions rather than one. Not only are the speed with which objects move and the spatiotemporal aspects of the collision im­ portant for people’s causal impression, but also the direction in which the objects move after the collision (White, 2012b; White, Chapter 14 in this volume). An alternative account for capturing people’s intuitive physical reasoning was proposed by Battaglia et al. (2013). They share Sanborn et al.’s (2013) assumption that people’s in­ tuitive understanding of physics approximates some aspects of Newtonian mechanics. However, rather than modeling this knowledge in terms of a CBN, Battaglia

(p. 519)

et al.

(2013) stipulate that people’s intuitive theory of physics is akin to a physics engine used to render physically realistic scenes. A physics engine is a program designed to efficiently simulate the interaction of physical objects in a way that approximately corresponds to Page 6 of 57

Intuitive Theories the predictions of Newtonian mechanics. According to Battaglia et al.’s (2013) account, people make predictions and inferences about physical events by running mental simula­ tions on their internal physics engine. Assuming that people have uncertainty about some of the relevant parameters then naturally generates a probabilistic program. This ap­ proach is not limited to making specific inferences (such as the mass of an object) in spe­ cific situations (such as collisions between two objects), but yields predictions about many kinds of questions we might want to ask about physical scenes.

What Are Intuitive Theories Good For? Now we have a sense of what intuitive theories are and some idea about how they might be modeled in terms of probabilistic, generative programs (for more details, see Goodman et al., 2015). But what are intuitive theories actually good for? Conceiving of intuitive the­ ories in terms of probabilistic, generative models allows us to explain a diverse set of cog­ nitive skills as computational operations defined over these programs. Let us illustrate the power of this approach by way of a simple example in the physical domain. Consider the schematic diagram of a collision event between two billiard balls, A and B, as depict­ ed in Figure 27.1. Both balls enter the scene from the right at time point t1. At t2, the two balls collide. Ball B bounces off the wall shortly afterward, before it eventually enters through an gate in the walls at t4.

Figure 27.1 Schematic diagram of a collision event between two billiard balls, A and B. The solid lines indicate the balls actual movement paths. The dashed line indicates how ball B would have moved if ball A had not been present in the scene.

First, intuitive theories support prediction. Imagine that the time was stopped at t2 and we are wondering whether ball B will go through the gate. We can use the generative model to simulate what is likely going to happen in the future by conditioning on what we have observed, such as the state of the table and the trajectories on which A and B trav­ eled until they collided. Uncertainty enters our predictions in different ways. For exam­ Page 7 of 57

Intuitive Theories ple, we might have perceptual uncertainty about where exactly ball A struck ball B. We might also have uncertainty about how ball B is going to collide with the wall. Anyone who has tried a bank shot on a pool table knows that our calculations are sometimes off. However, with practice, accuracy improves dramatically, as can be witnessed by watching professional pool players.1 Finally, we may also have some general uncertainty about what will happen after this point in time. Will another ball enter the scene and potentially knock ball B off course? Will someone tilt the table? Will the gate suddenly close? All these factors will depend on our more specific understanding of this particular domain, such as whether we’ve seen other balls entering the scene before or how reliably the gate stays open. Second, intuitive theories support inference (cf. Meder & Mayrhofer, Chapter 23 in this volume). Imagine you checked your phone as the clip started and you only began paying attention at t2. Having observed the motion paths of the balls from t2 onward allows us to infer where the balls likely came from. Again, we might be somewhat uncertain about the exact location, as there are in principle an infinite number of ways in which the balls could have ended up colliding exactly in the way that they did. However, out of all the possible ways, we will deem some more plausible than others (see Smith & Vul, 2014). Third, we can use our intuitive understanding of the domain as a guide for action. Imag­ ine that you are the “gatekeeper” and have to make sure that B doesn’t go through. At t1, you might not see any reason to intervene. However, at t3 you might get seriously worried that B is actually going to make it this time. Now you might have different actions to pre­ vent that from happening, such as throwing another ball, tilting or bouncing the table, or running around the table to catch the ball with your hand. You can use hypothetical rea­ soning to plan (p. 520) your action so as to minimize effort. If I were to bounce the table from this side, would that be sufficient to divert B so that it won’t go through the gate? How hard would I need to bounce the table? Maybe it’s safer to throw another ball? But what are the chances that I’m going to actually hit B and knock it off its course? Fourth, generative models support counterfactual inferences. For example, having ob­ served what actually happened, we might wonder afterward what would have happened if the balls hadn’t collided. Would ball B have gone through the gate anyhow? Again, we can use our intuitive understanding of the domain to get an answer. We first need to take into account what we’ve actually observed (as shown by the solid lines in Figure 27.1). We then realize the truth of the counterfactual antecedent (i.e., that the balls did not collide) by means of a hypothetical player making a shot. For example, we could imagine that we picked up ball A from the table shortly before the collision would have happened. We then predict what would have happened in this counterfactual scenario. In our case, ball B would have continued on its straight path and missed the gate. Because we have ob­ served the whole episode, we can be pretty certain about the counterfactual outcome. We know that no additional balls entered the scene and that no one tilted the table. Finally, generative models can be used for explanation. If someone asked you why ball B went through the gate, one sensible response would be to say it went through the gate Page 8 of 57

Intuitive Theories because it collided with ball A. This notion of explanation is tightly linked to causality and, in particular, to a conception of causality which says that what it means to be a cause is to have made a difference in one way or another. In order to figure out whether a particular event made a difference in a given situation, we need to compare what actually happened with the outcome in a relevant counterfactual situation. We can cite the colli­ sion as a cause for the outcome because it made a difference. If the balls hadn’t collided, then ball B wouldn’t have gone through the gate. The same operation reveals which things didn’t make a difference to the outcome and would thus not satisfy us as explana­ tions of the outcome. For example, imagine someone said that ball B went through the gate because ball A was gray. This explanation strikes us as bad because ball A’s color made no difference to the outcome whatsoever. Even if A’s color had been yellow instead of gray, ball B would have gone through the gate in exactly the same way. Explanations not only pick out events that actually made a difference, but they tend to pick out “the” cause among the multitude of factors that were each “a” cause of the out­ come. While it is true that ball B wouldn’t have gone through the gate if the top wall hadn’t been there, we are less inclined to say that the wall caused the ball to go through the gate (Cheng & Novick, 1992; Hilton & Slugoski, 1986; see also Cheng & Lu, Chapter 5 in this volume). Explanations distinguish between causes and enabling conditions (Cheng & Novick, 1991; Kuhnmünch & Beller, 2005) and generally pick out events that we consider worth talking about (Hilton, 1990). In the example that we have used, we employed our generative theory to give an explana­ tion for a particular outcome. We can also use our intuitive theory to provide explanations on a more general level. Imagine that we observed several rounds and noticed that, gen­ erally, when both balls are present, ball B tends to miss the gate. In contrast, when ball A is absent, B goes through the gate almost all the time. We may thus say, on a general lev­ el, that A prevents B from going through the gate. However, even if A generally tends to prevent B from going through the gate, we would still say that B went through the gate because of ball A in the particular example shown in Figure 27.1. Thus, general causal statements that are based on repeated observations can dissociate from the particular causal statement that pertains to the situation at hand. So far, we have focused on one particular setup: collision events between billiard balls. However, as illustrated earlier, the power of an intuitive theory that is represented on a sufficiently abstract level (such as an approximate physics simulation engine) is that it supports the same kinds of judgments, actions, and explanations we have described for this particular situation for any kind of situation within its domain of application. There are infinitely many ways in which two billiard balls collide with each other, and we ought to be able to say for each situation whether ball A caused ball B to go through the gate or prevented it from going through. And of course, we should also be able to make similar judgments when more than two balls are involved (Gerstenberg, Goodman, Lagnado, & Tenenbaum, 2015), or other obstacles are present in the scene, such as walls, rough patches, or even teleports (Gerstenberg, Goodman, Lagnado, & Tenenbaum, 2014). In­ deed, since we have defined the functions operating on intuitive theories simply in terms Page 9 of 57

Intuitive Theories of conditioning and intervening on a generative applied to a completely different context.

(p. 521)

model, the same functions can be

Consider the tower of blocks shown in Figure 27.2 b. Using the same intuitive theory of physics (see Figure 27.2 a), we can predict what will happen if the tower is struck from one side (prediction), can think about from what direction the wind must have blown af­ ter having seen the top block lying on the ground (inference), can put another block on the tower without making it fall (action), can consider what would happen if the bottom left block were removed (hypothetical), and can say that the top block doesn’t fall be­ cause it is supported by the block underneath (explanation).

Intuitive Physics and Causal Judgments So far, we have discussed what intuitive theories are, how to model them, and what we can do with them. We have illustrated some of these ideas by thinking about colliding bil­ liard balls and towers of blocks. In this section we will use people’s understanding of physics as a case study for exploring the ideas behind intuitive theories more thoroughly. We will compare different theoretical approaches to modeling people’s intuitive under­ standing of physics and discuss some of the empirical studies that have motivated these accounts. We will then apply what we have learned about people’s intuitive understand­ ing of physics to explain how people make causal judgments.

Intuitive Physics There is evidence for a foundational understanding of physics from very early in develop­ ment. Infants already expect that two solid objects cannot occupy the same space and that objects don’t just suddenly disappear and reappear but persist over time (Bail­ largeon, Spelke, & Wasserman, 1985; Spelke, 1990). They infer hidden causes of effects (Saxe et al., 2005) and integrate spatial and temporal information to make predictions about future events (Téglás et al., 2011). Over the course of childhood, our intuitive un­ derstanding of physics grows to become more and more sophisticated (for reviews, see Baillargeon, 2004; Spelke, Breinlinger, Macomber, & Jacobson, 1992). Characterizing people’s intuitive understanding of physics is a challenging task (Hayes, 1985). A complete account will have to explain how it is possible for humans to be very apt at interacting with the physical world, while at the same time, when probed more ex­ plicitly, some of our intuitive physical concepts appear fundamentally at odds with classi­ cal physics (cf. Kozhevnikov & Hegarty, 2001; Levillain & Bonatti, 2011; Shanon, 1976; Zago & Lacquaniti, 2005). In this section, we will first summarize theoretical accounts that have focused on explaining the systematic ways in which people’s intuitive physical understanding diverges from the physical laws. We will then discuss more recent work which argues that people’s intuitive understanding of physics may be modeled in analogy to physics engines that are used to create physically realistic animations.

Page 10 of 57

Intuitive Theories Impetus Theory and Qualitative Reasoning In the 1980s, empirical findings cast doubt on the accuracy of people’s intuitive under­ standing of physics. McCloskey and colleagues revealed (p. 522) several ways in which people’s predictions about physical events were off (McCloskey, Washburn, & Felch, 1983; see also, DiSessa, 1982; Zago & Lacquaniti, 2005). In particular, people had difficulty rea­ soning about projectile motion, such as when a ball rolls off a cliff (Kaiser, Proffitt, & Mc­ Closkey, 1985), or circular motion, such as when a ball whirled at the end of a string is re­ leased (McCloskey, Caramazza, & Green, 1980). In the case of projectile motion, many participants tended to draw a path according to which the ball continues its horizontal motion beyond the cliff, and only begins to fall down sometime later. The correct re­ sponse, however, is that the ball will fall down in a parabolic arc. In the case of circular motion, many participants believed that when the ball is released, it will continue to fly in a curvilinear way before its path eventually straightens out. Here, the correct response is that the ball will fly in a straight line as soon as it is released.

Figure 27.2 Using an intuitive theory of physics to understand a visual scene.

McCloskey (1983) explains people’s systematic errors by appealing to a naïve theory of motion. Accordingly, people’s intuitive understanding of how objects move is more similar to a medieval impetus theory than to what would be predicted by classical physics. Impe­ tus theory is characterized by two key ideas. First, objects are set in motion by imparting an impetus to the object, which subsequently serves as an internal force generating the object’s motion. Second, a moving object’s impetus gradually dissipates until the object eventually comes to a halt. Impetus theory explains people’s answers for the projectile motion and circular motion problems discussed in the preceding. By endowing the ball rolling off the cliff with an internal impetus, we can make sense of the ball’s initial resis­ tance to gravity. Only when its impetus has dissipated will gravity cause it to fall down. Similarly, if a ball that is whirled has acquired a circular impetus, it takes time for that circular impetus to dissipate before the ball will eventually continue to move along a straight path. Evidence for people’s naïve theory of motion was gathered by having participants predict the motion of objects in diagrams depicting physical scenes, and by having participants explain their responses in extended interview studies. The striking similarities between people’s responses in these interviews and the writings of medieval impetus theorists Page 11 of 57

Intuitive Theories suggests that our naïve theory of motion is likely to be the result of how we experience ourselves as agents interacting with objects (see also, White, 2012a). For example, people also have the impression that when a moving ball A collides with a stationary B, that A ex­ erted more force on B than vice versa, even though the force transfer is actually symmet­ rical (White, 2006, 2009). This perceived force asymmetry in collision events may result from people experiencing themselves as agents who exert force on other objects. The ex­ perienced resistance by these objects may often be smaller than the experienced force ex­ erted on the object. McCloskey (1983) also argued that people’s core theory of motion is surprisingly consistent. Some of the individual differences in people’s predictions can be explained as resulting from different beliefs about exactly how impetus dissipates, or how an object’s impetus interacts with external forces such as gravity. Impetus theory draws a qualitative distinction between objects at rest (no impetus) and moving objects that have impetus. In classical physics there is no such distinction. In the absence of external forces, a moving object remains in motion and does not slow down. Fully specifying a physical scene by using the laws of classical physics requires detailed information about the objects and forces at play. However, in many situations, we are able to make qualitative predictions about how a system may change over time without having access to information at a level of detail that would be required to derive predictions based on classical physics. For example, if we heat a pot of water on a stove, we know that the water will eventually boil—even though we don’t know exactly when it will hap­ pen. Several accounts have been proposed that aim to capture people’s intuitive under­ standing of physics in terms of qualitative reasoning principles (DiSessa, 1993; Forbus, 2010; Kleer & Brown, 1984). Forbus’s (1984) qualitative process theory (QPT) states that people’s intuitive physical theory is organized around physical processes that bring about qualitatively different states. Accordingly, people’s intuitive domain knowledge is represented as a mental mod­ el that supports qualitative simulations about the different states a particular physical system may reach. A mental model is characterized by the entities in its domain, the qual­ itative relationships that hold between the different entities, the processes that bring about change, and the preconditions that must be met for the processes to unfold (For­ bus, 1993, 2010; Gentner, 2002). According to QPT, people think about physical systems in terms of qualitative processes that lead the system from one state to another. For ex­ ample, through increasing temperature, water is brought to boil. The differential equa­ tions that classical physics requires to (p. 523) model spatiotemporally continuous processes are replaced with a qualitative mathematics that yields predictions about how a system may behave, based on partial knowledge of the physical scene. A number of principles guide how people’s physical understanding is modeled according to QPT (Forbus, 2010). Rather than representing relevant quantities numerically (such as the amount of water in the pot, or the exact temperature of the stove), qualitative repre­ sentations are discretized. Qualitatively different values are represented that are of rele­ vance. For water, its freezing point and boiling point are particularly relevant for under­ standing its behavior. Qualitative physical models are more abstract than their classical Page 12 of 57

Intuitive Theories counterparts. Instead of precisely quantifying a physical process, qualitative models rep­ resent processes in terms of sign changes. For example, when modeling how the water level in a leaking bathtub changes over time when the shower is on, a qualitative model would simply capture whether the level is increasing, decreasing, or constant, without representing the exact rate of change. Finally, by abstracting away from more detailed in­ formation about the relevant physical variables, a qualitative model often makes ambigu­ ous predictions. Qualitative models outline a space of possible states that a system may reach. For the bathtub example, a qualitative model would predict that the bathtub could be completely empty, overflowing with water, or anywhere in between. However, it would not allow us make exact predictions about how the water level would change as a func­ tion of the size of the leak and of how much water comes out of the shower. While classical physical equations such as F = ma are non-causal and symmetric (we also could have written it as m = F/a; cf. Mochon & Sloman, 2004), QPT provides an account of people’s causal reasoning that is grounded in the notion of a directed physical process that leads from cause to effect. We will discuss process theories of causation in more de­ tail in the following.

From Noisy Newtons to a Mental Physics Simu­ lation Engine Most of the research reviewed in the previous section has probed people’s naïve under­ standing of physics by asking questions about diagrammatic displays of physical scenes. However, even when dynamic stimuli were used rather than static images, people’s judg­ ments in some situations were still more in line with impetus theory rather than what would be predicted by classical physics (Kaiser, Proffitt, Whelan, & Hecht, 1992; Smith, Battaglia, & Vul, 2013). More recently, research in intuitive physics has revisited the idea that people’s under­ standing of physics may be best described in terms of some more fine-grained quantita­ tive approximation to aspects of Newtonian physics. Importantly, this research assumes, in line with the qualitative reasoning work discussed earlier, that people have uncertainty about the properties of the physical scene. As briefly mentioned in the preceding, San­ born et al. (2013) have shown how such a noisy Newtonian model adequately captures people’s inferences about object masses as well as causal judgments in simple collision events. Their model further explained what was often interpreted to be a biased judgment as a consequence of rational inference over a noisy model that incorporates uncertainty about the relevant physical properties. Michotte (1946/1963) found that people have a stronger causal impression when the velocity of the initially stationary projectile object was slightly lower than the velocity of the initially moving motor object. Their causal im­ pression was lower when the projectile object’s velocity was higher than that of the motor object. Michotte (1946/1963) was puzzled by the fact that people’s causal impression wasn’t increasing with the magnitude of the effect that the motor object had on the pro­ jectile object. However, if we assume that people’s intuitive understanding of physics and Page 13 of 57

Intuitive Theories their causal judgments are closely linked, then this effect is to be expected. If both ob­ jects are inanimate, have the same mass, and are placed on a even surface, it is physically impossible for the projectile object’s velocity to be greater than that of the motor object. In contrast, the reverse is possible, provided that there is some uncertainty about whether the collision was perfectly elastic. While the collisions of billiard balls are close to being elastic, collisions between most objects are not, and some of the kinematic ener­ gy is transformed into heat or object transformation. Thus, the asymmetrical way in which deviations of the projectile object’s velocity from the motor object’s velocity affect people’s causal impressions can be explained as a rational inference in a situation in which we are uncertain about some of the relevant physical properties. Sanborn et al.’s (2013) Noisy Newton account models people’s judgments as inferences over a probabilistic, graphical model that includes variables which express people’s un­ certainty about some of the parameters. As discussed at the beginning (p. 524) of the chapter, this model does a very good job of capturing people’s inferences about object mass as well as their causal judgments. However, the model is limited in its range of ap­ plication. Probabilistic graphical models do not generalize well beyond the task that they were built for (cf. Gerstenberg & Goodman, 2012; Goodman et al., 2015; Tenenbaum et al., 2007). Since then, researchers have explored the idea that people’s intuitive understanding of physics may be best explained in analogy to a physics engine in a computer program that simulates realistic physical interactions. While the Noisy Newton model introduces ran­ dom variables in the graphical model to capture people’s uncertainty, a deterministic physics simulation model can be made probabilistic by introducing noise into the system. For example, when extrapolating a ball’s motion, a deterministic physics engine says for each point in time exactly where the ball will be. However, when we try to predict what will happen, we have some uncertainty about exactly where the ball will go. By introduc­ ing noise into the physics simulation, we can capture this uncertainty. Rather than giving an exact value of a where the ball will be at each point in time in the future, a noisy physics simulation model returns a probability distribution over possible positions. In or­ der to get these probabilities, we generate many samples from our noisy physics engine. Because of the random noise that is injected into the system, each sample looks a little different. The whole sample then induces a probability distribution over possible future states. For example, in the near future, the ball tends to be roughly at the same point in each of our noisy samples since there simply weren’t that many steps yet in the simula­ tion to introduce noise. Thus, the noisy simulation model will make a strong prediction about where the ball will be in the near future. However, when asked to make a predic­ tion about where the ball will be later, the model yields a much weaker prediction. Since there were more time steps at which noise was introduced into the system, the outcomes of the simulations are more varied. Smith and Vul (2013) set out to investigate more closely what sorts of noise in people’s mental physical simulations best explains their actions in a simple physics game. In this game, participants saw a moving ball on a table similar to our billiard balls as shown in Page 14 of 57

Intuitive Theories Figure 27.1. Part of the table was then occluded, and participants were asked to move a paddle up or down such that they would intercept the ball when it re-emerged from the occluded part. Smith and Vul (2013) tested different sources of uncertainty: (1) perceptu­ al uncertainty about the ball’s position and the direction of its velocity when the occluder occurred, and (2) dynamic uncertainty about how the ball bounces off the edges of the ta­ ble and how it moves along the surface of the table. They found that people’s actions were best explained by assuming that dynamic noise is a greater factor in people’s men­ tal simulations than perceptual noise. For example, the extent to which participants’ pad­ dle placement was off increased strongly with the number of bounces that happened be­ hind the occluder and not so strongly with the mere distance traveled. More recently, Smith and Vul (2014) also showed that the same simulation model also explains people’s diagnostic inferences about what path a ball must have taken to arrive at its current posi­ tion. In a similar task, Smith, Dechter, Tenenbaum, and Vul (2013) had participants judge whether the ball was going to first hit a green or a red patch on tables with different con­ figurations of obstacles. The earlier that participants correctly predicted which patch will be hit, the more reward they received. Hence, participants were encouraged to continu­ ously update their predictions as the clip unfolded. Smith, Dechter, et al. (2013) found that the noisy Newtonian simulation model captured participants’ predictions very accu­ rately for most of the trials. However, there was also a number of trials in which it was physically impossible for the ball to get to one of the patches. Whereas participants tend­ ed to make their predictions very quickly on these trials, the simulation model took time to realize the impossibility of reaching a certain patch. This result suggests that people sometimes use more qualitative reasoning about what is possible and what is impossible to assist their physical predictions (Forbus, 2010). While the previous studies focused on people’s understanding of collisions in relatively simple two-dimensional worlds, Battaglia et al. (2013) ran a series of experiments to demonstrate different kinds of inferences people make about towers of blocks similar to the one shown in Figure 27.2 b. In one of their experiments, participants saw a configura­ tion of blocks and time was paused. They were asked to judge whether the tower was sta­ ble. For trials in which participants received feedback, time was then switched on and participants saw whether the tower was in fact stable or whether some of the blocks fell. Battaglia et al. (2013) modeled judgments by assuming that people have access to an in­ tuitive physics engine (similar to the actual physics engine (p. 525) that was used to gen­ erate the simulations), which they can use to mentally simulate what is going to happen. The model assumes that the gradedness in people’s judgments stems from perceptual un­ certainty about the exact location of the blocks, as well as dynamical uncertainty about how exactly the physical interactions are going to unfold. Battaglia et al.’s (2013) account nicely illustrates some of the key differences between modeling people’s intuitive understanding of physics as noisy, mental simulations versus qualitative physical reasoning. The two approaches differ most strongly in the way in which they represent people’s uncertainty about the physical scene. The qualitative rea­ Page 15 of 57

Intuitive Theories soning approach uses discretization and abstraction to arrive at a symbolic representa­ tion that only captures some of the aspects of the physical situation. In contrast, the noisy simulation approach deals with uncertainty in a very different way. It maintains a richer physical model of the situation and captures people’s uncertainty by putting noise on dif­ ferent parameters. In each (mental) simulation of the physical scene, the outcome is de­ termined by the laws of physics as approximately implemented in the physics engine that was used to generate the scene. By running many simulations, we get a probability distri­ bution over possible future scenes because each simulation is somewhat different due to the noise introduced to aspects of the situation that the observer is uncertain about. The more uncertain we are about the physical properties of the scene, the more varied the probability distribution over future outcomes will be. By expressing uncertainty in this way, the noisy simulation model accounts not only for qualitative judgments, such as whether or not the tower is going to fall, but also for judg­ ments that require quantitative precision, such as in which direction the tower is most likely to fall. Battaglia et al. (2013) further showed that neither people nor their model had difficulty doing the same types of judgments when the weights or shapes of the blocks were varied, when physical obstacles where added to the scene, or when they were asked to reason about what would happen if the table that the tower rested on was bumped from different directions. The noisy simulation model also makes predictions on the level of cognitive processing that were recently confirmed. Hamrick, Smith, Griffiths, and Vul (2015) showed that people take longer to make judgments in situations in which the outcome is more uncertain—a finding that fits with the idea that people simulate more (i.e., draw more samples from their mental simulation model) when the outcome is uncertain. We have now seen some empirical evidence for what an intuitive theory of physics is good for. By assuming that people’s intuitive theory of physics is similar to a noisy physics en­ gine, we can explain how people make predictions about future events (Battaglia et al., 2013; Sanborn et al., 2013; Smith & Vul, 2013), inferences about the past (Smith & Vul, 2014), and take actions to achieve their goals (Smith, Battaglia, & Vul, 2013; Smith, Dechter, et al., 2013). In the following we will show that people can also make use of their intuitive theory of physics to reason about counterfactuals and to give explanations for what happened.

Causal Judgments We will now apply what we have learned about people’s intuitive understanding of physics to explain how people make causal judgments. Before doing so, let us briefly re­ view some of the philosophical and psychological literature on causality to get a sense of what it is that we need to explain.

Process Versus Dependency Accounts of Causation

Page 16 of 57

Intuitive Theories Philosophical Background In philosophy, there are two broad classes of theories of causation (Beebee, Hitchcock, & Menzies, 2009). According to process theories of causation, what it means for an event C to cause another event E is for there to be some physical quantity that is transmitted along a spatiotemporally continuous process from C to E (Dowe, 2000; Salmon, 1984). The paradigm case is a collision of two billiard balls in which ball A transfers its momen­ tum to ball B via the collision event. According to dependency theories of causation, what it means for C to cause E is for there to be some kind of dependence between C and E. Some dependency theories propose a probabilistic criterion such that for C to be a cause of E, C must increase the probability that E happens (Suppes, 1970). Here, we will focus on the criterion of counterfactual dependence. According to a counterfactual theory, for C to have caused E, both C and E must have happened, and E would not have happened if C had not happened (Lewis, 1973, 1979). The CBN approach we have discussed earlier is an example of a dependency theory of causation. There, the notion of a counterfactual inter­ vention is important: C is a cause of E if E would change in response to an intervention on C (Pearl, 2000; Woodward, 2003). Note that philosophers of causation are not only con­ cerned with providing an account of causation that corresponds to people’s (p. 526) intu­ itive judgments, they care deeply about other aspects as well, such as the ontological plausibility of their account, and whether it is possible to reduce causation to counterfac­ tual dependence or certain types of physical processes. Let us get some intuition about the significance of these different theories by discussing two exemplary cases, each of which is easily dealt with by one theory but is problematic for the other one. Consider the schematic diagram shown in Figure 27.3 a. Both balls A and B enter the scene from the right. Ball E is stationary in front of the gate. Ball A hits ball E and E goes through the gate. Ball B doesn’t touch ball E. However, if ball A had been absent from the scene, ball B would have hit E and E would have still gone through the gate, albeit slightly differently. This is a case of pre-emption. The collision event be­ tween ball A and E pre-empts a collision event between B and E that would have hap­ pened just a moment later, and which would have resulted in the same outcome. The intu­ ition is that ball A caused ball E to go through the gate, whereas ball B did not. Cases of pre-emption are easily dealt with by process accounts but are problematic for dependen­ cy accounts. According to a process theory, A qualifies as the cause because there is a spatiotemporally continuous process through which A transfers momentum to E, which results in E going through the gate. In contrast, there is no actual process that connects B and E in any way. Simple dependency theories have no way of distinguishing between the two balls. Ball E would have gone through the gate even if either ball A or ball B had been absent from the scene.

Page 17 of 57

Intuitive Theories

Figure 27.3 Schematic diagrams of physical interac­ tions between billiard balls.

Now let’s consider the case shown in Figure 27.3 b. Here, all three balls enter the scene from the right. E travels along a straight path, and no ball ever interacts with it. However, something interesting happens in the background. Ball A’s trajectory is such that it’s about to intersect with E. Ball B, however, hits ball A, and neither of the balls end up in­ teracting with E. Cases like these are known as situations of double prevention. B pre­ vents A, which would have prevented E from going through the gate. Clearly, B played an important causal role. However, process accounts have difficulty accounting for this since there is no continuous process that connects B and E. Dependency accounts have no trou­ ble with this case. Since E would not have gone through the gate if B hadn’t collided with A, B is ruled in as a cause of E’s going through the gate.

Psychological Research Inspired by the different philosophical attempts of analyzing causality, psychologists have tested which type of theory better explains people’s causal learning, reasoning, and attri­ bution. In a typical causal learning experiment, a participant is presented with a number of variables, and the task is to figure out what the causal connections between the vari­ ables are by observing and actively intervening in the system (Meder et al., 2010; Rottman, Chapter 6 in this volume; Waldmann & Hagmayer, 2005; Steyvers, Tenenbaum, Wagenmakers, & Blum, 2003). Sometimes participants are already provided with the can­ Page 18 of 57

Intuitive Theories didate structure, but they are asked to estimate how strong the causal relationships be­ tween the variables are (Cheng, 1997; LePelley, Griffiths, & Beesley, Chapter 2 in this vol­ ume; Shanks & Dickinson, 1987; Waldmann & Holyoak, 1992). The CBN framework provides a unified account for explaining people’s judgments about causal strength (Griffiths & Tenenbaum, 2005) as well as their inferences about how dif­ ferent candidate variables are structurally related (for a review, see Rottman & Hastie, 2013). However, there is also evidence that people not only care about the dependency between events when making causal inferences but consider mechanistic information, too (p. 527) (e.g., Ahn & Kalish, 2000; Ahn, Kalish, Medin, & Gelman, 1995; Johnson & Ahn, Chapter 8 in this volume). Children, in particular, are more likely to draw conclusions about a causal relationship in the presence of a plausible mechanism (Muentener, Friel, & Schulz, 2012; Muentener & Bonawitz, Chapter 33 in this volume; Schlottmann, 1999). Guided by the normative CBN framework, research into causal learning has mostly fo­ cused on providing people with covariation information. However, we know that people use many more sources of information to figure out causal relationships (Lagnado, Wald­ mann, Hagmayer, & Sloman, 2007). Temporal information is a particularly important cue since causes precede their effects. More recently, research has begun to investigate how people combine the many different sources of evidence to make causal inferences (Bram­ ley, Gerstenberg, & Lagnado, 2014; Lagnado & Sloman, 2004, 2006; Rottman & Keil, 2012). Besides investigating how people learn about causal relationships, psychologists have al­ so studied how we use our general causal knowledge to make causal judgments about particular events, such as in the billiard ball cases shown in Figure 27.3 (cf. Danks, Chap­ ter 12 in this volume). Research has shown that people’s causal judgments are particular­ ly sensitive to information about causal processes. Several studies have looked into situa­ tions in which the predictions of process and dependency accounts are pitted against each other (Chang, 2009; Lombrozo, 2010; Mandel, 2003; Shultz, 1982; Walsh & Sloman, 2011). Based on a comprehensive series of experiments with both adults and children from different cultures, Shultz (1982) concluded that people’s causal judgments are more in line with the predictions of process rather than regularity theories—a particular type of dependency theories (see also, Bender, Beller, & Medin, Chapter 35 in this volume). In­ spired by the early work of Hume (1748/1975), regularity theories predict that people learn about causal relationships by information about covariation and spatiotemporal con­ tiguity. Shultz (1982) found that participants’ judgments were more strongly affected by the presence of a plausible mechanism, as opposed to other dependency information such as the timing of events. Similarly, the results of a series of vignette studies by Walsh and Sloman (2011) demonstrated that manipulating process information had a stronger effect on people’s cause and prevention judgments than manipulating dependency information. In one sce­ nario, Frank and Sam are playing ball. Frank accidentally kicks the ball toward a neighbor’s house. Sam is initially blocking the ball’s path but gets distracted and steps out of the way. The ball hits the neighbor’s window and smashes it. The majority of partic­ Page 19 of 57

Intuitive Theories ipants (87%) answered the question of whether Frank caused the window to shatter posi­ tively, whereas only a small proportion of participants (24%) agreed that Sam caused the window to smash. While dependency theories have difficulty marking a difference be­ tween Frank and Sam (since the outcome counterfactually depended on both of their ac­ tions), process theories correctly predict that Frank will be seen as a cause, but not Sam. In another series of vignette studies, Lombrozo (2010) found that the actors’ intentions had a significant influence on causal judgments in situations of double prevention and pre-emption. Intentions create a strong dependence relationship between actor and out­ come (Heider, 1958; Hommel, Chapter 15 in this volume; Malle, 2008). If Brian intends to kill Jack and his first shot misses, he is most likely going to shoot another time to achieve his goal. If Brian accidentally shot at Jack and missed, then he certainly won’t shoot again. Lombrozo (2010) found that when both the transference cause (equivalent to the unobserved cause of ball E’s motion in our example in Figure 27.3 b) and the dependence cause (equivalent to ball B) were intentional, participants tended to agree that each of them caused the outcome. However, manipulating intentions had a stronger effect on the dependence cause. While participants’ rating of the transference cause was high whether it was intentional or accidental, the dependence cause was seen as less causal when it was accidental rather than intentional. Research into causal judgments has suffered from a lack of formally specified theories that yield quantitative predictions. Researchers have mostly relied on comparing qualita­ tively whether causal judgments change between experimental manipulations. Within the class of dependency theories, the CBN framework has been employed to yield formal defi­ nitions of actual causation (Danks, Chapter 12 in this volume; Halpern & Pearl, 2005; Hitchcock, 2009) and, more recently, these accounts have been extended to give graded causal judgments by considering default states of variables (Halpern, 2008; Halpern & Hitchcock, 2015), or assign degrees of responsibility when multiple causes are at play (Chockler & Halpern, 2004; Lagnado, Gerstenberg, & Zultan, 2013; Lagnado & Gersten­ berg, Chapter 29 in this volume). Within the framework of process theories, Wolff (2007; Wolff & Thorstad, Chapter 9 in this volume) has developed a force dynamics account inspired by work in linguistics (Talmy, 1988). The core idea is that causal events involve the interaction of two parties, an agent and a patient. People’s use of different causal terms such as “caused,” “prevent­ (p. 528)

ed,” or “helped” is explained in terms of the configuration of forces that characterize the interaction between agent and patient. For example, the force dynamics model predicts that people will say that the agent “caused” the patient to reach an end state if the pa­ tient did not have a tendency toward the end state, the agent and patient forces com­ bined in such a way that the resulting force pointed toward the end state, and the patient actually reached the end state. People are predicted to say “helped” instead of “caused” when the patient already had a tendency toward the end state. In line with Forbus’s quali­ tative reasoning account discussed earlier, the force dynamic model yields qualitative predictions about what word people should use in a given situation. However, it does not

Page 20 of 57

Intuitive Theories make any quantitative predictions. It is silent, for example, about what makes a really good cause or at what point a “cause” becomes a “helper.” Overall, it is fair to say that existing empirical work on causal judgments doesn’t leave us with a very clear picture. Information about both dependence and processes affects people’s judgments, but the extent to which it does appears to vary between studies. In the case of double prevention, some studies find that participants treat double preventers as causes (Chang, 2009; Lombrozo, 2010; Sloman, Barbey, & Hotaling, 2009; Wolff, Bar­ bey, & Hausknecht, 2010) whereas others do not (Goldvarg & Johnson-Laird, 2001; Walsh & Sloman, 2011). The disparity of the empirical findings reflects the philosophical strug­ gles of finding a unified conception of cause (Paul & Hall, 2013; Strevens, 2013; White, 1990). Indeed, some have given up hope of finding a unified concept of causality and have consequently endorsed the idea that there are two (Hall, 2004) or several fundamentally different concepts of causality (Boddez, De Houwer, & Beckers, Chapter 4 in this volume; De Vreese, 2006; Godfrey-Smith, 2009; Lombrozo, 2010). Others, in contrast, hold on to the idea that the plurality of causal intuitions can be unified into a singular conception of causality (Schaffer, 2005; Williamson, 2006; Woodward, 2011). In the following, we will argue for the latter position: understanding causal judgments in terms of (different) coun­ terfactual contrasts defined over intuitive theories helps reconcile the different views.

Bridging Process and Dependency Accounts We believe that the notion of causes as difference-makers, as conceptualized in depen­ dency theories of causation, is primary and that we can capture the intuitions behind process theories of causation in terms of difference-making at the right level of analysis (cf. Schaffer, 2005; Woodward, 2011). Next, we propose an account that is inspired by Lewis’s (2000) response to criticisms of his earlier counterfactual theory of causation (Lewis, 1973, 1979). Consider again, the case of pre-emption as depicted in Figure 27.3 b. It is true that there is no counterfactual dependence between the presence of ball A and whether or not E ends up going through the gate. However, there is a counterfactual de­ pendence on a finer level of granularity—a level that doesn’t merely consider absence or presence of the balls, but is concerned with the exact way in which the outcome event came about, including temporal and spatial information. Lewis (2000) coined this finer no­ tion of counterfactual dependence causal influence. Ball A exerts a causal influence on ball E: if ball A had struck ball E slightly differently—at a different angle, with a different speed, or at a different point in time—the relevant outcome event of E going through the gate would have been slightly different, too. E would have gone through the gate at a dif­ ferent location, at a different speed, or at a different point in time. Ball B, in contrast, did not exert any causal influence on ball E on this level of granularity. Even if B’s position had been slightly different from what it actually was, E would still have gone through the gate exactly in the same way in which it did in the actual situation. While we take Lewis’s (2000) idea as a point of departure, our proposed account differs from his in two important ways: first, Lewis tried to provide an account that reduces cau­ sation to counterfactual dependence and a similarity ordering over possible worlds. In line with more recent work in philosophy of causation (e.g., Woodward, 2003), we believe Page 21 of 57

Intuitive Theories that causation cannot be reduced, but that the concept of actual causation is best under­ stood in terms of counterfactuals defined over an intuitive (causal) theory of the world (Halpern & Pearl, 2005; Pearl, 2000). Second, Lewis believed that conceptualizing causa­ tion as influence replaced the earlier idea of thinking about counterfactual dependence on the (p. 529) coarser level of absences and presences. However, we will show that both conceptions of counterfactual dependence are key to understanding people’s causal judg­ ments.

A Counterfactual Simulation Model of Causal Judgments We have developed a counterfactual simulation model (CSM) of causal judgments that aims to combine the key insights from process and dependency accounts of causation (Gerstenberg et al., 2012; Gerstenberg, Goodman, et al., 2014, 2015). The CSM starts off with the basic assumption that in order for a candidate cause (which could be an object or an agent) to have caused an outcome event, it must have made a difference to the out­ come. Consider a simple causal chain as shown in Figure 27.4 a. Ball E and ball A are ini­ tially at rest. Ball B then enters the scene from the right, hits ball A, which subsequently hits ball E, and E goes through the gate. To what extent did balls A and B cause ball E to go through the gate? Intuitively, both B and A made a difference to the outcome in this situation. The CSM cap­ tures this intuition in the following way. For each ball, we consider a counterfactual world in which we had removed the ball from the scene. We then evaluate, using our intuitive physical model of the domain, whether the outcome event would have been any different from what it actually was. More formally, we can express this criterion in the following way:

(1)

To determine our subjective degree of belief that a candidate cause (C) was a differencemaker (DM) for the outcome event (∆e), we first condition on what actually happened in the situation S (i.e., where the balls entered the scene, how they collided, that ball B went through the gate, the position of the walls, etc.). We then consider the counterfactual world in which we had removed the candidate cause C from the scene. Then, we evaluate whether the outcome in the counterfactual world (∆e′) would have been different from the outcome in the actual world (∆e). The ∆ sign means that we represent the outcome event of interest on a fine level of granularity—that is, we care about the exact way in which E went through the gate (or missed the gate), which includes spatiotemporal information. It is easy to show that both balls A and B were difference-makers according to this criteri­ on. If ball B had not been present in the scene, then E would not have gone through the gate at all. If we had removed ball A from the scene, E would have gone through the gate differently from how it actually did.

Page 22 of 57

Intuitive Theories This criterion of difference-making distinguishes candidate objects that were causes of the outcome from objects that were not. For example, a ball that (p. 530) is just lying in the corner of the room and never interacts with any of the other balls would be ruled out. Removing that ball from the scene would make no difference at all to when and where E went through the gate. If a candidate cause passed this strict criterion of difference-mak­ ing, then the CSM considers four different aspects of causation that jointly determine the degree to which the candidate is perceived to have caused the outcome of interest (see Figure 27.5). Let us illustrate these different aspects of causation by focusing on the ex­ ample of the causal chain.

Figure 27.4 Illustration of the different types of counterfactual contrasts that serve as tests to cap­ ture different aspects of causation.

Figure 27.5 Different aspects of causation that the counterfactual simulation model captures in terms of counterfactual contrasts. Note: The different aspects are defined in Equations 1–5.

Page 23 of 57

Intuitive Theories

Figure 27.6 Schematic diagrams of collision events. Solid lines show the ball’s actual trajectories; the dashed line shows the trajectory ball B would have moved on if it had not collided with ball A.

Whether-Cause To determine our subjective degree of belief that B was a whether-cause of E’s going through the gate (Figure 27.4 b), we consider a counterfactual situation in which B was removed from the scene, and evaluate whether the outcome would have been different from what it actually was:

(2)

Notice that when considering whether-causation, the outcome event (e) is defined at a coarser level of granularity. We are merely interested in whether or not E would have gone through the gate if the candidate cause would have been removed from the scene (remove(C))—we don’t care about the more detailed spatiotemporal information. For the causal chain, the answer is pretty simple. E would definitely not have gone through the gate if B had been removed from the scene. Thus, we are certain that B was a whethercause of E’s going through the gate. Ball A, in contrast, was not a whether-cause. E would have gone through the gate even if ball A had not been present. For the causal chain, determining whether each candidate was a whether-cause of E’s go­ ing through the gate was easy. However, this need not be the case. Consider the three clips shown in Figure 27.6. In Figure 27.6 a, it is pretty clear that ball B would have missed the gate if ball A had been removed from the scene. Thus, we are relatively cer­ tain that A was a whether-cause of B’s going through the gate in this case. In Figure 27.6 b, the situation is less clear. We don’t know for sure what the outcome would have been if ball A had been removed from the scene. Finally, in Figure 27.6 c, it is pretty obvious that B would have gone through the gate even if ball A had not been present in the scene. Ball A was not a whether-cause of B’s going through the gate in this case. How can we model people’s uncertainty about the outcome in the relevant counterfactual situation in which the candidate cause had been removed from the scene? In line with previous work discussed earlier, we assume that people’s intuitive understanding of this domain can be expressed in terms of a noisy model of Newtonian physics. With this as­ sumption, we can determine the counterfactual probability P (e′≠ e|S, remove(C)) in the following way: we generate a number of samples from the physics engine that was used to create the stimuli. Each sample exactly matches what actually happened up until the Page 24 of 57

Intuitive Theories point at which the two (p. 531) balls collide. At this point, we remove the candidate cause from the scene and let the counterfactual world unfold. For each sample, we introduce some noise into the underlying physics model by applying a small perturbation to B’s di­ rection of motion at each time step. By generating many noisy samples, we get a distribu­ tion over the outcome in the counterfactual world. In some of the noisy samples, B goes through the gate, in others it misses. We can then use the proportion of samples in which B ended up going through the gate to predict people’s subjective degree of belief that B would have gone through the gate if ball A hadn’t been present in the scene. We have shown that people’s causal judgments for clips like the ones shown in Figure 27.6 are well accounted for by our model of whether-causation (cf. Gerstenberg et al., 2012). In our experiment, participants viewed a number of clips that varied in terms of whether B went through the gate or missed it, and how clear it was what would have hap­ pened if ball A had not been present in the scene. One group of participants made coun­ terfactual judgments. They judged whether B would have gone through the gate if ball A had not been present in the scene. Another group of participants made causal judgments. They judged to what extent A caused B to go through the gate or prevented it from going through. We found that participants’ counterfactual judgments were very well accounted for by the noisy Newton model. Participants’ causal judgments, in turn, followed the pre­ dictions of our model of whether-causation very accurately. The more certain participants were that the outcome would have been different if ball A had been removed from the scene, the more they said that A caused B to go through the gate (or prevented it from going through in cases in which it missed). For the clips in Figure 27.6, participants’ causal judgments were high for (a), intermediate for (b), and low for (c). The tight coupling between causal and counterfactual judgments in Gerstenberg et al. (2012) provides strong evidence for the role of counterfactual thinking in causal judg­ ments. However, since each of the clips that participants saw was somewhat different, it could still be possible, in principle, to provide an account of people’s judgments solely in terms of what actually happened. In another experiment (Gerstenberg, Goodman, et al., 2014), we demonstrated that whether-causation is indeed a necessary aspect of people’s causal judgments. This time, we created pairs of clips that were identical in terms of what actually happened, but dif­ fered in what would have happened if ball A had been removed from the scene. Figure 27.7 shows two pairs of clips. In both clips (a) and (b), the collision and outcome events are identical. Both clips differ, however, in what would have happened if ball A had not been present in the scene. In (a), ball B would have been blocked by the brick. In (b), ball B would have gone through the gate even if ball A had not been present. Participants judged that A caused B to go through the gate for (a) but not for (b). Similarly, for the two clips shown in (c) and (d), participants judged that A prevented B from going through the gate in (c) but not in (d). B would have gone through the gate if A hadn’t been present in (c). In (d), B would have not gone through the gate even if A had not been present—it would have been blocked by the brick. The fact that participants’ judgments differ dra­ Page 25 of 57

Intuitive Theories matically for clips in which what actually happened was held constant demonstrates that whether-causation is a crucial aspect of people’s causal judgments.

How-Cause Some counterfactual theories of causation try to capture people’s causal judgments sim­ ply in terms of what we have termed whether-causation. Indeed, much of the empirical work discussed earlier has equated counterfactual theories of causation with a model that merely considers whether-causation, and has contrasted this model with process models of causation that are more sensitive to the way in which the outcome actually came about. We believe that the strict dichotomy that is often drawn between counterfactual and process theories of causation is misguided. From the research reported in the pre­ ceding, it is evident that people care about how events actually came about. However, this does not speak against counterfactual theories of causation. It merely suggests that only considering whether-causation is not sufficient for fully expressing people’s causal intuitions. Counterfactual theories are flexible—they can capture difference-making on different levels of analysis. So far, we have focused on whether-causation: the question of whether the presence of the cause made a difference to whether or not the effect of inter­ est occurred. Here, we will show how the CSM captures the fact that people also care about how the outcome came about. Consider again the example of the simple causal chain shown in Figure 27.4. Participants give a high causal rating to ball B in that case, and an intermediate rating to ball A (see (p. 532) Gerstenberg, Goodman, et al., 2015, for details). If people’s causal judgments were solely determined by considering whether-causation, then the fact that A is seen as having caused the outcome to some degree is surprising. The presence of A made no dif­ ference as to whether or not E would have gone through the gate. While only B was a whether-cause, both A and B influenced how E ended up going through the gate. The CSM defines how-causation in the following way:

Page 26 of 57

Intuitive Theories

Figure 27.7 Schematic diagrams of collision events. The balls’ actual trajectories are shown as solid ar­ rows and the counterfactual trajectory of ball B is shown as dashed arrow. In the top row, B goes through the gate. In the bottom row, B misses the gate. On the left side, A made a difference to the out­ come (broadly construed). On the right side, A made no difference.

(3)

We model our subjective degree of belief that a candidate cause (C) was a how-cause of the outcome, by considering a counterfactual situation in which C was somewhat changed (change(C)), and checking whether the outcome (finely construed, that is, including infor­ mation about when and where it happened) would have been any different in that situa­ tion (∆e′ ≠ ∆e). Notice that in contrast to difference-making and whether-causation, where we considered what would have happened if the candidate cause had been removed from the scene, we need a different kind of counterfactual operation for how-causation. While the remove operation is pretty straightforward, the change operation is somewhat more flexible. For this particular domain, we can think of the change operation as a small perturbation to the candidate cause’s spatial location before the causal event of interest happened (see Figure 27.4 c). A cause is a how-cause if we believe that this small perturbation would have made a difference in exactly how the outcome of interest happened. By taking into account both whether-causation and how-causation, we can make sense of participants’ causal judgments in the causal chain. Ball B receives a high judgment be­ cause it was both a whether-cause and a how-cause of E’s going through the gate. Ball A receives a lower judgment because it was only a how-cause, but not a whether-cause, of E’s going through the gate.

Page 27 of 57

Intuitive Theories Considering how-causation also allows us to make sense of other empirical phenomena that are troubling for a simple counterfactual model that only relies on whether-causa­ tion. While whether-causation and how-causation often go together, (p. 533) they can be dissociated in both ways. In the case of the causal chain, we saw an example where a ball can be a how-cause but not a whether-cause. There are also situations in which a cause is a whether-cause but not a how-cause. In the double prevention clip shown in Figure 27.3 b, participants gave a relatively low causal rating to ball B even though they were sure that E would not have gone through the gate if ball B had not been present in the scene. In this situation, B was a whether-cause but not a how-cause. Ball E would have gone through the gate exactly in the way in which it did, even if ball B had been somewhat changed. If people care about both whether-causation and how-causation, then this ex­ plains why ball B gets relatively low causal rating in this case.

Sufficient-Cause Sufficiency is often discussed alongside necessity as one of the fundamental aspects of causation (e.g., Downing, Sternberg, & Ross, 1985; Hewstone & Jaspars, 1987; Jaspars, Hewstone, & Fincham, 1983; Mackie, 1974; Mandel, 2003; Pearl, 1999; Woodward, 2006). Our notion of whether-causation captures the necessity aspect. The CSM defines suffi­ ciency in the following way:

(4)

To evaluate whether a candidate cause (C ) was sufficient for the outcome (e) to occur in the circumstances (S), we imagine whether the outcome (broadly construed) would still have happened (e’ = e) even if all other candidate causes had been removed from the scene (remove(\C), see Figure 27.4 d). Applying this definition to the causal chain, we see that ball B was sufficient for E’s going through the gate. Ball E would have gone through the gate even if ball A had been removed from the scene. Ball A, in contrast, was not suf­ ficient for the outcome. E would not have gone through the gate if ball B had been re­ moved from the scene. Taking sufficiency into account helps to account for the fact that participants give equally high causal judgments to each ball in situations of joint causation and overdetermination (see Figure 27.8). A model that only considers how-causation and whether-causation is forced to predict higher causal ratings in the case of joint causation than in the situation of overdetermination. In both situations, the two candidate causes are how-causes of the outcome. In the case of joint causation, both balls are whether-causes but not sufficientcauses of the outcome. In contrast, for overdetermination, both balls are sufficient-causes but not whether-causes of the outcome. If we assume that participants’ causal judgments are equally strongly affected by whether-causation and sufficient-causation, we can make sense of the fact that their judgments are equally high in both cases.

Page 28 of 57

Intuitive Theories Robust-Cause Some causal relationships are more robust than others. Causal relationships are robust to the extent that they would have continued to hold even if the conditions in this particular situation had been somewhat different (cf. Lewis, 1986; Woodward, 2006). The CSM de­ fines robustness in the following way:

(5)

A candidate cause (C) is a robust cause of the outcome (broadly construed) in the situa­ tion (S) (p. 534) to the extent that we believe that the outcome would have still come about (e′ = e) even if all the other candidate causes had been somewhat different (change(\C)). Intuitively, the more factors a particular relationship between a candidate cause the outcome depends on, the more sensitive the cause was.

Figure 27.8 Schematic diagrams of situations of joint causation (each ball is necessary, and both balls are jointly sufficient) and overdetermination (each ball is sufficient).

Taking into account robustness allows us to explain why ball A in the pre-emption sce­ nario receives such a high rating (Figure 27.3 a). Not only was ball A a how-cause that was sufficient to bring about the outcome, it was also a very robust cause. If we changed the other (pre-empted) candidate cause (ball A), then ball E would still have gone through the gate exactly in the same way that it did. In contrast in the causal chain scenario (Fig­ ure 27.4 a), the initial cause (ball B) was less robust for E’s going through the gate. In a counterfactual situation in which ball A had been slightly different, there is a good chance that ball E would not have gone through the gate anymore (Figure 27.4 e). We have tested the predictions of the CSM in a challenging experiment that included 32 different clips (Gerstenberg, Goodman, et al., 2015). A version of the model that com­ bined how-causation, whether-causation, and sufficiency in an additive manner explained participants’ causal judgments best. Including robustness as an additional factor in the model did not improve the model fit significantly.

Page 29 of 57

Intuitive Theories

Discussion How do people make causal judgments about physical events? What is the relationship between people’s general intuitive understanding of physics and the specific causal judg­ ments they make for a particular situation? We have discussed attempts that aim to express people’s intuitive understanding of physics qualitatively. More recently, successful attempts have been made to capture people’s intuitive understanding of physics as approximately Newtonian. In particular, the assumption that people’s intuitive theory of physics can be represented as a probabilistic, generative model has proven very powerful. It explains how people make predictions about the future by sampling from their generative model of the situation, infer what must have happened by conditioning on their observations, and make counterfactual judgments by simulating what the likely outcome would have been if some of the candi­ date causes had been removed or altered. We have seen subsequently how people can use their intuitive understanding of physics to make causal judgments by contrasting what actually happened with the outcome in differ­ ent counterfactual worlds. Much of previous philosophical and psychological work has ar­ gued for multiple notions of causation and explicitly has contrasted process theories with dependency theories of causation. We have argued that both views can be reconciled. The counterfactual simulation model (CSM) adequately predicts people’s causal judgments for simple collision events by assuming that people’s judgments reflect their subjective de­ gree of belief that the candidate cause made a difference to whether the outcome oc­ curred. By contrasting situations that matched what actually happened and only differed in what would have happened in the relevant counterfactual world, we established that people’s causal judgments are intrinsically linked to counterfactual considerations. By looking into more complex scenes that featured several collisions, we showed that people care not only about whether the candidate cause made a difference to whether or not the outcome occurred, but also about how the outcome came about and whether the candi­ date cause was individually sufficient. Good causes are whether-causes, how-causes, and sufficient for bringing about the outcome in a robust way. The CSM bridges process and dependency accounts in several ways. First, it assumes that people have an intuitive understanding of the physical domain that can be character­ ized as approximately Newtonian. This generative model specifies the causal laws that are required to simulate what would have happened in the relevant counterfactual world. Second, the CSM acknowledges that people’s causal judgments are not simply deter­ mined by whether-dependence but are influenced by how-dependence, sufficiency, and ro­ bustness as well. Our model is thus closely in line with a proposal by Woodward (2011, p. 409), who argued that “geometrical/mechanical conceptions of causation cannot replace difference-making conceptions in characterizing the behavior of mechanisms, but that some of the intuitions behind the geometrical/mechanical approach can be captured by thinking in terms of spatio-temporally organized difference-making information.” In con­ trast to previous work on causal judgment, the CSM yields quantitative predictions Page 30 of 57

Intuitive Theories through defining graded concepts of counterfactual contrasts that jointly influence people’s causal judgments. The CSM can account for inter-individual (p. 535) differences by assuming that people may differ in their assessment of the counterfactual contrasts the model postulates, as well as in how much weight they assign to the different contrasts when judging causation. The CSM also suggests a new angle for looking at the relationship between language and causation (cf. Solstad & Bott, Chapter 31 in this volume). Recall that Wolff’s (2007) force dynamics model explains the use of different causal expressions in terms of differences in force configurations. The difference between “caused” and “helped” is that in the case of “caused” the patient’s force was not directed toward the end state, whereas in the case of “helped” it was. The CSM suggests different ways in which “helped” (or “enabled”) might differ from “caused.” First, and similar to the idea in Wolff (2007), people might prefer “helped” to “caused” in situations in which they are unsure about whether the event actu­ ally made a difference to the outcome. Indeed, we have shown empirically that if partici­ pants believe that the outcome might have happened anyway, even if the causal event hadn’t taken place, they prefer to say it “helped,” rather than “caused,” the outcome to occur (Gerstenberg et al., 2012). Second, an event might be seen as having “helped” rather than “caused” an outcome when it was deficient in one way or another. We have seen earlier that good causes are characterized by whether-dependence, how-dependence, sufficiency, and robustness. For causes for which only some of these factors are true, we might prefer to say “helped” rather than “caused.” Consider, for example, the case of double prevention in which ball B is a whether-cause but not a how-cause (see Figure 27.3 b). In this case, people might be more happy to say that ball B “helped” ball E to go through the gate, rather than hav­ ing caused it to go through. Similarly, ball A in the causal chain is only a how-cause but not a whether-cause. Again, it seems better to say that ball A “helped” ball E to go through the gate, rather than “caused” it to go through (cf. Wolff, 2003). The CSM also suggests ways in which “helped” might differ from “enabled.”2 Intuitively, “enabled” is more strongly tied to whether-dependence. If ball A moves an obstacle out of the way for ball B to go through the gate, it seems appropriate to say that A “enabled” B to go through the gate (A was a whether-cause but not a how-cause). In contrast, if B is already headed toward the gate and A bumps into B to slightly speed it up, it seems like A “helped” rather than “enabled” B to go through the gate (in this case, A was a how-cause but not a whether-cause).

Intuitive Psychology and Causal Explanations In the previous sections, we have focused on people’s intuitive understanding of physics and how it supports people’s causal judgments about physical events. We will now turn our attention to people’s intuitive understanding of other people. In an interview with Harvey, Ickes, and Kidd (1978), the psychologist Edward Jones was asked whether the fu­ ture of attribution theory will see “a convincing integration of cognitive-experimental ap­ Page 31 of 57

Intuitive Theories proaches, such as the Bayesian approach and attributional approaches.” Jones’s answer was positive: he anticipated an “integration of attribution with information processing, a more mathematical or Bayesian approach” (1978, p. 385). However, this future had to wait. Discouraged by the fate of Bayes’s theorem as a seemingly inadequate model of judgment and decision-making (e.g., Kahneman, Slovic, & Tversky, 1982; Slovic, Fis­ chhoff, & Lichtenstein, 1977), early Bayesian approaches to attribution theory (Ajzen & Fishbein, 1975, 1983) were met with more critique than approval (Fischhoff, 1976; Fis­ chhoff & Lichtenstein, 1978; Jaspars et al., 1983). Yet, just as Bayesian approaches have had a remarkably successful revival as accounts of judgment and decision-making (see, e.g., Hagmayer & Fernbach, Chapter 26 in this volume; Hagmayer & Sloman, 2009; Kryn­ ski & Tenenbaum, 2007; Osman, Chapter 16 in this volume; Sloman & Hagmayer, 2006), they have been rediscovered as powerful accounts for explaining attribution (Hagmayer & Osman, 2012; Sloman, Fernbach, & Ewing, 2012). The anticipated future of an rap­ prochement between Bayesian and attributional approaches is finally underway.

An Intuitive Theory of Mind Heider and Simmel’s (1944) experiment in which participants were asked to describe ani­ mated clips of moving geometrical shapes is one of the hallmarks of attribution research. Rather than describing the clip in terms of the shapes’ physical movements, most partici­ pants explained what had happened by adopting an intentional stance (Dennett, 1987; Gergely & Csibra, 2003). As mentioned earlier, most participants perceived the shapes as intentional agents that acted according to their beliefs, desires, and goals. Indeed, many (p. 536) participants reported a rich causally connected story and endowed the shapes with complex personalities. Developmental psychologists have provided strong empirical support that even infants perceive others as goal-directed agents who are guided by a principle of rational action, according to which goals are achieved via the most efficient means available (Gergely, Nádasdy, Csibra, & Bíró, 1995; Hommel, Chapter 15 in this vol­ ume; Scott & Baillargeon, 2013; Sodian, Schoeppner, & Metz, 2004; but see also Ojalehto, Waxman, & Medin, 2013). How do adults (and infants) arrive at such a rich conception of other agents’ behavior? Empirical evidence and theoretical developments suggest that people’s inferences about other’s behavior are guided both by bottom-up processes, such as visual cues to animacy and intentional action (Barrett, Todd, Miller, & Blythe, 2005; Premack, 1990; Tremoulet & Feldman, 2006; Tremoulet, Feldman, et al., 2000; Zacks, 2004), as well as top-down processes that are dictated by intuitive theories (Uleman et al., 2008; Wellman & Gelman, 1992; Ybarra, 2002). While the fact that top-down processes are required to explain people’s inferences has been defended convincingly (Tenenbaum et al., 2006, 2007, 2011), there is still a heated debate about how these top-down processes feature in people’s understanding of other minds (e.g., Stich & Nichols, 1992). According to the the­ ory theory (Gopnik & Wellman, 2012, 1992), we understand others by means of an intu­ itive theory of how mental states, such as desires, beliefs, and intentions, interact to bring about behavior (cf. Malle, 1999). For example, when we see a person walking to­ ward a hot-dog stand, we might reason that she must be hungry and believes that the hotPage 32 of 57

Intuitive Theories dog she intends to buy will satiate her desire for food. In contrast, according to the simu­ lation theory (Goldman, 2006; Gordon, 1986, 1992) we explain behavior by putting our­ selves in the other person’s shoes and simulate what mental states we would have if we had acted in this way in the given situation. If I were to walk toward a hot-dog stand, I would probably be hungry and intend to get some food. While the last word in this debate certainly has not been spoken, recent empirical evidence favors the theory theory (Saxe, 2005). Most of the empirical support for the Heiderian view of the adult (or child) as intuitive theorist comes from developmental psychology (e.g., Gopnik et al., 2004; Gweon & Schulz, 2011; Schulz, 2012; for reviews, see Flavell, 1999; Saxe, Carey, & Kanwisher, 2004). Much of developmental research on theory of mind has focused on the false-belief task (Wimmer & Perner, 1983) in which participants are asked to anticipate how an actor will behave whose belief about the state of the world is incorrect (e.g., where Sally will look for a toy that has been moved from one location to another while she was away; see Wellman et al., 2001, for a meta-review). More recently, researchers have begun to also look at the inferences of adult participants in more challenging theory of mind tasks (Ap­ perly, Warren, Andrews, Grant, & Todd, 2011; Birch & Bloom, 2004, 2007; Epley, Keysar, Van Boven, & Gilovich, 2004; Kovács, Téglás, & Endress, 2010).

Modeling an Intuitive Theory of Mind A major advantage of the theory theory is that it lends itself to a precise computational implementation. In recent years, a number of accounts have been proposed that concep­ tualize people’s inferences about an agent’s goals or preferences in terms of an inverse decision-making approach (Baker et al., 2009; Goodman et al., 2006; Lucas, Griffiths, Xu, & Fawcett, 2009; Yoshida, Dolan, & Friston, 2008). Assuming that an intentional agent’s actions are caused by their beliefs and desires and are guided by a principle of rationality, we can invert this process using Bayes’s rule and infer an agent’s likely mental states by observing their actions. Building a computational theory of mind is a challenging task be­ cause unobservable mental states interact in complex ways to bring about behavior. Any particular action is consistent with a large set of possible beliefs, desires, and intentions (see Kleiman-Weiner, Gerstenberg, Levine, & Tenenbaum, 2015). These difficulties notwithstanding, Baker et al. (2009) have shown how the inverse plan­ ning approach can accurately capture people’s inferences about an actor’s goals. They use a simplified theory of mind according to which an agent’s action is influenced by her beliefs about the environment and her goals. What goals an agent may have are con­ strained by the setup of the environment. They further make the simplifying assumption that the agent has complete knowledge of the environment (see Figure 27.9 a a schematic of an intuitive theory of agents). The computational task is to infer an agent’s goal from her actions in a known environment. In their experiments, participants observe the movements of an actor in a 2D scene that features three possible goal states (see Figure 27.9). Participants are asked to (p. 537)

Page 33 of 57

Intuitive Theories indicate at different time points what they think the agent’s goal is. We invite the reader to take a moment before continuing, and to think about the agent’s goals in Figures 26.9 b and 26.9 c at the different time points. In Figure 27.9 b, the agent’s goal at t1 is com­ pletely ambiguous. We cannot be sure whether she is heading toward A, B, or C. Howev­ er, at t2, we can be relatively confident that the agent is not heading toward C. This infer­ ence follows from the principle of rationality: if the agent’s goal were C, then she would have taken a more direct path toward that goal. We are still unsure, however, about whether the agent is heading toward A or B. At t3 our uncertainty is resolved—as soon as the agent makes another step toward A, we are confident that this is the goal she is head­ ing for. Contrast this pattern of inferences with the situation in which the solid barrier is re­ placed with a barrier that has a gap (see Figure 27.9 c). In this situation, we rule out B as the agent’s goal at t1 already. If B had been the agent’s goal, then she would have walked through the gap in the barrier. This illustrates that our inferences about an actor’s goals not only are a function of their actual actions (which are identical in both situations), but also are markedly influenced by the state of the environment, which determines what al­ ternative actions an agent could have performed. In subsequent experiments, Baker et al. (2009) also showed how their account can handle cases in which the agent’s trajectory contradicts a simple view of rational action (e.g., when the agent heads toward B at t2 in Figure 27.9 c) by assuming that the agent’s goals might change over time or that the agent might have certain subgoals before reaching the final goal. The important role that the state of the environment plays in people’s attributions res­ onates well with a core distinction that Heider (1958) drew between what he called im­ personal causation and personal causation (cf. Malle, 2011; Malle, Knobe, O’Laughlin, Pearce, & Nelson, 2000). The key difference between these two notions of causality is the concept of intentional action (see also Lombrozo, 2010; Woodward, 2006). Whereas an in­ tentional actor adapts to the state of the environment in order to achieve his goal (an in­ stance of personal causation), a person who reaches a certain state in the environment accidentally would not have reached the same state if the environment had been some­ what different (an instance of impersonal causation). While personal causation implies equifinality—the same goal is reached via potentially different routes, impersonal causa­ tion (involving physical events or accidental behavior) is characterized by multifinality— different environmental conditions lead to different effects. Further experiments motivated by the inverse planning approach have shown that people are sensitive to configurations of the environment when inferring one agent’s social goals of avoiding/approaching (Baker, Goodman, & Tenenbaum, 2008). or helping/hindering an­ other agent (Ullman et al., 2009). While simple social heuristics, such as motion cues, go some way in predicting people’s inferences (e.g., avoidance generally motivates an in­ crease in physical distance; cf. Barrett et al., 2005; Zacks, 2004), such accounts lack the flexibility to capture the constraints that the environment imposes on behavior. For exam­ ple, it can sometimes be necessary to walk toward an agent one would like to avoid by fleeing through a door in the middle of a corridor. Furthermore, there are often multiple Page 34 of 57

Intuitive Theories ways in which one agent can help (or hinder) (p. 538) another agent to achieve his goals (e.g., remove an obstacle, suggest an alternative route, etc.). A rational actor will choose the most efficient action in a given situation to realize his (social) goal.

Figure 27.9 (a) A simple causal model of the rela­ tionships between the environment, an agent’s goal, and action. While the state of the environment and the agent’s actions are observed, the value of the goal variable is unknown and needs to be inferred. (b)–(c) Two stimulus examples adapted from Baker et al. (2009). An agent starts at x and moves along the dotted path. Participants are asked about the agent’s goal at different time points t1–t3.

In recent work, Jara-Ettinger, Gweon, Tenenbaum, and Schulz (2015) have extended the inverse planning approach into a framework they have coined the naïve utility calculus. In addition to inferring an agent’s mental states from her actions, we also make inferences about the costs associated with the action, as well as how rewarding the outcome must have been. In a series of experiments, Jara-Ettinger, Gweon, et al. (2015) have shown that children’s inferences about the preferences of an agent are sensitive to considerations of agent-specific costs and rewards. In one of their experiments, an agent chooses between a melon and a banana. On the first trial, the banana is more difficult to get to than the melon because it is placed on a higher pedestal. The agent chooses melon. On the second trial, the difficulty of getting to each fruit is matched. This time, the agent chooses the banana. When five- to six-year-old chil­ dren are subsequently asked which fruit the agent likes better, they correctly infer that the agent has a preference for the banana. Even though the agent chose each fruit exact­ ly once, children took into account that getting the banana on the first trial would have been more difficult. The trial in which the costs for both options are equal is more diag­ nostic for the agent’s preference. In another experiment, children made correct infer­ ences about an agent’s competence based on information about preferences. From ob­ serving an agent not taking the preferred treat when it is placed on the high pedestal, we can infer that the agent probably lacks the necessary skill to get it. Jara-Ettinger, Tenen­ baum, and Schulz (2015) also showed that children’s social evaluations are affected by in­ formation about how costly it would be for an agent to help. In situations in which two agents refused to help, children evaluate the less competent agent as nicer. Refusing to help when helping would have been easy reveals more about the person’s lack of motiva­ tion than when helping would have been difficult.

Page 35 of 57

Intuitive Theories While most of the previous work assumed that the agent has complete knowledge about the environment, some studies have looked into situations in which the agent can see on­ ly a part of her environment. For example, Baker, Saxe, and Tenenbaum (2011) have shown that participants have no difficulty in simultaneously inferring the beliefs and de­ sires of an agent in a partially observable environment. Furthermore, Jara-Ettinger, Bak­ er, and Tenenbaum (2012) demonstrated that by observing other people’s actions we not only can draw inferences about their mental states, but also can gain useful information about the state of the environment. If we notice how a man gets up from the dinner table next to ours before having finished his meal and walks upstairs, we can use this informa­ tion to infer the likely location of the bathrooms in the restaurant. How confident we are with our inference will depend on whether or not we think the man has been to the restaurant before (and whether there might be other reasons for going upstairs, such as making an important phone call). More generally, how much we can learn from others’ behavior depends on our assumptions about the agent’s knowledge state and his inten­ tions (Shafto, Goodman, & Frank, 2012). While assuming that the observed agent has an intention to teach us about the state of the world speeds up learning (Csibra & Gergely, 2009; Goodman, Baker, & Tenenbaum, 2009), we have to remain cautious because inten­ tions can be deceptive (Lagnado, Fenton, & Neil, 2013; Schächtele, Gerstenberg, & Lagnado, 2011).

Expressing Causal Explanations In an insightful epilogue to Jaspars, Fincham, and Hewstone’s (1983) volume on attribu­ tion research, Harold Kelley argued that the “common person’s understanding of a partic­ ular event is based on the perceived location of that event within a temporal ly ordered network of interconnected causes and effects” (p. 333, emphasis in original). Kelley iden­ tified five key properties of perceived causal structures that he characterized in terms of the following dichotomies: (1) simple–complex: the complexity of the causal relationship between different events encompasses the full range of one-to-one to many-to-many map­ pings; (2) proximal–distal: causes differ in terms of their location on the perceived causal chain of events that connects causes and effects; (3) past–future: perceived causal struc­ tures are organized according to the temporal order of events and support both reason­ ing about the past and the future; (4) stable–unstable: the causal relationships between events differ in terms of their stability; and (5) actual–potential: perceived causal struc­ tures not only represent what actually happened, but also support the perceiver’s imagi­ nation about what could have happened. Thinking of people’s intuitive understanding of the physical world and of other agents in terms of intuitive theories resonates well with Kelley’s proposal (p. 539) of perceived causal structures. It highlights that there is no direct mapping between covariation and causal attribution, as suggested by early research in attribution theory. Covariation is on­ ly one of the many cues that people use in order to construct a causally structured mental representation of what has happened (Einhorn & Hogarth, 1986; Lagnado, 2011; Lagnado et al., 2007). The perceived causal structure can subsequently be queried, for example by Page 36 of 57

Intuitive Theories comparing what actually happened with what would have happened under certain coun­ terfactual contingencies, to arrive at causal attributions (Kahneman & Tversky, 1982; Lipe, 1991). Thinking about causal attributions in these terms shifts the focus of interest toward what factors influence people’s causal representations of the world, such as tem­ poral information (Lagnado & Sloman, 2004, 2006) and domain knowledge (Abelson & Lalljee, 1988; Bowerman, 1978; Kelley, 1972; Mischel, 2004; Schank & Abelson, 1977; Tenenbaum et al., 2007). We have shown earlier how the counterfactual simulation model (CSM) adequately cap­ tures people’s causal judgments about collision events. The different aspects of causation in the CSM are defined on a sufficiently general level such that they can be applied to any generative model of a domain—including people’s intuitive theory of psychology (Mitchell, 2006; Wellman & Gelman, 1992). Consider the scenario depicted in Figure 27.10 (cf. Baker et al., 2011). An agent is about to grab some food for lunch from a food truck. The agent knows that there are three different food trucks: one with Mexican food, one with Lebanese food, and one with Korean food. However, there are only two parking spots for the food trucks, which are taken on a first-come, first-served basis. Baker et al. (2011) have shown that people can infer the agent’s preferences and beliefs merely based on the path that the agent walked. From the path in Figure 27.10, we can infer the agent’s complete preference order for the three trucks. He likes the Korean truck best, and the Lebanese truck more than the Mexican truck. We can explain the agent’s peeking around the corner by referring to his belief that the Korean truck might have been at parking spot 2. The principle of rational action implies that if the agent had known that the Korean truck wasn’t parked at spot 2, he wouldn’t have put in the effort to look around the corner. Instead, he would have directly gone for the Lebanese truck.

Page 37 of 57

Intuitive Theories

Figure 27.10 Food truck scenario in which an agent chooses which food truck to go to for lunch. The agents’ view of which truck is parked at parking spot 2 is blocked by a wall. The dotted line indicates the actual path that the agent took. Note: Numbers 1 and 2 indicate the two possible parking spots. M = Mexican food truck; L = Lebanese food truck. Figure adapted from Baker et al. (2011).

Thus, in analogy to the causal judgments in the physical domain, we can explain other people’s behavior in terms of counterfactual contrasts over our intuitive theory of psy­ chology. Within this framework, we have already shown that people’s attributions of re­ sponsibility are closely linked to their causal understanding of the situation (Gerstenberg, Halpern, & Tenenbaum, 2015; Gerstenberg & Lagnado, 2010, 2012; Lagnado et al., 2013; Zultan, Gerstenberg, & Lagnado, 2012) and their intuitive theory of how other people would (or should) have acted in a given situation (Allen, Jara-Ettinger, Gerstenberg, Kleiman-Weiner, & Tenenbaum, 2015; Gerstenberg, Ullman, Kleiman-Weiner, Lagnado, & Tenenbaum, 2014; Lagnado & Gerstenberg, Chapter 29 in this volume).

Conclusion and Future Directions We started off this chapter with some of the big questions that were motivated by children’s curiosity to figure out how the world works. Children rapidly develop an under­ standing of the world that is far beyond what can be captured by current approaches in artificial intelligence. Bridging the gap between human common-sense reasoning and ma­ chine intelligence requires acknowledging that people’s knowledge of the world is struc­ tured in terms of intuitive theories (Forbus, 1984; Gopnik & Wellman, 1992; Saxe, 2005; Wellman & Gelman, 1992), and that many cognitive functions can be understood as infer­ ences over these intuitive theories. We have argued that intuitive theories are best repre­ sented in terms of probabilistic, generative programs (Gerstenberg & Goodman, 2012; Goodman et al., 2015). We have provided (p. 540) empirical evidence for how understand­ ing intuitive theories in terms of probabilistic, generative models allows to make sense of Page 38 of 57

Intuitive Theories a wide array of cognitive phenomena (Chater & Oaksford, 2013; Danks, 2014). Because our intuitive theories are structured and generative, they support prediction, inference, action, counterfactual reasoning, and explanation for infinitely many possible situations. Focusing on people’s intuitive theory of physics and psychology, we have shown how people’s causal judgments can be understood in terms of counterfactual contrasts defined over their intuitive understanding of the domain. Conceptualizing causal judgments in this way provides a bridge between process and dependency accounts of causation. Our proposed counterfactual simulation model accurately captures people’s causal judgments about colliding billiard balls for a host of different situations, including interactions be­ tween two and three billiard balls with additional objects such as bricks. People’s infer­ ences about another agent’s goals or intentions can be explained by assuming that we have an intuitive theory that others plan and make decisions in a rational manner. The process of mental simulation plays a central role in this framework (Barsalou, 2009; Hegarty, 1992, 2004; Johnson-Laird & Khemlani, Chapter 10 in this volume; Kahneman & Tversky, 1982; Wells & Gavanski, 1989; Yates et al., 1988). It provides the glue between people’s abstract intuitive theories and the concrete inferences that are supported in a given situation through conditioning on what was observed (Battaglia et al., 2013) and imagining how things might have turned out differently (Gerstenberg et al., 2012; Ger­ stenberg, Goodman, et al., 2014, 2015). In the domain of intuitive physics, we have seen that people’s predictions are consistent with a noisy Newtonian framework that captures people’s uncertainty by assuming that our mental simulations are guided by the laws of physics, but that we are often uncertain about some aspects of the situation. Future re­ search needs to study the process of mental simulation more closely and investigate what determines the quality and resolution of people’s mental simulations (Crespi et al., 2012; Hamrick & Griffiths, 2014; Hamrick et al., 2015; Marcus & Davis, 2013; Schwartz & Black, 1996; Smith, Dechter, et al., 2013). One of the key challenges for the line of work discussed in this chapter is to understand how people come to develop their intuitive understanding of how the world works (Fried­ man, Taylor, & Forbus, 2009; Gershman, Chapter 17 in this volume). What are we initially endowed with, and how do our representations of the world change over time (Carey, 2009; Griffiths, Chapter 7 in this volume; Muentener & Bonawitz, Chapter 33 in this vol­ ume)? How can we best model the process of theory acquisition (Gopnik, 2010)? We have seen that the development of an intuitive theory of mind undergoes qualitatively different stages (Gopnik & Wellman, 1992), from an early theory that only considers goals and per­ ceptual access (Gergely & Csibra, 2003) to a full-fledged theory of mind that integrates beliefs, desires, and intentions (Bratman, 1987; Malle, 1999). We have suggested that people’s intuitive domain theories are best understood in terms of probabilistic, generative programs (Goodman et al., 2015). This raises the question of how such representations are learned (Tenenbaum et al., 2011). Computationally, this is known as the problem of program induction: learning a generative program based on da­ ta (e.g., Dechter, Malmaud, Adams, & Tenenbaum, 2013; Liang, Jordan, & Klein, 2010; Page 39 of 57

Intuitive Theories Rule, Dechter, & Tenenbaum, 2015). Program induction is difficult since it is severely un­ derconstrained: an infinite number of programs are consistent with any given data. Nev­ ertheless, recent work has demonstrated that the problem is feasible. Different empirical phenomena, such as number learning (Piantadosi, Tenenbaum, & Goodman, 2012), con­ cept learning (Goodman, Tenenbaum, et al., 2008; Stuhlmüller, Tenenbaum, & Goodman, 2010), or acquiring a theory of causality (Goodman, Ullman, & Tenenbaum, 2011), have been cast as learning an intuitive theory through searching a hypothesis space of differ­ ent possible programs that might have generated the data. This work has demonstrated how qualitative transitions in people’s knowledge can be explained in terms of transitions between programs of different complexity (Piantadosi et al., 2012). While some first at­ tempts have been made (Fragkiadaki, Agrawal, Levine, & Malik, 2015; Ullman, Stuhlmüller, Goodman, & Tenenbaum, 2014), further work is required to explain how peo­ ple arrive at their rich intuitive theories of how the world works.

References Abelson, R. P., & Lalljee, M. (1988). Knowledge structures and causal explanation. New York: New York University Press. Ahn, W.-K., & Kalish, C. W. (2000). The role of mechanism beliefs in causal reasoning. In F. Keil & R. Wilson (Eds.), Explanation and cognition (pp. 199–225). Cambridge, MA: Cam­ bridge University Press. Ahn, W.-K., Kalish, C. W., Medin, D. L., & Gelman, S. A. (1995). The role of covariation ver­ sus mechanism information in causal attribution. Cognition, 54(3), 299–352. Ajzen, I., & Fishbein, M. (1975). A Bayesian analysis of attribution processes. Psychologi­ cal Bulletin, 82(2), 261–277. Ajzen, I., & Fishbein, M. (1983). Relevance and availability in the attribution process. In J. M. Jaspars, F. D. Fincham, & M. Hewstone (Eds.), Advances in Experimental Social Psy­ chology (pp. 63–89). New York: Academic Press. Allen, K., Jara-Ettinger, J., Gerstenberg, T., Kleiman-Weiner, M., & Tenenbaum, J. B. (2015). Go fishing! Responsibility judgments when cooperation breaks down. In D. C. Noelle et al. (Eds.), Proceedings of the 37th Annual Conference of the Cognitive Science Society (pp. 84–89). Austin, TX: Cognitive Science Society. Apperly, I. A., Warren, F., Andrews, B. J., Grant, J., & Todd, S. (2011). Developmental conti­ nuity in theory of mind: Speed and accuracy of belief–desire reasoning in children and adults. Child Development, 82(5), 1691–1703. Baillargeon, R. (2004). Infants’ physical world. Current Directions in Psychological Science, 13(3), 89–94. Baillargeon, R., Spelke, E. S., & Wasserman, S. (1985). Object permanence in five-monthold infants. Cognition, 20(3), 191–208. Page 40 of 57

Intuitive Theories Baker, C. L., Goodman, N. D., & Tenenbaum, J. B. (2008). Theory-based social goal infer­ ence. In B. C. Love, K. McRae, & V. M. Sloutsky (Eds.), Proceedings of the 30th Annual Conference of the Cognitive Science Society (pp. 1447–1452). Austin, TX: Cognitive Science Society. Baker, C. L., Saxe, R., & Tenenbaum, J. B. (2009). Action understanding as inverse plan­ ning. Cognition, 113(3), 329–349. Baker, C. L., Saxe, R. R., & Tenenbaum, J. B. (2011). Bayesian theory of mind: Modeling joint belief-desire attribution. In L. Carlson, C. Hölscher, & T. Shipley (Eds.), Proceedings of the 33rd Annual Conference of the Cognitive Science Society (pp. 2469–2474). Austin, TX: Cognitive Science Society. Baron-Cohen, S., Leslie, A. M., & Frith, U. (1985). Does the autistic child have a “theory of mind”? Cognition, 21(1), 37–46. Barrett, H. C., Todd, P. M., Miller, G. F., & Blythe, P. W. (2005). Accurate judgments of in­ tention from motion cues alone: A cross-cultural study. Evolution and Human Behavior, 26(4), 313–331. Barsalou, L. W. (2009). Simulation, situated conceptualization, and prediction. Philosophi­ cal Transactions of the Royal Society B: Biological Sciences, 364(1521), 1281–1289. Battaglia, P. W., Hamrick, J. B., & Tenenbaum, J. B. (2013). Simulation as an engine of physical scene understanding. Proceedings of the National Academy of Sciences, 110(45), 18327–18332. Beebee, H., Hitchcock, C., & Menzies, P. (2009). The Oxford handbook of causation. New York: Oxford University Press. Birch, S. A., & Bloom, P. (2004). Understanding children’s and adults’ limitations in men­ tal state reasoning. Trends in Cognitive Sciences, 8(6), 255–260. Birch, S. A., & Bloom, P. (2007). The curse of knowledge in reasoning about false beliefs. Psychological Science, 18(5), 382–386. Borkenau, P. (1992). Implicit personality theory and the five-factor model. Journal of Per­ sonality, 60(2), 295–327. Bowerman, W. R. (1978). Subjective competence: The structure, process and function of self-referent causal attributions. Journal for the Theory of Social Behaviour, 8(1), 45–75. Bramley, N., Gerstenberg, T., & Lagnado, D. A. (2014). The order of things: Inferring causal structure from temporal patterns. In P. Bello, M. Guarini, M. McShane, & B. Scas­ sellati (Eds.), Proceedings of the 36th Annual Conference of the Cognitive Science Society (pp. 236–241). Austin, TX: Cognitive Science Society. Bratman, M. (1987). Intention, plans, and practical reason. Harvard University Press. Page 41 of 57

Intuitive Theories Butterfill, S. A., & Apperly, I. A. (2013). How to construct a minimal theory of mind. Mind & Language, 28 (5), 606–637. http://dx.doi.org/10.1111/mila.12036 doi: 10.1111/mila. 12036 Carey, S. (2009). The origin of concepts. Oxford: Oxford University Press. Chang, W. (2009). Connecting counterfactual and physical causation. In Proceedings of the 31th annual conference of the Cognitive Science Society (pp. 1983–1987). Austin, TX: Cognitive Science Society. Chater, N., & Oaksford, M. (2013). Programs as causal models: Speculations on mental programs and mental representation. Cognitive Science, 37(6), 1171–1191. Cheng, P. W. (1997). From covariation to causation: A causal power theory. Psychological Review, 104(2), 367–405. Cheng, P. W., & Novick, L. R. (1991). Causes versus enabling conditions. Cognition, 40, 83–120. Cheng, P. W., & Novick, L. R. (1992). Covariation in natural causal induction. Psychologi­ cal Review, 99(2), 365–382. Chockler, H., & Halpern, J. Y. (2004). Responsibility and blame: A structural-model ap­ proach. Journal of Artificial Intelligence Research, 22(1), 93–115. Christensen-Szalanski, J. J., & Willham, C. F. (1991). The hindsight bias: A meta-analysis. Organizational Behavior and Human Decision Processes, 48(1), 147–168. Crespi, S., Robino, C., Silva, O., & de’Sperati, C. (2012). Spotting expertise in the eyes: Billiards knowledge as revealed by gaze shifts in a dynamic visual prediction task. Journal of Vision, 12(11), 1–19. Csibra, G., & Gergely, G. (2009). Natural pedagogy. Trends in Cognitive Sciences, 13(4), 148–153. Danks, D. (2014). Unifying the mind: Cognitive representations as graphical models. Cam­ bridge, MA: MIT Press. Dechter, E., Malmaud, J., Adams, R. P., & Tenenbaum, J. B. (2013). Bootstrap learning via modular concept discovery. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence (pp. 1302–1309). Dennett, D. C. (1987). The intentional stance. Cambridge, MA: MIT Press. De Vreese, L. (2006). Pluralism in the philosophy of causation: Desideratum or not? Philo­ sophica, 77, 5–13. DiSessa, A. A. (1982). Unlearning Aristotelian physics: A study of knowledgebased learning. Cognitive Science, 6(1), 37–75. (p. 542)

Page 42 of 57

Intuitive Theories DiSessa, A. A. (1993). Toward an epistemology of physics. Cognition and Instruction, 10(2–3), 105–225. Dowe, P. (2000). Physical causation. Cambridge: Cambridge University Press. Downing, C. J., Sternberg, R. J., & Ross, B. H. (1985). Multicausal inference: Evaluation of evidence in causally complex situations. Journal of Experimental Psychology: General, 114(2), 239–263. Einhorn, H. J., & Hogarth, R. M. (1986). Judging probable cause. Psychological Bulletin, 99(1), 3–19. Epley, N., Keysar, B., Van Boven, L., & Gilovich, T. (2004). Perspective taking as egocen­ tric anchoring and adjustment. Journal of Personality and Social Psychology, 87(3), 327– 339. Fischhoff, B. (1976). Attribution theory and judgment under uncertainty. In J. H. Harvey, W. Ickes, & R. F. Kidd (Eds.), New directions in attribution research (Vol. 1) (pp. 421–452). Hillsdale, NJ: Lawrence Erlbaum Associates. Fischhoff, B., & Lichtenstein, S. (1978). Don’t attribute this to Reverend Bayes. Psycho­ logical Bulletin, 85(2), 239–243. Flavell, J. H. (1999). Cognitive development: Children’s knowledge about the mind. Annu­ al Review of Psychology, 50(1), 21–45. Fodor, J. A. (1975). The language of thought. Cambridge, MA: Harvard University Press. Forbus, K. D. (1984). Qualitative process theory. Artificial Intelligence, 24(1), 85–168. Forbus, K. D. (1993). Qualitative process theory: Twelve years after. Artificial Intelligence, 59(1), 115–123. Forbus, K. D. (2010). Qualitative modeling. Wiley Interdisciplinary Reviews: Cognitive Science, 2(4), 374–391. Fragkiadaki, K., Agrawal, P., Levine, S., & Malik, J. (2015). Learning visual predictive models of physics for playing billiards. arXiv preprint arXiv:1511.07404. Frazier, B. N., Gelman, S. A., & Wellman, H. M. (2009). Preschoolers’ search for explana­ tory information within adult–child conversation. Child Development, 80(6), 1592–1611. Friedman, S., Taylor, J., & Forbus, K. D. (2009). Learning naive physics models by analogi­ cal generalization. In B. Kokinov, K. Holyoak, & D. Gentner (Eds.), Proceedings of the sec­ ond international analogy conference (pp. 145–154). Sofia, Bulgaria: NBU Press. Gelman, S. A., & Legare, C. H. (2011). Concepts and folk theories. Annual Review of An­ thropology, 40(1), 379–398.

Page 43 of 57

Intuitive Theories Gentner, D. (2002). Psychology of mental models. In N. J. Smelser & P. B. Bates (Eds.), In­ ternational encyclopedia of the social and behavioral sciences (pp. 9683–9687). Amster­ dam: Elsevier Science. Gergely, G., & Csibra, G. (2003). Teleological reasoning in infancy: The naıve theory of ra­ tional action. Trends in Cognitive Sciences, 7(7), 287–292. Gergely, G., Nádasdy, Z., Csibra, G., & Bíró, S. (1995). Taking the intentional stance at 12 months of age. Cognition, 56(2), 165–193. Gerstenberg, T., & Goodman, N. D. (2012). Ping pong in church: Productive use of con­ cepts in human probabilistic inference. In N. Miyake, D. Peebles, & R. P. Cooper (Eds.), Proceedings of the 34th annual conference of the Cognitive Science Society (pp. 1590– 1595). Austin, TX: Cognitive Science Society. Gerstenberg, T., Goodman, N. D., Lagnado, D. A., & Tenenbaum, J. B. (2012). Noisy New­ tons: Unifying process and dependency accounts of causal attribution. In N. Miyake, D. Peebles, & R. P. Cooper (Eds.), Proceedings of the 34th annual conference of the Cogni­ tive Science Society (pp. 378–383). Austin, TX: Cognitive Science Society. Gerstenberg, T., Goodman, N. D., Lagnado, D. A., & Tenenbaum, J. B. (2014). From coun­ terfactual simulation to causal judgment. In P. Bello, M. Guarini, M. McShane, & B. Scas­ sellati (Eds.), Proceedings of the 36th annual conference of the Cognitive Science Society (pp. 523–528). Austin, TX: Cognitive Science Society. Gerstenberg, T., Goodman, N. D., Lagnado, D. A., & Tenenbaum, J. B. (2015). How, whether, why: Causal judgments as counterfactual contrasts. In D. C. Noelle et al. (Eds.), Proceedings of the 37th annual conference of the Cognitive Science Society (pp. 782– 787). Austin, TX: Cognitive Science Society. Gerstenberg, T., Halpern, J. Y., & Tenenbaum, J. B. (2015). Responsibility judgments in voting scenarios. In D. C. Noelle et al. (Eds.), Proceedings of the 37th annual conference of the Cognitive Science Society (pp. 788–793). Austin, TX: Cognitive Science Society. Gerstenberg, T., & Lagnado, D. A. (2010). Spreading the blame: The allocation of respon­ sibility amongst multiple agents. Cognition, 115(1), 166–171. Gerstenberg, T., & Lagnado, D. A. (2012). When contributions make a difference: Explain­ ing order effects in responsibility attributions. Psychonomic Bulletin & Review, 19(4), 729–736. Gerstenberg, T., Ullman, T. D., Kleiman-Weiner, M., Lagnado, D. A., & Tenenbaum, J. B. (2014). Wins above replacement: Responsibility attributions as counterfactual replace­ ments. In P. Bello, M. Guarini, M. McShane, & B. Scassellati (Eds.), Proceedings of the 36th annual conference of the Cognitive Science Society (pp. 2263–2268). Austin, TX: Cognitive Science Society.

Page 44 of 57

Intuitive Theories Glymour, C. N. (2001). The mind’s arrow: Bayes nets and graphical causal models. Cam­ bridge, MA: MIT Press. Godfrey-Smith, P. (2009). Causal pluralism. In H. Beebee, C. Hitchcock, & P. Menzies (Eds.), Oxford handbook of causation (pp. 326–337). New York: Oxford University Press. Goldman, A. I. (2006). Simulating minds. New York: Oxford University Press. Goldvarg, E., & Johnson-Laird, P. N. (2001). Naive causality: A mental model theory of causal meaning and reasoning. Cognitive Science, 25(4), 565–610. Goodman, N. D., Baker, C. L., Bonawitz, E. B., Mansinghka, V. K., Gopnik, A., Wellman, H., Schulz, L., & Tenenbaum, J. B. (2006). Intuitive theories of mind: A rational approach to false belief. In R. Sun, & N. Miyake (Eds.), Proceedings of the 28th Annual Conference of the Cognitive Science Society (pp. 1382–1387). Austin, TX: Cognitive Science Society. Goodman, N. D., Baker, C. L., & Tenenbaum, J. B. (2009). Cause and intent: Social reason­ ing in causal learning. In N. Taatgen, & H. van Rijn (Eds.), Proceedings of the 31st Annual Conference of the Cognitive Science Society (pp. 2759–2764). Austin, TX: Cognitive Science Society. Goodman, N. D., Mansinghka, V. K., Roy, D., Bonawitz, K., & Tenenbaum, J. B. (2008). Church: A language for generative models. In Uncertainty in Artificial Intelligence. Goodman, N. D., Tenenbaum, J. B., Feldman, J., & Griffiths, T. L. (2008). A rational analy­ sis of rule-based concept learning. Cognitive Science, 32(1), 108–154. Goodman, N. D., Tenenbaum, J. B., & Gerstenberg, T. (2015). Concepts in a probabilistic language of thought. In E. Margolis & S. Lawrence (Eds.), The conceptual mind: New (p. 543) directions in the study of concepts (pp. 623–653). Cambridge, MA: MIT Press. Goodman, N. D., Ullman, T. D., & Tenenbaum, J. B. (2011). Learning a theory of causality. Psychological Review, 118(1), 110. Gopnik, A. (2010). How babies think. Scientific American, 303(1), 76–81. Gopnik, A. (2012). Scientific thinking in young children: Theoretical advances, empirical research, and policy implications. Science, 337(6102), 1623–1627. Gopnik, A., Glymour, C., Sobel, D. M., Schulz, L. E., Kushnir, T., & Danks, D. (2004). A the­ ory of causal learning in children: Causal maps and Bayes nets. Psychological Review, 111(1), 3–32. Gopnik, A., & Wellman, H. (2012). Reconstructing constructivism: Causal models, Bayesian learning mechanisms, and the Theory Theory. Psychological Bulletin, 138(6), 1085–1108. Gopnik, A., & Wellman, H. M. (1992). Why the child’s theory of mind really is a theory. Mind & Language, 7(1–2), 145–171. Page 45 of 57

Intuitive Theories Gordon, R. M. (1986). Folk psychology as simulation. Mind & Language, 1(2), 158–171. Gordon, R. M. (1992). The simulation theory: Objections and misconceptions. Mind & Language, 7(1–2), 11–34. Griffiths, T. L., & Tenenbaum, J. B. (2005). Structure and strength in causal induction. Cognitive Psychology, 51(4), 334–384. Griffiths, T. L., & Tenenbaum, J. B. (2009). Theory-based causal induction. Psychological Review, 116(4), 661–716. Gweon, H., & Schulz, L. E. (2011). 16-month-olds rationally infer causes of failed actions. Science, 332(6037), 1524. Hagmayer, Y., & Osman, M. (2012). From colliding billiard balls to colluding desperate housewives: Causal Bayes nets as rational models of everyday causal reasoning. Synthese, 189(1), 17–28. Hagmayer, Y., & Sloman, S. A. (2009). Decision makers conceive of their choices as inter­ ventions. Journal of Experimental Psychology: General, 138(1), 22–38. Hall, N. (2004). Two concepts of causation. In J. Collins, N. Hall, & L. A. Paul (Eds.), Cau­ sation and counterfactuals. Cambridge, MA: MIT Press. Halpern, J. Y. (2008). Defaults and normality in causal structures. In Proceedings of the 11th Conference on Principles of Knowledge Representation and Reasoning (pp. 198– 208). Halpern, J. Y., & Hitchcock, C. (2015). Graded Causation and Defaults. British Journal for the Philosophy of Science, 66, 413–457. Halpern, J. Y., & Pearl, J. (2005). Causes and explanations: A structural-model approach. Part I: Causes. The British Journal for the Philosophy of Science, 56(4), 843–887. Hamrick, J. B., & Griffiths, T. L. (2014). What to simulate? Inferring the direction of men­ tal rotation. In P. Bello, M. Guarini, M. McShane, & B. Scassellati (Eds.), Proceedings of the 36th Annual Conference of the Cognitive Science Society. Austin, TX: Cognitive Science Society. Hamrick, J. B., Smith, K. A., Griffiths, T. L., & Vul, E. (2015). Think again? The amount of mental simulation tracks uncertainty in the outcome. In D. C. Noelle et al. (Eds.), Pro­ ceedings of the 37th Annual Conference of the Cognitive Science Society (pp. 866–871). Austin, TX: Cognitive Science Society. Hartshorne, J. K. (2013). What is implicit causality? Language, Cognition and Neuro­ science, 29(7), 804–824. Harvey, J. H., Ickes, W. J., & Kidd, R. F. (1978). New directions in attribution research (Vol. 2). Mahwah, NJ: Lawrence Erlbaum Associates. Page 46 of 57

Intuitive Theories Hayes, P. J. (1985). The second naive physics manifesto. In J. Hobbs & R. Moore (Eds.), Formal Theories of the Commonsense World. Norwood, NJ: Ablex. Hegarty, M. (1992). Mental animation: Inferring motion from static displays of mechani­ cal systems. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18(5), 1084–1102. Hegarty, M. (2004). Mechanical reasoning by mental simulation. Trends in Cognitive Sciences, 8(6), 280–285. Heider, F. (1958). The psychology of interpersonal relations. New York: John Wiley & Sons. Heider, F., & Simmel, M. (1944). An experimental study of apparent behavior. The Ameri­ can Journal of Psychology, 57(2), 243–259. Henderson, L., Goodman, N. D., Tenenbaum, J. B., & Woodward, J. F. (2010). The struc­ ture and dynamics of scientific theories: A hierarchical bayesian perspective. Philosophy of Science, 77(2), 172–200. Hewstone, M., & Jaspars, J. (1987). Covariation and causal attribution: A logical model of the intuitive analysis of variance. Journal of Personality and Social Psychology, 53(4), 663. Hilton, D. J. (1990). Conversational processes and causal explanation. Psychological Bul­ letin, 107(1), 65–81. Hilton, D. J., & Slugoski, B. R. (1986). Knowledge-based causal attribution: The abnormal conditions focus model. Psychological Review, 93(1), 75–88. Hitchcock, C. (2009). Structural equations and causation: Six counterexamples. Philo­ sophical Studies, 144(3), 391–401. Hume, D. (1748/1975). An enquiry concerning human understanding. Oxford: Oxford Uni­ versity Press. Jara-Ettinger, J., Baker, C. L., & Tenenbaum, J. B. (2012). Learning what is where from so­ cial observations. In N. Miyake, D. Peebles, & R. P. Cooper (Eds.), Proceedings of the 34th Annual Conference of the Cognitive Science Society (pp. 515–520). Austin, TX: Cognitive Science Society. Jara-Ettinger, J., Gweon, H., Tenenbaum, J. B., & Schulz, L. E. (2015). Children’s under­ standing of the costs and rewards underlying rational action. Cognition, 140, 14–23. Jara-Ettinger, J., Tenenbaum, J. B., & Schulz, L. E. (2015). Not so innocent: Toddlers’ infer­ ences about costs and culpability. Psychological Science, 26(5), 633–640. Jaspars, J., Hewstone, M., & Fincham, F. D. (1983). Attribution theory and research: The state of the art. In J. M. Jaspars, F. D. Fincham, & M. Hewstone (Eds.), Attribution theory Page 47 of 57

Intuitive Theories and research: Conceptual, developmental and social dimensions (pp. 343–369). New York: Academic Press. Jaspars, J. M., Fincham, F. D., & Hewstone, M. (1983). Attribution theory and research: Conceptual, developmental and social dimensions. New York: Academic Press. Jonze, S. (Director) (2013). Her. Annapurna Pictures. United States. Kahneman, D., Slovic, P., & Tversky, A. (1982). Judgment under uncertainty: Heuristics and biases. Cambridge: Cambridge University Press. Kahneman, D., & Tversky, A. (1982). The simulation heuristic. In D. Kahneman & A. Tver­ sky (Eds.), Judgment under uncertainty: Heuristics and biases (pp. 201–208). New York: Cambridge University Press. Kaiser, M. K., Proffitt, D. R., & McCloskey, M. (1985). The development of beliefs about falling objects. Attention, Perception, & Psychophysics, 38(6), 533–539. (p. 544)

Kaiser, M. K., Proffitt, D. R., Whelan, S. M., & Hecht, H. (1992). Influence of ani­

mation on dynamical judgments. Journal of Experimental Psychology: Human Perception and Performance, 18(3), 669–690. Keil, F. C. (2003). Folkscience: Coarse interpretations of a complex reality. Trends in Cog­ nitive Sciences, 7(8), 368–373. Keil, F. C. (2012). Running on empty? How folk science gets by with less. Current Direc­ tions in Psychological Science, 21(5), 329–334. Kelley, H. H. (1972). Causal schemata and the attribution process. New York: General Learning Press. Kleer, J. D., & Brown, J. S. (1984). A qualitative physics based on confluences. In Qualita­ tive reasoning about physical systems (pp. 7–83). Elsevier. http://dx.doi.org/10.1016/ b978-0-444-87670-6.50005-4. doi: 10.1016/b978-0-444-87670-6.50005-4 Kleiman-Weiner, M., Gerstenberg, T., Levine, S., & Tenenbaum, J. B. (2015). Inference of intention and permissibility in moral decision making. In D. C. Noelle et al. (Eds.), Pro­ ceedings of the 37th annual conference of the Cognitive Science Society (pp. 1123–1128). Austin, TX: Cognitive Science Society. Kovács, Á. M., Téglás, E., & Endress, A. D. (2010). The social sense: Susceptibility to oth­ ers’ beliefs in human infants and adults. Science, 330(6012), 1830–1834. Kozhevnikov, M., & Hegarty, M. (2001). Impetus beliefs as default heuristics: Dissociation between explicit and implicit knowledge about motion. Psychonomic Bul letin & Review, 8(3), 439–453. Krynski, T. R., & Tenenbaum, J. B. (2007). The role of causality in judgment under uncer­ tainty. Journal of Experimental Psychology: General, 136(3), 430–450. Page 48 of 57

Intuitive Theories Kuhn, T. S. (1996). The structure of scientific revolutions. Chicago: University of Chicago Press. Kuhnmünch, G., & Beller, S. (2005). Distinguishing between causes and enabling condi­ tions-through mental models or linguistic cues? Cognitive Science, 29(6), 1077–1090. Lagnado, D. A. (2011). Causal thinking. In P. M. Illari, F. Russo, & J. Williamson (Eds.), Causality in the sciences (pp. 129–149). Oxford: Oxford University Press. Lagnado, D. A., Fenton, N., & Neil, M. (2013). Legal idioms: A framework for evidential reasoning. Argument & Computation, 4(1), 46–63. Lagnado, D. A., Gerstenberg, T., & Zultan, R. (2013). Causal responsibility and counter­ factuals. Cognitive Science, 47, 1036–1073. Lagnado, D. A., & Sloman, S. (2004). The advantage of timely intervention. Journal of Ex­ perimental Psychology: Learning, Memory, and Cognition, 30(4), 856–876. Lagnado, D. A., & Sloman, S. A. (2006). Time as a guide to cause. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32(3), 451–460. Lagnado, D. A., Waldmann, M. R., Hagmayer, Y., & Sloman, S. A. (2007). Beyond covaria­ tion. In A. Gopnik & L. Schulz (Eds.), Causal learning: Psychology, philosophy, and compu­ tation (pp. 154–172). Oxford: Oxford University Press. Levesque, H. J., Davis, E., & Morgenstern, L. (2011). The Winograd schema challenge. In AAAI spring symposium: Logical formalizations of commonsense reasoning, 46, 47–52. Levillain, F., & Bonatti, L. L. (2011). A dissociation between judged causality and imag­ ined locations in simple dynamic scenes. Psychological Science, 22(5), 674–681. Lewis, D. (1973). Causation. The Journal of Philosophy, 70(17), 556–567. Lewis, D. (1979). Counterfactual dependence and time’s arrow. Noûs, 13(4), 455–476. Lewis, D. (1986). Postscript C to “Causation”: (Insensitive causation). In Philosophical pa­ pers (Vol. 2). Oxford: Oxford University Press. Lewis, D. (2000). Causation as influence. The Journal of Philosophy, 97(4), 182–197. Liang, P., Jordan, M. I., & Klein, D. (2010). Learning programs: A hierarchical Bayesian approach. In Proceedings of the 27th international conference on machine learning (ICML-10) (pp. 639–646). Lipe, M. G. (1991). Counterfactual reasoning as a framework for attribution theories. Psy­ chological Bulletin, 109(3), 456–471. Lombrozo, T. (2010). Causal-explanatory pluralism: How intentions, functions, and mecha­ nisms influence causal ascriptions. Cognitive Psychology, 61(4), 303–332. Page 49 of 57

Intuitive Theories Lombrozo, T. (2012). Explanation and abductive inference. Oxford: Oxford University Press. Lucas, C., Griffiths, T. L., Xu, F., & Fawcett, C. (2009). A rational model of preference learning and choice prediction by children. Advances in Neural Information Processing Systems, 21, 985–992. Mackie, J. L. (1974). The cement of the universe. Oxford: Clarendon Press. Malle, B. F. (1999). How people explain behavior: A new theoretical framework. Personali­ ty and Social Psychology Review, 3(1), 23–48. Malle, B. F. (2008). Fritz heider’s legacy: Celebrated insights, many of them misunder­ stood. Social Psychology, 39(3), 163–173. Malle, B. F. (2011). Time to give up the dogmas of attribution: An alternative theory of be­ havior explanation. Advances in Experimental Social Psychology, 44, 297–352. Malle, B. F., Knobe, J., O’Laughlin, M. J., Pearce, G. E., & Nelson, S. E. (2000). Conceptual structure and social functions of behavior explanations: Beyond person–situation attribu­ tions. Journal of Personality and Social Psychology, 79(3), 309–326. Mandel, D. R. (2003). Judgment dissociation theory: An analysis of differences in causal, counterfactual and covariational reasoning. Journal of Experimental Psychology: General, 132(3), 419–434. Marcus, G. F., & Davis, E. (2013). How robust are probabilistic models of higher-level cog­ nition? Psychological Science, 24(12), 2351–2360. McCloskey, M. (1983). Naive theories of motion. In D. Gentner, & A. L. Stevens (Eds.), Mental models (pp. 299–324). Erlbaum. McCloskey, M., Caramazza, A., & Green, B. (1980). Curvilinear motion in the absence of external forces: Naïve beliefs about the motion of objects. Science, 210(4474), 1138– 1141. McCloskey, M., Washburn, A., & Felch, L. (1983). Intuitive physics: The straight-down be­ lief and its origin. Journal of Experimental Psychology: Learning, Memory, and Cognition, 9(4), 636–649. Meder, B., Gerstenberg, T., Hagmayer, Y., & Waldmann, M. R. (2010). Observing and in­ tervening: Rational and heuristic models of causal decision making. Open Psychology Journal, 3, 119–135. Michotte, A. (1946/1963). The perception of causality. New York: Basic Books. Mills, C. M., & Keil, F. C. (2004). Knowing the limits of one’s understanding: The development of an awareness of an illusion of explanatory depth. Journal of Experimental Child Psychology, 87(1), 1–32. (p. 545)

Page 50 of 57

Intuitive Theories Mischel, W. (2004). Toward an integrative science of the person. Annual Review of Psy­ chology, 55, 1–22. Mitchell, J. P. (2006). Mentalizing and Marr: An information processing approach to the study of social cognition. Brain Research, 1079(1), 66–75. Mochon, D., & Sloman, S. A. (2004). Causal models frame interpretation of mathematical equations. Psychonomic Bulletin & Review, 11(6), 1099–1104. Muentener, P., Friel, D., & Schulz, L. (2012). Giving the giggles: Prediction, intervention, and young children’s representation of psychological events. PloS One, 7(8), e42495. Newell, A., Shaw, J. C., & Simon, H. A. (1958). Elements of a theory of human problem solving. Psychological Review, 65(3), 151–166. Nickerson, R. S. (1998). Confirmation bias: A ubiquitous phenomenon in many guises. Re­ view of General Psychology, 2(2), 175–220. Ojalehto, B., Waxman, S. R., & Medin, D. L. (2013). Teleological reasoning about nature: intentional design or relational perspectives?. Trends in Cognitive Sciences, 17(4), 166– 171. Paul, L. A., & Hall, N. (2013). Causation: A user’s guide. Oxford: Oxford University Press. Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible infer­ ence. San Francisco, CA: Morgan Kaufmann. Pearl, J. (1999). Probabilities of causation: Three counterfactual interpretations and their identification. Synthese, 121(1–2), 93–149. Pearl, J. (2000). Causality: Models, reasoning and inference. Cambridge: Cambridge Uni­ versity Press. Perner, J., Leekam, S. R., & Wimmer, H. (1987). Three-year-olds’ difficulty with false be­ lief: The case for a conceptual deficit. British Journal of Developmental Psychology, 5(2), 125–137. Piantadosi, S. T., Tenenbaum, J. B., & Goodman, N. D. (2012). Bootstrapping in a language of thought: A formal model of numerical concept learning. Cognition, 123(2), 199–217. Premack, D. (1990). The infant’s theory of self-propelled objects. Cognition, 36(1), 1–16. Rottman, B. M., & Hastie, R. (2013). Reasoning about causal relationships: Inferences on causal networks. Psychological Bulletin. Rottman, B. M., & Keil, F. C. (2012). Causal structure learning over time: Observations and interventions. Cognitive Psychology, 64(1), 93–125.

Page 51 of 57

Intuitive Theories Rule, J., Dechter, E., & Tenenbaum, J. B. (2015). Representing and learning a large system of number concepts with latent predicate networks. In D. C. Noelle et al. (Eds.), Proceed­ ings of the 37th annual conference of the Cognitive Science Society (pp. 2051–2056). Austin, TX: Cognitive Science Society. Rumelhart, D. E., & McClelland, J. L. (1988). Parallel distributed processing. Cambridge, MA: MIT Press. Sagi, E., & Rips, L. J. (2014). Identity, causality, and pronoun ambiguity. Topics in Cogni­ tive Science, 6(4), 663–680. Salmon, W. C. (1984). Scientific explanation and the causal structure of the world. Prince­ ton, NJ: Princeton University Press. Sanborn, A. N., Mansinghka, V. K., & Griffiths, T. L. (2013). Reconciling intuitive physics and newtonian mechanics for colliding objects. Psychological Review, 120(2), 411–437. Saxe, R. (2005). Against simulation: The argument from error. Trends in Cognitive Sciences, 9(4), 174–179. Saxe, R., Carey, S., & Kanwisher, N. (2004). Understanding other minds: Linking develop­ mental psychology and functional neuroimaging. Annual Review of Psychology, 55, 87– 124. Saxe, R., Tenenbaum, J., & Carey, S. (2005). Secret agents: Inferences about hidden caus­ es by 10- and 12-month-old infants. Psychological Science, 16(12), 995–1001. Schächtele, S., Gerstenberg, T., & Lagnado, D. A. (2011). Beyond outcomes: The influence of intentions and deception. In L. Carlson, C. Hölscher, & T. Shipley (Eds.), Proceedings of the 33rd Annual Conference of the Cognitive Science Society (pp. 1860–1865). Austin, TX: Cognitive Science Society. Schaffer, J. (2005). Contrastive causation. The Philosophical Review, 114(3), 327–358. Schank, R. C., & Abelson, R. P. (1977). Scripts, plans, goals and understanding: An inquiry into human knowledge structures. Hillsdale, NJ: Lawrence Erlbaum Associates. Schlottmann, A. (1999). Seeing it happen and knowing how it works: How children under­ stand the relation between perceptual causality and underlying mechanism. Developmen­ tal Psychology, 35, 303–317. Schulz, L. (2012). The origins of inquiry: Inductive inference and exploration in early childhood. Trends in Cognitive Sciences, 16(7), 382–389. Schulz, L. E., Goodman, N. D., Tenenbaum, J. B., & Jenkins, A. C. (2008). Going beyond the evidence: Abstract laws and preschoolers’ responses to anomalous data. Cognition, 109(2), 211–223.

Page 52 of 57

Intuitive Theories Schwartz, D. L., & Black, J. B. (1996). Analog imagery in mental model reasoning: Depic­ tive models. Cognitive Psychology, 30(2), 154–219. Scott, R. M., & Baillargeon, R. (2013). Do infants really expect agents to act efficiently? A critical test of the rationality principle. Psychological Science, 24(4), 466–474. Shafto, P., Goodman, N. D., & Frank, M. C. (2012). Learning from others the conse­ quences of psychological reasoning for human learning. Perspectives on Psychological Science, 7(4), 341–351. Shanks, D. R., & Dickinson, A. (1987). Associative accounts of causality judgment. In The psychology of learning and motivation (Vol. 21, pp. 229–261). San Diego, CA: Academic Press. Shanon, B. (1976). Aristotelianism, Newtonianism and the physics of the layman. Percep­ tion, 5(2), 241–243. Shultz, T. R. (1982). Rules of causal attribution. Monographs of the Society for Research in Child Development, 47(1), 1–51. Simons, D. J. (2000). Attentional capture and inattentional blindness. Trends in Cognitive Sciences, 4(4), 147–155. Sloman, S. A. (2005). Causal models: How people think about the world and its alterna­ tives. New York: Oxford University Press. Sloman, S. A., Barbey, A. K., & Hotaling, J. M. (2009). A causal model theory of the mean­ ing of cause, enable, and prevent. Cognitive Science, 33(1), 21–50. Sloman, S. A., Fernbach, P. M., & Ewing, S. (2012). A causal model of intentionality judg­ ment. Mind and Language, 27(2), 154–180. Sloman, S. A., & Hagmayer, Y. (2006). The causal psycho-logic of choice. Trends in Cogni­ tive Sciences, 10(9), 407–412. Sloman, S. A., & Lagnado, D. A. (2005). Do we “do”? Cognitive Science, 29(1), 5–39. Slovic, P., Fischhoff, B., & Lichtenstein, S. (1977). Behavioral decision theory. Annual Re­ view of Psychology, 28(1), 1–39. (p. 546)

Smith, K. A., Battaglia, P., & Vul, E. (2013). Consistent physics underlying ballistic

motion prediction. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceed­ ings of the 35th Annual Conference of the Cognitive Science Society (pp. 3426–3431). Austin, TX: Cognitive Science Society. Smith, K. A., Dechter, E., Tenenbaum, J., & Vul, E. (2013). Physical predictions over time. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th annu­

Page 53 of 57

Intuitive Theories al meeting of the Cognitive Science Society (pp. 1342–1347). Austin, TX: Cognitive Science Society. Smith, K. A., & Vul, E. (2013). Sources of uncertainty in intuitive physics. Topics in Cogni­ tive Science, 5(1), 185–199. Smith, K. A., & Vul, E. (2014). Looking forwards and backwards: Similarities and differ­ ences in prediction and retrodiction. In P. Bello, M. Guarini, M. McShane, & B. Scassellati (Eds.), Proceedings of the 36th Annual Conference of the Cognitive Science Society (pp. 1467–1472). Austin, TX: Cognitive Science Society. Sodian, B., Schoeppner, B., & Metz, U. (2004). Do infants apply the principle of rational action to human agents? Infant Behavior and Development, 27(1), 31–41. Spelke, E. S. (1990). Principles of object perception. Cognitive Science, 14(1), 29–56. Spelke, E. S., Breinlinger, K., Macomber, J., & Jacobson, K. (1992). Origins of knowledge. Psychological Review, 99(4), 605–632. Steyvers, M., Tenenbaum, J. B., Wagenmakers, E.-J., & Blum, B. (2003). Inferring causal networks from observations and interventions. Cognitive Science, 27(3), 453–489. Stich, S., & Nichols, S. (1992). Folk psychology: Simulation or tacit theory? Mind & Lan­ guage, 7(1–2), 35–71. Strevens, M. (2013, jul). Causality reunified. Erkenntnis, 78(S2), 299–320. Stuhlmüller, A., Tenenbaum, J. B., & Goodman, N. D. (2010). Learning structured genera­ tive concepts. In S. Ohlsson & R. Catrambone (Eds.), Proceedings of the 32nd Annual Conference of the Cognitive Science Society (pp. 2296–2301). Austin, TX: Cognitive Science Society. Suppes, P. (1970). A probabilistic theory of causation. Amsterdam: North-Holland. Talmy, L. (1988). Force dynamics in language and cognition. Cognitive Science, 12(1), 49– 100. Téglás, E., Vul, E., Girotto, V., Gonzalez, M., Tenenbaum, J. B., & Bonatti, L. L. (2011). Pure reasoning in 12-month-old infants as probabilistic inference. Science, 332(6033), 1054–1059. Tenenbaum, J. B., Griffiths, T. L., & Kemp, C. (2006). Theory-based Bayesian models of in­ ductive learning and reasoning. Trends in Cognitive Sciences, 10(7), 309–318. Tenenbaum, J. B., Griffiths, T. L., & Niyogi, S. (2007). Intuitive theories as grammars for causal inference. In A. Gopnik, & L. E. Schulz (Eds.), Causal learning: Psychology, philoso­ phy, and computation (pp. 301–322). Oxford: Oxford University Press.

Page 54 of 57

Intuitive Theories Tenenbaum, J. B., Kemp, C., Griffiths, T. L., & Goodman, N. D. (2011). How to grow a mind: Statistics, structure, and abstraction. Science, 331(6022), 1279–1285. Todorov, E. (2004, sep). Optimality principles in sensorimotor control. Nature Neuro­ science, 7(9), 907–915. Tremoulet, P. D., & Feldman, J. (2006). The influence of spatial context and the role of in­ tentionality in the interpretation of animacy from motion. Attention, Perception, & Psy­ chophysics, 68(6), 1047–1058. Tremoulet, P. D., Feldman, J., et al. (2000). Perception of animacy from the motion of a single object. Perception, 29(8), 943–952. Uleman, J. S., Adil Saribay, S., & Gonzalez, C. M. (2008). Spontaneous inferences, implicit impressions, and implicit theories. Annual Review of Psychology, 59, 329–360. Ullman, T. D., Stuhlmüller, A., Goodman, N. D., & Tenenbaum, J. B. (2014). Learning physics from dynamical scenes. In P. Bello, M. Guarini, M. McShane, & B. Scassellati (Eds.), Proceedings of the 36th Annual Conference of the Cognitive Science Society (pp. 1640–1645). Austin, TX: Cognitive Science Society. Ullman, T. D., Tenenbaum, J. B., Baker, C. L., Macindoe, O., Evans, O. R., & Goodman, N. D. (2009). Help or hinder: Bayesian models of social goal inference. In Advances in neural information processing systems, 22, 1874–1882. Waldmann, M. R., & Hagmayer, Y. (2005). Seeing versus doing: Two modes of accessing causal knowledge. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31(2), 216–227. Waldmann, M. R., & Holyoak, K. J. (1992). Predictive and diagnostic learning within causal models: Asymmetries in cue competition. Journal of Experimental Psychology: Gen­ eral, 121(2), 222–236. Walsh, C. R., & Sloman, S. A. (2011). The meaning of cause and prevent: The role of causal mechanism. Mind & Language, 26(1), 21–52. Wellman, H. M. (2011). The child’s theory of mind. Cambridge, MA: MIT Press. Wellman, H. M., Cross, D., & Watson, J. (2001). Meta-analysis of theory-of-mind develop­ ment: The truth about false belief. Child Development, 72(3), 655–684. Wellman, H. M., & Gelman, S. A. (1992). Cognitive development: Foundational theories of core domains. Annual Review of Psychology, 43(1), 337–375. Wells, G. L., & Gavanski, I. (1989). Mental simulation of causality. Journal of Personality and Social Psychology, 56(2), 161–169. White, P. A. (1990). Ideas about causation in philosophy and psychology. Psychological Bulletin, 108(1), 3–18. Page 55 of 57

Intuitive Theories White, P. A. (2006). The causal asymmetry. Psychological Review, 113(1), 132–147. White, P. A. (2009). Perception of forces exerted by objects in collision events. Psychologi­ cal Review, 116(3), 580–601. White, P. A. (2012a). The impetus theory in judgments about object motion: A new per­ spective. Psychonomic Bulletin & Review, 19(6), 1007–1028. White, P. A. (2012b). Visual impressions of causality: Effects of manipulating the direction of the target object’s motion in a collision event. Visual Cognition, 20(2), 121–142. Williamson, J. (2006). Causal pluralism versus epistemic causality. Philosophica, 77, 69– 96. Wimmer, H., & Perner, J. (1983). Beliefs about beliefs: Representation and constraining function of wrong beliefs in young children’s understanding of deception. Cognition, 13(1), 103–128. Wolff, P. (2003). Direct causation in the linguistic coding and individuation of causal events. Cognition, 88(1), 1–48. Wolff, P. (2007). Representing causation. Journal of Experimental Psychology: General, 136(1), 82–111. Wolff, P., Barbey, A. K., & Hausknecht, M. (2010). For want of a nail: How absences cause events. Journal of Experimental Psychology: General, 139(2), 191–221. Woodward, J. (2003). Making things happen: A theory of causal explanation. Oxford: Ox­ ford University Press. Woodward, J. (2006). Sensitive and insensitive causation. The Philosophical Re­ view, 115(1), 1–50. (p. 547)

Woodward, J. (2011). Mechanisms revisited. Synthese, 183(3), 409–427. Yates, J., Bessman, M., Dunne, M., Jertson, D., Sly, K., & Wendelboe, B. (1988). Are con­ ceptions of motion based on a naive theory or on prototypes? Cognition, 29(3), 251–275. Ybarra, O. (2002). Naive causal understanding of valenced behaviors and its implications for social information processing. Psychological Bulletin, 128(3), 421–441. Yoshida, W., Dolan, R. J., & Friston, K. J. (2008). Game theory of mind. PLoS Computation­ al Biology, 4(12), e1000254. Zacks, J. M. (2004). Using movement and intentions to understand simple events. Cogni­ tive Science, 28(6), 979–1008. Zago, M., & Lacquaniti, F. (2005). Cognitive, perceptual and action-oriented representa­ tions of falling objects. Neuropsychologia, 43(2), 178–188. Page 56 of 57

Intuitive Theories Zultan, R., Gerstenberg, T., & Lagnado, D. A. (2012). Finding fault: Counterfactuals and causality in group attributions. Cognition, 125(3), 429–440. (p. 548)

Notes: (1.) There is also evidence that the way in which novices and experts utilize their intuitive understanding differs. In a recent eye-tracking study (Crespi, Robino, Silva, & de’Sperati, 2012), novices and experts saw video clips of a pool player making a shot. The clip was paused at some point, and participants were then asked to judge whether the ball is go­ ing to hit a skittle at the center of the table. Novices’ eye movements kept following the path that they predicted the ball would take in an analogous manner. Experts’ eyes, in contrast, saccaded quickly from one key spot (e.g., where the ball got struck) to another (e.g., where the ball hits the cushion). (2.) Wolff’s (2007) force dynamics model does not distinguish between “helped” and “en­ abled.”

Tobias Gerstenberg

Department of Brain and Cognitive Sciences Massachusetts Institute of Technology Cambridge, Massachusetts, USA Joshua B. Tenenbaum

Department of Brain and Cognitive Sciences Massachusetts Institute of Technology Cambridge, Massachusetts, USA

Page 57 of 57

Space, Time, and Causality

Space, Time, and Causality   Marc J. Buehner The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.29

Abstract and Keywords This chapter explores how the understanding of causality relates to the understanding of space and time. Traditionally, spatiotemporal contiguity is regarded as a cue toward causality. While concurring with this view, this chapter also reviews some boundary condi­ tions of this approach. Moreover, temporal information goes beyond merely helping to identify causal relations; it also shapes the types of causal inferences that reasoners draw. Recent developments further show that the relation between time and causality is bi-directional: not only does temporal information shape and guide causal inferences, but once one holds a causal belief, one’s perception of time and space is distorted such that cause and effect appear closer in space-time. Spatiotemporal contiguity thus supports causal beliefs, which in turn foster impressions of contiguity. Keywords: contiguity, causality, time, spatiotemporal, temporal

The fundamental problem of causal induction is how we can know that one thing causes another in light of the fact that there is no direct evidence available to our sensory sys­ tem that can assure us of the presence of causal relations. Many of the approaches re­ viewed in the section “Time as a Guide to Cause” in this chapter offer various solutions to this problem, and several of these explicitly or implicitly trace their heritage back to the great Scottish empiricist philosopher David Hume. Hume (1888) famously observed that three empirical cues give rise to causal impressions in the mind: (1) the candidate cause precedes its effect, (2) there is a constant conjunction between candidate cause and ef­ fect, and (3) the effect is contiguous in space and time to the candidate. Of these three cues, the first one, temporal priority, has largely been taken as self-evident, and has not been studied extensively (but see Rohde & Ernst, 2013; Rohde, Greiner, & Ernst, 2014; Stetson, Cui, Montague, & Eagleman, 2006). In contrast, the second cue—usually re­ ferred to as contingency—has inspired a great deal of research, much of which has been expertly introduced and reviewed earlier in this volume. In fact, the at times fierce de­ bate of whether, when, and how exactly contingency gives rise to causal knowledge has meant that comparatively less effort has been expended to explore Hume’s third conjec­ ture: that causes are contiguous in space and time with their effects. This chapter will ex­ Page 1 of 24

Space, Time, and Causality amine evidence pertaining to this notion, as well as relatively recent developments that suggest that the relationship between contiguity and causality is bi-directional: not only does contiguity in space-time frequently signal the existence of a causal relation, but once we hold the belief that two events are causally related, we perceive them to be close to one another in space-time.

Time as a Guide to Cause Temporal Contiguity and Contingency Intensive and systematic investigation of the rules governing how humans make the men­ tal leap from empirical evidence to mental representations of causation began in the 1980s, and was inspired by (p. 550) the possibility that human causal learning operates according to the same principles that guide conditioning in non-human animals (Dickin­ son, 2001; Shanks, 1985, 1987; Shanks & Dickinson, 1988; Wasserman, Elek, Chatlosh, & Baker, 1993; see also Le Pelley, Griffiths, and Beesley, Chapter 2 in this volume). As hint­ ed at earlier, the bulk of this work was trying to establish the relationship between the de­ gree of contingency and the strength of the resulting causal impression. Buoyed by the apparent correspondence between human causal judgments and predictions of associa­ tive learning theories, Shanks and Dickinson began to explore whether causal learning is also sensitive to temporal contiguity, another principle of associative learning. Using a simple instrumental learning task, they found that introducing a delay of more than two seconds between a causal action and its consequence completely eliminated participants’ ability to distinguish situations where their actions produced an outcome 75% of the time from situations where their actions had no control over the outcome. In a totally different domain—that of perceptual causality—Albert Michotte (1963) had already found that in­ troducing a delay between collision and subsequent motion onset destroys the otherwise irresistible impression of causality associated with so-called launching displays (see White, Chapter 14 in this volume). In line with David Hume’s conjecture (1888), it thus appeared that temporal contiguity indeed is an essential cue to causation. Earlier work in developmental psychology had already tried to establish which of the two Humean cues—contingency or contiguity—is more fundamental for causal learning: Mendelson and Shultz (1976), for example, presented 4- to 7-year-old children with a sim­ ple causal system, in which one potential cause consistently but non-contiguously covar­ ied with an effect, and contrasted this with another possible cause, which covaried incon­ sistently with the effect, but did so in a contiguous manner. More specifically, children ob­ served an experimenter dropping marbles into two different holes of a box, and this was sometimes followed by a bell ringing. The evidence was structured such that whenever a marble was dropped in one hole (A), the bell rang five seconds later. On some trials, the experimenter first dropped a marble in A, waited, and then dropped another marble in the other hole (B), and then the bell rang immediately. Lastly, there were also trials where the experimenter just dropped a marble in B, and there was no bell ring. The children were told that only one of the actions (dropping a marble into hole A or B) makes the bell Page 2 of 24

Space, Time, and Causality ring, and, using a series of different questions, were asked to indicate which action they thought causes the bell to ring. Results showed that children preferred to attribute causality to the contiguous, but non-consistent cause (B), thus suggesting that early on in development, contiguity is a more important cue to causality than contingency. However, this result was reversed in a condition where the apparatus was modified to provide a ra­ tionale for the delay: when children saw that there were two boxes, separated from each other, so that the one with the holes was positioned higher up and connected with a piece of rubber tubing to the other emitting the bell sound, they preferred to attribute causality to the consistent, but non-contiguous cause. This suggests that temporal contiguity inter­ acts with awareness of causal mechanism in determining causal judgment: in the pres­ ence of a plausible mechanism for why the bell ring might be delayed, the contingent cause was selected, but in the absence of such a mechanism, the immediate cause was preferred, even though it correlated with the cause only imperfectly. The role of mechanism in causal inference is explored in detail by Johnson and Ahn (Chapter 8 in this volume). For our purposes here, we might ask whether an interaction between knowledge of causal mechanism and (the absence of) temporal contiguity (as ob­ served by Mendelson & Shultz, 1976) also underlies the results Michotte (1963) and Shanks et al. (1989) have obtained, which suggested that human reasoners cannot recog­ nize delayed causal relations. The perceptual causality displays used by Michotte resem­ bled collisions between physical objects (although the original stimuli used by Michotte were squares, subsequent research often compared his launching displays to colliding bil­ liard balls, e.g., Blakemore et al., 2001). Because we have experience with objects collid­ ing and launching each other (even if we do not play billiard), we would bring our (naïve) knowledge of the physical laws governing collision events to the task, and thus expect that a launch should follow a collision instantaneously. A delayed launch, in contrast, vio­ lates the laws of physics, and thus cannot be causal (instead, the second object must have moved on its own accord). In Shanks et al.’s study, adults had to press the space bar on a computer, which resulted in a triangle flashing on the screen according to a probabilistic schedule, and after the programmed delay. Computers are of course a lot faster now than they were 25 years ago when (p. 551) Shanks et al. conducted their research, but even then, people might have expected that if pressing a key on a computer keyboard has any effect, it would manifest itself right away, a fact that Shanks et al. explicitly acknowledge in their paper. Furthermore, their participants might have interpreted the rating instruc­ tions to ask to what extent the space bar makes the triangle flash immediately. Had par­ ticipants known that—contrary to their expectations and assumptions about electronics— in some conditions key presses were not effective until a few seconds later, the results might well have found greater tolerance for delay. My earlier research (Buehner & May, 2002, 2003, 2004) suggests that this line of reason­ ing is correct: when participants know that there might be a delay between cause and ef­ fect, the detrimental effect of delay is considerably reduced, or even abolished. In essence, providing participants with knowledge of, or a rationale for, potential delays en­ ables them to bridge temporal gaps (cf. Einhorn & Hogarth, 1986). Importantly, the de­ fault expectation is for causal relations to be contiguous, in line with Hume’s (1888) Page 3 of 24

Space, Time, and Causality conjectures, and contiguity (when paired with contingency) is a powerful cue to causality. We exposed participants to a computerized scenario based on Shanks et al. (1989), and told them to imagine that the causal mechanism involves a delay (clicking on a light switch may cause an energy-saving light bulb to illuminate1). As alluded to before, causal judgments from participants who received these instructions and experienced four-sec­ ond cause–effect delays while interacting with the computer were just as high as judg­ ments from another group of participants who were not instructed about delays and wit­ nessed immediate cause–effect pairings: providing a rationale for the delay removed its adverse effect on causal learning. However, in that paper (Buehner & May, 2004) we also found that participants who were told to expect delays (i.e., with energy-saving light bulbs) but experienced contiguous ac­ tion–outcome pairings provided just as high ratings of causal efficacy as participants who expected (and experienced) immediacy. In other words, experiencing a consistent and contiguous pairing between cause and effect was sufficient to create a causal impression, regardless of whether participants were led to believe in a delayed or immediate causal mechanism. One interpretation of this is that the delay-instructions were not credible enough, particularly given that the experiment was presented on a computer: it was evi­ dent to participants that the instructions were just that, and that the pairing between their button presses and the outcome on the computer screen was determined by the computer. Moreover, these participants experienced a direct and strong contingency be­ tween their actions and the effect, and it would have been irrational to deny that (by giv­ ing a low causal rating) in the context of interacting with a computer. Does this mean that experienced contiguity always trumps prior belief in a particular (de­ layed) time frame? Not necessarily. Investigating this question in a credible manner re­ quires a different experimental paradigm, one where belief in the necessity of cause–ef­ fect delays can be credibly instilled, perhaps not unlike the marble setup used in Mendel­ son and Shultz’s (1976) developmental study. Buehner and McGregor (2006) presented participants with a physical apparatus inspired by this research: a marble-run in which a marble was inserted on the top, rolled along one of four possible paths, and triggered one of four switches before exiting on the bottom. The switches were connected to a light at the front of the apparatus, such that the marble rolling over them could turn the light on. Participants were told that in different conditions, one, two, three, or all four of the switches might be live, thus creating a probabilistic causal relation between marbles be­ ing inserted into the apparatus and the light coming on. In addition, the experimenter could vary the tilt of the marble-run, so that the ball either traversed it in a short or longer period of time. Unbeknownst to the participants, there was also a hidden switch on the top of the marble-run, which enabled the computer to schedule the timing of the light dependent on marble insertion. After a thorough exploration phase, the apparatus was covered with a felt cloth, so that participants could only see when the experimenter in­ serted the marble, and whether and when the light turned on, but not which switch was triggered by the marble, and when. This, combined with the hidden switch on top, en­ abled us to provide all participants with standardized probabilistic evidence about the candidate cause (marble insertion) and the effect (light). Covering the apparatus ensured Page 4 of 24

Space, Time, and Causality that participants could not base their answers on direct observation of the causal mecha­ nism, but instead had to draw inferences (as they would when judging how effective pressing a key is to make something happen on the computer screen). Using this experi­ mental setup, we found that short cause–effect intervals elicited high causal ratings only when participants had a belief that the causal mechanism in question involved a short time (p. 552) frame (i.e., the marble-run was tilted steeply); when paired with a shallow tilt, suggestive of a longer delay between marble insertion and triggering one of the po­ tential switches, short cause–effect intervals did not give rise to strong impressions of causality. Instead, participants in this condition appeared to attribute the outcome to al­ ternative causes (they were told that sometimes the computer activates the light indepen­ dent of whether or when a marble is inserted). In sum, Buehner and McGregor (2006) could show that when reasoners assume or know the causal mechanism to be slow, expe­ riencing temporal delays facilitates causal attribution, while contiguity impairs it. Whether temporal contiguity promotes or hinders causal learning thus depends on people’s expectations about the causal mechanism in question. Buehner and McGregor’s (2006) results notwithstanding, it is probably fair to assume that in the absence of specific mechanism beliefs, contiguous cause–effect pairings are stronger indicators of causality than delayed pairings. This is true from a purely computa­ tional perspective: contiguous pairings are easier to identify than delayed ones for two reasons. First, delays mean that knowledge about events has to be held in memory for longer, introducing the possibility of memory decay. Second, and probably more impor­ tant, the longer the cause–effect delay, the greater the probability that other (genuine or spurious) causes occur in the inter-event interval, thus increasing competition for ex­ planatory power and obscuring the evidence for the relation. In other words, temporal contiguity makes it easy to discover causal contingencies, while delays hinder the process. Any learning mechanism that tries to extract contingency information from the environ­ ment has to parse the evidence into discrete units of observation. Otherwise, it is not pos­ sible to compute contingencies in the first place: what constitutes a co-occurrence of a candidate cause and an effect of interest must be defined. Usually, learning theories adopt the notion of discrete learning trials—that continuous time is carved into discrete units of time during which stimuli (or responses) and outcomes do or do not co-occur (for example, see Pearce, 1987; Pearce & Hall, 1980; Rescorla & Wagner, 1972; see also Le Pelley, Griffiths, and Beesley, Chapter 2 in this volume). However, it is by no means cer­ tain that organisms will parse the continuous stream of evidence in the same way that an experimenter intended them to, and so we need to be careful when applying contingencybased approaches to causation to data extracted from continuous time, a point which I will explore in more depth in the next section in the context of described events (for a de­ tailed discussion of this problem, and alternative approaches to conceptualize learning, see Gallistel, 1990; Gallistel & Gibbon, 2000).

Page 5 of 24

Space, Time, and Causality

Temporal Information in Described Situations The impact of experienced temporal contiguity on causal inference is highly intuitive, es­ pecially when one considers the computational considerations drawn out in the previous paragraph. Sometimes, however, temporal information is available in non-experiential form, for example in the form of event descriptions. In such situations, the computational considerations described above may not necessarily hold, because the reasoner would not need to hold information in memory over extended periods of time, as all the relevant in­ formation is available concurrently. Would temporal information in such situations still impact causal inference? Two lines of evidence show that it does. Hagmayer and Waldmann (2002) presented participants with a causal induction task that required them to decide whether interventions against mosquito plague in two communi­ ties were successful. Information about whether or not an intervention had taken place in a given year, as well as whether the community suffered from mosquito plague that year, was provided for a 20-year cycle in tabular format for two different communities. The da­ ta for the two communities were identical, but the authors had rearranged the columns in such a way that this was not obvious to participants. Furthermore, the data were struc­ tured in such a way that the contingency between treatment and mosquito plague was slightly positive when considered within the same year (there was a plague in 5 of 8 years when the treatment was applied, and when no treatment was applied, there was a plague in 6 of 12 years, yielding an overall contingency of .125). However, that same data, when considered over a slightly longer time frame (from one year to the next) yielded a moder­ ately negative contingency (there was a plague in only 3 of 8 years following an interven­ tion, and in 8 of 12 years following no intervention, yielding an overall contingency of –. 292; see Figure 28.1 for a schematic). Importantly, Hagmayer and Waldmann told their participants that one community had decided to use an insecticide to combat the mosqui­ toes, while the other opted to plant a special flower known to support the breeding of a special kind of beetle that eats (p. 553) mosquito larvae. While time frames were not ex­ plicitly mentioned, Hagmayer and Waldmann expected that participants would assume that the insecticide would act immediately (i.e., its effect, if present, would occur within the same year as the intervention), whereas the beetle-promoting strategy would not show its effect until the following year. In line with these assumptions, participants judged the same data to indicate that the strategy was not successful when they thought they were evaluating the insecticide, and that it was effective when that same data repre­ sented the biological strategy. In other words, Hagmayer and Waldmann showed that as­ sumptions about the time frame of a causal mechanism (immediate for insecticide, de­ layed for flower-beetle) determine how statistical evidence is parsed, even when this evi­ dence is presented in tabular format and time is not part of a reasoner’s direct experi­ ence.

Page 6 of 24

Space, Time, and Causality

Figure 28.1 Structure of experimental design in Hagmayer and Waldmann (2002, Experiment 1). Adapted with permission of the authors.

In a different, but related line of work, Greville and Buehner (2007) showed that temporal information displayed in tabular formats mediates the interpretation of causal contingen­ cies. Their study asked participants to decide how effective exposure to radiation is in killing off bacterial cultures. Participants were presented with tables displaying data con­ cerning 40 bacterial cultures each in a treatment and a control group. Each bacterial cul­ ture was listed in one row per table, and different columns indicated the 5 days over which the hypothetical experiment lasted. Participants were told that all 40 cultures in the treatment group were exposed to radiation on day 1 of the study, and then were ob­ served for a total of 5 days. The death of a particular culture was indicated by an X in the table. This meant that participants could very easily tally the contingency by counting the number of Xs in the treatment group and comparing it to the number of Xs in the control group. In addition, however, the distribution of Xs across the columns of the table also provided temporal information: Greville and Buehner manipulated the tables in such a way that contiguity was either strong (Xs in the table representing the treatment group peaked near day 1), weak (Xs peaked near day 5), or neutral (Xs were randomly distrib­ uted over the 5 days). Figure 28.2 shows a sample stimulus used in the experiment. Re­ sults showed that causal ratings were influenced by the overall contingency, as well as the distribution of data across days. More specifically, conditions with identical overall contingencies gave rise to higher causal ratings when the contingency was implemented with an early peak than with a late peak. Moreover, when the overall contingency was ze­ ro (i.e., suggesting no causal influence of the rays on bacterial death), participants gave zero ratings only when the data distribution was random. When the data peaked early, they judged a zero-contingency to indicate that the rays were effective in killing off the bacteria, but when the data peaked late, they judged the same zero-contingency to indi­ cate that the rays promoted the survival of bacteria. This means that the mere advancing or postponing of an effect in time (relative to the randomly distributed control group) was attached with causal significance, even when the overall contingency remained entirely unaffected by temporal variations.

Page 7 of 24

Space, Time, and Causality Taken together, both lines of investigation (Greville & Buehner, 2007; Hagmayer & Wald­ mann, 2002) demonstrate that temporal information exerts a powerful influence on causal induction. Moreover, (p. 554) these studies have shown that the influence of time on causal inference and learning cannot simply be explained away by considerations of computational complexity, memory decay, or interference of subsequent information. In­ stead, temporal information genuinely influences causal induction in a way that models based purely on contingency assessment (and possible auxiliary assumptions concerning information processing) fail to capture.

Figure 28.2 Example of stimulus materials from Gre­ ville and Buehner (2007). Reproduced with publisher’s permission.

Temporal Order and Causal Structure Until now I have only discussed work pertaining to inferences of causal strength or power, which captures how much a cause produces (or prevents) an effect. In addition to consid­ ering power or strength, though, we also have to consider causal structure, that is, whether there exists a causal relation in the first place, and how confident we are about this (see also, in this volume, Cheng & Lu, Chapter 5; Griffiths, Chapter 7; and Rottman, Chapter 6). Time also impacts our assessment of causal structure. To appreciate this, let’s take a step back and consider causal inference based purely on statistical information (i.e., without access to knowledge of temporal patterns). In the simple case of a single candidate cause and a single effect, the relevant information thus is the degree of contin­ gency between these two events (and the relative frequencies that make up this contin­ gency). While this information is sufficient to determine whether there exists an associa­ tion between them, which is potentially indicative of a causal relation (see Cheng & Lu, Chapter 5 in this volume, for a discussion of the boundary conditions under which an as­ sociation licenses a causal inference), it cannot unequivocally indicate in which direction Page 8 of 24

Space, Time, and Causality a potential causal arrow might point. Frequently such considerations are solved by a pri­ ori knowledge of plausible mechanisms (p. 555) (cf. Johnson & Ahn, Chapter 8 in this vol­ ume, and also White, Chapter 14 in this volume): when I ask myself whether the elevator in my building is once again broken (which, alas, it seems to be on a regular basis), I con­ sider whether my pressing the call button succeeds in making the doors open, and not whether the approach of the elevator made me press the button. But many important causal considerations do not afford resolution via pre-existing mechanism knowledge. A case in point is Sir Ronald Fisher’s insistence that the correlation between smoking and lung cancer does not suggest that the former causes the latter. Instead, Fisher argued, the association is equally explainable by a genotype that disposes an individual toward lung cancer and also simultaneously causes a craving for nicotine (Pearl, 2000). A similar problem occurs in the perennial debate whether watching violence on television (or lis­ tening to aggressive music, etc.) causes violent behavior, or whether, in contrast, people with violent tendencies simply have a preference for media that portrays violence. The reason that such situations are difficult to resolve is that the observed data patterns are compatible with more than one underlying causal structure (the technical term for this is Markov equivalence; see also Rottman, Chapter 6 in this volume). The royal road toward settling such debates, of course, is controlled experimentation, which allows for interven­ tion on the variables of interest. Experimentation (or intervention) allows us to turn a variable on or off, and to observe whether the other event(s) still occur(s), thus producing unique statistical patterns that allow us to distinguish between different candidate struc­ tures. In addition to providing differential statistical information, though, intervention al­ so provides us with temporal information: we intervene and then observe the outcome of our action. Lagnado and Sloman (2004) showed that it is in fact this temporal information that drives how we reason about interventions. They did this by exposing learners to a simple threevariable system with one target cause C, and two other variables A and B. The three vari­ ables were linked by a causal chain structure, where A caused B, which in turn caused C, with both links having identical conditional probabilities P(B|A) = P(C|B) = .8. Partici­ pants were not told about this structure in advance, and were presented with covariation­ al evidence generated by the system, either in an observational learning task, or in an in­ terventional learning task. During interventional learning, participants could set the val­ ue of either A or B on a given trial and then observe the overall outcome. At the end of each learning phase, participants were asked which structure from a choice of five most likely would have generated the data. Options included the correct A→B→C chain as well as an incorrect B→A→C chain, two (also incorrect) common cause structures (A←B→C and B←A→C), and one incorrect common effect model (A→C←B). Lagnado and Sloman found that interventional learning resulted in higher correct model selection than observational learning. More important for our purposes here, they showed that if observational learn­ ing takes place in a way where the information is presented in a temporal order that is consistent with the underlying causal model (e.g., A, followed by B, followed by C), model selection is just as accurate as under interventional learning that automatically affords such a temporal order. Moreover, when the data from interventional learning are present­ Page 9 of 24

Space, Time, and Causality ed in a temporal order that is inconsistent with the underlying causal model (e.g., the par­ ticipant intervenes on A, and then after a short pause observes A and B simultaneously, followed by C), model selection is significantly impaired. This suggests that interventional learners rely on the temporal order cue to infer causal structures. It is important to note here that from a computational perspective, temporal information is not required to iden­ tify the correct causal structure: the statistical information gleaned from intervening in the causal system is sufficient to distinguish between the different models, so learners would not need to rely on temporal information. In sum, Lagnado and Sloman have shown that the key component reasoners rely on when learning from intervention is the tempo­ ral information afforded by manipulating a causal system, and that the statistical informa­ tion derived from the intervention is subordinate to it (see also Lagnado & Sloman, 2006; McCormack, Frosch, Patrick, & Lagnado, 2015). The power that temporal order information exerts over our causal reasoning was further demonstrated by Burns and McCormack (2009). They presented adults and young chil­ dren with observational data from a mechanistic three-variable system containing three devices, A, B, and C, mounted on top of a box. In contrast to Lagnado and Sloman’s (2004, 2006) preparations, however, activation of the three devices was deterministically linked, such that whenever A moved, B and C also moved. In a synchronous condition, A activated, and one second later B and C activated concurrently for one second; in the se­ quential condition, A activated, one second later, B activated for one second, and after a further (p. 556) second C activated (again for one second). Based on the statistical infor­ mation, we can conclude that the three events are causally related, but beyond that no further inference is possible. It could be that A is the common cause of B and C, that A causes B, and B causes C, or that A causes C, and C causes B. Burns and McCormack showed that adults and 6- to 7-year-olds relied on the temporal information to endorse causal structures: both groups indicated that a causal chain model was operating in the sequential condition, and that a common cause model underpinned the synchronous condition (interestingly, 4-year-olds had no clear preferences between these models). Tak­ en together, Lagnado and Sloman’s and Burns and McCormack’s evidence shows that temporal information is used to disambiguate between equally possible causal structures, and that when temporal information is at variance with statistical information, reasoners go with the temporal cues. A different take on the question of how temporal information interacts with statistical da­ ta considers whether temporal dependencies exist or not. This is best exemplified by con­ trasting between and within subjects designs: in the former, each observation (or inter­ vention) is independent of the other—the influence of having administered treatment or having observed an effect in one participant does not carry over to the next participant. However, in a within-subject preparation, such carryover effects are likely to occur: if a patient suffers from anxiety on day 1 of testing (where no treatment was given), all else being equal, she is likely to also suffer from anxiety on day 2, reflecting a pattern of tem­ poral autocorrelation. This means that the same statistical evidence can, under certain conditions, give rise to different interpretations, depending on whether the reasoner as­ sumes the data are gathered from an atemporal or a temporal context. Rottman, Page 10 of 24

Space, Time, and Causality Kominksy, and Keil (2014) showed that young children are sensitive to this distinction and exploit the additional information provided by temporal autocorrelation when it is appro­ priate to do so. Earlier in this chapter I argued that one reason that temporal delays hinder causal infer­ ences is that they increase the likelihood that other events occur in the inter-event inter­ val, and that such intervening events will then compete with the candidate cause for ex­ planatory power. The examples I reviewed in this subsection—although pertaining to structural inferences rather than the interplay of contiguity and contingency in determin­ ing causal strength—provide simple and intuitive examples of this. When participants in both Lagnado and Sloman’s (2004, 2006) or Burns and McCormack’s (2009) studies ob­ served A pause B pause C, the most natural inference that came to them was that of a causal chain A→B→C. B occurred in the interval between A and C, and thus rendered A less credible as a direct cause of C, even though (at least in Burns and McCormack’s stud­ ies) a direct causal link from A→C was equally possible based on the statistical informa­ tion. Time thus helps us to figure out which events go together.

Temporal Regularity By now we have established that causal relations unfold over time, with some causes re­ vealing their consequences sooner, while others take longer to manifest their effects, and we have learned that all things being equal, the more contiguous a causal relation is, the easier it is to learn, or the stronger the causal inference that an observer draws. But these variations in cause–effect contiguity of course almost always occur in the context of contingency: as David Hume (1888) recognized, we draw causal inferences after observ­ ing the effect following a cause repeatedly. This opens up the possibility that the cause– effect interval itself may vary; sometimes it might be short, and other times it might be longer, or it might be exactly the same every single time. Let’s illustrate this with an ex­ ample: if I go to eat at a restaurant and consider the causal relations between eating the meal and my hunger ceasing, on the one hand, and having to foot the bill for the meal af­ ter having paid with my credit card, on the other, the difference in regularity of the expe­ rienced causal time frames is obvious: hunger always ceases after the same interval fol­ lowing the meal (i.e., constituting a regular causal time frame); the delay between the meal and the money for it coming out of my account, on the other hand, varies—if I visit­ ed the restaurant at the beginning of a month, I won’t pay for the meal for another 4 weeks (I get my credit card bill at the end of each month), and if I ate near the end of the month, I might have to pay for it after just a few days. In this section, I will consider how the regularity of the time frame itself impacts our assessment of causal relations. There are two possible ways in which regularity of cause–effect time frames might impact inferred causal strength. From a reinforcement learning perspective (see Gershman, Chapter 17 in this volume) one might expect—for any specific mean temporal delay—that it might be more advantageous to experience irregular compared to regular intervals. This (p. 557) is because in reinforcement learning, the value of a reinforcer diminishes over time according to a negatively accelerated curve, such that short intervals support Page 11 of 24

Space, Time, and Causality greater learning than long intervals (this is also referred to as hyperbolic discounting; for an overview, see Green & Myerson, 2004). Crucially, the value of the reinforcer diminish­ es relatively rapidly at first, and then levels off. For example, the difference in potential reinforcement might be large when comparing intervals of 500 and 1,000 milliseconds (ms), but negligible when comparing intervals of 4,500 and 5,000 ms. A logical conse­ quence of this is that the net strength that can be gained by experiencing a combination of A trials with a delay X and A trials with delay Z is greater than the strength gained by 2 x A trials with a delay

where X > Z (see Figure 28.3). To put it directly, the combined

strength gained from experiencing a short and a long interval is higher than that gained from two intervals of medium length. On the other hand, from a computational or Bayesian perspective (cf. Rottman, Chapter 6 in this volume, and Griffiths, Chapter 7), one could argue that a regular time frame would support better causal learning than a combination of short and long intervals. The reason for this is twofold: first, a temporally regular co-occurrence will be easier to detect than a variable one. More important, temporal regularity between two events is much more like­ ly if they are linked by a causal relation compared to if they are unrelated. Put differently, experiencing temporal regularity by chance is very unlikely, and consequently temporal regularity might be seen as a further cue towards causality.

Figure 28.3 Potential differences in accrued associa­ tive strength between fixed interval and variable in­ terval conditions according to a hyperbola-like dis­ counting function of delayed events. Reproduced from Greville & Buehner (2010) with permission.

James Greville explored the impact of temporal regularity on causal induction for his PhD. In a series of experiments, he exposed participants to an instrumental learning scenario closely modeled on Shanks et al.’s (1989) seminal paradigm where pressing a key made a Page 12 of 24

Space, Time, and Causality shape on the computer screen illuminate according to a probabilistic schedule. He consis­ tently found that experiencing a regular cause–effect delay supported greater causal learning than variable intervals (whose average length corresponded to the regular com­ parison interval). Moreover, the amount of learning (i.e., how long participants engaged with the task) had no impact on this relation: it is not the case that causal relations in­ volving variable intervals simply take longer to learn (Greville & Buehner, 2010). This ad­ vantage of temporal regularity or predictability is true not only in instrumental learning contexts, but also when reasoners learn by purely observing covariational evidence un­ fold over time (Greville & Buehner, 2016).

Cause as a Guide to Time The first part of this chapter has explored the notion that contiguity promotes causal in­ ference. To summarize and simplify, we might say that one of the heuristics that guides causal inference is “If there is contiguity and contingency, then there is causality.” Conti­ guity, of course, is a relative concept. An inter-event interval of two seconds might seem perfectly normal and contiguous in a physical apparatus like the marble runs discussed in the earlier section (Buehner & McGregor, 2006; Mendelson & Shultz, 1976), but will ap­ pear unnatural when interacting with a computer (Buehner & May, 2003; Shanks et al., 1989) or when observing (simulations of) collision events (Michotte, 1963; see also White, Chapter 14 in this volume). Moreover, time, just like causality, is not subject to direct per­ ception. We have no dedicated sensory organ specifically tuned toward time perception. Instead, we perceive time in the context of other events. In fact, one might argue that time perception is a misnomer, because we in fact infer how much time has elapsed, based on a number of cues, such as the number and nature of events that occurred dur­ ing a to-be-estimated interval. In addition, temporal cognition is subject to many internal as well as environmental biases (arousal, mood, the nature of the events preceding the in­ terval in question, the interval’s content, etc.; (p. 558) for an overview, see Droit-Volet, Fayolle, & Gil, 2011; Fraisse, 1984; Meck, 2007). In this section I will consider the possi­ bility that causality is one of the biases that influence time perception. A Bayesian interpretation of the simple decision rule “if contiguity, then causality” sug­ gests that out of all inter-event experiences we might have, those that involve a causal connection between the first and the second event are more likely to be contiguous (Ea­ gleman & Holcombe, 2002). In other words, if we consider any inter-event interval, learn­ ing that the interval was a causal one increases the likelihood that this interval was also (relatively) short. Keeping in mind that time perception is indirect and noisy, it is there­ fore plausible to expect causality-induced biases in temporal cognition. The first demonstrations of such a bias came from Patrick Haggard’s laboratory (Hag­ gard, Clark, & Kalogeras, 2002; Haggard & Clark, 2003) and used the Libet clock method (Libet, Gleason, Wright, & Pearl, 1983) as a proxy measure of action awareness. Partici­ pants watched a fast-moving single clock hand on a computer screen (one rotation every 2,560 ms) and pressed a key at a time of their choosing. They were instructed to avoid us­ Page 13 of 24

Space, Time, and Causality ing the clock as an action prompt (e.g., pressing the key when the hand is at salient posi­ tions like 15, 30, 45, or 60) and to instead act whenever they felt the urge to do so. After they pressed the key, the clock continued to rotate for a short while before it disappeared and participants had to indicate where the hand was when they pressed the key. Over a series of single action baseline trials like this, Haggard et al. established participants’ mean judgment error of action awareness (participants were surprisingly accurate, with an overall mean error of only +6 ms). In a separate baseline block, Haggard et al. mea­ sured the same participants’ awareness of external events (an auditory tone). Participants watched the clock until—at a random unpredictable time after the start of the trial—they heard a beep, and again they reported where the hand was when they heard the beep (here the mean error was slightly greater: +15 ms). To measure whether causal actions lead to shifts in time perception, Haggard et al. also exposed participants to operant conditions, where again they were instructed to press the key at a time of their choosing while watching the clock hand, but this time their key press triggered the tone 250 ms later. On such trials, participants were cued to report either the time of their action, or when they heard the resultant tone (never both, this would have been much too difficult). When they compared judgment errors from operant trials to the corresponding baseline trials, Haggard et al. found that awareness of causal actions was delayed by an additional 15 ms, whereas awareness of the tone was brought forward by 46 ms. In other words, the causal action and its outcome attracted each other in subjective time. We have to be careful, though, before we interpret this as evidence for a causality bias on time perception. After all, Haggard et al. (2002) never measured time perception directly. Instead, they prompted participants’ awareness of actions and their consequences. It is entirely possible to explain shifts in event perception via a realignment of perceptual-mo­ tor streams without drawing on any representation of subjective time, be it explicit or im­ plicit (Kennedy, Buehner, & Rushton, 2009; Parsons, Novich, & Eagleman, 2013; Rohde & Ernst, 2013; Rohde et al., 2014; Stetson et al., 2006). It is beyond the scope of this chap­ ter to explain the realignment perspective in detail. Suffice it to say that this perspective is rooted in the unity assumption of perception: different sensory signals originating from the same event (in this case the key press) belong together and should be perceived as such. However, sensory signals arrive in the brain at different times (e.g., due to differ­ ences in nerve conduction velocities and times). Yet, we typically perceive unity, because our brain binds the different input streams together. It would not be adaptive, however, for such binding to be hard-wired. Instead, the system ought to be flexible to account for changes in the degree of alignment that will inevitably occur over an organism’s lifetime (e.g., conduction times will change as a function of limb growth). Thus, according to this perspective, if we perceive a short delay between our action and its outcome, our sensory system might gradually realign itself to bring the two events closer together in subjective time. In order to ascertain whether causality truly biases time perception over and above how events are aligned in subjective experience, it is necessary to probe time perception di­ rectly. Various lines of research now have done this. First, several studies have shown that verbal estimates of temporal intervals are consistently shorter when the interval is Page 14 of 24

Space, Time, and Causality between a causal action and its outcome, compared to when that same interval separates a passively induced, non-causal movement and a tone (Engbert, Wohlschläger, & Hag­ gard, 2008) or two unrelated tones (Humphreys & Buehner, 2009). A similar effect has been obtained when participants (p. 559) were asked to reproduce the temporal interval they have experienced (Humphreys & Buehner, 2010). Finally, the effect also held up when measured with the method of constant stimuli. Here, participants had to compare the temporal intervals between active (causal) or passively induced (non-causal) key presses and a flash to a tone of varying duration (Nolden, Haering, & Kiesel, 2012). Using this rigorous psychophysical method, Nolden et al. likewise found that active causal inter­ vals appeared subjectively shorter than non-causal passive intervals. In sum, the tempo­ ral binding of cause to effect is a robust result that has now been validated across differ­ ent methodologies, and holds not only in event perception, but also when time perception is probed directly.

Causality, Intentionality, or Both? The astute reader will have noticed that the previous section introduced an additional di­ chotomy beyond the causal versus non-causal distinction. Several of the studies I re­ viewed in that section implemented this distinction by contrasting active causal move­ ments to passively induced movements. This is because the original focus of this work in­ deed was on the awareness of voluntary action. Inspired by a motor-learning perspective similar to the one reviewed briefly in the preceding section, Haggard et al. (2002) were interested in exploring the predictions of a forward-model of motor control. A key feature of this model is that action control takes place by comparing a prediction of the outcome of a given movement to the current experience (for a discussion on goal-directed action, see Hommel, Chapter 15 in this volume). This is what affords corrections in mid-move­ ment (e.g., when we are catching a ball). However, because the laws of association (see the first part of this chapter) state that contiguous events afford better learning than noncontiguous events, it would also be adaptive, so the argument goes, for the motor system to perceive the consequences of its actions immediately, which then gives rise to binding. Be that as it may, the crucial point to note here is that according to this perspective, the critical distinction is not so much between causal and non-causal events, but between vol­ untary, intended actions and other events. Let’s take a closer look at this. In Engbert et al.’s (2008) studies, the control events con­ sisted of passive finger movements. The participant’s finger was strapped to a lever, which participants were asked to press in the active causal condition to generate the ef­ fect. But in the involuntary, passive condition, the lever was pulled down by a solenoid, moving the attached finger with it, and this event was then followed by the same out­ come. In Nolden et al.’s (2012) studies, active key presses were contrasted with condi­ tions in which participants rested their fingers on they keys, and the keys then popped upward at an unpredictable moment; both events were followed by the same outcome. A careful analysis of the causal structure involved in these setups reveals that the activepassive (or intentional and voluntary vs. non-intentional and involuntary) distinction is confounded with a causal versus non-causal distinction: on active voluntary trials, the ac­ Page 15 of 24

Space, Time, and Causality tion is a direct cause of the outcome, and the outcome in fact does not occur unless the action takes place; on passive involuntary trials, the computer schedules both the time of the passive action and the subsequent outcome. This means that on involuntary trials, there is no direct causal link between the passive action and the outcome, but instead the computer is a common cause for both. Because of that, one would not expect any binding or biases to take place on such trials. The same holds true for a seemingly elegant control condition implemented in the original Haggard et al. (2002) paper, in which the motor cortex was briefly activated via trans-cranial magnetic stimulation (TMS), which resulted in a muscle twitch on the same finger that participants used to press the key in the active conditions. Haggard et al. found that, relative to baseline single event intervals, aware­ ness of TMS-induced muscle twitches was relatively early when they were followed by an outcome, whereas awareness of the outcome was delayed: the two repelled each other, rather than resulting in binding. Here as in the other cases, though, there is no causal connection between the muscle twitch and the subsequent beep, so a Bayesian causal perspective makes the same predictions as an approach based on awareness of intention­ al action. What is needed, then, to shed light on this situation is to disentangle causality and inten­ tionality. There are two ways to address this. First, we can ask whether any intentional action has the capacity to bind subsequent events to it, or whether the action also needs to be causal. Second, we can ask whether only intentional action causality leads to bind­ ing, or whether any form causality, even if it does not contain intentional action, can also lead to binding. My research pursued both angles. First, we compared whether intention­ al action in the absence of causality is sufficient to lead to binding (Buehner & Humphreys, 2009). To this end, (p. 560) we trained participants to synchronize a key press to a GO signal; once they had mastered this, we taught them that their key press gener­ ates a tone after a fixed interval; in a final, experimental phase, we asked them to keep pressing the key in synchrony with the GO signal, but to now also press a second key at the time when they anticipate the tone they have generated. We compared this to a base­ line condition, where participants learned that the GO signal itself is the cue for the sub­ sequent tone; we still instructed them to learn to press a key in synchrony to the GO sig­ nal, but crucially, they learned that the subsequent tone occurs regardless of whether or when they pressed the key. A final experimental phase likewise asked them to still press one key in synchrony with the GO signal, and to now also press a second key when they expect the second tone. Crucially, the experimental phases in the causal and baseline con­ ditions were identical on the surface: in both cases, the participant performed two ac­ tions, both in synchrony to external events—critically, though, in the causal condition, the first action caused the second event, while there was no such link at baseline. Comparing reaction times across conditions, we found, first, that a caused tone was ex­ pected earlier than a non-caused (but equally predictable) one: participants made antici­ patory key presses earlier in the causal compared to the baseline condition. We also found that awareness of the causal action was delayed. This revealed itself via relatively earlier execution of the first key press in the causal compared to the baseline condition. In other words, because perception of the causal action was delayed (Haggard et al., Page 16 of 24

Space, Time, and Causality 2002), participants had to execute the action slightly earlier so that it subjectively ap­ peared to be on time with the go signal. In sum, this study showed that temporal binding of action to effects only occurs when the action is causal. Non-causal actions do not af­ ford binding. Thus, intentionality alone is not sufficient to result in binding. In order to ask whether intentional action is necessary to produce binding, or whether causality on its own can produce binding, I compared voluntary action to mechanistic ac­ tion (Buehner, 2012). In a setup similar to the one described above, participants had to press a key to indicate when they anticipated an outcome (in this case an LED flash). The outcome was either generated by an earlier key press, or merely signaled by another ear­ lier LED flash. Critically, though, in some conditions participants pressed the key, while in others an autonomous robot pressed the key. Analyses of reaction times showed that awareness of the target flash was earlier for both the intentional causal and the robotcausal conditions, relative to the non-causal baseline. At very short intervals (500 ms), however, the binding effect was more pronounced for intentional than machine actions. In sum, this work suggests that intentionality is not necessary to achieve causal binding, and that the presence of a causal relation, even a merely mechanical one, is sufficient to achieve temporal causal binding. At the same time, there appears to be a privileged role for intentional action, such that causal binding is enhanced when the cause happens to be an intentional action. Future research will have to further clarify the interplay of inten­ tion and causality in temporal causal binding. In my discussion of the causal binding literature, I have taken for granted that partici­ pants correctly inferred the objective causal structures deployed in the various experi­ mental situations. However, to date, not a single study actually tested whether partici­ pants appropriately appraised the causal relations. The Bayesian, bi-directional account of temporal binding, however, necessitates appropriate causal appraisal (otherwise, there would be no causal knowledge that could influence the perception of time). Research from Moore, Lagnado, Deal, and Haggard (2009), however, allays these concerns: using a standard Libet clock binding paradigm, they exposed their participants to probabilistic relations between action and outcome. Importantly, they varied both the probability of an outcome given an action—expressed as P(e|c) in the probabilistic framework of causation (see Perales, Catena, Cándido, and Maldonado, Chatper 3 in this volume)—as well as the probability of the outcome in the absence of the action P(e|¬c). Both probabilities deter­ mine the contingency between action and outcome, which, in this situation, serves as a proxy for strength of the causal relation between them. Moore et al. found that the extent of temporal binding varied as a function of action-outcome contingency, corroborating the notion that temporal binding between action and outcome is rooted in the perceived causal relation between them.

From Temporal Order to Causality and Vice Versa In the first part of the chapter, we have learned how experienced temporal order can shape causal impressions over and above statistical regularity. In the second section, we so far have considered (p. 561) how knowledge of causality can warp our impression of Page 17 of 24

Space, Time, and Causality time to pull events linked by a causal relation closer together in subjective experience. Would the causality bias even go as far as to reverse temporal order to align our experi­ ence with our causal beliefs? Recent research suggests that this is also the case (Bechli­ vanidis & Lagnado, 2013). Bechlivanidis and Lagnado presented participants with a simu­ lated physics world in a computerized puzzle game context. Participants dragged objects around on the screen before hitting a “play” button, which triggered the objects into mo­ tion according to predefined rules that simulated attraction or repulsion between various of the objects. The goal of the game was to position the objects in such a way that once triggered, the event sequence ended in a particular goal state (e.g., the red rectangle ends up inside the purple square). Through a series of self-paced puzzle trials, partici­ pants learned the causal rules that underpinned the puzzle world. For example, the red rectangle had to be transformed into a star before it could be admitted to the purple square (otherwise it would just bounce back off), and in order to transform it, two specific other objects had to collide. Having completed a series of such puzzle trials (and having acquired the requisite causal knowledge), participants then viewed a crucial test clip, which depicted the same events they had become familiar with in the puzzle phase (e.g., the red rectangle transforming into a star), and had to report the temporal order in which they saw the events by arranging descriptive statements on the computer screen into the correct order. Crucially, however, the temporal order of the test clip was at variance with the causal order participants had acquired through the puzzle phase (the red rectangle entered the purple square, then it transformed into a star, and after this the collision be­ tween the two other objects took place). When prompted to report the temporal order of the events they experienced in this final clip, the majority response (38.7%) corresponded to the earlier learned causal order. In contrast, a control group of participants who did not undergo the causal puzzle pre-training reported this temporal order only 2.9% of the time, but reported the correct temporal order 42.9% of the time, more than twice as fre­ quently as participants who had undergone the puzzle training (19.3%). A subsequent ex­ periment extended this finding by showing that the same objective order of events is re­ ported differently by two groups of participants, depending on whether their prior experi­ ence of causality corresponded to the objective order, or was of the opposite order. Thus, causal beliefs bias our perception of temporal order such that when we experience an or­ dering that violates existing causal knowledge, our perception of order reverses to be in line with our beliefs. An interesting question for future research is how stable this effect is in light of repeated exposure to novel orders (which suggest that the existing causal knowledge may need to be updated).

Cause as a Guide to Space David Hume (1888) referred to contiguity in time and space as a cue toward causality. I have not discussed spatial contiguity at all in the first section of the chapter. This is be­ cause there is a paucity of research on this topic. While there is a solid body of work con­ sidering the role of spatial contiguity in perceptual causality (Michotte, 1963; cf. also White, Chapter 14 in this volume), I am not aware of systematic investigations of spatial contiguity on causal judgments from contingency data. However, the evidence from per­ Page 18 of 24

Space, Time, and Causality ceptual causality research certainly aligns with Hume’s conjecture and shows that spatial contiguity is an important cue to visual impressions of causality. This means that if the Bayesian interpretation of Hume’s conjecture is correct, we should find similar causality biases in space perception as we did in time perception. Only one study so far (Buehner & Humphreys, 2010) investigated this possibility. We pre­ sented participants with modifications of the original perceptual causality stimulus. Par­ ticipants observed one object (A) move toward and collide with a stationary rectangle (B). On the side opposite to where A collided with B was a stationary third object (C), which started moving following A’s collision with B. The visual impression was similar to that of a Newton’s cradle in which momentum is transferred from one object to another via an intermediary stationary object. Critically, the length of the intermediate object B varied randomly from trial and trial. Following each observation, the screen cleared and partici­ pants had to reproduce the size of object B by adjusting a new bar on the screen until they felt it matched the size of object B seen in the previous animation. We found that participants consistently underestimated the size of B. Importantly, when the animation they witnessed did not give rise to a causal impression (e.g., C only moved off after a de­ lay, or moved off before A collided with B), the underestimation was considerably re­ duced, or even reversed (p. 562) (i.e., overestimation). In a nutshell, we found evidence for causal binding in space: when two events are linked by a causal relation, we perceive them to have occurred closer in space than those same events when they are not causally related.

Conclusions This chapter has considered David Hume’s (1888) conjectures about the role that time and space play in causal inference. We have seen that temporal contiguity plays an impor­ tant role in causal inference such that—all else being equal—short delays support causal learning, while longer delays tend to hinder it. There are exceptions from this default po­ sition, however, such that when prior expectations (or experience) suggest that the candi­ date causal relation takes time to unfold, temporal delays not only can be bridged, but al­ so—if the delay expectations are realistic—can give rise to stronger impressions of causality than contiguous cause–effect pairings. The extent of the delay itself need not be constant over repeated experience of cause–effect intervals. However, a constant interval gives rise to stronger impressions of causality than a variable one. Temporal order is a fundamental cue toward causality, and specifically to causal struc­ ture. The cause has to precede its effect. Often, different causal structures give rise to identical statistical dependencies. In the absence of controlled experimentation (interven­ tion), temporal order can help to disambiguate between different structures. In fact, the credibility of temporal order is so high that we are swayed by it even when it is at vari­ ance with statistical information. This can lead to problems in situations where we might learn about the presence of an effect before learning about its cause, and thus could con­ clude that the former caused the latter. Page 19 of 24

Space, Time, and Causality Temporal information enters causal considerations not only from firsthand experience, but also where events are merely described or summarized in tabular format. How a par­ ticular statistical pattern is interpreted depends on the temporal expectations we bring to the task. In other words, our temporal assumptions shape whether we group an instance of cause–delay–effect as an instance of cause present and effect present, or as two sepa­ rate instances of cause and no effect, and no cause and effect. And even when the group­ ings of cause and effect are unambiguously clear, additional temporal information con­ cerning advancement or postponement of the effect enters into the consideration and in­ fluences our judgments. Finally, the relationship between time and causality is bi-directional in ways that tran­ scend Hume’s conjectures. Not only do we use the hard empirical facts of temporal and spatial contiguity to infer the existence of causal relations, but once we have formed such impressions, they in turn influence how we perceive these so-called hard facts in a way reminiscent of constraint-satisfaction models of cognition (Holyoak & Simon, 1999).

References Bechlivanidis, C., & Lagnado, D. A. (2013). Does the “why” tell us the “when?” Psychologi­ cal Science, 24(8), 1563–1572. doi:10.1177/0956797613476046 Blakemore, S. J., Fonlupt, P., Pachot-Clouard, M., Darmon, C., Boyer, P., Meltzoff, A. N., et al. (2001). How the brain perceives causality: An event-related fMRI study. NeuroReport, 12(17), 3741–3746. Buehner, M. J. (2012). Understanding the past, predicting the future: Causation, not in­ tentional action, is the root of temporal binding. Psychological Science, 23(12), 1490– 1497. doi:10.1177/0956797612444612 Buehner, M. J., & Humphreys, G. R. (2009). Causal binding of actions to their effects. Psy­ chological Science, 20(10), 1221–1228. doi:10.1111/j.1467-9280.2009.02435.x Buehner, M. J., & Humphreys, G. R. (2010). Causal contraction: Spatial binding in the per­ ception of collision events. Psychological Science, 21(1), 44–48. doi: 10.1177/0956797609354735 Buehner, M. J., & May, J. (2002). Knowledge mediates the timeframe of covariation assess­ ment in human causal induction. Thinking and Reasoning, 8(4), 269–295. Buehner, M. J., & May, J. (2003). Rethinking temporal contiguity and the judgement of causality: Effects of prior knowledge, experience, and reinforcement procedure. Quarter­ ly Journal of Experimental Psychology A: Human Experimental Psychology, 56(5), 865– 890. doi:10.1080/02724980244000675 Buehner, M. J., & May, J. (2004). Abolishing the effect of reinforcement delay on human causal learning. Quarterly Journal of Experimental Psychology B, 57(2), 179–191. doi: 10.1080/02724990344000123 Page 20 of 24

Space, Time, and Causality Buehner, M. J., & McGregor, S. (2006). Temporal delays can facilitate causal attribution: Towards a general timeframe bias in causal. Thinking and Reasoning, 12(4), 353–378. Burns, P., & McCormack, T. (2009). Temporal information and children”s and adults” causal inferences. Thinking and Reasoning, 15(2), 167–196. Dickinson, A. (2001). Causal learning: Association versus computation. Current Directions in Psychological Science, 10(4), 127–132. Droit-Volet, S., Fayolle, S. L., & Gil, S. (2011). Emotion and time perception: Ef­ fects of film-induced mood. Frontiers in Integrative Neuroscience, 5, 33. doi:10.3389/ fnint.2011.00033 (p. 563)

Droit-Volet, S., & Meck, W. H. (2007). How emotions colour our perception of time. Trends in Cognitive Sciences, 11(12), 504–513. Eagleman, D. M., & Holcombe, A. O. (2002). Causality and the perception of time. Trends in Cognitive Sciences, 6(8), 323–325. doi:10.1016/S1364-6613(02)01945-9 Einhorn, H. J., & Hogarth, R. M. (1986). Judging probable cause. Psychological Bulletin, 99(1), 3–19. Engbert, K., Wohlschläger, A., & Haggard, P. (2008). Who is causing what? The sense of agency is relational and efferent-triggered. Cognition, 107(2), 693–704. doi:10.1016/ j.cognition.2007.07.021 Fraisse, P. (1984). Perception and estimation of time. Annual Review of Psychology, 35(1), 1–36. doi:10.1146/annurev.ps.35.020184.000245 Gallistel, C. R. (1990). The organization of learning. Cambridge, MA: MIT Press. Gallistel, C. R., & Gibbon, J. (2000). Time, rate and conditioning. Psychological Review, 107(2), 289–344. Green, L., & Myerson, J. (2004). A discounting framework for choice with delayed and probabilistic rewards. Psychological Bulletin, 130, 769–792. doi: 10.1037/0033-2909.130.5.769 Greville, W. J., & Buehner, M. J. (2007). The influence of temporal distributions on causal induction from tabular data. Memory and Cognition, 35(3), 444–453. Greville, W. J., & Buehner, M. J. (2010). Temporal predictability facilitates causal learning. Journal of Experimental Psychology: General, 139(4), 756–771. doi:10.1037/a0020976 Greville, W. J., & Buehner, M. J. (2016). Temporal predictability enhances judgments of causality in elemental causal induction from both intervention and observation. Quarterly Journal of Experimental Psychology, 69(4), 678–697.

Page 21 of 24

Space, Time, and Causality Haggard, P., & Clark, S. (2003). Intentional action: Conscious experience and neural pre­ diction. Consciousness and Cognition, 12(4), 695–707. Haggard, P., Clark, S., & Kalogeras, J. (2002). Voluntary action and conscious awareness. Nature Neuroscience, 5(4), 382–385. doi:10.1038/nn827 Hagmayer, Y., & Waldmann, M. R. (2002). How temporal assumptions influence causal judgments. Memory and Cognition, 30(7), 1128–1137. Holyoak, K. J., & Simon, D. (1999). Bidirectional reasoning in decision making by con­ straint satisfaction. Journal of Experimental Psychology: General, 128(1), 3–31. Hume, D. (1888). A treatise of human nature (L. A. Selby-Bigge, Ed.). Oxford: Clarendon Press. Humphreys, G. R., & Buehner, M. J. (2009). Magnitude estimation reveals temporal bind­ ing at super-second intervals. Journal of Experimental Psychology: Human Perception and Performance, 35(5), 1542–1549. doi:10.1037/a0014492 Humphreys, G. R., & Buehner, M. J. (2010). Temporal binding of action and effect in inter­ val reproduction. Experimental Brain Research, 203(2), 465–470. doi:10.1007/ s00221-010-2199-1 Kennedy, J. S., Buehner, M. J., & Rushton, S. K. (2009). Adaptation to sensory-motor tem­ poral misalignment: Instrumental or perceptual learning? Quarterly Journal of Experi­ mental Psychology, 62(3), 453–469. doi:10.1080/17470210801985235 Lagnado, D. A., & Sloman, S. A. (2004). The advantage of timely intervention. Journal of Experimental Psychology: Learning, Memory, and Cognition, 30(4), 856–876. doi: 10.1037/0278-7393.30.4.856 Lagnado, D. A., & Sloman, S. A. (2006). Time as a guide to cause. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32(3), 451–460. doi:10.1037/0278-7393. 32.3.451 Libet, B., Gleason, C. A., Wright, E. W., & Pearl, D. K. (1983). Time of conscious intention to act in relation to onset of cerebral activity (readiness-potential): The unconscious initi­ ation of a freely voluntary act. Brain: A Journal of Neurology, 106(Pt 3), 623–642. McCormack, T., Frosch, C. A., Patrick, F., & Lagnado, D. A. (2015). Temporal and statisti­ cal information in causal structure learning. Journal of Experimental Psychology: Learn­ ing, Memory, and Cognition, 41(2), 395–416. doi:10.1037/a0038385 Mendelson, R., & Shultz, T. R. (1976). Covariation and temporal contiguity as principles of causal inference in young children. Journal of Experimental Child Psychology, 22, 408– 412. Michotte, A. (1963). The perception of causality. New York: Basic Books. Page 22 of 24

Space, Time, and Causality Moore, J. W., Lagnado, D. A., Deal, D. C., & Haggard, P. (2009). Feelings of control: Con­ tingency determines experience of action. Cognition, 110(2), 279–283. doi:10.1016/ j.cognition.2008.11.006 Nolden, S., Haering, C., & Kiesel, A. (2012). Assessing intentional binding with the method of constant stimuli. Consciousness and Cognition, 21(3), 1176–1185. doi:10.1016/ j.concog.2012.05.003 Parsons, B. D., Novich, S. D., & Eagleman, D. M. (2013). Motor-sensory recalibration mod­ ulates perceived simultaneity of cross-modal events at different distances. Frontiers in Psychology, 4, 1–12. doi:10.3389/fpsyg.2013.00046/abstract Pearce, J. M. (1987). A model for stimulus generalization in Pavlovian conditioning. Psy­ chological Review, 94(1), 61–73. Pearce, J. M., & Hall, G. (1980). A model for Pavlovian learning: Variations in the effec­ tiveness of conditioned but not of unconditioned stimuli. Psychological Review, 87(6), 532–552. Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge, UK: Cambridge University Press. Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy, Classical conditioning II: Current theory and research (pp. 64–99). New York: AppletonCentury-Crofts. Rohde, M., & Ernst, M. O. (2013). To lead and to lag: Forward and backward recalibration of perceived visuo-motor simultaneity. Frontiers in Psychology, 3, 599. doi:10.3389/fpsyg. 2012.00599/abstract Rohde, M., Greiner, L., & Ernst, M. O. (2014). Asymmetries in visuomotor recalibration of time perception: Does causal binding distort the window of integration? Acta Psychologi­ ca, 147, 127–135. doi:10.1016/j.actpsy.2013.07.011 Rottman, B. M., Kominsky, J. F., & Keil, F. C. (2014). Children use temporal cues to learn causal directionality. Cognitive Science, 38(3), 489–513. doi:10.1111/cogs.12070 Shanks, D. R. (1985). Continuous monitoring of human contingency judgment across tri­ als. Memory and Cognition, 13(2), 158–167. Shanks, D. R. (1987). Acquisition functions in contingency judgment. Learning and Motivation, 18(2), 147–166. doi:10.1016/0023-9690(87)90008-7 (p. 564)

Shanks, D. R., & Dickinson, A. (1988). Associative accounts of causality judgment. Psy­ chology of Learning and Motivation, 21, 229–261.

Page 23 of 24

Space, Time, and Causality Shanks, D. R., Pearson, S. M., & Dickinson, A. (1989). Temporal contiguity and the judge­ ment of causality by human subjects. Quarterly Journal of Experimental Psychology B, 41B(2), 139–159. doi:10.1080/14640748908401189 Stetson, C., Cui, X., Montague, P. R., & Eagleman, D. M. (2006). Motor-sensory recalibra­ tion leads to an illusory reversal of action and sensation. Neuron, 51(5), 651–659. doi: 10.1016/j.neuron.2006.08.006 Wasserman, E. A., Elek, S., Chatlosh, D., & Baker, A. (1993). Rating causal relations: Role of probability in judgments of response-outcome contingency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19(1), 174.

Notes: (1.) In the first decade of the 2000s, energy-saving light bulbs came with a noticeable gap of a few seconds between being switched on and illuminating.

Marc J. Buehner

School of Psychology Cardiff University Cardiff, Wales, UK

Page 24 of 24

Causation in Legal and Moral Reasoning

Causation in Legal and Moral Reasoning   David A. Lagnado and Tobias Gerstenberg The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.30

Abstract and Keywords Causation looms large in legal and moral reasoning. People construct causal models of the social and physical world to understand what has happened, how and why, and to allo­ cate responsibility and blame. This chapter explores people’s common-sense notion of causation, and shows how it underpins moral and legal judgments. As a guiding frame­ work it uses the causal model framework (Pearl, 2000) rooted in structural models and counterfactuals, and shows how it can resolve many of the problems that beset standard but-for analyses. It argues that legal concepts of causation are closely related to everyday causal reasoning, and both are tailored to the practical concerns of responsibility attribu­ tion. Causal models are also critical when people evaluate evidence, both in terms of the stories they tell to make sense of evidence, and the methods they use to assess its credi­ bility and reliability. Keywords: causation, counterfactual, legal reasoning, attribution, moral, causal model

Introduction What or who caused a certain event to occur is essentially a practical question of fact which can best be answered by ordinary common sense rather than by ab­ stract metaphysical theory. Lord Salmon, Alphacell Ltd v. Woodward [1972] A.C. 824, 847 A young child is admitted to a hospital suffering from croup. He is placed under the care of a doctor. The child is settled at first, but then has two episodes of breathing difficulties. The nurse calls the doctor, but she is at a clinic and does not attend. The child recovers from these episodes, but soon after his breathing is severely blocked, and he suffers a cardiac arrest. The child subsequently dies. The child’s mother brings a case against the doctor. Medical experts claim that had the child been intubated during these episodes, his death could have been avoided.1

Page 1 of 64

Causation in Legal and Moral Reasoning We assign responsibility to someone rapidly, often in a matter of seconds, in a process that is internal to us and largely automatic. It seems natural to blame the doctor for the child’s death—if she had attended to the child, she could have saved him. Assigning legal responsibility takes longer, often many months. It is an external process with several ex­ plicit stages: charging someone, taking her to court, an investigation, and subsequent tri­ al. Crucial to both processes is the construction of causal models of what happened: who did what, how, and why. These models include assumptions about people’s actions, be­ liefs, and motivations, about what actually happened and also what would have happened had people acted differently. They also encapsulate assumptions about what should have happened: what actions a reasonable person would have taken. As given, the story about the child’s death is underspecified. We do not know exactly what would have happened if the doctor had intubated (p. 566) the child; perhaps the child would have died anyway. Nevertheless, we still blame the doctor for not attending when called, and indeed the legal finding was that she breached her duty of care to the child. But the question of whether the doctor caused the child’s death is still unclear. We do not yet know what would have happened if the doctor had attended—would she have intubat­ ed? In court the doctor claimed that even if she had attended, she would not have intubat­ ed. That was her practice, and indeed some medical experts supported this decision (al­ though they were in a minority). Now we might be less sure that the doctor caused the death. After all, had she attended and not intubated the child, she would have been fol­ lowing an acceptable line of medical practice, but the child would still have died. Thus her failure to attend to the child made no difference to what actually happened. This scenario illustrates the dependence of both legal and moral reasoning on causal un­ derstanding and a relatively sophisticated use of counterfactual reasoning. Indeed, the notion of causation is embedded in legal doctrine, and also implicit in our moral reason­ ing. But it is not clear exactly what this notion of causation amounts to, how it relates to scientific or everyday conceptions of causality, and how it underpins our legal and moral decisions. Our opening quote from a judge, which is representative of legal opinion, states that the notion of causation used in the law corresponds to our common-sense no­ tion of causality. Moreover, it is also argued (Moore, 2009, 2012) that this is how it should be, because ultimately legal decisions should fit with our moral judgments, and the latter are themselves based on common-sense principles. But what exactly is our common-sense notion of causation, and how does it underpin our everyday moral and legal judgments? This chapter will explore these questions, building on recent work in philosophy, psychology, and cognitive science that develops a rich pic­ ture of how people construct and reason with causal models (Halpern & Pearl, 2005; Pearl, 2000; Sloman, 2005; Sloman & Lagnado, 2015; see also other chapters in this vol­ ume).

Page 2 of 64

Causation in Legal and Moral Reasoning

Philosophical Theories of Causation Before exploring the role of causation in legal and moral reasoning, it is useful to high­ light some key distinctions from the philosophical literature (for more details see Danks, Chapter 12 in this volume). First, we must distinguish between general and singular causal claims. The former in­ volve claims about causal laws or propensities: that exposure to asbestos causes lung cancer; that reckless driving causes accidents; that poisoning causes death. The latter in­ volve claims about a specific event or state of affairs: that Jim contracted lung cancer due to asbestos exposure; that Joe’s speeding on this occasion caused the accident; that Jane died from arsenic in her tea. In legal or moral contexts, the focus is often on singular claims: we want to know the specifics of what happened and blame or praise people ac­ cordingly. But general claims also play a crucial role, encapsulating our knowledge of what makes people (and things) tick, and helping us to infer what happened on particular occasions. Second, philosophical theories of causation divide into two main camps (Paul & Hall, 2013): 1. Dependency accounts: where causation is defined in terms of whether causes “make a difference” to their effects. For singular events, this is often cashed out in terms of counterfactual dependency between events (Lewis, 1986). For example, the arsenic in Jane’s tea caused her death because it made the crucial difference; with­ out the arsenic she would not have died. 2. Process or production accounts: where causation is defined in terms of a physical process that transfers energy or momentum from causes to effects (Dowe, 2000). For example, there is a complex physical process from the drinking of arsenic to the Jane’s eventual death from poisoning. While these two approaches are not exclusive—indeed, both apply in the case of Jane’s death—they differ on some critical cases, such as their treatment of omissions. An omis­ sion, for instance the doctor’s failure to intubate, is a perfectly legitimate cause of the child’s death according to the difference-making account, whereas on the process view omissions are ruled out because they do not involve a physical process from the omission to the putative effect. What is the physical process that connects the doctor’s failure to in­ tubate to the child’s death? On the other hand, difference-making accounts notoriously struggle with a different class of cases, such as overdetermination or pre-emption (for ex­ amples, see later discussion in this chapter), which seem readily captured in terms of process accounts. Recent research in philosophy and cognitive psychology focuses mainly on counterfactual theories and difference making, but we shall see that legal causation seems to draw on aspects of both dependency and process. Third, with regard to singular causal claims, another important distinction is be­ tween questions of causal connection versus selection (see Hilton, Chapter 32 in this vol­ ume). The former involves determining whether or not an event is “a” cause of some (p. 567)

Page 3 of 64

Causation in Legal and Moral Reasoning event, where there might be various causes or contributing causes for any specific effect. Thus, the discarded cigarette, the dry wood, and the presence of oxygen in the air are all causes of the subsequent fire in the shed. In contrast, causal selection involves picking out one (or several) of these as “the” cause, and relegating the others to mere back­ ground conditions. For example, the dry wood and the presence of oxygen will typically be seen as mere conditions, with the discarded cigarette selected as “the” cause of the fire. But this process of selection depends on contextual or pragmatic issues. In the con­ text of a shed that is usually damp, the dryness of the wood might also be singled out. This highlights another distinction commonly made in the literature on singular causal judgments, between normal versus abnormal events (Halpern & Hitchcock, 2014; Hart & Honore, 1983; Hilton, Chapter 32 in this volume). Actions or events that are atypical or abnormal relative to the commonplace are often singled out as causes. Whether this re­ flects a deep feature of causal claims themselves, rather than a pragmatic feature of how we use them, is a controversial question (Blanchard & Schaffer, 2016). But the focus on transgressions from normality is clearly very germane to legal and moral issues, and seems to play a correspondingly large role in human judgment (Kahneman & Miller, 1986; Knobe, 2009).

Legal Reasoning Causation in the Law Causal reasoning is ubiquitous in the law. This holds true in both criminal and civil pro­ ceedings. Thus, criminal offenses are analyzed in terms of two key elements: the defendant’s action (actus reus) and the defendant’s mental state at the time of the action (mens rea). For example, murder requires both that the defendant caused the death of the victim, and intended to kill or seriously harm the victim. Both elements invoke questions of causality. This is most straightforward with the actus reus element, which amounts to the explicit claim that the defendant’s conduct caused the result in question. However, it is also implicit in the mens rea condition. The mental element, too, is assumed to play a causal role in the offense. For example, an intention is taken as a causal precursor to the action.2 Without the assumption that the intention has some causal efficacy, the rationale for establishing mens rea for murder would be undermined. Likewise, in civil proceed­ ings, a central concern is to establish that the defendant’s action or omission caused the harm suffered by the claimant. Consequently, proof of causation is often a key issue in court, with the prosecution seek­ ing to prove that the relevant causal link existed, and the defense typically opposing this claim. But even when the defense accepts that the defendant did, in fact, cause the re­ sult, they will often argue for countervailing causal factors that justify or excuse the defendant’s actions: for example, a claim that the defendant acted in self-defence or un­ der duress. In addition, causal claims infuse the network of hypotheses and evidence that surround the central issues of a crime. Thus, a defendant’s motive, opportunity, and Page 4 of 64

Causation in Legal and Moral Reasoning means are often causal pre-conditions of a crime, and evidence is offered to support these causal hypotheses. Moreover, evidence is itself a causal consequence of the pattern of events that make up a crime. Fingerprints left at the crime scene are caused by the perpetrator’s presence; an eyewitness report is caused by their sighting of the defendant; a defendant’s confession is caused by his feelings of guilt (or by the strong arm of the law). Causality also pervades how we evaluate the process of decision-making and justice. The decisions made by juries and judges are determined by a multitude of causal factors, and these admit of analysis, especially when a trial is appealed. For example, a key question in an appeal is often whether the jury would have reached a different verdict if some as­ pect of the actual trial had been different—for instance, if a certain piece of evidence had been excluded, or if the judge had given different instructions. These questions often hinge on our causal understanding of how jurors think—and the answers to these ques­ tions can determine the result of the appeal. Finally, when a judge decides on an appro­ priate punishment, the causal impact of these sanctions needs to be considered—what would be the effect of imprisoning or imposing a fine on the guilty party? Here a prospec­ tive notion of causation is in play—anticipating the likely consequences of a particular sentence. In short, the law is shot through with causal claims and judgments, and crucial decisions are made based on the various decision-makers’ understanding of causality. (p. 568)

Common-sense, Legal, and Philosophical Notions of Causation

Jurists3 often claim that the law relies on the same basic concepts of causation as those used in everyday thought (Hart & Honore, 1985). Indeed, juries are typically told to rely on their common-sense notion of causation. In complex situations, a judge might instruct the jury on some finer points, but by and large jurists maintain that legal notions of cau­ sation appeal to, and elucidate, our ordinary understanding of causation.4 In contrast, the relation between the legal notion and philosophical analyses of causation is openly debat­ ed. Some argue that the law operates with its own notion of causation, and metaphysical analyses are often irrelevant (Green, 2015; Hoffman, 2011; Stapleton, 2008), while others argue for a general theory of causation, suitable for law, science, and metaphysics (Moore, 2009; Schaffer, 2010). Often missing from this debate is a precise description of everyday causal thought, or any discussion of relevant psychological work, and how this relates to the philosophical theo­ ries or the legal accounts that assume it. In this chapter we argue that everyday causal reasoning is indeed closely allied to legal concepts of causation, but also that both are rel­ atively well modeled (but not yet perfectly) by current philosophical theories. This is not to say that we have a single satisfactory theory that encompasses legal and everyday con­ cerns, but recent progress suggests that convergence is possible. And irrespective of whether we attain a unified theory of causation, psychology and law have a lot to learn from each other. One theme that we will develop is that the way in which the law deals

Page 5 of 64

Causation in Legal and Moral Reasoning with causation, in the face of practical questions and needs, is often mirrored by our everyday causal reasoning.

Legal Analyses of Causation The Shoot-out The TV series Breaking Bad raises many moral issues. Here we use one of its pivotal scenes to illustrate some issues for legal causation. To cut a long story short,5 Walt is a chemistry teacher with terminal cancer who is cooking and selling crystal meth with his ex-pupil Jesse. Attempting to widen their sales, they get mixed up with Tuco, a psycho­ pathic drug baron. Tuco is on the run from the police, and is holding Walt and Jesse cap­ tive at his father’s isolated shack. Armed with a machine gun and a handgun, he takes them both outside, and starts beating Jesse. In the struggle, Jesse manages to grab Tuco’s handgun, and shoots him in the stomach. Tuco lies dying, and Walt and Jesse leave him for dead. But Tuco gets up and staggers toward Jesse’s car, where they have left his ma­ chine gun. Suddenly, Hank, a federal drug agent (also Walt’s brother-in-law) turns up. He shouts at Tuco to stop and raise his hands. Instead, Tuco seizes the machine gun, and fires repeatedly at Hank. Hank hides behind a car and shoots back. After a battery of fir­ ing from both sides, Tuco stops to reload, and Hank shoots Tuco dead. Who caused Tuco’s death? The answer seems straightforward: it was Hank.6 However, even though common sense and the law might agree, it is not trivial to give a principled rationale for this an­ swer. Indeed, as we show next, the standard but-for test used in legal contexts cannot de­ liver this simple answer.

Factual and Legal Causation Legal analyses of causation operate with two notions—factual and legal (or proximate) causation. These are supposed to work in two stages: initially, the factual causes in a case are identified; then one (or several) of these is selected as the legal cause. The separabili­ ty of these steps has been contested, both on theoretical grounds (Green, 2015; Tadros, 2005) and in terms of the actual practice of trial judges (Hoffmann, 2011), but the con­ ceptual distinction is standard in legal texts. Factual causation is assumed to correspond to what actually happened in the case, irre­ spective of any evaluation or legal judgment.7 Our knowledge of this depends on the de­ tails of the case, the evidence and arguments presented, and our everyday assumptions about how people and things work, sometimes supplemented by expert knowledge (e.g., in medical or scientific contexts). The standard test for causation is the but-for test: the defendant’s action is a factual cause of the result if, but for the defendant’s action, the re­ sult would not have occurred (Herring, 2010, p. 90). The but-for test is appealing in its simplicity, and it has a strong philosophical pedigree in counterfactual theories of causation (Lewis, 1986). In many cases it delivers a clear-cut judgment: when the defendant shoots his victim, it is usually clear that had he not shot,

Page 6 of 64

Causation in Legal and Moral Reasoning the victim would not have died. However, it suffers from various problems, both as a theo­ retical and practical principle. These problems will be discussed in the following. Legal causation lacks a crisp definition or test, and is often seen as the juncture where le­ gal and policy issues are introduced. In UK law, a legal (p. 569) cause is defined as “an op­ erating and substantial cause” (e.g., Herring, 2010, p. 91). The question of what counts as a substantial cause is open-ended, but it aims to rule out but-for factors that are only re­ motely linked to the result in question. In many cases it is intuitively obvious whether or not a factor is substantial, but there will be tricky cases where the status of a candidate cause is unclear. Without a precise definition, the decision will rest on the judge’s inter­ pretation, and thus allow for non-causal factors to influence the judgment. The notion of an operating cause also lacks a precise definition or rule for application. It is often in­ voked when the defendant’s action is “interrupted” by another person’s action or an act of nature. For example, consider the case where the defendant and his two friends beat up the victim, but the defendant then left the scene, after which his friends drowned the victim.8 The defendant was ruled not to have caused the death, because he was not an op­ erating cause: the subsequent actions of his friends constituted a novel intervention that broke the chain of causation between his actions and the death. Here again, the lack of a definitive test permits leeway in the interpretation of an operating cause, and can allow other non-causal factors to influence the final judgment. Both of these concepts will be il­ lustrated further in the following examples. So far we have looked mainly at the legal treatment of actions (actus reus), but mental states (mens rea) can also play a key role in judgments of legal causation. The require­ ment of an intention in serious offenses like murder is a straightforward example. But a defendant’s mental states are also relevant with respect to whether or not they foresaw the adverse result of their actions. For example, consider a defendant who sets light to an adversary’s house, intending to frighten them, but a child dies in the fire. Despite not in­ tending to kill the child, the defendant can be convicted of murder because he could have foreseen that someone might be killed.9 Note how issues of foreseeability are crucial for purpose of legal judgments, but are not clearly tied to causation as normally understood on a scientific view. Why should someone’s knowledge or expectations affect the extent to which an action is judged to have caused an outcome? However, the influence of mental factors such as foreseeability does seem to tie in with our everyday conception of causa­ tion, as we shall show in the following (e.g., see Knobe, 2009; Lagnado & Channon, 2008).

Problems with the But-For Test Despite its central role in causal judgments in the law, the but-for test suffers from vari­ ous well-documented problems: it can be imprecise and difficult to prove; it is over-inclu­ sive in what it counts as a cause, but in certain cases it is also too restrictive, ruling out genuine causes. We will discuss these problems in turn, with the ultimate aim of showing how they can be resolved by an extended account of the but-for concept (cf. Halpern & Pearl, 2005; Stapleton, 2008).

Page 7 of 64

Causation in Legal and Moral Reasoning The Problem of Imprecision and Proof One difficulty is that in certain contexts the but-for test is imprecise and thus hard to prove (Moore, 2009). The but-for is essentially a comparative test: one compares the actu­ al world, in which the defendant acted and the result occurred, with a hypothetical (coun­ terfactual) world in which the defendant did not act, and ask whether or not the result would still have occurred. But this leaves unspecified exactly what takes the place of the defendant’s action. Sometimes the contrast case is obvious. For instance, when consider­ ing what would have happened if the defendant had not shot the victim, one imagines a world where he did not shoot. One does not consider a world in which he tries to kill the victim in some other way (cf. Schaffer, 2010). However, sometimes the appropriate counterfactual supposition is less clear. For exam­ ple, in the preceding medical negligence example, one key question was what would have happened if the doctor had not breached her duty of care, and had attended the child when called. Would the child have survived? The answer to this question depends both on (a) what the doctor would have done if she had attended, and (b) what would have hap­ pened as a consequence of this action. Both of these issues are uncertain, and thus re­ quire evidence and argument. In this case it was agreed, based on medical opinion, that the child would have had a greater chance of survival if the doctor had intubated. There­ fore, if one imagines “doctor intubates” as the contrast case, then the but-for test would rule the doctor’s negligence as a cause of the death. However, the doctor argued that even if she had attended, she would not have intubated. On this counterfactual supposi­ tion, the child would still have died, and therefore by the but-for test the doctor did not cause the death. The court accepted this argument, and ruled that the doctor’s failure to attend, despite being a breach of her duty of care, did not cause the child’s death.10 (p. 570) The legal question thus shifted to whether the doctor’s practice of not intubating was itself reasonable. Although in this case the two components (a) and (b) of the but-for test were resolved to the court’s satisfaction, in other situations these issues might be hard to establish. It might be difficult to agree on what the defendant would have done; and, even if this were established, to agree on what would have happened contingent on this action. Thus, al­ though in this case the medical experts agreed that had the doctor intubated, the child would probably have survived, in other medical cases this might not be so clear (and might even go beyond current medical knowledge). In sum, despite its apparent simplicity, the but-for test requires various cognitive opera­ tions when applied to real cases. The fact-finder needs to select the appropriate contrast, and to judge how this counterfactual world would have unfolded. These demands are not trivial, and might be hard to prove or provide substantial evidence in support of. Never­ theless, this problem is part and parcel of legal inquiry. Legal cases are often hard to de­ cide, and the but-for test, properly analyzed, clarifies what needs to be shown for proof of causation. It seems appropriate that different sides to the dispute might argue for differ­

Page 8 of 64

Causation in Legal and Moral Reasoning ent contrasts, and even make different claims about what would have happened in the relevant counterfactual worlds.

The Problem of Promiscuity Another problem with the but-for test is that it generates too many causes. Thus in most cases there will be innumerable but-for causes of a specific result. For example, when a defendant shoots his victim, there are all kinds of factors but-for which the victim’s death would not have occurred: if the defendant’s parents hadn’t met, if he hadn’t been born, if he hadn’t moved to the city, if he hadn’t been introduced to the victim, and so on. Most of these factors are clearly irrelevant to the legal issue in question. But the but-for test by it­ self is too coarse a tool to demarcate the relevant from the irrelevant factors. This is where the concept of legal causation is supposed to earn its keep, by pruning away those factors that are deemed irrelevant to the legal question at issue. The notion of legal cause—cashed out in terms of substantial or operating cause—seems to work well in clear-cut cases. It rules out factors that are clearly insignificant, such as distant or coincidental precursors of the defendant’s behavior, thus excluding his parents and other factors that were incidental to his behavior on this occasion, and also limits the extent to which the defendant is deemed a cause of the more distant or coincidental con­ sequences of his actions. But there will be hard cases, where the imprecision of the con­ cept of legal cause means that questions of significance or remoteness requires a judg­ ment call by the fact-finder, rather than following explicitly from the causal definition it­ self.11

Pre-emption Pre-emption occurs when an action or event brings about an outcome, but if this action had not occurred, an alternative or back-up action would have brought about that same result. The latter action is “pre-empted” by the former. For example, suppose the driver of a rented car fails to brake and injures a pedestrian. Unknown to the driver, the brakes were faulty—the car rental company had not checked or maintained the car’s brakes. So even if the driver had applied the brakes, they would not have worked, and the same in­ juries would have been incurred.12 Intuitively it is the negligent action (or inaction) of the driver that caused the harm, and not the faulty brakes. But sensible as this claim sounds, the but-for delivers the wrong answer, because but-for the driver’s action the same harm would still have occurred. Such examples are commonplace in legal and everyday set­ tings, and challenge the but-for test as an adequate criterion. Once again, legal causation, in particular the notion of operative cause, needs to be invoked. The operative cause of the accident was the driver’s failure to use the brakes, not the faulty brakes. The brakes never got the chance to malfunction, and thus were not an operative cause of the acci­ dent. In pre-emption cases, the pre-empted action either fails to occur, or acts but is beaten to the punch by the “actual” cause. Either way there is a clear causal path from the opera­ tive cause to the result, but not from the pre-empted action. This asymmetry makes such cases easier to deal with. Certainly our intuitions seem sharper, even if the notion of oper­ Page 9 of 64

Causation in Legal and Moral Reasoning ative cause is still fuzzy. However, there are related cases in which one action activates a chain of events that would have led to a harmful result, but instead another action inter­ venes to cause the harm. For example, let us return to the earlier shoot-out example. Jesse shot Tuco and left him to die. The damage from Jesse’s shot was slowly killing Tuco, and Tuco would have died in a few hours. However, Hank intervened and shot Tuco dead immediately. Clearly Hank caused Tuco’s death. Jesse’s causal role in the death is less clear, but (p. 571) given the independent and unforeseeable nature of Hank’s intervention, our intuition (and the law?) is that Jesse did not cause Tuco’s death. The but-for test, however, appears to rule out both Hank and Jesse. If Hank had not shot Tuco, he would still have died from Jesse’s bullet. If Jesse had not shot Tuco, he would still have died from Hank’s shot. But it would be crazy to argue that neither man caused Tuco’s death. Someone definitely killed him! The law offers one possible answer to this problem, stipulating that the but-for test indi­ viduates the result in terms of its timing and manner: “The test for factual causation re­ quires the jury to consider whether, but for the defendant’s unlawful actions, the harm would have occurred at the same time and in the same way that it did” (Herring, 2010, p. 90). Thus Hank’s shot passes the but-for test, because the exact timing and manner of Tuco’s death would have been different had Hank not shot.13 What about Jesse’s shot? Whether or not it passes the but-for test depends on further details about the actual situa­ tion (and relevant counterfactuals). The key question is whether Tuco would have died at the same time and in the same way if Jesse had not shot. This is a tricky question. Per­ haps Tuco would have died a few minutes later if he hadn’t already been wounded by Jesse’s shot. More complicated still, if Jesse had not shot Tuco at all, the course of events might have been very different—Tuco would not have been left to die, he might not have been shot by Hank, and so on. Here again our judgments also depend on what contrast case we use—what we substitute for Jesse’s shot, and how we play out the counterfactual world subsequent to that change. The speculation about whether Jesse’s shot is a but-for cause of Tuco’s death can be cur­ tailed if we move to the question of legal causation. Here we can argue that Hank’s shot was a substantial and operating cause of Tuco’s death, whereas Jesse’s shot was not. Hank’s action was an intervening cause—independent and voluntary—that broke the chain of causation between Jesse’s action and the death. Thus the law has the means to deal with these problem cases, by individuating the outcome at a suitably fine level of grain or by invoking the notion of legal causation (and operative cause). Both approaches have been used in actual legal cases (Herring, 2010, p. 90). Here again, the lack of a pre­ cise definition of legal causation allows cases to be dealt with in a flexible manner. This is a practical bonus, but it opens the door for non-causal factors to influence judgment, and can lead to inconsistency and controversy across legal rulings.

Page 10 of 64

Causation in Legal and Moral Reasoning Overdetermination The textbook case of overdetermination is when two people (A and B) independently and simultaneously shoot the victim, and either shot alone was sufficient to kill the victim. On the but-for test, neither shooter is a cause of the victim’s death, because if A had not shot, the victim would still have died from B’s shot, and the same is true for B. But it is coun­ terintuitive to conclude that neither shooter caused the death. What makes this different from pre-emption cases is that each shooter does exactly the same thing and we want both to be judged as causes of the death. A more complex example of overdetermination is as follows: “A company produced a leather-spray to be used by consumers on their leather clothing. The company discovered that the spray was extremely toxic for certain elderly people and others with respiratory conditions. The relevant group of executives voted unanimously to market the product (the voting rule required only a majority of votes.) Subsequently the product killed a num­ ber of consumers” (Stapleton, 2013, p. 43).14 Each of the executives was prosecuted sepa­ rately as a cause of the deaths. In their defense, each member argued that his individual vote was not a cause of the deaths, so he should not be held responsible. The court reject­ ed this argument, and each executive was held legally responsible for the deaths in­ curred. Here again, although everyday judgments and the law converge on the same an­ swer (Gerstenberg, Halpern, & Tenenbaum, 2015), the but-for test rules that none of the executives is a cause, because for each member it is true that the motion would still have passed even if he had voted against it. Note that this latter example cannot be dealt with by describing the outcome in more fine-grained detail. The timing and manner of the harmful outcomes of the company’s action depend only on whether or not the motion was passed, and not by the exact majority. So a fine-grained but-for test still excludes any ex­ ecutive as a cause of the harm.

Extensions to the But-For Analysis The problems of pre-emption and overdetermination are well known both in philosophy and law, and are often seen as fatal to counterfactual accounts of causation (Paul & Hall, 2013). However, two recent approaches to causation, developed independently in law (Stapleton, 2008, 2009) and in computer (p. 572) science (Chockler & Halpern, 2004; Halpern & Pearl, 2005), offer very similar solutions to these problems.

Involvement and Contribution Stapleton (2008, 2009) argues against a general-purpose concept of causation on the grounds that causal claims depend on the nature of the interrogation, and that different questions can demand and yield different answers.15 She identifies a specific notion of causation that fits the wide-ranging purposes of legal inquiry—what she terms “involve­ ment” or “contribution.” At the heart of her account is an extension of the but-for analysis that allows the counterfactual test to be computed over a wider range of contrasts, thus ruling actions (and omissions) that cause or contribute to an outcome as genuine causes, even if their contribution is neither necessary nor sufficient for the outcome. Page 11 of 64

Causation in Legal and Moral Reasoning Stapleton identifies three forms of “involvement” central to legal inquiries. First, a factor is involved in an outcome if it satisfies the standard but-for test and thus is a necessary condition. One compares the actual world—in which the factor and the outcome both oc­ curred—with a hypothetical world in which the factor is removed. If the outcome would no longer have occurred, then the factor is deemed to be involved in the outcome. Se­ cond, a factor is also involved in an outcome if it satisfies an amended but-for test where one compares the actual world with a hypothetical world in which the factor is removed along with any other factor that “duplicates” the outcome in question. If the outcome would not have occurred in this hypothetical world, then the factor is judged to be in­ volved in the outcome. For example, when two hunters (A & B) independently and simul­ taneously shoot a hiker (overdetermination), to assess whether hunter A is involved in the outcome, one imagines a world where neither hunter shoots. In this hypothetical world the hiker would not have died, so hunter A is involved in the death: a similar argument rules in hunter B, too. The third form of involvement is the relation of contribution. This involves two steps: (1) transform the actual world by removing any factors that are not needed for the result still to occur; (2) compare this world to the hypothetical world where the target factor is also removed; if the result would not have occurred in this lat­ ter world, then the target factor contributes (3) to the result. For example, consider a slight variation of the previous voting example (Stapleton, 2008, 2009). Suppose that the vote is 9–0 in favor of marketing the product, and a majority of only 6 is required to pass motion. Take one particular voter (Bob). Did Bob contribute to the harm? First, imagine a world where the motion still passes, but all excess factors are removed, for example by removing three voters, such that the vote is 6–0. Second, estab­ lish whether, but-for Bob’s vote, the motion would have passed. If the answer is no, then Bob contributed to the result. The same argument can be applied to each voter. There­ fore, on Stapleton’s account, each voter contributes to the motion being passed, and thus to the subsequent harm. Essentially, Stapleton’s account generalizes the but-for test to allow for comparisons with a broader range of hypothetical worlds, and thus avoids problems of overdetermination. Her account leaves various issues unresolved, for instance, how we specify the exact na­ ture of the hypothetical worlds, how we establish what happens in these worlds, and how we decide which factors to remove to establish contribution. She does refer to the neces­ sary element of a sufficient set (NESS) test as a formal test for involvement (Wright, 1988), but it is unclear that this test can deliver the needed judgments, and is problemat­ ic for other reasons (Fumerton & Kress, 2001). Nevertheless, her proposals are a step in the right direction, and have been adopted in some legal rulings.

Structural Model Approach Recent work in philosophy and computer science (Chockler & Halpern, 2004; Halpern & Pearl, 2005) has spawned a novel approach to these problems, which bears notable paral­ lels to Stapleton’s proposals, albeit couched in a more formal framework. Like Stapleton, the starting point for the structural model is the but-for test, where causation depends on a counterfactual relation between putative cause and effect. However, on the structural Page 12 of 64

Causation in Legal and Moral Reasoning approach this counterfactual relation is explored in the context of a specific causal model, defined in terms of a set of variables (corresponding to the events of interest) and a set of structural equations (which capture the causal relations between variables). A counter­ factual is cashed out in terms of interventions on the causal model, and obeys a specific logic that allows one to update the model and derive the consequences of this interven­ tion (see also Danks, Chapter 12 in this volume). Thus, in a straightforward case where a single hunter (A) shot a hiker (E), A is deemed a cause of E, if an intervention that had stopped A would have undone E. This corresponds to the but-for test and (1) in Stapleton’s account. Overdetermination (p. 573) cases are dealt with by allowing the coun­ terfactual query to be extended to include additional interventions. For instance, in the overdetermination case where hunter A and hunter B both shoot E, one considers a butfor test for each hunter conditional on an intervention in which the other hunter is stopped from shooting. On this extended test, both hunters are ruled in as causes of the hiker’s death. This corresponds to (2) in Stapleton’s account. Note that this also captures the notion of contribution (3), because it allows for multiple counterfactual interventions —for instance, in the company voting example, intervening by removing the votes of three executives would make an individual member’s vote a but-for cause of the motion being passed. Finally, the structural model approach has been extended to include degrees of causal responsibility (Chockler & Halpern, 2004). This allows us to assess how far a fac­ tor is from being a but-for cause, by counting the number of changes (interventions) re­ quired to make the outcome counterfactually dependent on that factor (see later discus­ sion for more details). This takes us further than Stapleton’s account, because we can dis­ tinguish situations where the vote is 7–0 rather than 9–0, with each voter receiving a higher degree of casual responsibility in the former case, because each is closer to being a but-for cause (see Gerstenberg et al., 2015; Lagnado et al., 2013).

Page 13 of 64

Causation in Legal and Moral Reasoning

Figure 29.1 Causal diagram of the shoot-out. Nodes correspond to binary variables, and arrows represent direct causal relations. The arrow with the minus sign represents a preventive relation from bullets in head (from Hank’s shots) to the excessive bleeding from stomach. Green variables are those that are true, red variables are false.

By and large, there is a neat mapping from Stapleton’s notion of contribution to the struc­ tural model approach. This suggests that a unified framework for causation, applying to both legal and everyday causation, is possible. There are, however, various issues still to be resolved by the structural model approach. One important question is whether the structural account can adequately capture the notion of an active causal process (see Gerstenberg et al., 2015, and Gerstenberg & Tenenbaum, Chapter 27 in this volume), which seems closely related to the legal notion of operative cause. A hint of how this might be achieved can be seen by how the structural model handles difficult pre-emption cases such as the shoot-out example. The problem is that a but-for test rules out both preempted and non pre-empted causes—for example, neither Jesse nor Hank are but-for causes of Tuco’s death. To deal with this, the structural approach introduces additional variables into the causal model, effectively capturing the notion of an active causal path­ way from one cause that pre-empts other potential causes. For example, in the case of Tuco’s death, a fine-grained causal model represents the separate damage caused by Jesse’s and Hank’s gunshots, and the fact that the damage caused by Jesse’s bullet is overridden by the damage caused by Hank’s bullet (see Figure 29.1). There is an active causal path (operating cause) from Hank’s gunshot to Tuco’s death that pre-empts (breaks the chain of causation) from Jesse’s shot to Tuco’s death. The extent to which structural accounts (or difference-making approaches more general­ ly) can adequately model such cases is still an open question. Crucial here is whether

Page 14 of 64

Causation in Legal and Moral Reasoning such approaches can do full justice to our intutive sense that there is a process connect­ ing cause and effect. Another key set of questions, yet to be fully incorporated into a structural model ap­ proach, is the role of mental states such as intentions and foreseeability. Chockler and Halpern (2004) extend the account to include an agent’s uncertainty about the outcomes of their actions (formalizing this in terms of a notion of blame), but there is not yet an ex­ tension that takes intentionality into account. Modeling an agent’s internal mental states, including their beliefs and intentions, should prove a fruitful avenue for further investiga­ tion.

Psychological Research We have outlined the notion of causation as applied in the law, and argued that despite various problems and complexities, a coherent picture emerges with causal judgments of­ ten assessable in terms of (p. 574) counterfactual analyses.16 We have also seen that a common claim is that the notion of causation in law corresponds to our everyday notion of causation. However, legal theorists usually support this claim only on the basis of intu­ itions, rather than empirical research. Does it still hold up when we look to empirical studies of causal reasoning in legal contexts? In the following sections we provide a selective but, we hope, representative look at the psychological research. We believe that this research supports the claimed similarity be­ tween legal and psychological conceptions of causation—and also suggests that both le­ gal conceptions and everyday notions of causation serve similar overarching functions and draw on similar abstract conceptions. The details still need to be worked out, but we hold that convergence is not just a claim but also a goal—we should aim for a conception of causation that fits both our everyday understanding and its usage in law (cf. Moore, 2009).

Legal Inquiry Legal inquiry can be divided into three distinct but interrelated phases: 1. Explanatory: What happened? 2. Evidential: What is the evidential support/proof? 3. Attributive: Who or what is responsible? All three phases are geared toward the common aim of identifying whether a crime or transgression occurred, apprehending the perpetrators and building an evidential case against them, and deciding guilt or liability on the basis of the evidence. Causal reasoning is involved in all three phases—constructing a causal story or explanation of what hap­ pened, using evidence to support this story, and attributing responsibility based on a causal understanding of how and why the guilty parties did what they did.

Page 15 of 64

Causation in Legal and Moral Reasoning Given its overarching goals of maintaining justice, legal inquiry has several distinctive features that separate it from a typical scientific inquiry. To start with, the law is con­ cerned with transgressions—disruptions to the normal course of events that violate soci­ etal rules and demand correction or punishment. This concern sets the framework for in­ quiry, and also determines the nature and level of explanation that is sought—predomi­ nantly causal explanations about human actions, intentions, beliefs, and desires—explana­ tions that can justify assigning responsibility and blame. The law seeks to identify, punish, and prevent legal violations, and its conception and uses of causal reasoning is geared di­ rectly to these aims. This marks a substantial difference from the goals of scientific in­ quiry, but perhaps not to our everyday concerns and inquiries. Indeed, legal and inves­ tigative reasoning seems to provide a more apt metaphor for everyday social reasoning (Fincham & Jaspers, 1980; Tetlock et al., 2007) than the scientific one; for example, Fincham’s metaphor of people as “intutive lawyers” (see Alicke et al., 2016, for discus­ sion of various metaphors in causal attribution research). This holds both with respect to the practical role of causal concepts, and the close interrelation between judgments of causality and responsibility.

Explanatory and Evidential Reasoning A primary goal for legal inquiry is to figure out what happened: Was a crime committed, who was the perpetrator, and why did they do it? To achieve this goal, the fact-finders (judge or jury) must use the evidence presented to them about the specific case, in tan­ dem with their general common-sense knowledge about how the world works, especially their knowledge and assumptions about human behavior. An equally important goal is for the fact-finders to assess how well the evidence and arguments given by the prosecution and defense teams support their respective claims. For example, is there sufficient evi­ dence to uphold a charge of murder, so that you are sure (beyond reasonable doubt)? Thus the fact-finder has two interlocking tasks—to figure out the best version of what happened, often choosing between competing stories offered by prosecution and defense, and to assess how well the evidence supports either story. Both are required before the fact-finder can decide on guilt. In serious cases, where a case is tried by a jury, the ulti­ mate fact-finders are ordinary members of the public, typically untrained in law or evi­ dential reasoning. They are explicitly asked to use their common-sense understanding of the physical and social world, along with the evidence, to make their decisions. Most psy­ chological research has focused on laypeople (or students) as their subjects of study— how does a layperson reach a decision based on the evidence and arguments presented in court? The dominant answer to this question is given by the story model of juror decisionmaking (Pennington & Hastie, 1986, 1988, 1992). (p. 575)

The Story Model

According to the story model, jurors construct narratives to organize and interpret the mass of evidence presented in court. These stories draw on causal knowledge and as­ sumptions, including scripts about how people typically think and behave. This knowl­ edge is combined with case-specific information to construct causal “situation” models of Page 16 of 64

Causation in Legal and Moral Reasoning what happened, typically based around human agency and social interactions. Jurors se­ lect the best story—one that explains the evidence, fits with their ideas about stereotypi­ cal stories, and satisfies various criteria such as coherence, plausibility, and complete­ ness. This story is then matched against the posible verdict categories to yield the juror’s pre-deliberation decision.

Story Structures One of the key claims of the story model is that people develop rich narrative-based ex­ planations of the evidence. This goes beyond simple evidence-integration accounts (e.g., Hogarth & Einhorn, 1992) in which people compute a weighted sum of the evidence for or against the crime hypothesis. Instead, jurors are assumed to construct a story that makes sense of the evidence and supports a verdict in a more holistic fashion. These nar­ rative structures are usually based around the actions of human protagonists, and are generated from abstract templates known as episode schemas. These schemas represent event sequences that occur in real-world contexts as well as fictional stories, and can be used iteratively to produce complex actions and narratives (Bennett & Feldman, 1981; Schank & Abelson, 1977). An archetypal episode schema is depicted in Figure 29.2. This episode schema is centered on the thoughts and actions of a human protagonist. At the top level are a set of initiating events and background physical states; these events cause specific psychological states in the protagonist (e.g., particular beliefs, desires, and emotions), and lead him or her to formulate goals and intentions, which, in turn, motivate subsequent actions; these actions, in combination with other physical states, generate consequences. This schema can be embedded in a larger episode, and a story structure is often constructed from multiple embedded episodes. We will give a concrete illustration of the schema in the following (see Figure 29.3).

Figure 29.2 An abstract episode schema. Adapted from Pennington & Hastie (1986).

Pennington and Hastie used a variety of materials and methods to test the story model (Pennington & Hastie, 1986, 1988, 1992). These included simulated videos of real trials, interviews, and think-aloud protocols for eliciting people’s mental representations and

Page 17 of 64

Causation in Legal and Moral Reasoning reasoning processes. To give a flavor for these studies, and some of their key findings, let us illustrate with the study in Pennington and Hastie (1986). Participants were sampled from a jury pool and watched a three-hour video of a simulat­ ed criminal trial, based on a real American case: Commonwealth of Massachusetts v. Johnson. The defendant, Frank Johnson, was charged with killing Alan Cardwell with “de­ liberate premeditation and malice forethought.” In the trial, both prosecution and de­ fense accepted that Johnson and Cardwell had argued in their local bar on the day of the incident, and that Cardwell had threatened Johnson with a razor. Later that evening, Johnson returned to the bar. He went outside with Cardwell, and they fought, leading to Johnson stabbing Cardwell with a fishing knife. Cardwell died from the wound. The key facts under dispute included the following: whether or not Johnson intentionally returned home to get his knife; whether Johnson returned to the bar specifically to find Cardwell; whether Cardwell drew out his razor during the fight; and whether Johnson actively stabbed Cardwell or held out his knife in self-defense. After viewing the trial, participants had to decide between four possible verdicts: not guilty, manslaughter, second-degree murder, first-degree murder (these categories were explained in the judge’s instructions at the end of the trial). Crucially, participants were asked to think aloud as they considered the case and made their individual decisions. These think-aloud protocols were transcribed and analyzed in terms of content (e.g., story comments versus verdict comments). Story content was (p. 576) encoded into graphs both at the individual level and at a group level classified by verdict.17

Figure 29.3 Central story structure for first-degree murder verdict. Adpated from Pennington & Hastie (1986).

Three key empirical findings emerged from these analyses: that people used story struc­ tures seeped in causal claims (indeed, 85% of events described in their protocols were causally linked); that people drew numerous inferences beyond the given evidence (only 55% of protocols referred to events in actual testimony; 45% referred to inferred events such as mental states, goals, and actions); that people constructed different stories from the same evidence, and these differences were reflected by correspondingly different ver­ Page 18 of 64

Causation in Legal and Moral Reasoning dicts.18 For example, participants who gave a “first-degree murder” verdict tended to pro­ vide a story that elaborated on the events prior to the stabbing, emphasizing Johnson’s anger or humilation, and his intention to confront and kill Cardwell (see Figure 29.3). In contrast, those who gave a “not guilty” verdict focused on the altercation itself, spelling out details of how Johnson acted in self-defense. In this story the stabbing was portrayed as a consequence (of Cardwell’s behavior) rather than a goal-directed action initiated by Johnson. Overall, the story model has garnered strong empirical support, and is widely accepted by legal theorists. It encapsulates the core claim that people use causal explanations to draw inferences from evidence. It also highlights the constructive nature of people’s ex­ planations, using their causal knowledge to fill in gaps in the evidence and tell a com­ pelling story. The power of a story to summarize and rationalize a mixed body of evidence is also a potential weakness—the most compelling story is not always the one most likely to be true. Nevertheless, there is strong experimental evidence that people use story structures to organize their evidence and make decisions.

Extending the Story Model The story model marks a huge advance in our understanding of juror decision-making. However, several issues remain unresolved. One problem is that the notion of a causal sit­ uation model, so central to the account, is not explicitly formalized or defined; this makes it harder to elicit and test people’s causal models, or to compare their causal reasoning against normative standards. What makes one situation model better than another? Given that the fact-finder believes a specific causal model, what inferences are licensed? Pen­ nington and Hastie propose several criteria for evaluating stories—such as coherence, plausibility, and completeness—but these notions also lack formal definition and it is un­ clear how they might trade off against each other.19 Without a formal framework for causal representation and inference, it is difficult to explain how people construct models based on background knowledge, and unclear how these causal models relate to counter­ factual analyses and judgments of factual and legal causation. We believe that the causal model framework (Pearl, 2000), suitably developed, can ad­ dress some of these concerns. The framework provides a formal (p. 577) theory of causal representation, learning, and inference, and has been successfully used in numerous ar­ eas of causal cognition (Sloman, 2009; see, in this volume, Griffiths, Chapter 7; Oaksford & Chater, Chapter 19; Rehder, Chapter 20). Even though people’s actual causal represen­ tations and inferences can depart from the formal theory (Sloman & Lagnado, 2015), the causal framework provides a crucial guide to modeling causal cognition. It also suggests how an account of everyday and legal causation can be developed, allowing for appropri­ ate counterfactual reasoning and judgments of causal responsibility (Lagnado et al., 2013). For instance, the story structures for the Johnson murder case are readily transformed in­ to formal causal networks. We have translated the “first-degree murder” story structure into a formal causal graph (see Figure 29.4). The nodes correspond to events (or proposi­ Page 19 of 64

Causation in Legal and Moral Reasoning tions), the directed links to causal relations between these events. We have not specified the exact functional relations between events, but it is relatively straightforward to use the formal apparatus to capture the intended combination functions. For example, in this network, Johnson stabbing Cardwell depends on three causes: Johnson is with Cardwell, Johnson is armed with a knife, and he intends to kill Cardwell. If we use an “AND” func­ tion, then this states that all three causes are needed for Johnson to stab Cardwell (which seems appropriate to capture this specific story structure).

Figure 29.4 A formal causal network for the first-de­ gree murder story. Derived from the central story model shown in Fig­ ure 29.3.

Furthermore, the story model’s claim that people use abstract epsiode schemas to con­ struct specific story structures anticipates recent computational work on how people use intuitive theories of a domain to generate approriate causal models (Griffiths & Tenen­ baum, 2009; also see Gerstenberg & Tenenbaum, Chapter 27 in this volume). Applied to the legal domain, the idea would be that people’s intutive theories and knowledge (e.g., about criminal behavior and social interactions), combined with case-specific information, allow them to generate specific causal situation models. This is a promising avenue for fu­ ture research, and would be facilitated by formalizing story structures in terms of causal networks.

Modeling Evidential Reasoning Another underdeveloped area for the story model is the issue of evidence and proof. As well as constructing plausible stories, fact-finders must evaluate how well the evidence supports these stories and assess the strength, credibility, and reliability of the evidence (Schum, 1994). Although the story model supplies some criteria for story evaluation, it does not attempt to model how different items of evidence (e.g., witness testimony, foren­ Page 20 of 64

Causation in Legal and Moral Reasoning sic evidence, etc.) relate to different elements in a story, nor does it consider how one captures the credibility or reliability of this evidence.20 To address this question, Kuhn et al. (1994) proposed that fact-finders lie on a spectrum, from satisficers who maintain a single plausible story, to more adept reasoners who engage in “theory-evidence coordina­ tion,” assessing how well the evidence supports different hypotheses and stories, as well as the reliability of the evidence. But Kuhn gives no definite explication of this process, nor a normative benchmark for how it should be done. What we need is a fine-grained analysis of how fact-finders represent and reason about the strength and reliability of the evidence, and how this affects their story evaluation. Just as when they reason about what actually happened, “the story of the crime,” factfinders will draw on causal knowledge (p. 578) and assumptions, here directed to the “sto­ ry of the trial,” where they must reason about people’s motivations and beliefs when they give testimony, and use these inferences to update their version of what happened. For example, when a witness gives evidence, the fact-finder must judge whether this testimo­ ny is accurate, mistaken, or intentionally deceptive (Schum, 1994). These inferences will modulate the fact-finder’s beliefs about what happened, sometimes in complex ways. Thus, when a defendant’s alibi is discredited, this can undermine his innocence in two ways—by making it more likely that he was at the crime scene and by showing that he is lying (Lagnado, 2011; Lagnado et al., 2013; Lagnado & Harvey, 2008). Similarly, when a victim is shown to be inconsistent in her testimony, even when it concerns a matter pe­ ripheral to the crime in question, this can undermine the victim’s credibility and thus have the knock-on effect of undermining her testimony about the actual crime (Connor De Sai, Reimers, & Lagnado, 2016). In short, fact-finders draw inferences from the credibili­ ty or reliability of the evidence, and these inferences permeate through the fact-finder’s network of beliefs about the crime as well. The “story of the crime” and the “story of the trial” are closely intertwined. Here again, we believe that a careful formal analysis of the relations between hypotheses, evidence, and reliability is crucial to understand how fact-finders actually reason. Again, this is not to claim that fact-finders follow the normative theory; but without a framework to capture reasonable inference, we cannot uncover or appraise how people actually do it. Moreover, a normative framework can also suggest how the evidential reasoning of fact-finders might be improved. One possible approach, closely related to causal model theory, is the Bayesian network framework (Pearl, 1988; Taroni et al., 2006). Fenton et al. (2014) apply Bayesian network analysis to legal arguments, in a framework that allows for the systematic modeling of interrelations between hypotheses, evidence, and reliabili­ ty. They argue that fact-finders can use legal idioms—small-scale causal building blocks tailored to the legal context. These idioms can be combined and reused to represent and reason about large-scale legal cases involving complex and interrelated bodies of evi­ dence. Some recent empirical studies suggest that people follow the qualitative pre­ scripts of this formal account (Lagnado, 2011; Lagnado et al., 2013), even though it is un­ likely that they engage in full-fledged Bayesian computations.

Page 21 of 64

Causation in Legal and Moral Reasoning To show how the idiom-based approach can be applied, let us return to the Cardwell mur­ der case. One critical issue was whether or not Cardwell pulled out a razor shortly before Johnson stabbed him. Johnson’s plea of self-defense would be bolstered if this was true, and mock jurors who found Johnson not guilty tended to include this as an element in their stories. But how do people decide this fact? In the trial they are presented with con­ flicting testimonies. On the one hand, Johnson claims that Cardwell pulled out the razor, and this is reaffirmed by another eyewitness Clemens (who is a friend of Johnson’s!). On the other hand, a policeman and the bar owner both testify that they did not see Cardwell holding a razor. In addition, the reliability of all witnesses is open to question. Johnson has a clear motivation to lie, as does his friend Clemens. And both the policeman and bar owner admit under cross-examination that their views of Cardwell’s right hand were par­ tially obscured, so he might have been holding a razor. To complicate matters further, the pathologist who examined Cardwell’s body reports that he found a razor in his back pock­ et. This seems unlikely if Cardwell had indeed pulled out the razor. Could he have put it back into his pocket while dying from a stab wound? If not, might someone else have re­ placed it? Somehow fact-finders must negotiate these competing claims, and decide how to incorporate the reliabilities of the various witnesses. This requires going beyond story construction. The idiom-based approach provides a Bayesian network framework to capture the com­ plex interrelations between hypotheses, witness testimony, and reliabilty.21 In particular, it posits an “evidence-reliability” idiom that explicitly captures how a witness’s report is modulated by his reliability, for example, whether the witness is mistaken or intentionally biased. We model the key testimonial evidence about the presence (or absence) of the ra­ zor using the Bayesian network shown in Figure 29.5.22 White nodes represent hypothe­ ses (events that are unknown and need to be inferred), such as whether or not Cardwell pulled out the razor; whether or not the razor was found in his back pocket; whether the policeman is accurate in his testimony, and so on. Gray nodes represent the testimony re­ ports that are heard in court. The key idea is that given the evidence (e.g., the witness re­ ports), the network allows us to revise our beliefs in the unknown hypotheses using Bayesian updating. A fuller model could include evidence about the reliability of the wit­ nesses (e.g., information obtained under (p. 579) cross-examination), and connect up with networks that model other components of the evidence (see Fenton et al., 2014, for de­ tails).

Page 22 of 64

Causation in Legal and Moral Reasoning

Figure 29.5 Idiom-based Bayesian network for wit­ ness testimonies about whether or not Cardwell pulled out a razor in the fight. Gray nodes represent testimonies presented in court, white nodes repre­ sent hypotheses.

The idiom-based approach represents a formal development of ideas from Schum (1994) and Wigmore (1913), capturing the interrelations between hypotheses and evidence in a systematic and probabilistically coherent fashion (see also Dawid & Evett, 1997; Fenton & Neil, 2012; Hepler, Dawid, & Leucari, 2007). The extent to which people actually produce such representations (and computations) is still an open question. While there is some ev­ idence that people’s judgments can be captured by network models at a qualitive level (i.e., participants’ posterior judgments correlate with the outputs of Bayesian networks constructed from their causal beliefs, priors, and conditonal probabilty judgments; see Connor De Sai, Reimers, & Lagnado, 2016) it seems unlikely that they perform exact Bayesian computations (given the computational demands with mutiple interrelated vari­ ables).

Coherence-Based Reasoning As noted earlier, although the story model captures many aspects of juror decision-mak­ ing, it does not provide a formal or computational framework to underpin people’s repre­ sentations or inferential processes. Coherence-based approaches to reasoning and deci­ sion-making (Simon & Holyoak, 2002; Simon, Snow, & Read, 2004; Thagard, 2000) aim to provide such a framework, and give a more general account of the kind of complex deci­ sion-making faced in legal contexts. Such models were inspired by earlier cognitive con­ sistency theories (Heider, 1958) and were revitalized by advances in connectionism (Mc­ Clelland & Rumelhart, 1986). The key idea is that people strive for coherent representa­ tions of the world, and the decision-making process is driven by the search for a maximal­ ly coherent final state, one that best satisfies the multiple constraints faced by the deci­ sion-maker. This approach appears well suited to the legal domain, where decision-mak­ ers are faced with complex bodies of probabilistic evidence, often ambiguous or contra­ dictory, and need to reach categorical verdicts.

Page 23 of 64

Causation in Legal and Moral Reasoning On this view, people represent evidential and decision variables in terms of units in an as­ sociative network. These units are connected with excitatory or inhibitory links, depend­ ing on whether they are mutually consistent or inconsistent.23 Units have an initial level of activation that depends on their prior degree of acceptability, with the receipt of new evidence boosting the activation of the corresponding (p. 580) units in the network. Infer­ ence or belief updating then involves the spread of activation through the network. Through an iterative process of parallel constraint satisfaction, the network settles into a state that maximizes coherence between units, with the final decision being determined by the units activating above some threshold (for details, see Thagard, 2000). A key feature of this interactive process is that it can lead to bi-directional reasoning— whereby evidence is distorted to fit with emerging decisions and judgments. The decision maker continually readjusts his assessment of hypotheses and evidence until a coherent position emerges, leading to high confidence in a final decision even in the face of initial ambiguity and uncertainty. Advocates of coherence-based approaches maintain that this bi-directional reasoning distinguishes it from Bayesian accounts, arguing that the latter only allow for unidirectional reasoning from evidence to conclusions.24 Thagard (2000) applies coherence-based modeling to legal cases, but does not test these empirically. Subsequent work (e.g., Simon & Holyoak, 2002; Simon et al., 2004; Glockner & Engel, 2013) aims to model actual legal reasoning using coherence models. We will il­ lustrate their approach with one key study. Simon et al. (2004) use a legal decision-making task to show that people engage in bi-di­ rectional reasoning and evidence distortion. They use a two-stage paradigm. In the first stage, participants make judgments about a set of social vignettes, including evaluations of various kinds of evidence. For example, in one scenario a mystery man leaves flowers for a woman in an office, and her colleague states that she recognized the man as Dale Brown—whom she has only seen a couple of times before. Participants answer various questions, including one about the value of this identification evidence: “Does the office worker’s identification make it more likely that it was Dale Brown who delivered the flow­ ers?” After this initial task, participants complete a distractor task in which they solve analogies. They then move on to stage two of the experiment. Their main task is to decide a legal case, which involves an employee Jason Wells, who is accused of stealing a large sum of money from the company safe. They are presented with a mixed body of evidence, with various pieces for and against the suspect—for exam­ ple, a technician claimed to see Jason rushing from the crime scene, a car like Jason’s was caught on camera leaving the parking lot around the time of the crime, and Jason had made several large payments shortly after the crime; however, in his defense, another witness claimed to see Jason far away from the crime scene at that time, and Jason claimed his payments were legitimate family transactions. The key evidential manipula­ tion is whether or not participants are told that Jason’s DNA was found on the safe. Un­ surprisingly, those told that DNA on the safe matched Jason’s DNA tended to convict, and those told it did not match tended to acquit. However, participants also assessed the oth­ Page 24 of 64

Causation in Legal and Moral Reasoning er pieces of evidence in the case. Crucially, they were asked to evaluate the same kind of evidence claims as had been requested in the prior social vignettes, for example, the val­ ue of an eyewitness identification. The major finding was that people distorted the value of evidence to fit with their verdicts. Thus, convictors tended to inflate the value of the eyewitness testimony, whereas acquittors tended to deflate it. Simon et al. take these findings to show that people distort evidence to cohere with their decisions. And further experiments suggest this distortion takes place during the deci­ sion-making process, rather than being a post hoc attempt to maintain consistency with their decision. They also contend that bi-directional reasoning does not fit with Bayesian prescripts, and thus undermines a Bayesian updating model. In particular, they argue that the evaluation of one piece of evidence (e.g., the eyewitness identification) should be treated independently from the DNA evidence, but that people violate this prescription. While these findings appear to support coherence-based effects, we think they can also be explained within a Bayesian framework, if it is extended to include a richer represen­ tation of the evidence and its reliability. Applying the idiom-based approach to the Jason Wells case, a Bayesian network that captures the reliability of the eyewitness using a reli­ ability node (see Figure 29.6) can account for the observed change in evidence evalua­ tion. On this network, the presence of a DNA match raises the probability that Jason Wells is (p. 581) guilty, which also (via explaining away) raises the probability that the eyewit­ ness is reliable. In contrast, the same network shows that if the DNA evidence is false, this lowers the probability that Jason Wells is guilty, and in turn (via explaining away) low­ ers the probability that the eyewitness is reliable. (For more details about explaining away, and the evidence-reliability idiom, see Fenton et al., 2013; Lagnado et al., 2013). Therefore it is perfectly legitimate for participants to modulate their judgments about the reliability of the eyewitness identification according to whether the DNA evidence is posi­ tive or negative.

Figure 29.6 Bayesian network of Jason Wells case using the evidence-reliability idiom to capture the bidirectional reasoning observed in Simon et al. (2004).

Thus, in this case, bi-directional reasoning need not violate Bayesian updating, given a suitably rich representation of evidence and reliability (see also Jern et al., 2015, for a similar argument about a different legal case used in Simon & Holyoak, 2002). This does not mean that people’s reasoning can always be recast in rational terms—especially given the computational demands of even simple Bayesian networks. However, it is useful to see that alleged irrational reasoning is rational relative to a richer representational framework. The exact psychological mechanisms that achieve this are still an open ques­ Page 25 of 64

Causation in Legal and Moral Reasoning tion. Moreover, there are other aspects of human reasoning—such as susceptibility to or­ der effects—where Bayesian prescripts do seem to be violated (Lagnado & Harvey, 2008). But even here, one might argue for heuristic approximations to Bayesian reasoning, rather than throwing out the Bayesian framework altogether (Griffiths, Lieder, & Good­ man, 2015). Another problem with the coherence-based approach is that the cognitive representations that it posits are based solely on associative links. But this lack of directionality means that they cannot fully support causal or counterfactual reasoning (Sloman, 2009; Sloman & Lagnado, 2015; Waldmann et al., 2006). For example, recall the Cardwell murder case from Pennington and Hastie’s studies. The critical first-degree murder story claims that Johnson was humiliated by Cardwell, and therefore formulated a plan to kill him. Johnson returned to the bar, and stabbed Cardwell. This is a causal sequence of events, not just an associated set of events. Johnson’s plan for revenge is not merely associated with his in­ timidation by Cardwell, and his stabbing of Cardwell. His plan for revenge is a conse­ quence of the intimidation and a cause of the stabbing. Capturing this with a causal repre­ sentation enables counterfactual inference. If someone had stopped Johnson from return­ ing to the bar they might have prevented the stabbing, but they would not have prevent­ ed the earlier intimidation. As we argue throughout this chapter, causal representations are critical to legal and moral reasoning. Mere association is not enough.

Attributing Causality Let us move from the explanatory and evidential phases to the attributive phase. There has been a wealth of empirical research into causal attribution (for reviews, see Alicke et al., 2016; Hilton, Chapter 32 in this volume). We will focus on four key areas: (1) issues of but-for, necessity, and sufficiency; (2) intention and foresight; (3) abnormality versus nor­ mality; and (4) group attributions.

But-For, Necessity, and Sufficiency Despite its problems, the but-for test occupies a central position in legal causation. It also plays a dominant role in everyday judgments of causation. Numerous studies show that people’s causal judgments are sensitive to counterfactual contrasts, and that people are more likely to judge something as a cause of an outcome when they believe that the out­ come would not have occurred without the putative cause (Hilton, Chapter 32 in this vol­ ume). Moreover, in a set of studies looking specifically at legal reasoning in civil liability cases, Hastie (1999) showed that the majority of mock jurors were concerned with causal aspects such as necessity, sufficiency, and but-for reasoning (even though the judge made no explicit mention of causation). The but-for also allows for omissions as causes. This is illustrated by the Bolitho case, where the doctor’s failure to attend the child, although a breach of duty of care, was not judged to have caused the child’s death because even had she attended she would not have intubated, and thus the child would still have died. Even in a complex case like this, people use but-for reasoning and track the legal judgment. Uustalu (2013) asked partici­ Page 26 of 64

Causation in Legal and Moral Reasoning pants to give causal judgments on the Bolitho case, but varied the counterfactual contrast —whether or not the doctor would have intubated. Particpants (recruited from the gener­ al public) judged the doctor significantly more causal (and blameworthy) if they were told that she would have intubated. Moreover, in the absence of any information about what the doctor would have done, participants assumed that she would have intubated, and thus judged her to have caused the death. Despite the presence of but-for reasoning in many studies, it has also been shown that it is neither necessary nor sufficient for judgments of causation. As with legal judg­ ments, there are contexts in which people still judge something as a cause despite it not being a but-for condition, or fail to judge something a cause even though it is a but-for condition. Thus, Spellman and Kincannon (2001) compared people’s causal judgments in legal scenarios with either multiple sufficient (MS) or multiple necessary (MN) causes. For example, two independent gunmen shoot the victim at the same time, and he dies. In the MS condition the coroner rules that either shot would have been sufficient to kill the victim, in the MN condition he rules that both shots were needed. The MS condition is a classic overdetermination case, and the but-for fails for both gunshots. Nevertheless, peo­ (p. 582)

ple judged both gunshots as causes of the victim’s death. More surprisingly, when asked to rate the strengths of the causes, people rated each gunshot higher in the MS than MN condition, even though the gunshots are but-for causes in the latter and not the former. Similar results were obtained using a scenario in which two inanimate factors, lightning or fierce winds blowing down an electrical pole, led to fires that burned down a building. In the MS condition participants were told that either fire alone was sufficient to burn down the entire building, in the MN condition that each fire alone would only have burned down half the building. Spellman and Kincannon concluded that people do not use but-for reasoning to assign causality in these cases. One possible confound here is that the strength of the causes seem different between the two conditions. Thus, the causes in the MS conditions appear stronger, because either alone would have been sufficient to bring about the effect, in contrast to the MN condi­ tions, where neither would have been sufficient. Thus it is possible that the difference in causal ratings reflect differences in the perceived causal strengths of the causes, rather than being due to the contrast between sufficiency or necessity. Nevertheless, the studies clearly show that people assign causality even when the standard but-for test does not ap­ ply. Gerstenberg and colleagues (Gertsenberg & Lagnado, 2010; Lagnado et al., 2013; Zultan et al., 2012) also compared MS and MN causes (sometimes in the same scenario), but in contexts where the causal agents were identical across conditions, and with the combina­ tion rule (MN or MS) pre-established by the rules of the game. They too found causal as­ signments even to non-necessary causes, but they also showed that MN were judged as more causal than MS, in line with the Chockler and Halpern model that allows for graded causal responsibility based on an extended but-for rule (for more details, see discussion later in this chapter).

Page 27 of 64

Causation in Legal and Moral Reasoning Another challenge to but-for reasoning arises in situations where there are sequences of actions and events, as in the pre-emption cases discussed earlier. Greene and Darley (1998) studied people’s liability judgments in criminal scenarios involving a chain of events between a perpetrator’s actions and a final outcome. All scenarios started with the same core setup: Harold intends to kill his colleague Joe and inserts a poisonous pill in his vitamin bottle. Numerous variants were constructed by adding different sequences of ac­ tions and events to this initial segment. This allowed the experimenters to vary both the necessity and sufficiency25 of the perpetrator’s actions for the final outcome. For exam­ ple, in one scenario Joe dies from ingesting the poisonous pill, and thus Harold’s action is both a necessary and sufficient cause of his death. In some other scenarios, Joe ingests the poison but is killed by someone/something else before the poison works, thus Harold’s action is not a necessary condition of Joe’s death; whereas in other scenarios, the poison alone is not strong enough, so Harold’s action is not sufficient (and needs to combine with other drugs to kill him). Participants judged Harold’s liability for the final outcome, measured in terms of sentence imposed, as well as explicitly asking participants to judge the necessity, sufficiency, and contribution of Harold’s action. Overall, Harold’s perceived contribution to the outcome was the best predictor of the liability judgments, but per­ ceived necessity and sufficiency were also good predictors. Greene and Darley draw sev­ eral conclusions from this research: that people favor a graded notion of causation (con­ tribution) rather than a dichotomous yes/no, and that sufficiency, not just necessity, plays a role in liability judgments. This is nicely illustrated by the fact that people assign greater liability for attempted murder when Harold’s actions would have been sufficient to kill Joe (e.g., the level of poison was high enough) than when it was insufficient (e.g., the level was too low). They also found a “proximity effect,” such that the closer the perpetrator’s actions were to bringing about the harm, the more liable he is judged. Fi­ nally, they conclude that “while the theory of causation our respondents seem to use is not easy to specify, it has components of sturdy rationality” (Greene & Darley, 1998, p. 467). Building on this (p. 583) conclusion, future work could investigate whether Greene and Darley’s pattern of results can be captured by recent extensions of the counterfactual models (e.g., Gerstenberg et al., 2015) with graded causal responsibility and incorporat­ ing notions of sufficiency and robustness. It should be noted that Greene and Darley took liability, not causal judgments, but the research discussed in the next section suggests a close link between these two kinds of judgment in lay attributions. Some of Greene and Darley’s scenarios involved pre-emption cases, in which an interven­ ing action or event breaks the chain of causation from the perpetrator’s actions and the final outcome. Such situations, as we discussed earlier, can raise interesting issues be­ cause the initial action sets in motion a sequence of events, and then the intervening ac­ tion of a third party (or inanimate process) interrupts this sequence to determine the final outcome. So the initiating action might still be a necessary condition for the final out­ come, even though it is not typically judged as the cause. Complications can arise when the initiator and intervening party are connected in some way—for example, as part of a gang attacking a victim. At one extreme, in UK law if the initiator could have reasonably foreseen the actions of the third party, then he too can be convicted of murder, along with Page 28 of 64

Causation in Legal and Moral Reasoning the intervening actor who deals the fatal blow. At the other extreme, the intervening ac­ tor is judged to break the chain of causation and create a novel causal path, thus vindicat­ ing the initiator of murder. Assessing cases as to which action path is “operative” can be a tough judgment call. These kinds of scenarios have been explored in psychological studies, and people’s judg­ ments do seem to fit with some notion of “operative” cause whereby a suitably indepen­ dent intervening party does alleviate the initiator of causing death (Mandel, 2011), al­ though no studies (to our knowledge) have looked at borderline cases where the foresight of the initiator (as to what the intervening party might do) is varied. However, various studies have explored the more general issue of how people’s judgments of a perpetrator are influenced by his foresight of the harmful consequences of his actions.

Intentions and Foresight In legal contexts the mental elements of a crime, such as intention or foresight, are incor­ porated in judgments of causation. Empirical studies show that this also holds in laypeople’s causal attributions (Alicke et al., 2016; Cushman, 2008; Hilton et al., 2010; Lagnado & Channon, 2008). Thus, Lagnado and Channon (2008) explored how people at­ tribute causality and blame in event chains with multiple agents, varying the agents’ in­ tentions and foresight. For example, in one scenario, a wife put poison in her husband’s medication (either intentionally or accidentally), and then the ambulance was severely de­ layed (either it got lost or ignored the call), resulting in the husband’s death. In more complex scenarios, the foreseeability of the adverse outcome was also manipulated, both in terms of what the agent actually foresaw and what was foreseeable (probable) from an objective viewpoint. For example, a woman makes a self-assembly chair, and she either thinks that it will not break or that it will (subjective foreseeability). The truth state of the world is also varied: the chair is either made properly and is unlikely to break, or made poorly and likely to break (objective foreseeability). These two factors were crossed in a factorial design. The findings from these studies were systematic across many different scenarios. People assigned more causality and blame to intentional versus unintentional actions, and for outcomes that were foreseeable versus unforeseeable (this applied to both subjective and objective foreseeability). The blame ratings are relatively straightforward to interpret, be­ cause most accounts of blame (Shaver, 1985) agree that agents are more blameworthy for intentional and foreseeable consequences of their actions. The causal ratings are harder to explain, even though they seem to fit with the legal notion of causation.26 According to a counterfactual notion of causation they are puzzling, because the target action, whether intentional or unintentional, is still a but-for condition of the outcome. Several explanations of these findings are possible. A common response is that in situa­ tions where human actions lead to adverse outcomes, people are primarily concerned with attributing blame, even when they are ostensibly judging cause. This mirrors the in­ fluence of policy factors on causation judgments in legal contexts. This response can di­ vide into distinct psychological accounts (not mutually exclusive). On one view, people’s Page 29 of 64

Causation in Legal and Moral Reasoning desire to blame someone for an adverse outcome distorts their causal model, exaggerat­ ing the degree of causation in the morally reprehensible cases, where the action is inten­ tional and the outcome foreseeable (cf. Alicke, 2000). On an alternative view, people’s causal models of the situation (legitimately) incorporate factors that mediate blame, al­ though these factors are not specific just to morality or blame. For example, as well as judging the necessity of a (p. 584) causal relation, people might also be concerned with the robustness of the causal relation. Roughly speaking, a causal relation is robust when it would have held even if there had been perturbations to the background conditons, whereas it is sensitive if it relies on a fragile and improbable set of background conditions (cf. Kominsky et al., 2015; Lombrozo, 2010; Woodward, 2006). Thus intentional actions are typically judged more robust than unintentional ones. For example, the wife poisoning her husband is less sensitive to background conditions than her unintentionally doing the same thing; the latter depends on the wife not having her glasses on, misreading the la­ bel, not checking the drink, and so on; similarly for foreseeability, which will usually be inversely related to the predictability of the background conditions. These empircial findings present a challenge for psychological models that rely purely on counterfactual analyses. However, more recent advances are starting to address these is­ sues. One thing that is needed is a more fine-grained modeling of agents’ mental states, to incoporate factors such as intentions and foresight (for formal approaches that include foreseeability, see Chockler & Halpern, 2004; for psychological approaches that include intentions, see Kleiman-Weiner, Gerstenberg, Levine & Tenenbaum, 2015; Sloman 2009).

Norms Another systematic finding in the psychological research is that norms play a role in people’s causal attributions. Thus, an action or event that violates a norm is often accord­ ed greater causality, or is preferred as “the” cause, of a subsequent outcome (Hart & Honore, 1959/1985; see Hilton, Chapter 32 in this volume). Here norms are a broad cate­ gory, including moral prescriptions, social rules or conventions, and statistical norms. One much discussed example is the pen vignette (Knobe & Fraser, 2008): A receptionist in an academic department keeps a stock of pens on her desk. Administrative assistants are allowed to take these pens, but academic staff are not (although they often do). One morning both a professor and an assistant each take a pen, leaving the receptionist with none. Participants are asked who caused the problem (the lack of pens). Overall they as­ sign causality to the professor, not the assistant. Hitchcock and Knobe (2009) argue that the professor is preferred as the cause because he has violated the prescriptive norm of who is allowed to take pens. They explain this in terms of a more general account of how people select singular causes. When judging cau­ sation, people consider what actions (or events) made the difference to the outcome in question; and this involves counterfactual reasoning about what would have happened if certain things had been different. Moreover, this reasoning is slanted toward considering typical rather than atypical possible worlds (Kahneman & Miller, 1986), which means that people will focus on abnormal actions rather than normal ones as the relevant differencemakers. For example, in the pen vignette, the professor is rated as more causal than the Page 30 of 64

Causation in Legal and Moral Reasoning assistant because the possible world in which he does not take a pen (the norm-conform­ ing world) is considered more relevant than the world in which the assistant does not take the pen. Hitchcock and Knobe see this selective preference for abnormal events as an effective strategy for future intervention or prevention (cf. Hitchcock, 2012; Lombrozo, 2010). For instance, it is relatively straightforward to address the pen problem by enforcing more stringent measures on the professor. This account links with typical situations encoun­ tered in legal cases, where something has gone wrong (a transgression of the normal course of events) and one seeks to assign causality in order to readdress the balance. However, it is unclear whether this means that the notion of abnormality should be built into a definition of causal judgment (Halpern & Hitchcock, 2014), rather than being seen as part of the pragmatics of how we use these judgments. The finding that norm-violating actions receive greater causal ratings is robust, but it ad­ mits of alternative explanations. Alicke (2000) argues that it is people’s desire to blame agents for adverse outcomes that drives this effect, with people distorting their causal claims in order to justify assigning blame. For example, the professor is clearly more blameworthy than the innocent assistant. However, on its own this account cannot ex­ plain situations where a positive rather than a negative outcome occurs, and norm-viola­ tion effects persist. Another possible explanation is that the term “cause” is ambiguous in such scenarios, and that people see the causal question as a request to assign blame (or praise) to the respon­ sible agents (cf. Hart & Honore, 1959/1985; Lagnado & Channon, 2008). For instance, it seems clear that the professor is most deserving of blame in the pen vignettes, and a sim­ ilar analysis applies to other scenarios. In support of this claim, Samland and Waldmann (2015) replicate the standard causal preference for the (p. 585) norm-violating agent, but show that on an indirect measure of causal strength, people do not differentiate between norm-violating or norm-conforming agents. For example, both professor and assistant are assigned the same causal strength ratings. Samland and Waldmann argue that in such scenarios, when people are asked to assign causality to agents, they tend to make judg­ ments of accountability rather than causality. This accountability hypothesis resonates with the legal practice of including non-causal factors when judging “legal cause” as op­ posed to “factual cause.” It is too early to rule between these alternative explanations. In­ deed, it is possible that each account has some validity depending on the circumstances, for example, whether the norms are moral or statistical, whether the outcomes are bad or good. Most discussions of norms focus on the causality assigned to the norm violator. But the norms can also have a less direct influence on causal ratings. Thus, Kominsky et al. (2015) investigate situations in which two agents’ individual actions combine to bring about an outcome, but where one of the agent’s actions is marked out because it violates a norm. They illustrate the issue with a classic legal case (Carter v. Town, 1870) in which a child buys gunpowder from the defendant, the child’s mother and aunt hide it from the Page 31 of 64

Causation in Legal and Moral Reasoning child, but in a place where they know he will find it. The child retrieves the gunpowder and suffers an injury. The court did not find against the defendant because his action was “superseded” by the negligent action of the mother and aunt. Effectively their negligent action (which constituted a norm violation) reduced the causality attributed to the defen­ dant. Kominsky et al. (2015) explore this notion of supersession in everyday causal reasoning problems, some involving moral norm violations (e.g., stealing something or breaking a rule), others involving statistical norm violations (e.g., throwing double six with a pair of dice). They show that when one agent breaks a norm, the causality attributed to the other agent is reduced. For example, when Sue buys one bookend, and her husband Bob com­ pletes the pair by stealing a matching bookend from a friend, Sue is rated as less of a cause (of the couple having a matching pair) than when Bob buys the bookend from his friend. Moreover, they also show that this “supersession” effect only occurs when both agents are necessary for the outcome, not when either agent is sufficient. Kominsky et al. explain these findings in terms of robustness: someone’s action (e.g., Sue buys the leftside bookend) is judged as less of a cause (of the matching pair), if it required that some­ one else acted atypically (e.g., Bob steals the matching bookend). Sue’s action is less ro­ bust because it relies on someone else breaking a norm. They link this to the counterfac­ tual availability of norm-conforming versus norm-violating behavior. When one assesses the extent to which an agent caused an outcome, one takes into account how readily he would have achieved it under alternative possible worlds, and those worlds in which the other agent conforms with, rather than violates, a norm come more readily to mind (cf. Kahneman & Miller, 1986).

Responsibility Attributions in Groups A common finding is that a person’s individual responsibility is reduced when several peo­ ple contributed to the outcome (Alicke, 2000; Kerr, 1996). For example, individuals have a reduced sense of responsibility in situations where multiple people would be capable of helping another person who finds herself in an emergency (Darley & Latané, 1968; La­ tané, 1981). A series of studies (Gerstenberg & Lagnado, 2010, 2012; Lagnado et al., 2013; Zultan et al., 2012) shows that attributions to individuals in a group context are sensitive to the causal structure that dictates how individual contributions combine to bring about the outcome. The authors have developed a model of responsibility attribution, the criticalitypivotality model (CP model, hereafter), which predicts that people’s responsibility attribu­ tions are influenced by two key considerations: (1) criticality—how important a person’s contribution is expected to be for bringing about a positive group outcome (ex ante), and (2) pivotality—how close a person’s contribution was to actually having made a difference to the outcome (ex post). Let us illustrate the different notions via a simple example. Consider a situation in which a two-person company is voting whether to market a product (cf. Stapleton’s voting exam­ ple, discussed earlier). For the vote to pass, both members must vote in favor of the mo­ Page 32 of 64

Causation in Legal and Moral Reasoning tion. In such a situation, each member’s action is critical for the outcome—the motion will not pass unless both vote in favor. Contrast this with a situation in which the motion is passed if at least one of the members votes in favor. Here, the criticality of each member’s action is reduced. Thus, we say that a member’s action is more critical when all of the members have to succeed than when the success of one of the members is sufficient for the outcome. (p. 586) The CP model predicts that people’s responsibility judgments in­ crease, the more critical a person’s action was perceived to be for the outcome. The second component of the model is concerned with how close a person’s action was to having made a difference to the outcome. Consider a slighlty more complicated situation in which five members vote, and a majority rule is used to determine of the motion passes (cf. Goldman, 1999). Three people vote in favor and two vote against the motion. In this situation, each of the three people who voted in favor was pivotal for the outcome. Had any of them changed his or her mind, the motion would have failed. Contrast this with a situation in which the outcome of the vote is four to one in favor of the motion. Here, none of the members who voted in favor was pivotal. If one of them had changed her mind, the group would still have passed. Expressed in the legal terminology that we have introduced earlier, none of the individual voter’s actions was a but-for cause of the out­ come. However, intuitively each of the voters should still receive some responsibility for the outcome. Based on Halpern and Pearl’s (2005) model of actual causation discussed earlier, Chockler and Halpern (2004) proposed a structural model of responsibility that captures this intuition. Their model predicts that the further away a person’s action was from having made a difference to the outcome, the less responsible that person’s action will be viewed. In the case where the vote is 3–2, each of the voter’s responsibility is high because his or her action was pivotal in the actual situation. In the case where it is 4–1, each of the voter’s responsibility is reduced because none of their individual actions made a differ­ ence to the outcome in the actual situation. The responsibility of a person’s action for an outcome is predicted to be equal to 1/(N+1), where N is the minimal number of changes that are required to render the person’s action pivotal. Let’s consider how much responsi­ bility Joe, who voted for the motion, is predicted to receive in a situation in which the out­ come of the vote was 4–1 in favor. Joe’s vote didn’t make a difference in the actual situa­ tion. However, if we changed the vote of one of the other people who was in favor of the motion, then Joe’s vote would have made a difference. Since one change is required to make Joe pivotal, Joe’s responsibility is predicted to be equal to 1/2. Lagnado et al. (2013) tested the CP model by manipulating both the criticality and piv­ otality of a person’s action in a variety of group situations (including group competitons and public goods games). They showed that people’s responsibility attributions were sen­ sitive to both criticality and pivotality, and were well predicted by the CP model (see also Gerstenberg, Halpern, & Tenenbaum, 2015). A natural extension would be to apply this model directly to legal and moral contexts.

Page 33 of 64

Causation in Legal and Moral Reasoning

Causal Simulation Our look at the psychological research on attribution has focused mainly on empirical findings rather than well-worked-out psychological theory. This is partly due to the lack of any comprehensive theory that explains how people reach their causal judgments. As not­ ed in the section on explanatory and evidential reasoning, the story model presents an at­ tractive approach to juror decision-making, especially if extended to include more rigor­ ous notion of causal explanation and a framework for reasoning about evidence and its reliability. Another promising feature of the story model is the idea that people use their causal knowledge and situation models to simulate possible ways (narratives) in which people’s actions might have led to the crime in question. The idea that people use mental simulations to make judgments of causality and probabil­ ity was introduced by Kahneman and Tverksy (1982). Athough a fertile idea, and devel­ oped in various areas including legal decision-making (Feigenson, 1996; Heller, 2006), causal simulation has mainly been cast as a mental heuristic that avoids complex compu­ tation and can yield biased inferences. We endorse Kahneman and Tverksy’s original in­ sight that simulation is a crucial aspect of psychological thinking—especially in causal reasoning—but would argue that it is more sophisticated than a mere heuristic (although there might be heuristic ways of achieving it in complicated situations) and should be mapped onto a richer framework of causal representation and inference (cf. Gerstenberg, Goodman, Lagnado, & Tenenbaum, 2015; Goodman, Tenenbaum, & Gerstenberg, 2015). In short, we should take mental simulation seriously as a complex feature of causal cogni­ tion, a capability that involves richly structured representations of the physical and social world, and engages inferential machinery that can often deliver sound inferences, both about actual and counterfactual eventualities. This thesis can be developed in various ways (Gerstenberg & Tenenbaum, Chapter 27 in this volume; Sloman & Lagnado, 2015), and is a promising area for future research. (p. 587)

Summary

We have seen that laypeople’s causal attributions accord well with legal judgments, and operate with a very similar notion of causation. This notion is more sophisticated than a simple but-for, but can often be captured by an extended counterfactual analysis. At the heart of everyday and legal reasoning is the reliance on causal models, and the focus on human agency and social interactions. Judgments of causation also take factors such as intentions and foresight into account, and thus overlap with issues of responsibility and blame. This could be taken as a departure from rational scientific inquiry, but could also be cast as a consequence of the goals and aims of causal judgments—which in both every­ day and legal contexts often serve to identify wrong-doers or deviant behavior.

Page 34 of 64

Causation in Legal and Moral Reasoning

Moral Reasoning Research in moral psychology has focused on a number of different research topics, such as the distinction between moral norms and conventions (Chakroff et al., 2013; Sripada & Stich, 2006), the role of intuition and emotion, such as disgust or empathy, in moral judg­ ment (Greene, 2001; Greene & Haidt, 2002; Haidt, 2001; Haidt et al., 1997), and whether some things, such as a person’s life, have sacred values that make them incommensu­ rable with other things, such as money (Tetlock et al., 2000). Here, we will focus on a top­ ic that we believe best illustrates the role of causal and counterfactual thinking in moral cognition: people’s evaluative judgments in moral dilemmas. Work on moral psychology has drawn heavily from philosophical work on normative ethics. In ethics, there are three dominant approaches of how to analyze moral behavior. First, according to deontological theories, the morality of a person’s action derives from its accordance with a set of moral norms or duties (Darwall, 2003b; Kant, 2002). An ac­ tion is good if it adheres to set of moral principles or rules. Second, according to conse­ quentialist theories, the morality of a person’s action is determined by the consequences it brings about (Darwall, 2003a; Smart & Williams, 1973). An action is good if it leads to good outcomes. A third approach, virtue theories, focuses on what the action says about the person’s character (Darwall, 2003c). An action is good to the extent that it indicates good or virtuous character. These normative theories emphasize the three elements that are part of any moral analy­ sis of a situation: persons (virtue theories), actions (deontological theories), and conse­ quences (consequentialist theories; Sloman et al., 2009). We will argue that people’s moral evaluations of another person’s behavior are best understood if we assume that they consider both the causal role that a person’s action plays in bringing about the out­ come, as well as what the action says about the person’s character. Further, we will argue that both action-focused and character-focused considerations are best captured in terms of counterfactual contrasts over people’s intuitive causal theories of the domain (cf. Ger­ stenberg et al., 2015; Goodman et al., 2015). This section has two parts: in the first part we look at how representing moral dilemmas in terms of causal models that support counterfactual reasoning helps us understand how people make moral judgments. We will see that in order to analyze the causal status of a person’s action, we need to have a causal representation of the situation that dictates how the person’s action relates to the outcome under consideration. To draw inferences about a person from her action, we need a theory of mind—a causal model of how people plan and choose their actions. We can then invert that model to reason from an observed action to aspects about the character (Baker et al., 2009; Gopnik & Wellman, 1992; Kleiman-Weiner et al., 2015; Malle & Knobe, 1997; Wellman & Gelman, 1992; Yoshida et al., 2008). In the second part, we will argue that in order to arrive at a more complete picture of how people make moral evaluations, we will need to shift focus from merely considering Page 35 of 64

Causation in Legal and Moral Reasoning the moral permissibility of an action to considering more fully what the action reveals about the person’s character.

Causality and Counterfactuals in Moral Dilemmas Any moral evaluation has to start with a (rudimentary) causal analysis of the situation (Cushman & Young, 2011; Driver, 2008; Guglielmo, Monroe, & Malle, 2009; Mikhail, 2007, 2009; Sloman et al., 2009). Clearly, we would not blame someone whose action played no causal role whatsoever in how the outcome came about. The counterfactual but-for test mentioned earlier provides a first pass for evaluating whether a person’s ac­ tion made a difference to the outcome. As we will see, the but-for test can also help us to draw a distinction between indended outcomes of an action, and outcomes that were merely forseen but not intended. In a typical moral dilemma, the agent faces a decision between several actions, each of which is expected to lead to a different negative outcome. (p. 588) One of the most-studied moral dilemmas is the trolley problem (Foot, 1967; Thomson, 1976, 1985). In a typical trolley scenario, a trolley is out of control and headed toward five people standing on a railroad track. A person, let’s call him Hank, observes this. If Hank doesn’t do anything, the trolley will kill the five people on the track. However, Hank is close to the control room and he can throw a switch that will change the course of the train onto a side track. As it turns out, there is one person standing on the side track. If Hank throws the switch, the five people on the main track will survive, but the one person on the side track will die. If Hank doesn’t throw the switch, the five people on the main track will die, but the person on the side track will survive. Is it morally permissible for Hank to throw the switch? When faced with this side-track scenario, most participants tend to think that it is permis­ sible for Hank to throw the switch (for a review, see Waldmann et al., 2012). Clearly, the consequentialist is on Hank’s side: if Hank throws the switch only one person will die, whereas if he doesn’t throw the switch five people will die. Now let’s consider another variant of the trolley scenario in which, again, an out-of-con­ trol trolley is threatening to kill five people. This time, Hank finds himself on a bridge that crosses the railroad track and the only option he has for stopping the trolley is to push a large man off the bridge onto the track. This will stop the train but kill the large man. Is it morally permissible for Hank to push the large man off the bridge? Most participants don’t think so. For the consequentialist, this is puzzling: in both the side-track and the push scenario, the person faces a choice between two outcomes: either five people die, if he doesn’t act, or only one person dies, if he does act. If all that mat­ tered was the number of deaths, then participants should clearly consider it permissible for Hank to push the large man off the bridge.

Page 36 of 64

Causation in Legal and Moral Reasoning Much of research in moral psychology has been devoted to explaining what factors ac­ count for the difference in people’s intuitions about the moral status of a person’s actions between different moral dilemmas. Even though the side-track and push scenarios are similar on an superficial level—there is the same contingency between acting or not act­ ing and the number of people who die as a consequence—they are also different in impor­ tant respects. So while the consequentialist is somewhat at a loss, the deontologist can attempt to find a principled rule that distinguishes between these cases.27 One such rule is the doctrine of double effect (DDE; Foot, 1967; Kamm, 2007; Quinn, 1989). The DDE draws a distinction between two types of effects that can result from a person’s action: first, an effect that is desired and intended, and second, an effect that is undesired but foreseen. For example, in the side-track scenario, there are two effects when Hank throws the switch. The five people on the main track are saved, and the one person on the side track is killed. The DDE states that an action that would normally be prohibited, such as homicide, may be morally permissible when (1) the (negative) action itself is not directly intended, (2) the good effect of the action is indended but not the bad effect, (3) the good effect outweighs the bad effect, and (4) the actor had no morally bet­ ter alternative (see Mikhail, 2009). Thus, throwing the switch is morally permissible according to the DDE, if (1) Hank didn’t just throw the switch because he likes throwing switches, (2) he intended to save the five people but didn’t intend to kill the person on the side track (even though he foresaw that outcome), (3) the positive effect of saving the five outweights the negative effect of killing the one, and (4) there was nothing else that Hank could have done which would have led to a morally better overall outcome. Now what does the DDE say about the push scenario? Again, let’s assume that Hank is not the kind of guy who enjoys pushing people off bridges just for kicks. What about Hank’s intention? Did he intend to kill the man on the bridge? Note that the causal struc­ ture between action and outcome in the push scenario is different from the side-track sce­ nario. Let’s assume again that Hank intends to save the five people. However, in order to realize his primary intention, he has to kill the large man first. Pushing the large man is not merely a foreseeable side effect of his action, but it features as a causal means in a chain of events that culminates in bringing about the intended outcome. Using the large man as a means implies that Hank intended for the large man to die. Since killing the man was an essential part of Hank’s plan to save the people on the track, the DDE rules that this action is impermissible. How can we tell apart whether a particular outcome was only a side effect of a person’s action, or a means for bringing about another outcome? In many situations, the but-for test provides a simple procedure to determine whether a particular effect was a means versus a side effect of an action (see Figure 29.7). In the push scenario (Figure 29.7 b), (p. 589)

if the large man hadn’t been shoved off the bridge, then the five people would not

have been saved. The survival of the five depends counterfactually on the death of the one. Thus, pushing the large man off the bridge was a means for saving the five. In con­ Page 37 of 64

Causation in Legal and Moral Reasoning trast, in the side-track scenario (Figure 29.7 a), the five on the main track would have been saved even if there had been no person on the side track. Given that Hank threw the switch, the survival of the five was not counterfactually dependent on the death of the one (they would have survived even if the person on the side track had managed to jump off the track). Thus, the death of the person on the side track was a side effect, rather than a means for saving the five.

Figure 29.7 Highlighted causal structures of the side-track and push scenarios as a function of the or­ der of presentation. Highlighted causal paths are represented as solid arrows. Figure adapted from Wiegmann & Waldmann (2014).

As discussed earlier, the doctrine of double effect draws a distinction between effects that were intended and effects that were merely foreseen (see also Dupoux & Jacob, 2007; Mikhail, 2007, 2009). Again, counterfactuals can help us to capture this difference. Rather than considering counterfactuals over events that happened in the world, we con­ sider counterfactual contrasts over our model of how the agent made her decision. As­ suming that an agent’s decisions are determined by her mental states, such as her be­ liefs, desires, and intentions (Dennett, 1987), we can invert the causal process from deci­ sion to action and infer the agent’s mental states from her actions (Baker et al., 2009; Yoshida et al., 2008). How can we explain that saving the five on the main track was an intended consequence of throwing the switch, whereas killing the one on the side track was a foreseen but unintended effect? We can do so via considering whether a particular effect influenced the agent’s decision (Nanay, 2010; Sloman et al., 2012; Uttich & Lom­ brozo, 2010) or plan (Kleiman-Weiner et al., 2015). Let us assume that Hank threw the switch in the side-track scenario. This action is consis­ tent with different intentions that may have driven Hank’s action. Hank may have intend­ ed to save the five people on the main track. He may have intended to kill that one person on the side track. Or he may have inteded both. Let’s first assume that Hank actually in­ tended to save the five people on the main track, and he didn’t intend to kill the person on the side track. If that’s the case, then he would have also thrown the switch if there had been no person on the side track. In other words, the person on the side track made no difference to Hank’s throwing the switch. In contrast, if the five people on the main track hadn’t been there, then Hank would not have thrown the switch (assuming that peo­ ple are generally lazy and don’t just throw switches for no reason). Thus, the people on Page 38 of 64

Causation in Legal and Moral Reasoning the main track, but not the person on the side track, made a difference to Hank’s deci­ sion-making. By considering a causal model of the decision-maker, we can distinguish in­ tended from foreseen effects: intended effects make a difference to the decision, whereas merely foreseen but not intended effects make no difference (Guglielmo & Malle, 2010). Of course, Hank’s throwing the switch in the side-track scenario is also consistent with the possibility that Hank intended to kill the person on the side track. If that was his in­ tention, then he might not have thrown the switch if the person on the side track hadn’t been present (unless he also intended to save the five). As long as we don’t have reasons to the contrary, we are generally inclined to assume (p. 590) that a person is more likely to have good intentions (Mikhail, 2007, 2009)—thus, we would consider it more likely that when Hank threw the switch, he intended to save the five, rather than kill the one. In a recent study, Kleiman-Weiner et al. (2015) showed a close correspondence between the inferences that people make about an actor’s intentions and the extent to which they deem the person’s action morally permissible. In their experiments, they varied the num­ ber of people on the main track and the number of people on the side track. In some of the situations, participants were informed that the person on the track was the decisionmaker’s brother. In situations in which the decision-maker threw the switch, participants generally judged that the decision-maker didn’t intend to kill the people on the side track, and that he had no intention for those on the main track to be killed. However, partici­ pants’ judgments were sensitive to the number of people on the different tracks and to whether the decision-maker’s brother was involved. For example, participants were more inclined to say that the decision-maker actually intended to kill the people on the side track when there was only one anonymous person on the main track but five anonymous people on the side track. In that case, participants were also slightly less willing to say that the decision-maker intended to save the person on the main track. The decisionmaker’s action is consistent with a desire to kill as many people as possible. Now consider a situation in which the decision-maker’s brother is on the main track, there are five people on the side track, and the decision-maker throws the switch. In this case, participants are less inclined to believe that the decision-maker intended to kill the five people on the main track, and more likely to believe that the decision-maker’s inten­ tion was to save his brother. While the decision-maker’s action is still consistent with a desire to kill as many people as possible, we have a viable alternative for why he acted the way he did. He may simply value his brother more than anonymous strangers. Kleiman-Weiner et al. (2015) show that a model of moral permissibility that combines in­ ferences about a person’s intention with a consideration of how many lives were saved and killed explains people’s judgments very accurately.

What Counterfactual Contrasts Do People Consider? So far, we have used counterfactual contrasts to help us make morally relevant distinc­ tions. By defining counterfactual contrasts over the causal structure of the situation, we were able to tell apart outcomes that were side effects of actions from outcomes that Page 39 of 64

Causation in Legal and Moral Reasoning were means for bringing about another outcome. By considering counterfactual contrasts over people’s plans, we were able to tease apart outcomes that were intended from out­ comes that were merely foreseen but not intended. But what kinds of counterfactual con­ trasts do people actually consider when they make moral judgments? A host of research on counterfactual thinking has demonstrated how some counterfactuals come to mind more easily than others (e.g., Kahneman & Miller, 1986; Phillips, Luguri, & Knobe, 2015; Roese, 1997) Maybe the the notion of counterfactual availability can also help us make sense of people’s moral judgments? Waldmann and Dieterich (2007) have shown that participants find an intervention on the threat more permissible than an intervention on the victim. They contrasted the sidetrack scenario (threat intervention) with a scenario in which Hank can intervene by redi­ recting a bus containing the victim onto the train track and thereby stopping the train (victim intervention). In order to explain the pattern of people’s judgments, Waldmann and Dieterich (2007) suggest that, depending on the type of intervention, participants se­ lectively focus on different counterfactual contrasts (cf. Schaffer, 2010). When interven­ ing on the threat, people compare the causal path the trolley would have taken with the path that it actually took. This counterfactual contrast highlights the difference between the five on the main track versus the one on the side track. In contrast, when intervening on the bus with the victim, the counterfactual contrast highlights what would have hap­ pened to the victim if Hank hadn’t intervened and redirected the bus on the train track. Here, the contrast is between the victim surviving and the victim dying. Waldmann and Dieterich (2007) further argue that the attentional focus triggered by the victim interven­ tion leads to a neglect of other potential victims in the background (i.e., the five on the track)—a phenomenon they call intervention myopia (see also Waldmann & Wiegmann, 2010). Based on these selective attention effects, Wiegmann and Waldmann (2014) have recently developed an account that explains transfer effects between moral dilemmas. Several studies have shown that the order in which different moral dilemmas are presented af­ fects participants’ judgments (e.g., Schwitzgebel & Cushman, 2012). For example, partici­ pants judge intervening in the switch scenario less permissible if they were first (p. 591) asked to make a judgment about the push scenario than when the order is reversed. Wiegmann and Waldmann (2014) explain this order effect by assuming that different sce­ narios make different causal paths and the associated counterfactual contrasts more salient. Their account focuses on an analysis of the causal structure that underlies the dif­ ferent moral dilemmas as well as people’s default evaluations for the different cases (cf. Halpern & Hitchcock, 2015; Hall, 2007). In the switch scenario (Figure 29.7 a), the action has two effects via separate causal paths: saving the five and killing the one. When partic­ ipants see the switch scenario in isolation, they tend to judge the person’s action to be permissible. In line with the good intention prior (Mikhail, 2009), Wiegmann and Wald­ mann (2014) propose that participants selectively focus on the causal path from interven­ tion to saving rather than the connection between intervening and killing (Figure 29.7 a). In the push scenario (Figure 29.7 b), there is a single causal path from intervening to sav­ Page 40 of 64

Causation in Legal and Moral Reasoning ing via killing. Here, it is not possible to selectively focus on the relationship between in­ tervening and saving since the causal path is mediated via the killing of the large man. In contrast, the relationship between intervention and killing is salient (Figure 29.7 b). The key idea is now that participants have a tendency to map salient aspects of the causal structure from one situation to another if such a mapping is possible (cf. Gentner, 1983; Holyoak et al., 2010). Consider a participant who judged the switch scenario before the push scenario (top pair in Figure 29.7). In the switch scenario, the causal path from inter­ vention to saving is highlighted. However, it is not possible to map this path onto the causal structure of the push case since there is no direct causal path between interven­ tion and saving. In contrast, consider a participant who saw the push scenario before the switch scenario (bottom pair in Figure 29.7). The push scenario highlights the link be­ tween intervention and killing (what Waldmann & Dieterich, 2007, termed intervention myopia). Now, it is possible to map this highlighted causal path onto the causal structure in the switch case. Having judged the push scenario first highlights the relationship be­ tween intervention and killing in the switch scenario. Wiegmann and Waldmann (2014) argue that the reliable order effect arises from the asymmetric way in which selectively attended parts of the causal structure can be mapped from one situation to another. In the trolley problems discussed earlier, the counterfactual analysis was fairly straight­ forward since the vignettes explicitly stipulated the action–outcome contingency. In the real world, however, we normally cannot be certain about what would have happened in the relevant counterfactual world. We have to rely on our causal understanding of the sit­ uation to simulate how the world would have unfolded if the person had acted differently (see Gerstenberg & Tenenbaum, Chapter 27 in this volume). Generally, we cannot be sure that actions always bring about their intended effects (Cushman et al., 2009; Gerstenberg et al., 2010; Schächtele et al., 2011). Imagine that Hank threw the large man off the bridge, but it turned out that this didn’t suffice to stop the trolley. In that case, Hank not only failed to save the five people on the track, he also killed an innocent man for no good effect. Studies have shown that people take the uncertainty associated with different ac­ tions into account when making moral evaluations (Fleischhut, 2013; Kortenkamp & Moore, 2014).

From Action Permissibility to Character Evaluation In the previous section, we have seen how different aspects of how a person’s action fea­ tures in the causal structure that ultimately led to the outcome affects people’s moral evaluations. We have also seen that people’s judgments are not solely determined by the causal role that the action played in bringing about the outcome. The same action is eval­ uated differently based on the context of the situation and the inferences we can draw about the person’s intentions from his actions. The majority of work on judgments in moral dilemmas has focused on explaining how people judge the moral permissibility of different actions.

Page 41 of 64

Causation in Legal and Moral Reasoning Recently, scholars in moral psychology have argued that this focus on actions as the unit for moral evaluation is misguided. What people mostly care about is what the action re­ veals about the person (Goodwin, Piazza, & Rozin, 2014; Kelley & Stahelski, 1970; Malle, Guglielmo, & Monroe, 2014; Pizarro & Tannenbaum, 2011; Sripada, 2012; Uhlmann et al., 2013; Uhlmann et al., 2015; Wojciszke, Bazinska, & Jaworksi, 1999; Woolfolk et al., 2006). Rather than putting the action at the center of analysis, the person-centered approach sees the person as the key target for moral evaluation (cf. Uhlmann et al., 2015). People are evaluating creatures—upon meeting someone for the first time, we try to fig­ ure out what (p. 592) makes them tick (Alicke et al., 2015). Moral evaluations of a person’s character, such as whether he cares for others (Hamlin et al., 2007; Ullman et al., 2009), and whether he can be trusted (Charness & Dufwenberg, 2006; Rezlescu et al., 2012), are particularly important (Goodwin et al., 2014; Todorov et al., 2008). In contrast to the ac­ tion-centered view, the person-centered view focuses on people’s motivation for engaging in moral evaluations and is concerned with explaining what function these evaluations serve (Gintis et al., 2001, 2008). One such proposed function of moral evaluation is rela­ tionship regulation (Rai & Fiske, 2011; Scanlon, 2009). As “intuitive prosecutors,” our moral judgments serve to shape the world as we would like to see it (Fincham & Jaspars, 1980; Hamilton, 1980; Lloyd-Bostock, 1979; Tetlock et al., 2007). For example, we want that our friends care for us and we want to be able to rely on them when in need. By blaming them for not having helped us when we moved house, we send a signal to change their behavior in the future (Bottom et al., 2002; Scanlon, 2009). However, one might ar­ gue, why do we care to engage in moral evaluation of people with whom we have no di­ rect connection? There is a lot of blaming going on in sports bars! When our favorite foot­ ball team loses because one of the players slacked off, then we expect that player to put in more effort in the future. Even though he is not a close friend, our utility depends on him, and by blaming him we contribute to a public evaluation of the player that may in­ deed have an influence on his future behavior (McCullough et al., 2013). There is also a lot of blaming going on when people watch soap operas together (Hag­ mayer & Osman, 2012). Here, we know that the characters are fictional and our moral evaluations won’t influence the plot. However, we can again make sense of this behavior from a functional perspective: moral judgments in these contexts may serve to coordinate one’s expectations and norms with one’s friends. By blaming John for cheating on his girl­ friend, we demonstrate to our friends that we value faithfulness and expect our friends not to follow John’s example. With a focus on the person, rather than the action, the person-centered view needs to an­ swer the question of how we infer a person’s character traits from her actions—in partic­ ular, her moral traits. Actions differ in the extent to which they are diagnostic about a person’s dispositions (Reeder & Spores, 1983; Reeder et al., 2004; Snyder et al., 1979). We learn most about a person’s character from behaviors that are different from how we would have expected others to behave in the same situation (Ditto & Jemmott, 1989; Fiske, 1980; Jones & Harris, 1967; McKenzie & Mikkelsen, 2007; Reeder et al., 2004; Reeder & Brewer, 1979). But what shapes this expectation? One factor is the cost or ef­ Page 42 of 64

Causation in Legal and Moral Reasoning fort it takes for someone to do a certain action. We generally assume that others behave rationally and act in a way to achieve their desired outcomes efficiently (Dennett, 1987). Thus, the more costly or effortful a particular action was for the agent, the more certain we can be that he really valued the outcome (Jara-Ettinger et al., 2014; Ohtsubo & Watan­ abe, 2009; Ullman et al., 2009). Similarly, we learn that a person values something when he had an attractive alternative (Ben-Porath & Dekel, 1992). We know that a friend really cares for us when he comes over to help console us after a breakup even though he had been invited to the party of the year. The time it took someone to make a decision is another factor that influences what we can learn about her motivation for acting (Crockett, 2013; Cushman, 2013; Pizarro et al., 2003). Critcher et al. (2012) found that actors who acted quickly were evaluated more positively for good outcomes and more negatively for bad ones, compared to actors who reached the same decision more slowly. Fast decisions signal that the actor was sure about her action and did not need to resolve any conflicting motives. A person who imme­ diately rushes to help someone in need is likely to be more strongly motivated by another person’s needs than someone who first considers how much effort it would take to help (cf. Hoffman et al., 2015) and checks if anyone else might be in a better shape to help out. Additional evidence in favor of the person-centered view to moral judgment comes from research showing that information about a person’s general character influences our moral evaluation of a particular action (Alicke, 1992; Kliemann et al., 2008; Nadler, 2012; Nadler & McDonnell, 2011). We find ways of blaming bad people (Alicke, 2000) and ex­ cusing people whom we like (Turri & Blouw, 2014). Often, there is considerable uncer­ tainty about the motives behind a person’s action. Thus, interpreting the same action dif­ ferently, depending on who performed it, must not be taken to reflect a biased evaluation. It may be reasonable to use character information to fill in the gaps (Gerstenberg et al., 2014; Uhlmann et al., 2015). The person-centered view also provides a natural way of handling the expectations that come with exhibiting a certain role (Hamilton, 1978; Schlenker (p. 593) et al., 1994; Trope, 1986; Woolfolk et al., 2006). For example, if a swimmer is about to drown, then it is fore­ most the lifeguard’s responsibility to try to save him. If the swimmer drowned without anyone having helped, then we would blame the lifeguard more than any of the other peo­ ple who were around and who could have also helped. Part of what it means to be a life­ guard is to have the (prospective) responsibility of making sure that everyone is safe in the water. Direct empirical support for the person-centered approach comes from work showing dis­ sociations between person and act evaluations (e.g., Tannenbaum et al., 2011; Uhlmann & Zhu, 2013). Such situations arise, for example, when someone takes the “right” action (for example, from a consequentialist perspective), but taking this action indicates a bad moral character (Bartels & Pizarro, 2011; Koenigs et al., 2007; Uhlmann et al., 2013). The act of throwing an injured person overboard in order to save the boat from sinking is evaluated more positively than not doing so. However, the passenger who decided to Page 43 of 64

Causation in Legal and Moral Reasoning throw the injured person overboard was evaluated more negatively than a passenger who decided not to do so (Uhlmann et al., 2013). Similarly, the person-centered view helps to shed light on people’s moral evaluations of harmless-but-offensive transgressions (Haidt, 2001; Uhlmann et al., 2013). For example, most people consider it morally wrong to eat a dead dog, but often find themselves at a loss when trying to explain why (a phenomenon coined moral dumbfounding; Haidt et al., 1993). Uhlmann et al. (2013) showed that while the act of eating a dead dog is not evaluated more negatively than stealing food, the per­ son who ate the dog is judged to be a worse person than the person who stole. Even though eating a dead dog didn’t harm anyone (as long as the dog wasn’t killed to be eat­ en), it is plausible that a person who commits such an act is also likely to engage in other dubious behavior that might actually be harmful. Interestingly, while Uhlmann et al. (2013) replicated the moral dumbfounding effect for judgments about actions, partici­ pants had much less difficulty justifying the character inferences they had drawn. What the person-centered view highlights is the need for a rich model of how people make (moral) decisions. We need such a model in order to make inferences about the person’s mental states and preferences from his or her actions (Ajzen & Fishbein, 1975; Baker et al., 2009; Bratman, 1987; Malle & Knobe, 1997). For example, when Hank didn’t help a drowning swimmer, we need to infer what Hank’s beliefs were (maybe Hank thought the person was just pretending; Young & Saxe, 2011), and what Hank would have been capable of doing (maybe Hank wasn’t able to swim; Clarke, 1994; Jara-Ettinger et al., 2013; Kant, 2002; Morse, 2003; van Inwagen, 1978). Not only do we need a causal model that explains Hank’s actions in terms of his mental states, we also want to be able to simulate how someone else (maybe with the same beliefs and capabilities as Hank, but with different desires) would have acted in the same situation. Recently, Gerstenberg et al. (2014) have suggested an account that directly links infer­ ences about a person’s character to evaluations of his or her behavior (cf. Johnson & Rips, 2015). In their studies, they had participants judge to what extent actors whose action was either expected or surprising were responsible for a positive or negative outcome. For example, in one scenario, participants evaluated goalkeepers in a penalty shoot-out. The goalkeepers knew about the striker’s tendency to shoot in either corner of the goal. However, the strikers didn’t know that the goalkeepers knew about their tendency. Partic­ ipants then saw situations in which the striker either shot in the expected corner or in the unexpected corner, and the goalkeeper either jumped in the correct corner and saved the ball, or jumped in the wrong direction. The results showed that participants blamed the goalkeeper more for not saving the shot when he jumped in the unexpected direction and the striker shot in the expected direc­ tion. Participants also considered the goalkeeper more creditworthy overall when he saved a shot that was placed in the unexpected direction. In another condition of the experiment, the goalkeeper scenario was replaced with a sce­ nario in which the decision-maker had to predict on what color a spinner will land. This scenario was structurally equivalent to the goalkeeper scenario, and the probabilistic in­ Page 44 of 64

Causation in Legal and Moral Reasoning formation was matched. Again, participants blamed decision-makers more for negative outcomes that resulted from unexpected predictions (e.g., predicting that the spinner will land on blue when the chance for it landing on yellow was greater, and it actually landed on yellow). This time, however, decision-makers were praised more for positive outcomes that resulted from expected rather than unexpected actions (i.e., correctly predicting that the spinner will land on the more probable color, rather than correctly predicting the less probable outcome). How can we explain this pattern of results? Gerstenberg et al.’s (2014) account assumes that people’s responsibility judgments are mediated by an inference about the agent. Accordingly, we start off with some assumptions about how a reasonable person (or goalkeeper) is expected to act. After having observed what happened in this particu­ lar situation, we update our belief about the person. Gerstenberg et al. (2014) propose that responsibility judgments are closely related to how we change our expectations about people (cf. Ajzen & Fishbein, 1975). Intuitively, we credit people more if our expec­ tation about a person improved after having observed their action. We blame people more for a negative outcome if our expectation about their future behavior is lowered. (p. 594)

In their model, Gerstenberg et al. (2014) represent people’s intuitive theories about the situation as distributions over agents with different character traits or skills. The key idea is then that people have different prior assumptions for the goalkeeper and the spinner scenarios. A skilled goalkeeper may anticipate an unexpected shot and save it. For spin­ ners, howerver, it’s less likely that a person can reliably predict that it will land on the less likely color. Hence, if we observe a goalkeeper saving an unexpected ball, we may ei­ ther think that he acted unreasonably and was just lucky, or that he in fact correctly an­ ticipated the shot. If we deem the chances of skill being present to be reasonably high, then our expectations about this goalkeeper’s behavior increase after having seen him save the unexpected ball (and more so compared to a situation in which he saved an ex­ pected shot). In contrast, in the spinner scenario, the most likely explanation for a person correctly predicting the spinner’s landing on the unexpected outcome is that he was just lucky. Observing such an action actually leads us to lower our expectation about that person’s behavior, and thus he is deserving of less credit. Gerstenberg et al. (2014) applied their model to an achievement domain in which skill is a critical factor. However, their model could be extended and applied more directly to the moral domain. The key idea of linking people’s moral evaluations to a difference in expec­ tations is flexible enough to accommodate different aspects of the situation, such as im­ moral desires or ulterior motives. One natural way to think about the role of expectations in judgments of responsibility is again in terms of a counterfactual contrast. Rather than thinking about how the outcome would have been different if the person had acted differently, we may think about what would have happened if we had replaced the person in the situation with someone else (cf. Fincham & Jaspars, 1983). In the law, this idea is referred to as the reasonable-man test (Green, 1967). In cases of negligence, for example, we may have expectations about Page 45 of 64

Causation in Legal and Moral Reasoning what sort of precautions a reasonable person would have taken that might have prevent­ ed the harm from happening. Relatedly, there is a statistic in baseball called wins above replacement that captures the difference that a person makes to the number of games that a team wins over the course of the season (Jensen, 2013). It tries to quantify how many more games the team won over the season compared to a counterfactual team in which the player under consideration would have been replaced with another player. Thinking about moral evaluations in terms of counterfactual replacements provides a rich framework that links up normative expectations, action evaluation, and character infer­ ences. The richness of this framework comes with great theoretical demands. Not only do we need a causal representation of the situation that allows us to reason about the rela­ tionship between the person’s action and the outcome, but we also require an intuitive theory of the different factors that influence how people make decisions and plans, and how these personal characteristics translate into morally relevant behavior.

Discussion In this section on moral reasoning, we have seen that people’s moral judgments are strongly influenced by their causal representation of the situation, as well as their intu­ itive theory of how people make decisions. To evaluate the causal role that the agent’s action played, we need a causal model of the situation that supports the consideration of counterfactual contrasts. Our discussion of the literature on trolley problems (Waldmann et al., 2012) showed that we can distinguish means from side effects in terms of counterfactuals on actions, and intended from merely foreseen outcomes in terms of counterfactuals on plans (cf. Kleiman-Weiner et al., 2015). If we additionally assume that some counterfactual contrasts are more salient than oth­ ers, we can also make sense of why people’s moral intuitions differ depending on whether the action targets the threat or the victim (Iliev et al., 2012; Waldmann & Dieterich, 2007), and we can use the idea of salient causal paths to explain transfer effects between trolley problems that differ in their causal structure (Wiegmann & Waldmann, 2014). To evaluate what we can learn about a person from his or her action, we need a causal model of how people make decisions and plans (Baker et al., 2009; Wellman & Gel­ man, 1992). Once we have a generative model of how people’s mental states determine (p. 595)

their actions, we can invert this process and reason about a person’s mental states from having observed her actions. Making person inferences comes natural to us, and we have argued that evaluating others along moral character dimensions serves important func­ tions, such as regulating relationships and coordinating normative expectations (Rai & Fiske, 2011; Uhlmann et al., 2015). Finally, we have briefly sketched one way of how we can go from character evaluations to attributions of responsibility via considering coun­ terfactuals over persons—blame and credit vary as a function of our expectations about how a reasonable person should have acted in the given situation (Gerstenberg et al., 2014).

Page 46 of 64

Causation in Legal and Moral Reasoning Considering both the person-centered and action-centered views suggests that differ­ ences in people’s moral judgments can arise from different sources: (a) Two people might disagree about the causal status of the person’s action in bringing about the outcome. One person, for example, might believe that the outcome would have happened anyway, whereas the other person might believe that the negative outcome would have been pre­ vented but for the person’s action. (b) Two people might disagree about what the action reveals about the person. Whereas the same action might look like an expression of gen­ uine altruism to one person, another person might infer ulterior motives behind the ac­ tion. We suggest that rather than artificially providing people with all the relevant infor­ mation, as is often done in psychological studies on legal and moral judgments, it will be fruitful to design experiments that reflect the uncertainty inherent in our everyday lives. New empirical investigations into how people make person inferences and causal judg­ ments in the realm of uncertainty will have to go hand in hand with the development of a coherent formal framework that encompasses both the action-centered and person-cen­ tered view.

Conclusions Causality is at the core of people’s understanding of the physical and social world. In this chapter, we have shown that causation is key to legal and moral reasoning. We have seen that legal scholars and psychologists struggle with very similar issues. Legal scholars seek a principled account of causation that can be applied to complex scenarios and that fits with our common-sense intuitions. Here formal work on causal models and structural equations (e.g., Halpern & Pearl, 2005) has helped to sharpen our theories of causation. The idea of thinking about actual causes as difference-makers under possible contingen­ cies resonates well with how the law has extended the but-for test of causation (e.g., Sta­ pleton, 2008). Legal decision-making also requires people to assess the evidence present­ ed to support their conclusions. Causal reasoning is again critical, both in terms of the stories people tell to make sense of the evidence, and in terms of the methods they use to assess its credibility and reliability. Moreover, causal attributions seem to be shaped by pragmatic goals. People’s everyday causal judgments often serve to attribute responsibili­ ty and blame, mirroring the way in which legal judgments of causation are geared toward ultimate judgments of legal responsibilty. Psychologists also try to come up with a principled account of how people make moral judgments. We have shown that representing moral dilemmas in terms of causal models and counterfactuals helps us understand people’s judgments. Moreover, we argue that a fuller picture of how people make moral evaluations requires a shift of focus from the moral permissibility of actions to the broader issue of what actions reveal about the person’s character. Whereas the law has traditionally put less emphasis on the perpetrator’s character (Bayles, 1982; Duff, 1993), work in psychology has shown that our moral evaluations are heavily influenced by our inferences about what the person is like (Uhlman et al., 2015). Thus, there is a possible tension between the factors on which

Page 47 of 64

Causation in Legal and Moral Reasoning people base their intuitive moral judgments and the factors the law deems relevant in de­ termining legal liability. For both legal and moral reasoning, then, it it is crucial to understand the causal models that people construct, and the rich factual and counterfactual inferences they draw on this basis. Pragmatic and emotive concerns might shape and possibly distort these mod­ els, but without them we cannot get started on the route to blame, praise, or indifference.

References Ajzen, I., & Fishbein, M. (1975). A Bayesian analysis of attribution processes. Psychologi­ cal Bulletin, 82(2), 261–277. Alicke, M. D. (1992). Culpable causation. Journal of Personality and Social Psychology, 63(3), 368–378. Alicke, M. D. (2000). Culpable control and the psychology of blame. Psychological Bulletin, 126(4), 556–574. Alicke, M. D., Mandel, D. R., Hilton, D., Gerstenberg, T., & Lagnado, D. A. (2015). Causal conceptions in social explanation and moral evaluation: A historical tour. 10(6), 790–812. Baker, C. L., Saxe, R., & Tenenbaum, J. B. (2009). Action understanding as inverse plan­ ning. Cognition, 113(3), 329–349. Bartels, D. M., & Pizarro, D. A. (2011). The mismeasure of morals: Antisocial personality traits predict utilitarian responses to moral dilemmas. Cognition, 121(1):154–161. Ben-Porath, E., & Dekel, E. (1992). Signaling future actions and the potential for sacri­ fice. Journal of Economic Theory, 57(1), 36–51. Bennett, W. L., & Feldman, M. (1981). Reconstructing reality in the courtroom. New Brunswick, NJ: Rutgers University Press. (p. 597)

Blanchard, T., & Schaffer, J. (2016). Cause without default. In H. Beebee, C. Hitchcock & H. Price (Eds.), Making a difference. Oxford: Oxford University Press. Bottom, W. P., Gibson, K., Daniels, S. E., & Murnighan, J. K. (2002). When talk is not cheap: Substantive penance and expressions of intent in rebuilding cooperation. Organi­ zation Science, 13(5), 497–513. Bratman, M. (1987). Intention, plans, and practical reason. Center for the Study of Lan­ guage and Information. Chakroff, A., Dungan, J., & Young, L. (2013). Harming ourselves and defiling others: What determines a moral domain? PLoS ONE, 8(9), e74434. Charness, G., & Dufwenberg, M. (2006). Promises and partnership. Econometrica, 74(6), 1579–1601. Page 48 of 64

Causation in Legal and Moral Reasoning Chockler, H., & Halpern, J. Y. (2004). Responsibility and blame: A structural-model ap­ proach. Journal of Artificial Intelligence Research, 22(1), 93–115. Clarke, R. (1994). Ability and responsibility for omissions. Philosophical Studies, 73(2), 195–208. Connor De Sai, S., Reimers, S. & Lagnado, D. A. (2016). Consistency and credibility in le­ gal reasoning: A Bayesian network approach. In Papafragou, A., Grodner, D., Mirman, D., & Trueswell, J.C. (Eds.), Proceedings of the 38th Annual Conference of the Cognitive Science Society (pp. 626–631). Austin, TX: Cognitive Science Society Critcher, C. R., Inbar, Y., & Pizarro, D. A. (2012). How quick decisions illuminate moral character. Social Psychological and Personality Science, 4(3), 308–315. Crockett, M. J. (2013). Models of morality. Trends in Cognitive Sciences, 17(8), 363–366. Cushman, F. A. (2008). Crime and punishment: Distinguishing the roles of causal and in­ tentional analyses in moral judgment. Cognition, 108(2), 353–380. Cushman, F. (2013). Action, outcome, and value: A dual-system framework for morality. Personality and Social Psychology Review, 17(3), 273–292. Cushman, F., Dreber, A., Wang, Y., & Costa, J. (2009). Accidental outcomes guide punish­ ment in a “trembling hand” game. PloS One, 4(8), e6699. Cushman, F., & Young, L. (2011). Patterns of moral judgment derive from nonmoral psy­ chological representations. Cognitive Science, 35(6), 1052–1075. Darley, J. M., & Latané, B. (1968). Bystander intervention in emergencies: diffusion of re­ sponsibility. Journal of Personality and Social Psychology, 8(4), 377–383. Darwall, S. L. (Ed.) (2003a). Consequentialism. Oxford: Blackwell. Darwall, S. L. (Ed.) (2003b). Deontology. Oxford: Blackwell. Darwall, S. L. (Ed.) (2003c). Virtue ethics. Oxford: Blackwell. Dawid, A. P., & Evett, I. W. (1997). Using a graphical method to assist the evaluation of complicated patterns of evidence. Journal of Forensic Science, 42, 226–31. Dennett, D. C. (1987). The intentional stance. Cambridge, MA: MIT Press. Ditto, P. H., & Jemmott, J. B. (1989). From rarity to evaluative extremity: Effects of preva­ lence information on evaluations of positive and negative characteristics. Journal of Per­ sonality and Social Psychology, 57, 16–26. Dowe, P. (2000). Physical causation. Cambridge: Cambridge University Press.

Page 49 of 64

Causation in Legal and Moral Reasoning Driver, J. (2008). Attributions of causation and moral responsibility. In Sinnott-Armstrong, W. (Ed.), Moral psychology: The cognitive science of morality: Intuition and diversity, Vol. 2. Cambridge, MA: MIT Press. Dupoux, E., & Jacob, P. (2007). Universal moral grammar: A critical appraisal. Trends in Cognitive Sciences, 11(9), 373–378. Feigenson, N. R. (1996). The rhetoric of torts: How advocates help jurors think about cau­ sation, reasonableness and responsibility. Hastings Law Journal, 47, 61–165. Fenton, N., & Neil, M. (2012). Risk assessment and decision analysis with Bayesian net­ works. Boca Raton, FL: CRC Press. Fenton, N., Neil, M., & Lagnado, D. A. (2013). A general structure for legal arguments about evidence using Bayesian networks. Cognitive Science, 37, 61–102. Fincham, F. D., & Jaspars, J. M. (1980). Attribution of responsibility: From man the scien­ tist to man as lawyer. In Berkowitz, L. (Ed.), Advances in experimental social psychology (Vol. 13, pp. 81–138). New York: Academic Press. Fincham, F. D., & Jaspars, J. M. (1983). A subjective probability approach to responsibility attribution. British Journal of Social Psychology, 22(2), 145–161. Fiske, S. T. (1980). Attention and weight in person perception: The impact of negative and extreme behavior. Journal of Personality and Social Psychology, 38(6), 889–906. Fleischhut, N. (2013). Moral judgment and decision making under uncertainty. Unpub­ lished PhD thesis. Humboldt University, Berlin, Germany. Foot, P. (1967). The problem of abortion and the doctrine of the double effect. Oxford Re­ view, 5, 4–15. Fumerton, R., & Kress, K. (2001). Causation and the law: Preemption, lawful sufficency and causal sufficiency. Law and Contemporary Problems, 64, 83–105. Gentner, D. (1983). Structure-mapping: A theoretical framework for analogy. Cognitive Science, 7(2), 155–170. Gerstenberg, T., Goodman, N. D., Lagnado, D. A., & Tenenbaum, J. B. (2015). How, whether, why: Causal judgments as counterfactual contrasts. In D. C. Noelle et al. (Eds.), Proceedings of the 37th Annual Conference of the Cognitive Science Society (pp. 782– 787). Austin, TX: Cognitive Science Society. Gerstenberg, T., & Lagnado, D. A. (2010). Spreading the blame: The allocation of respon­ sibility amongst multiple agents. Cognition, 115(1), 166–171. Gerstenberg, T., Halpern, J. Y., & Tenenbaum, J. B. (2015). Responsibility judgments in voting scenarios. In D. C. Noelle et al. (Eds.), Proceedings of the 37th Annual Conference of the Cognitive Science Society (pp. 788–793). Austin, TX: Cognitive Science Society. Page 50 of 64

Causation in Legal and Moral Reasoning Gerstenberg, T., & Lagnado, D. A. (2012). When contributions make a difference: Explain­ ing order effects in responsibility attributions. Psychonomic Bulletin & Review, 19(4), 729–736. Gerstenberg, T., Lagnado, D. A., & Kareev, Y. (2010). The dice are cast: The role of intend­ ed versus actual contributions in responsibility attribution. In S. Ohlsson & R. Catram­ bone (Eds.), Proceedings of the 32nd annual conference of the Cognitive Science Society (pp. 1697–1702). Austin, TX: Cognitive Science Society. Gerstenberg, T., Ullman, T. D., Kleiman-Weiner, M., Lagnado, D. A., & Tenenbaum, J. B. (2014). Wins above replacement: Responsibility attributions as counterfactual replace­ ments. In P. Bello, M. Guarini, M. McShane, & B. Scassellati (Eds.), Proceedings of the 36th annual conference of the Cognitive Science Society (pp. 2263–2268). Austin, TX: Cognitive Science Society. Gintis, H., Henrich, J., Bowles, S., Boyd, R., & Fehr, E. (2008). Strong reciprocity and the roots of human morality. Social Justice Research, 21(2), 241–253. (p. 598)

Gintis, H., Smith, E. A., & Bowles, S. (2001). Costly signaling and cooperation. Journal of Theoretical Biology, 213(1), 103–119. Glöckner, A., & Engel C. (2013). Can we trust intuitive jurors? Standards of proof and the probative value of evidence in coherence based reasoning. Journal of Empirical Legal Studies, 10, 230–252. Goldman, A. I. (1999). Why citizens should vote: A causal responsibility approach. Social Philosophy and Policy, 16(2), 201–217. Goodman, N. D., Tenenbaum, J. B., & Gerstenberg, T. (2015). Concepts in a probabilistic language of thought. In E. Margolis & S. Lawrence (Eds.), The conceptual mind: New di­ rections in the study of concepts (pp. 623–653). Cambridge, MA: MIT Press. Goodwin, G. P., Piazza, J., & Rozin, P. (2014). Moral character predominates in person per­ ception and evaluation. Journal of Personality and Social Psychology, 106(1), 148–168. Gopnik, A., & Wellman, H. M. (1992). Why the child’s theory of mind really is a theory. Mind & Language, 7(1–2), 145–171. Green, E. (1967). The reasonable man: Legal fiction or psychosocial reality? Law & Soci­ ety Review, 2, 241–258. Green, S. (2015). Causation in negligence. Oxford: Hart. Greene, E. J., & Darley, J. M. (1998). Effects of necessary, sufficient, and indirect causa­ tion on judgments of criminal liability. Law and Human Behavior, 22(4), 429–451. Greene, J., & Haidt, J. (2002). How (and where) does moral judgment work? Trends in Cognitive Sciences, 6(12), 517–523. Page 51 of 64

Causation in Legal and Moral Reasoning Griffiths, T. L., & Tenenbaum, J. B. 2009. Theory-based causal induction. Psychological Re­ view, 116(4), 661–716 Guglielmo, S., & Malle, B. F. (2010). Can unintended side effects be intentional? Resolv­ ing a controversy over intentionality and morality. Personality and Social Psychology Bul­ letin, 36(12), 1635–1647. Guglielmo, S., Monroe, A. E., & Malle, B. F. (2009). At the heart of morality lies folk psy­ chology. Inquiry: An Interdisciplinary Journal of Psychology, 52(5), 449–466. Hagmayer, Y., & Osman, M. (2012). From colliding billiard balls to colluding desperate housewives: Causal bayes nets as rational models of everyday causal reasoning. Synthese, 189, 17–28. Haidt, J. (2001). The emotional dog and its rational tail: A social intuitionist approach to moral judgment. Psychological Review, 108(4), 814–834. Haidt, J., Koller, S. H., & Dias, M. G. (1993). Affect, culture, and morality, or is it wrong to eat your dog? Journal of Personality and Social Psychology, 65(4), 613–628. Haidt, J., Rozin, P., McCauley, C., & Imada, S. (1997). Body, psyche, and culture: The rela­ tionship between disgust and morality. Psychology & Developing Societies, 9(1), 107–131. Hall, N. (2007). Structural equations and causation. Philosophical Studies, 132, 109–136. Halpern, J. Y., & Hitchcock, C. (2015). Graded causation and defaults. British Journal for the Philosophy of Science, 66, 413–457. Halpern, J. Y., & Pearl, J. (2005). Causes and explanations: A structural-model approach. Part I: Causes. The British Journal for the Philosophy of Science, 56(4), 843–887. Hamilton, V. L. (1978). Who is responsible? Toward a social psychology of responsibility attribution. Social Psychology, 41(4), 316–328. Hamilton, V. L. (1980). Intuitive psychologist or intuitive lawyer? Alternative models of the attribution process. Journal of Personality and Social Psychology, 39(5), 767–772. Hamlin, J. K., Wynn, K., & Bloom, P. (2007). Social evaluation by preverbal infants. Nature, 450(7169), 557–559. Hart, H. L. A., & Honoré, T. (1959/1985). Causation in the law. Oxford: Oxford University Press. Hastie, R. (1999). The role of stories in civil jury judgments. University of Michigan Jour­ nal of Law Reform, 32, 227–239. Heider, F. 1958. The psychology of interpersonal relations. New York: John Wiley & Sons.

Page 52 of 64

Causation in Legal and Moral Reasoning Heller, K. (2006). The cognitive psychology of circumstantial evidence. Michigan Law Re­ view, 105, 243–305. Hepler, A. B., Dawid, A. P., & Leucari, V. (2007). Object-oriented graphical representa­ tions of complex patterns of evidence. Law, Probability & Risk, 6, 275–293. Herring, J. (2010). Criminal law: Texts, cases, and materials (4th ed.). Oxford: Oxford Uni­ veristy Press. Hilton, D. J., McClure, J., & Sutton, R. M. (2010). Selecting explanations from causal chains: Do statistical principles explain preferences for voluntary causes. European Jour­ nal of Social Psychology, 40(3), 383–400. Hitchcock, C. (2012). Portable causal dependence: A tale of consilience. Philosophy of Science, 79(5), 942–951. Hitchcock, C., & Knobe, J. (2009). Cause and norm. The Journal of Philosophy, 106(11), 587–612. Hoffman, Rt. Hon Lord (2011). Causation. In R. Goldberg (Ed.), Perspectives on causation (pp. 3–9). Oxford; Portland, OR: Hart. Hoffman, M., Yoeli, E., & Nowak, M. A. (2015). Cooperate without looking: Why we care what people think and not just what they do. Proceedings of the National Academy of Sciences, 112(6), 1727–1732. Hogarth, R. M., & Einhorn, H. J. (1992). Order effects in belief updating: The belief ad­ justment model. Cognitive Psychology, 24, 1–55. Holyoak, K. J., Lee, H. S., & Lu, H. (2010). Analogical and category-based inference: A theoretical integration with Bayesian causal models. Journal of Experimental Psychology: General, 139(4), 702–727. Iliev, R. I., Sachdeva, S., & Medin, D. L. (2012). Moral kinematics: The role of physical fac­ tors in moral judgments. Memory & Cognition, 40(8), 1387–1401. Jara-Ettinger, J., Kim, N., Muentener, P., & Schulz, L. E. (2014). Running to do evil: Costs incurred by perpetrators affect moral judgment. In P. Bello, M. Guarini, M. McShane, & B. Scassellati (Eds.), Proceedings of the 36th Annual Conference of the Cognitive Science Society (pp. 684–688). Austin, TX: Cognitive Science Society. Jara-Ettinger, J., Tenenbaum, J. B., & Schulz, L. E. (2013). Not so innocent: Reasoning about costs, competence, and culpability in very early childhood. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th Annual Conference of the Cog­ nitive Science Society (pp. 663–668). Austin, TX: Cognitive Science Society. Jensen, S. (2013). A statistician reads the sports pages: Salaries and wins in baseball. CHANCE, 26(1), 47–52. Page 53 of 64

Causation in Legal and Moral Reasoning Jern, A., Chang, K. K., & Kemp, C. (2014). Belief polarization is not always irrational. Psy­ chological Review, 121(2), 206–224. Johnson, S. G., & Rips, L. J. (2015). Do the right thing: The assumption of optimality in lay decision theory and causal judgment. Cognitive Psychology, 77, 42–76. Jones, E. E., & Harris, V. A. (1967). The attribution of attitudes. Journal of Experimental Social Psychology, 3(1), 1–24. Kahneman, D., & Miller, D. T. (1986). Norm theory: Comparing reality to its alternatives. Psychological Review, 93(2), 136–153. Kahneman, D. & Tversky, A. (1982). The simulation heuristic. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncertainty: Heuristics and biases (pp. 201– 208). Cambridge; Cambridge University Press. (p. 599)

Kamm, F. M. (2007). Intricate ethics. Oxford: Oxford University Press. Kant, I. (1796/2002). Groundworks for the metaphysics of morals. New Haven, CT; Lon­ don: Yale University Press. Kelley, H. H., & Stahelski, A. J. (1970). The inference of intentions from moves in the prisoner’s dilemma game. Journal of Experimental Social Psychology, 6(4), 401–419. Kerr, N. L. (1996). “Does my contribution really matter?”: Efficacy in social dilemmas. Eu­ ropean Review of Social Psychology, 7(1), 209–240. Kleiman-Weiner, M., Gerstenberg, T., Levine, S., & Tenenbaum, J. B. (2015). Inference of intention and permissibility in moral decision making. In D. C. Noelle, Dale, R. and War­ laumont, A. S. and Yoshimi, J. and Matlock, T., Jennings and C. D. and Maglio, P. P. Kliemann, D., Young, L., Scholz, J., & Saxe, R. (2008). The influence of prior record on moral judgment. Neuropsychologia, 46(12), 2949–2957. Knobe, J. (2009). Folk judgments of causation. Studies in the History and Philosophy of Science, 40(2), 238–242. Knobe, J., & Fraser, B. (2008). Causal judgment and moral judgment: Two experiments. In W. Sinnott-Armstrong (Ed.), Moral psychology (pp. 441–447). Cambridge, MA: MIT Press. Koenigs, M., Young, L., Adolphs, R., Tranel, D., Cushman, F., Hauser, M., & Damasio, A. (2007). Damage to the prefrontal cortex increases utilitarian moral judgements. Nature, 446(7138), 908–911. Kominsky, J. F., Phillips, J., Gerstenberg, T., Lagnado, D., & Knobe, J. (2015). Causal super­ seding. Cognition, 137, 196–209.

Page 54 of 64

Causation in Legal and Moral Reasoning Kortenkamp, K. V., & Moore, C. F. (2014). Ethics under uncertainty: The morality and ap­ propriateness of utilitarianism when outcomes are uncertain. The American Journal of Psychology, 127(3), 367–382. Kuhn, D., Weinstock, M., & Flaton, R. (1994). How well do jurors reason? Competence di­ mensions of individual variation in a juror reasoning task. Psychological Science, 5, 289– 296. Lagnado, D. A. (2011). Causal thinking. In P. M. Illari, F. Russo, & J. Williamson (Eds.), Causality in the sciences (pp. 129–149). Oxford: Oxford University Press. Lagnado, D. A., & Channon, S. (2008). Judgments of cause and blame: The effects of in­ tentionality and foreseeability. Cognition, 108, 754–770. Lagnado, D. A., Gerstenberg, T., & Zultan, R. (2013). Causal responsibility and counter­ factuals. Cognitive Science, 47, 1036–1073. Lagnado, D. A., Fenton, N., & Neil, M. (2013). Legal idioms: A framework for evidential reasoning. Argument and Computation, 4, 46–53. Lagnado, D. A., & Harvey, N. (2008). The impact of discredited evidence. Psychonomic Bulletin & Review, 15(6), 1166–1173. Latané, B. (1981). The psychology of social impact. American Psychologist, 36(4), 343– 356. Lewis, D. (1986). Causal explanation. Philosophical Papers, 2, 214–240. Griffiths, T. L., Lieder, F., & Goodman, N. D. (2015). Rational use of cognitive resources: Levels of analysis between the computational and the algorithmic. Topics in Cognitive Science, 7, 217–229. Lloyd-Bostock, S. (1979). The ordinary man, and the psychology of attributing causes and responsibility. The Modern Law Review, 42(2), 143–168. Lombrozo, T. (2010). Causal-explanatory pluralism: How intentions, functions, and mecha­ nisms influence causal ascriptions. Cognitive Psychology, 61(4), 303–332. Malle, B. F., Guglielmo, S., & Monroe, A. E. (2014). A theory of blame. Psychological In­ quiry, 25(2), 147–186. Malle, B. F., & Knobe, J. (1997). The folk concept of intentionality. Journal of Experimental Social Psychology, 33, 101–121. Mandel, D. R. (2011). Mental simulation and the nexus of causal and counterfactual ex­ planation. In C. Hoerl, T. McCormack, & S. R. Beck (Eds.), Understanding counterfactu­ als, understanding causation: Issues in philosophy and psychology (pp. 146–170). Oxford: Oxford University Press. Page 55 of 64

Causation in Legal and Moral Reasoning McCullough, M. E., Kurzban, R., & Tabak, B. A. (2013). Cognitive systems for revenge and forgiveness. Behavioral and Brain Sciences, 36(1), 1–15. McKenzie, C. R., & Mikkelsen, L. A. (2007). A Bayesian view of covariation assessment. Cognitive Psychology, 54(1), 33–61. Mikhail, J. (2007). Universal moral grammar: theory, evidence and the future. Trends in Cognitive Sciences, 11(4), 143–152. Mikhail, J. (2009). Moral grammar and intuitive jurisprudence: A formal model of uncon­ scious moral and legal knowledge. Psychology of Learning and Motivation, 50, 27–100. Moore, M. S. (2009). Causation and responsibility: An essay in law, morals, and meta­ physics. Oxford: Oxford University Press. Morse, S. J. (2003). Diminished rationality, diminished responsibility. Ohio State Journal of Criminal Law, 1, 289–308. Nadler, J. (2012). Blaming as a social process: The influence of character and moral emo­ tion on blame. Law & Contemporary Problems, 2, 1–31. Nadler, J., & McDonnell, M.-H. (2011). Moral character, motive, and the psychology of blame. Cornell Law Review, 97. Nanay, B. (2010). Morality or modality? What does the attribution of intentionality de­ pend on? Canadian Journal of Philosophy, 40(1), 25–39. Ohtsubo, Y., & Watanabe, E. (2009). Do sincere apologies need to be costly? Test of a costly signaling model of apology. Evolution and Human Behavior, 30(2), 114–123. Paul, L. A., & Hall, N. (2013). Causation: A user’s guide. Oxford: Oxford University Press. Pearl, J. (1988). Probabilistic reasoning in intelligent systems. San Francisco: Morgan Kaufmann. Pearl, J. (2000). Causality: Models, reasoning and inference. New York: Cambridge Uni­ versity Press. Pennington, N., & Hastie, R. (1981). Juror decision making models: The generalization gap. Psychological Bulletin, 89, 246–287. Pennington, N., & Hastie, R. (1986). Evidence evaluation in complex decision making. Journal of Personality and Social Psychology, 51(2), 242. Pennington, N., & Hastie, R. (1988). Explanation-based decision making: The effects of memory structure on judgment. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14, 521–533.

Page 56 of 64

Causation in Legal and Moral Reasoning Pennington, N. & Hastie, R. (1992). Explaining the evidence: Tests of the story model for juror decision making. Journal of Personality and Social Psychology, 62(2), 189–206. Phillips, J., Luguri, J., & Knobe, J. (2015). Unifying morality’s influence on non-moral judg­ ments: The relevance of alternative possibilities. Cognition, 145, 30–42. Pizarro, D. A., & Tannenbaum, D. (2011). Bringing character back: How the motivation to evaluate character influences judgments of moral blame. In M. Mikulincer, & P. R. Shaver (p. 600) (Eds.), The social psychology of morality: Exploring the causes of good and evil (pp. 91–108). Washington, DC: APA Press. Pizarro, D. A., Uhlmann, E., & Salovey, P. (2003). Asymmetry in judgments of moral blame and praise: The role of perceived metadesires. Psychological Science, 14(3), 267–72. Quinn, W. S. (1989). Actions, intentions, and consequences: The doctrine of double effect. Philosophy & Public Affairs, 18, 334–351. Rai, T. S., & Fiske, A. P. (2011). Moral psychology is relationship regulation: Moral mo­ tives for unity, hierarchy, equality, and proportionality. Psychological Review, 118(1), 57– 75. Reeder, G. D., & Brewer, M. B. (1979). A schematic model of dispositional attribution in interpersonal perception. Psychological Review, 86(1), 61. Reeder, G. D., & Spores, J. M. (1983). The attribution of morality. Journal of Personality and Social Psychology, 44(4), 736–745. Reeder, G. D., Vonk, R., Ronk, M. J., Ham, J., & Lawrence, M. (2004). Dispositional attribu­ tion: Multiple inferences about motive-related traits. Journal of Personality and Social Psychology, 86(4), 530–544. Rezlescu, C., Duchaine, B., Olivola, C. Y., & Chater, N. (2012). Unfakeable facial configu­ rations affect strategic choices in trust games with or without information about past be­ havior. PLoS ONE, 7(3), e34293. Roese, N. J. (1997). Counterfactual thinking. Psychological Bulletin, 121(1), 133–148. Rumelhart, D. E., & McClelland, J. L. (1986). Parallel distributed processing: Explorations in the microstructure of cognition, Vol. 1: Foundations. Cambridge, MA: MIT Press. Samland, J., & Waldmann, M. R. (2015). Highlighting the causal meaning of causal test questions in contexts of norm violations. In D. C. Noelle et al. (Eds.), Proceedings of the 37th Annual Conference of the Cognitive Science Society (pp. 2092–2097). Austin, TX: Cognitive Science Society. Scanlon, T. M. (2009). Moral dimensions. Cambridge, MA: Harvard University Press.

Page 57 of 64

Causation in Legal and Moral Reasoning Schächtele, S., Gerstenberg, T., & Lagnado, D. A. (2011). Beyond outcomes: The influence of intentions and deception. In L. Carlson, C. Hölscher, & T. Shipley (Eds.), Proceedings of the 33rd annual conference of the Cognitive Science Society (pp. 1860–1865). Austin, TX: Cognitive Science Society. Schaffer, J. (2010). Contrastive causation in the law. Legal Theory, 16(4), 259–297. Schank, R. C., & Abelson, R. P. (1977). Scripts, plans, goals, and understanding. Hillsdale, NJ: Lawrence Erlbaum Associates. Schlenker, B. R., Britt, T. W., Pennington, J., Murphy, R., & Doherty, K. (1994). The triangle model of responsibility. Psychological Review, 101(4), 632–652. Schum, D. A. (1994). The evidential foundations of probabilistic reasoning. Evanston, IL: Northwestern University Press. Schwitzgebel, E., & Cushman, F. (2012). Expertise in moral reasoning? Order effects on moral judgment in professional philosophers and non-philosophers. Mind & Language, 27(2), 135–153. Shaver, K. G. (1985). The attribution of blame: Causality, responsibility, and blameworthi­ ness. New York: Springer-Verlag. Simon, D., Snow, C. & Read, S. J. (2004). The redux of cognitive consistency theories: evi­ dence judgments by constraint satisfaction. Journal of Personality and Social Psychology, 86, 814–837. Simon, D., & Holyoak, K. J. (2002). Structural dynamics of cognition: from consistency theories to constraint satisfaction. Personality and Social Psychology Review, 6, 283–94. Sloman, S. A. (2009). Causal models: How people think about the world and its alterna­ tives. New York: Oxford University Press. Sloman, S. A., Fernbach, P. M., & Ewing, S. (2009). Causal models: The representational infrastructure for moral judgment. In D. M. Bartels, C. Bauman, L. J. Skitka, & D. L. Medin (Eds.), Moral judgment and decision making. The psychology of learning and moti­ vation: Advances in research and theory (pp. 1–26). Amsterdam: Elsevier. Sloman, S. A., Fernbach, P. M., & Ewing, S. (2012). A causal model of intentionality judg­ ment. Mind and Language, 27(2), 154–180. Sloman, S. A., & Lagnado, D. (2015). Causality in thought. Annual Review of Psychology, 66(1), 223–247. Smart, J. J. C., & Williams, B. (1973). Utilitarianism: For and against. Cambridge: Cam­ bridge University Press.

Page 58 of 64

Causation in Legal and Moral Reasoning Snyder, M. L., Kleck, R. E., Strenta, A., & Mentzer, S. J. (1979). Avoidance of the handi­ capped: An attributional ambiguity analysis. Journal of Personality and Social Psychology, 37(12), 2297–2306. Spellman, B. A., & Kincannon, A. (2001). The relation between counterfactual (“but for”) and causal reasoning: Experimental findings and implications for jurors’ decisions. Law and Contemporary Problems, 64(4), 241–264. Sripada, C. S. (2012). Mental state attributions and the side-effect effect. Journal of Ex­ perimental Social Psychology, 48(1), 232–238. Sripada, C. S., & Stich, S. (2006). A framework fro the psychology of norms. In P. Car­ ruthers, S. Laurence, & S. Stich (Eds.), The innate mind (pp. 280–301). New York: Oxford University Press. Stapleton, J. (2008). Choosing what we mean by “causation” in the law. Missouri Law Re­ view, 73(2), 433–480. Stapleton, J. (2009). Causation in the law. In H. Beebee, P. Menzies, C. Hitchcock (Eds.), The Oxford handbook of causation (pp. 744–769). Oxford: Oxford University Press. Tadros, V. (2007). Criminal responsibility. Oxford: Oxford University Press. Tannenbaum, D., Uhlmann, E. L., & Diermeier, D. (2011). Moral signals, public outrage, and immaterial harms. Journal of Experimental Social Psychology, 47(6), 1249–1254. Taroni, F., Aitken, C., Garbolino, P., & Biedermann, A. (2006). Bayesian networks and probabilistic inference in forensic science. Chichester, UK: John Wiley & Sons. Tetlock, P. E., Kristel, O. V., Elson, S. B., Green, M. C., & Lerner, J. S. (2000). The psycholo­ gy of the unthinkable: Taboo trade-offs, forbidden base rates, and heretical counterfactu­ als. Journal of Personality and Social Psychology, 78(5), 853–870. Tetlock, P. E., Visser, P. S., Singh, R., Polifroni, M., Scott, A., Elson, S. B., Mazzocco, P., & Rescober, P. (2007). People as intuitive prosecutors: The impact of social-control goals on attributions of responsibility. Journal of Experimental Social Psychology, 43(2), 195–209. Thagard, P. (2000). Coherence in thought and action. Cambridge, MA: MIT Press. Thomson, J. J. (1976). Killing, letting die, and the trolley problem. The Monist, 59(2), 204– 217. Thomson, J. J. (1985). The trolley problem. The Yale Law Journal, 94(6), 1395. Todorov, A., Said, C. P., Engell, A. D., & Oosterhof, N. N. (2008). Understanding evaluation of faces on social dimensions. Trends in Cognitive Sciences, 12(12), 455–460. Trope, Y. (1986). Identification and inferential processes in dispositional attribution. Psy­ chological Review, 93(3), 239–257. Page 59 of 64

Causation in Legal and Moral Reasoning Turri, J., & Blouw, P. (2014). Excuse validation: A study in rule-breaking. Philo­ sophical Studies, 172(3), 615–634. (p. 601)

Uhlmann, E. L., Pizarro, D. A., & Diermeier, D. (2015). A person-centered approach to moral judgment. Perspectives on Psychological Science, 10(1), 72–81. Uhlmann, E. L., & Zhu, L. L. (2013). Acts, persons, and intuitions: Person-centered cues and gut reactions to harmless transgressions. Social Psychological and Personality Science, 5(3), 279–285. Uhlmann, E. L., Zhu, L. L., & Tannenbaum, D. (2013). When it takes a bad person to do the right thing. Cognition, 126(2), 326–334. Ullman, T. D., Tenenbaum, J. B., Baker, C. L., Macindoe, O., Evans, O. R., & Goodman, N. D. (2009). Help or hinder: Bayesian models of social goal inference. In Advances in neural information processing systems, Vol. 22. (pp. 1874–1882). Uttich, K. & Lombrozo, T. (2010). Norms inform mental state ascriptions: A rational expla­ nation for the side-effect effect. Cognition, 116(1), 87–100. Uustalu, O. (2013). The role of spontaneous evaluations, counterfactual thinking, and ex­ pert testimony in causal and blame Judgments. Unpublished MSc thesis, University Col­ lege London. van Inwagen, P. (1978). Ability and responsibility. The Philosophical Review, 87(2), 201– 224. Waldmann, M. R., Hagmayer, Y, & Blaisdell, A. P. (2006). Beyond the information given: Causal models in learning and reasoning. Current Directions in Psychological Science, 15(6), 307–311. Waldmann, M. R., & Dieterich, J. H. (2007). Throwing a bomb on a person versus throw­ ing a person on a bomb: Intervention myopia in moral intuitions. Psychological Science, 18(3), 247–253. Waldmann, M. R., Nagel, J., & Wiegmann, A. (2012). Moral judgment. In The Oxford hand­ book of thinking and reasoning (pp. 364–389). New York: Oxford University Press. Waldmann, M. R., & Wiegmann, A. (2010). A double causal contrast theory of moral intu­ itions in trolley dilemmas. In S. Ohlsson & R. Catrambone (Eds.), Proceedings of the 32nd annual conference of the Cognitive Science Society (pp. 2589–2594). Austin, TX: Cogni­ tive Science Society. Wallez, C., & Hilton, D. (2015). Unpublished data. Wellman, H. M. & Gelman, S. A. (1992). Cognitive development: Foundational theories of core domains. Annual Review of Psychology, 43(1):337–375.

Page 60 of 64

Causation in Legal and Moral Reasoning Wiegmann, A., & Waldmann, M. R. (2014). Transfer effects between moral dilemmas: A causal model theory. Cognition, 131(1), 28–43. Wigmore, J. H. (1913). The problem of proof. Illinois Law Journal, 8(2), 77–103. Wojciszke, B., Bazinska, R., & Jaworski, M. (1998). On the dominance of moral categories in impression formation. Personality and Social Psychology Bulletin, 24(12), 1251–1263. Woolfolk, R. L., Doris, J. M., & Darley, J. M. (2006). Identification, situational constraint, and social cognition: Studies in the attribution of moral responsibility. Cognition, 100(2), 283–301. Yoshida, W., Dolan, R. J., & Friston, K. J. (2008). Game theory of mind. PLoS Computation­ al Biology, 4(12), e1000254. Young, L., & Saxe, R. (2011). When ignorance is no excuse: Different roles for intent across moral domains. Cognition, 120(2), 202–214. Zultan, R., Gerstenberg, T., & Lagnado, D. A. (2012). Finding fault: Counterfactuals and causality in group attributions. Cognition, 125(3), 429–440.

(p. 602)

Notes: (1.) Adapted from Bolitho v. City and Hackney Health Authority [1997] UKHL 46. This case is also discussed in Schaffer (2010). (2.) This holds for issues of foreseeability, too; thus UK law maintains that an action is in­ tentional if the result was almost certain to occur given the defendant’s actions and the defendant was aware of this (Herring, 2010). (3.) The term applies to both legal theorists and judges. (4.) See, e.g., Kennedy 2007 UKHL 38. (5.) Readers who have not seen Breaking Bad should skip this example until they have! (6.) Empricial research backs this up insofar as Hank is the most commonly cited cause; but it also shows that people attribute some causality to Tuco as well (Wallez & Hilton, 2015). (7.) This is a simplification, because the legal issue to be decided will dictate the focus of the causal inquiry about what happened. Thus, for a murder charge, the focus will be on the action of the defendant. Nevertheless, legal judgment is not supposed to enter at this stage, and an “objective” notion of causation is usually assumed (Hart & Honore, 1985; Moore, 2009) but see Green (2015) and Tadros (2007) for alternative views. (8.) R v. Rafferty, 2007, EWCA Crim 1846. (9.) R v. Nedrick [1986] 3 All ER 1 Court of Appeal. Page 61 of 64

Causation in Legal and Moral Reasoning (10.) Despite the subtlety of this case, people’s intuitive judgments seem to fit with the le­ gal reasoning here (see later discussion). (11.) This need not mean it is not principled, but that the principles are difficult to articu­ late, and might include legal/policy considerations. (12.) Saunders v. Adams, 117, So 72, (Ala 1928). (13.) Note that this response is not always available—sometimes the pre-empted cause would have brought about the effect in exactly the same way (see the voting example lat­ er in the chapter). Moreover, the level of description of the outcome required by law is matched to the offense, and is not excessively fine-grained (for discussion, see Stapleton, 2008, and Lewis, 2000). (14.) Adapted from German Court case: (37 BGHSt 106, 6 July 1990). (15.) Stapleton’s point that causal claims are relative to a frame of inquiry is well taken. And legal frames help filter and narrow the focus of inquiry onto a manageable set of pu­ tative causes: the defendant’s actions or breach of duty, intervening actions of other par­ ties, and so on. Indeed, inquiry-based filtering also holds in our everyday causal inquiries, where we usually only care about a limited pool of candidate causes, and safely ignore a multitude of but-for conditions. However, her claim that the inquiry-relative nature of le­ gal causation militates against a general-purpose theory is less convincing (cf. Schaffer, 2010), especially since, as we shall argue later, the legal account she proposes is in many respects an informal version of the definition of actual causation defended in several cur­ rent philosophical theories. (16.) This is not intended as a reductive analysis of causation—because counterfactuals themselves depend on prior causal knowledge, and work as a test of causation, not a con­ stitutive definition. Moreover, this is not to claim that issues of process and production are fully accomodated by counterfactual analyses. (17.) Although not classified as such, these graphs bear strong resemblances to formal causal networks. (18.) Later studies showed that people’s stories were mediators in their decision-making, and not merely post hoc rationalizations of their verdicts (e.g., Pennington & Hastie, 1992). (19.) Pennington & Hastie (1988) acknowledge some of these shortcomings and propose a computational model based on connectionist models of explanatory coherence (Thagard, 2000). We discuss coherence-based models later; while such models address how multi­ ple constaints can be satisfied, it is unclear how purely associative representations can capture certain aspects of causal reasoning (Waldmann, Hagmayer, & Blaisdell, 2006).

Page 62 of 64

Causation in Legal and Moral Reasoning (20.) Again, Pennington and Hastie (1988) acknowledge this shortcoming—and indeed in an early paper (Pennington & Hastie, 1981) discuss various aspects of witness credibility and reliability. (21.) For introductions to Bayesian networks, see Fenton and Neil (2012); Taroni et al. (2006). (22.) Note that Pennington and Hastie (1981) present a Wigmore chart of this issue that maps closely onto our Bayesian network analysis. However, their subsequent work does not develop this approach, but concentrates on the story structures themselves. (23.) It’s not always clear how the valence of these links is determined—presumably through prior knowledge or learning. (24.) This argument is not conclusive: it depends on how rich the Bayesian modeling is. Coherence theorists refer to a simplistic version restricted to Bayes rule, but broader Bayesian approaches can accommodate some form of bi-directional reasoning (see later in this chapter and Jern et al., 2014). (25.) The authors state that they are operating with an informal notion of sufficiency—as used in everyday discourse—rather than a logical or technical notion. On this reading, an action (e.g., getting Joe to ingest the poisonous pill) can be sufficient for an outcome (e.g., killing Joe), even if that exact outcome does not occur (e.g., because Joe dies in an acci­ dent before the pill poisons him). Although they do not express it in such terms, this is akin to a counterfactual notion of sufficiency (i.e., the poison would have killed Joe if he hadn’t died in an accident first). (26.) Note that participants made both cause and blame judgments, and were explicitly instructed that these two might dissociate. This was illustated with the example of a child who accidently shoots one of his parents (unfortunately not as far-fetched an example as it seems). (27.) Alternatively, one can also try to combine the best of both worlds. There are several different dual-systems frameworks which state that moral judgments are produced by qualitatively different, and potentially conflicting, cognitive systems (e.g., an emotional and a deliberate system, Greene et al., 2001; or systems that assign value directly to ac­ tions versus the outcomes that ultimately result from these actions, Crockett, 2013; Cush­ man, 2013).

David A. Lagnado

Department of Experimental Psychology University College London London, Eng­ land, UK Tobias Gerstenberg

Page 63 of 64

Causation in Legal and Moral Reasoning Department of Brain and Cognitive Sciences Massachusetts Institute of Technology Cambridge, Massachusetts, USA

Page 64 of 64

The Role of Causal Knowledge in Reasoning About Mental Disorders

The Role of Causal Knowledge in Reasoning About Mental Disorders   Woo-kyoung Ahn, Nancy S. Kim, and Matthew S. Lebowitz The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.31

Abstract and Keywords Despite the lack of scientific consensus about the etiologies of mental disorders, practic­ ing clinicians and laypeople alike hold beliefs about the causes of mental disorders, and about the causal relations among symptoms and associated characteristics of mental dis­ orders. This chapter summarizes research on how such causal knowledge systematically affects judgments about the diagnosis, prognosis, and treatment of mental disorders. Dur­ ing diagnosis, causal knowledge affects weighting of symptoms, perception of normality of behaviors, ascriptions of blame, and adherence to the DSM-based diagnostic cate­ gories. Regarding prognosis, attributing mental disorders to genetic or neurobiological abnormalities in particular engenders prognostic pessimism. Finally, both clinicians and laypeople endorse medication more strongly as an effective treatment if they believe mental disorders are biologically caused rather than psychologically caused. They also do so when considering disorders in the abstract versus equivalent concrete cases. The chapter discusses the rationality, potential mechanisms, and universality of these phe­ nomena. Keywords: causal, diagnosis, prognosis, treatment, mental disorder

Lifetime prevalence rates for mental disorders are surprisingly high—in the double digits in every country examined by the World Health Organization (WHO; Kessler et al., 2007). In the United States, the country with the highest lifetime prevalence studied by the WHO, about half of the population has had a diagnosable mental disorder at least once throughout the life span. In any given year, approximately one-quarter of adults in the United States meet diagnostic criteria for one or more disorders (Kessler et al., 2005). In addition, mental disorders have been identified as one of the three most costly categories of medical conditions (Keyes, 2007); Insel (2008) estimated the direct and indirect eco­ nomic burden of serious mental illness at around $317 billion annually in the United States alone.

Page 1 of 26

The Role of Causal Knowledge in Reasoning About Mental Disorders Unfortunately, the causes of mental disorders are still unclear and controversial, and con­ sensus about the etiologies of mental disorders has eluded researchers. Thus, the Intro­ duction of the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5; American Psychiatric Association, 2013), which serves as the official nosology of mental disorders in the United States and a number of other countries, states that “a complete description of the underlying pathological processes is not possible for most mental disorders” (p. xi). Despite rapid gains made by research in clinical neuroscience, “scientists have not identified a biological cause of, or even a reliable biomarker for, any mental disorder” (Deacon, 2013, p. 847). Although many mental disorders have a herita­ ble component, few specific genes have been associated with any mental disorder; name­ ly, “we do not have and are not likely to ever discover ‘genes for’ psychiatric illness” (Kendler, 2005, p. 1250). Given this lack of scientific consensus, an important question is how practicing mental health clinicians and laypeople conceptualize the etiologies of (p. 604) disorders, and how such conceptualizations subsequently affect their diagnostic and clinical reasoning. We begin by discussing causality in clinicians’ and laypeople’s concepts of mental disorders. After reviewing evidence suggesting an abundance of different causal theories about mental disorders, the bulk of this chapter is devoted to reviewing the consequences of such causal explanations. These reviews are organized based on three critical aspects of reasoning when dealing with mental disorders: diagnosis, judgments of prognosis, and decisions about treatment.1 We review both clinicians’ and laypeople’s reasoning whenev­ er possible. In addition, most of the effects of causal knowledge demonstrated in this do­ main may not be restricted to reasoning involving mental disorders. Instead, they may stem from more domain-general cognitive biases, which we include in our discussion.

Causality in Mental Disorder Concepts Despite the lack of scientific consensus about the etiologies of mental disorders, practic­ ing clinicians as well as laypeople appear to have beliefs or hunches about the causes of mental disorders, and about the causal relations among symptoms of mental disorders. For example, when Kim and Ahn (2002b) asked practicing clinicians to specify any rela­ tions at all among the symptoms within mental disorders, 97% of all relations that the clinicians drew and labeled were causal relations or relations that imply causality (Carey, 1985; Wellman, 1990). This finding suggests that causality is important in mental-health clinicians’ concepts of mental disorders. For familiar disorders such as major depression, anorexia nervosa, and borderline personality disorder, clinicians were also reasonably in agreement with each other regarding the causal structure of the symptom-to-symptom re­ lations in the disorder. Laypeople also agreed with the general causal structure of clini­ cians’ theories, suggesting that these theories (at a broad level) are understandable in common-sense terms (Ahn & Kim, 2005; Kim & Ahn, 2002a). The causality inherent in people’s concepts of mental disorders suggests that people may essentialize them (Medin & Ortony, 1989). Namely, do people believe that mental disor­ Page 2 of 26

The Role of Causal Knowledge in Reasoning About Mental Disorders der categories have underlying, fundamental essences that cause the surface symptoms, such that within each mental disorder, a single essence shared by all instances of the dis­ order serves as a common cause for the surface symptoms (e.g., Ahn et al., 2001; Wald­ mann & Hagmayer, 2006; see top panel of Figure 30.1 for an illustration)? Borsboom and Cramer (2013) argued that, unlike medical diseases, mental disorders do not have com­ mon-cause structures in reality. To understand this contrast, consider a brain tumor that causes headaches, forgetfulness, and foggy eyesight. Here, the tumor is the root cause of the symptoms, and is separate from the symptoms occurring as a consequence of that cause. For instance, one can have headaches without a brain tumor, and a brain tumor can conversely exist without headaches. However, consider a mental disorder, such as major depression (MD), and its core symptoms (e.g., feeling sad or disinterested). It is highly unlikely that one can be depressed without feeling sad or disinterested; rather, de­ pression has been defined by its symptoms (Borsboom & Cramer, 2013; but see Bhugra & Mastrogianni, 2004, for a consideration of cross-cultural differences). Similarly, in the case of a substance use disorder, the presence of the symptoms (e.g., using a substance) is necessary to say that the disorder is present. So, Borsboom and Cramer (2013) argued that, rather than being represented in a common-cause structure wherein symptoms are only directly causally connected to the essence, (p. 605) mental disorders have symptomto-symptom causal relations in reality (as can be seen in many Bayesian network repre­ sentations discussed elsewhere in this volume: Griffiths, Chapter 7; Rehder, Chapter 20; Rottman, Chapter 6; see the bottom panel of Figure 30.1 for an illustration).

Figure 30.1 Illustration of essentialized and non-es­ sentialized causal structures.

Note that Borsboom and Cramer’s (2013) discussion is concerned with the metaphysics of mental disorders, rather than with people’s beliefs about mental disorders per se. Do peo­ ple represent mental disorders as causal networks of symptoms or separate sets of com­ mon-cause structures? Ahn, Flanagan, Marsh, and Sanislow (2006) found that laypeople Page 3 of 26

The Role of Causal Knowledge in Reasoning About Mental Disorders (i.e., undergraduate students) generally essentialize mental disorders, seemingly adher­ ing to the notion of underlying common-cause structures. For example, laypeople in their study believed that for a given mental disorder, there must be an underlying cause that is necessary and sufficient that also causes surface symptoms. Practicing clinicians, howev­ er, were more ambivalent about such statements, neither agreeing nor disagreeing with them. The reasons for such ambivalence are unclear; Ahn et al. (2006) used DSM-IV disorders that were judged to be familiar to clinicians and laypeople, but it is possible that the DSM-IV taxonomy itself does not correspond to clinicians’ own taxonomy. For in­ stance, rather than being construed as a single essentialized disorder, major depression may be construed by clinicians as having several distinct subtypes, each of which may be essentialized as a relatively distinct “disorder.”

Effects of Causal Knowledge on Diagnosis How does causal knowledge affect the diagnosis of mental disorders? In the United States, formal diagnoses of mental disorders are made based on the DSM. As mentioned earlier, etiologies in psychopathology are controversial. As a result, the modern editions of the DSM (i.e., DSM-III, APA, 1980; DSM-III-R, APA, 1988; DSM-IV, APA, 1994; DSM-IVTR, APA, 2000; DSM-5, APA, 2013) have all adopted a descriptive approach intended “to be neutral with respect to theories of etiology” (APA, 1994; pp. xvii–xviii). Thus, most mental disorders in the DSM-5 (APA, 2013) are currently defined in terms of a set of sur­ face symptoms or conditions the patient must meet for diagnosis (in addition to functional impairment). For example, schizophrenia is defined as having two or more of the following five symp­ toms (along with an impaired level of functioning): hallucinations, delusions, disorganized speech, grossly disorganized or catatonic behavior, and negative symptoms. Thus, if clini­ cians follow the prescribed diagnostic approach of the DSM, they will search for symp­ toms in their patients that match the DSM diagnostic criteria and make diagnoses accord­ ingly, without incorporating any additional notions they may have of how these symptoms may affect each other and, in many disorders, what caused these symptoms in the first place. Furthermore, in most cases, all symptoms are weighted equally in the DSM. Nevertheless, causal knowledge affects various aspects of diagnoses of mental disorders. In this section, we review how causal knowledge determines which symptoms are seen as especially important, which can eventually affect how mental disorders are diagnosed. In addition, we review how causal knowledge affects judgments of how abnormal a person’s behaviors are. While the aforementioned topics concern effects of existing causal knowl­ edge, we also review recent studies on factors affecting the way that laypeople and clini­ cians infer biological or psychological bases of mental disorders in the first place.

Page 4 of 26

The Role of Causal Knowledge in Reasoning About Mental Disorders

Effects of Causal Knowledge on Feature Weighting: Causal Status Ef­ fect Kim and Ahn (2002a, 2002b) found from both clinicians and laypeople that when symp­ toms are causally related, the ones that are seen as causing other symptoms are regarded as more important and essential to the disorder concept than their effects. We have called this general phenomenon the causal status effect (Ahn, 1998). Namely, a feature serving as a cause in a causal relationship is considered more central than a feature serving as an effect, when all else is held equal (see Ahn & Kim, 2000; Sloman, Love, & Ahn, 1988, for a discussion of additional factors that should be expected to interact with the causal status effect in determining feature centrality). For instance, according to DSM-IV, “distorted body image” and “absence of period (in women) for more than 3 menstrual cycles” were both required in order to warrant a diagnosis of anorexia nervosa, making these two symptoms equally important for classification.2 However, according to data collected from clinicians by Kim and Ahn (2002a), “distorted body image” was most causally central in the clinicians’ theories, whereas “absence of the period (in women) for more than 3 men­ strual cycles” was rated the most causally peripheral. Furthermore, “distorted body im­ age” was considered to be the most diagnostically important of the criteria, and (p. 606) “absence of the period (in women) for more than 3 menstrual cycles,” though also a DSMIV diagnostic criterion for anorexia nervosa, was considered to be the least diagnostically important. The causal status hypothesis can readily be intuitively understood, and its real-life exam­ ples are abundant. We tend to form an illness concept based on the virus (e.g., Ebola; in­ fluenza) that causes the symptoms (e.g., fever; coughing) rather than by the symptoms per se. DNA structure causes many other properties of plants and animals (e.g., appear­ ance; mechanism of reproduction), and hence is considered the most important feature in their classification. Indeed, after DNA sequencing became available, some species were reclassified in the Linnaean taxonomy. For example, the domestic dog was once consid­ ered its own separate species, but in 1993 the Smithsonian Institution and the Society of Mammalogists reclassified it as a subspecies of the gray wolf, largely based on genetic in­ formation that had become newly available. In law, the severity of crimes often depends more on the suspects’ intentions rather than their surface behaviors (e.g., conspiracy to commit murder is typically a much more serious offense than involuntary manslaughter, even if the victim died only in the second case). In judging whether people are nice or not, we tend to place more weight on their intentions (i.e., what motivated or caused their actions) rather than on what they did. Indeed, the causal status effect has been demonstrated not only in the domain of mental disorders, but also with controlled artificial stimuli (e.g., Ahn, Kim, Lassaline, & Dennis, 2000) and across various types of categories (see Ahn & Kim, 2000, for a more detailed overview of the causal status effect). For instance, different features are central for natur­ al kinds and artifacts: in natural kinds, internal or molecular features are more conceptu­ ally central than functional features, but in artifacts, functional features are more concep­ tually central than internal or molecular features (e.g., Barton & Komatsu, 1989; Gelman, Page 5 of 26

The Role of Causal Knowledge in Reasoning About Mental Disorders 1988; Keil, 1989; Rips, 1989). Ahn (1998) showed that the causal status effect can explain this phenomenon. In natural kinds, internal/molecular features tend to cause functional features (e.g., cow DNA determines whether or not cows give milk) but for artifacts, func­ tional features determine compositional structure (e.g., chairs are used for sitting, and for that reason, they are made of a hard substance). The causal status effect appears to be fairly cognitively primitive, as shown by two sets of studies. First, the effect has been demonstrated with young children (Ahn, Gelman, Am­ sterlaw, Hohenstein, & Kalish, 2000), although it might not be innate, as Meunier and Cordier (2008) found the causal status effect with 5-year-olds but not with 4-year-olds in biological categorization. Second, although a number of studies had supported the notion that theory-based categorization may be a fundamentally slower, more deliberately exe­ cuted process than similarity-based categorization (e.g., Smith & Sloman, 1994), Luh­ mann, Ahn, and Palmeri (2006) found the causal status effect even under speeded condi­ tions for categorization. They showed that the time required to make judgments using causal knowledge or theory was equivalent to that using base-rate information of fea­ tures, which is traditionally considered a more rapid, associative reasoning process. In the domain of mental disorder diagnosis, Flores, Cobos, López, Godoy, and GonzálezMartín (2014) examined whether the use of causal theories is the result of fast and auto­ matic processes that take place very early on in clinical reasoning to comprehend cases, or slow, deliberate processes triggered only when clinicians are asked to make a diagnos­ tic judgment. To test these possibilities, they presented clinicians with information that was either consistent or inconsistent with widely accepted causal theories and measured clinicians’ reading times. For instance, according to a relatively well-established causal theory in clinical science regarding eating disorders, a strong fear of gaining weight (X) causes a refusal to maintain a minimal body weight (Y), which in turn causes overt behav­ iors such as a strict diet, vomiting, and laxative abuse (Z). Therefore, a clinical report stating that X is followed by Y, which is followed by Z, would be consistent with this causal theory, whereas Z followed by Y, which is followed by X, would be inconsistent. The idea is that inconsistent ordering, as in the latter case, would slow down the clinicians’ reading times if their causal theories are spontaneously activated while reading the re­ port. The authors found that temporal order manipulated based on the causal theories af­ fected clinicians’ reading times but not the reading times of students (who presumably do not have pre-existing causal theories).

Rationale for the Causal Status Effect Why would causally central features be conceptually central as well, as repeatedly shown in demonstrations of the causal status effect within and outside of the mental disorder do­ main? One of the most (p. 607) important functions of concepts is to allow us to make in­ ductive inferences about unknown features (e.g., Anderson, 1990). People appear to be­ lieve that given X causing Y, hypothetical features associated with the cause (X) (e.g., oth­ er effects besides Y that might be caused by or associated with X) are more likely to be

Page 6 of 26

The Role of Causal Knowledge in Reasoning About Mental Disorders present than unknown features associated with the effect (Y), even when the associative strengths are equal (Proctor & Ahn, 2007). Suppose a clinician learned that his patient has chronic feelings of emptiness (feature X), and she devotes herself to work to the exclusion of friendships and leisure (feature Y). The clinician might spontaneously infer that this patient has chronic feelings of emptiness because she is excessively devoted to work (Y→X). The idea is that given this causal knowledge, the clinician would believe a potential symptom associated with the cause feature (e.g., paying extraordinary attention to checking for possible mistakes, which might be associated with excessive devotion to work) is more likely to be true than an un­ known symptom associated with the effect feature (e.g., impulsive harmful actions, which might be associated with chronic feelings of emptiness). A different clinician might infer a different causal relation, for example, that this patient is excessively devoted to work be­ cause she chronically feels emptiness (X→Y). Given the opposite causal direction, the op­ posite inferences would be made; the clinician would believe that impulsive harmful ac­ tions would be more likely to be true than paying extraordinary attention to checking for possible mistakes. Proctor and Ahn (2007) tested this idea by experimentally manipulating the causal rela­ tions of identical features. Participants—students and practicing clinicians—were first presented with a pair of mental disorder symptoms for a patient. In one condition, they were told that symptom X causes symptom Y. In the other condition, participants learned that Y causes X. Then participants were asked about the likelihood that the target person would have other symptoms. One of the questions asked about the likelihood of a feature (X' henceforth) that is judged to be more strongly associated with X than with Y. Similarly, Y' was judged to be more strongly associated with Y than with X. When they were told that X causes Y, participants judged that X' would be more likely to be present, whereas when told that Y causes X, they judged that Y' would be more likely to be present. That is, their inductive judgments of an unknown feature were guided by the causal status of fea­ tures with which the unknown features were associated. Note that these results are diffi­ cult to explain by a purely associationist account, because the associative strength be­ tween X and X' and between Y and Y' were equated in the study (see Le Pelley, Griffiths, & Beesley, Chapter 2 in this volume, for a similar discussion). Because people appear to believe that cause features allow for more inductive inferences (as demonstrated in the preceding study), they may weigh cause features more heavily than effect features. Another possible reason for weighing causes more strongly than effects is based on psy­ chological essentialism (Medin & Ortony, 1989; see Ahn, 1998; Ahn & Kim, 2000, for dis­ cussion of psychological essentialism with respect to the causal status effect). If a mental disorder does indeed have a common-cause structure, then the more fundamental a cause is, the more useful it would be as an indicator of the presence/absence of the disorder’s common cause or causal essence. That is, if the causal relations among symptoms are probabilistic rather than deterministic, and features can be caused by multiple causes, features that are closer to the common cause in a causal network (in the top panel of Fig­ ure 30.1, feature B) would be more indicative of the common cause (feature A in the top Page 7 of 26

The Role of Causal Knowledge in Reasoning About Mental Disorders panel of Figure 30.1) than those that are further away from the common cause (feature C in the top panel of Figure 30.1). Thus, cause features should be weighted more than ef­ fect features if a mental disorder has a common-cause structure. If a mental disorder is believed to be essentialized with a common-cause structure with multiple levels of causal relations, as illustrated in the top panel of Figure 30.1, then the closer a feature is to the disorder’s fundamental common cause, the more useful it tends to be as an indicator of the presence/absence of this common cause or causal essence. For instance, feature B would be more indicative of feature A than feature C would be in the top panel of Figure 30.1. To explain, note that in most causal relations, an event can be caused by more than one possible cause, such that if one believes that A causes B, knowing that B is present does not guarantee that A must have been present because B could have been caused by an event other than A. Furthermore, if one believes that A causes B, which in turn causes C, knowing that C is present is even less likely to guaran­ tee that A was present; C could have been caused by an event other than B, which, in turn, could have been caused by an event other than A—that is, the further away an event is from the root cause, the worse the event is (p. 608) as an indicator of the root cause’s presence. Thus, the deeper cause a feature is in a causal chain, the more informative it can be about the root cause of the chain. For that reason, people may place more weight on cause features than their effect features.

Effects of Causal Knowledge on Judgments of Abnormality Causal knowledge also influences the degree to which clinicians perceive people’s behav­ iors to be psychologically abnormal in a global sense (e.g., abnormal; psychologically un­ healthy; in need of treatment). In an informal observation of clinicians’ reasoning tenden­ cies, Meehl (1973) noted that when clinicians felt that they understood a patient, the pa­ tient seemed less abnormal to them; that is, they seemed to apply an “understanding it makes it normal” heuristic in gauging abnormality. Ahn, Novick, and Kim (2003) empirically examined whether causal explanations indeed influence clinicians’ overall perceptions of how abnormal a person is. We developed descriptions of hypothetical pa­ tients with artificial disorders in which three behavioral symptoms were described as be­ ing linked in a causal chain (e.g., “because Penny frequently suffers from insomnia and is in a habitual state of sleep deprivation, she has trouble remembering the names of ob­ jects. This memory problem, in turn, leads her to suffer from episodes of extreme anxiety, because she fears that it will cause her to embarrass herself in front of others”; Ahn et al., 2003, p. 747). In one experiment, one group of practicing clinical psychologists and clini­ cal graduate students received these descriptions. A second group also received deeper causal explanations for the root symptom in each of these causal chains. For instance, the phrase “because she is very stressed out due to her workload” was added as a causal ex­ planation for why “Penny frequently suffers from insomnia.” As predicted by Meehl (1973), clinicians who received the additional life-event root cause explanations judged that these people were less abnormal (i.e., more psychologically healthy) than those who did not receive such explanations. Page 8 of 26

The Role of Causal Knowledge in Reasoning About Mental Disorders A parallel phenomenon in judgments of the need for psychological treatment—a concrete practical measure of judgments of abnormality—has also been documented (Kim & LoSavio, 2009). Vignettes of artificial disorder cases similar to those described in the pre­ ceding were judged less in need of psychological treatment when externally controlled precipitating events (e.g., “ever since he was drafted into the army”) were described as launching the causal chain of symptoms than when internally controlled events (e.g., “ever since he enlisted in the army”) were described, despite the fact that both explana­ tions were rated as equally satisfactory—that is, “understanding makes it normal” when deeper causal explanations for the root symptoms are precipitated by factors outside the person’s control. Thus, not only do reasoners seem to adhere to the notion that “under­ standing it makes it normal,” but their judgments also align with the more radical infer­ ence that “understanding it solves the problem.” Yet in reality, the person would be no less in need of subsequent support or intervention than if a causal explanation had not presented itself to the reasoner. Kim, Paulus, Gonzalez, and Khalife (2012) reported evidence for a proportionate-response effect, such that judgments of abnormality are predicted by the proportionality (in terms of valence and magnitude) of the behavioral response to the precipitating event. Propor­ tionality between cause and effect has long been observed to serve as a cue to causality (Einhorn & Hogarth, 1986); when an effect is disproportionate to its cause, people are less likely to perceive the causal connection. For example, people found it difficult to ac­ cept the germ theory of disease when it was first proposed, in part because germs were said to be microscopically small, whereas the effects of these germs could be staggering­ ly large, wiping out huge swaths of the human population (Medin, 1989). Thus, Kim, Paulus, Gonzalez, et al. (2012) hypothesized that the proportionality between precipitating events and subsequent behaviors would predict clinicians’ judgments of ab­ normality. They presented practicing clinical psychologists with descriptions of hypotheti­ cal people who were either exhibiting behaviors characteristic of post-traumatic stress disorder (PTSD), behaviors characteristic of depression, or mildly distressed (sub-clinical) behaviors. These behaviors were prefaced by descriptions of either severely negative (traumatic) events or mildly negative (everyday) events. Clinicians’ judgments of the psy­ chological abnormality of these hypothetical behaviors were strongly influenced by the proportionality between the severity of the event relative to the behaviors. In fact, ex­ hibiting only mildly distressed behaviors after a traumatic event was judged to be just as abnormal as exhibiting full-blown PTSD or depression symptoms after the same event, de­ spite the fact that only the presence of actual disorder symptoms is supposed to be treat­ ed as evidence of disorder in the DSM. (p. 609) In keeping with Meehl’s (1973) assertion that clinicians behave as though “understanding it makes it normal,” clinicians’ judg­ ments of the ease of understanding the behaviors also correlated with their abnormality judgments (Kim, Paulus, Gonzalez, et al., 2012). That is, the more proportionate the event was to the behaviors, the easier clinicians found it to understand the behaviors, and the less clinicians rated the behaviors to be psychologically abnormal.

Page 9 of 26

The Role of Causal Knowledge in Reasoning About Mental Disorders

Judgments of Psychological Versus Biological Bases of Mental Disor­ ders Another broad classification of mental illnesses that people may make is whether they are biologically or psychologically based. That is, how do people decide to categorize a given illness as a brain disease or as a disorder of mind? Entertaining this judgment involves some degree of acceptance, even temporarily, of mind–body dualism, given that all disor­ ders of mind must be brain diseases. Yet this judgment will have strong down-stream in­ fluences on choices of treatments, ascriptions of blame, and judgments about prognosis, as will be discussed later in this chapter. For physicalists, biological causes are lower level explanations for psychosocial explana­ tions of behavior. For instance, an overactive amygdala is experienced as anxiety; in reali­ ty, either could be invoked as an explanation for poor performance in a swim meet. In­ stead of treating these two as different levels of explanation, however, laypeople as well as clinicians appear to treat these two causes as being complementary; that is, laypeople tend to treat the relationship between biological and psychosocial explanations similarly to that between internal and external attributions. For instance, upon learning that an ex­ am was easy, one may automatically discount the possibility that a student who got an A on the exam worked hard. Similarly, people’s endorsement of biologically construed bases of behaviors (e.g., genes, brain structures, neurotransmitters) appears to be inversely re­ lated to their endorsement of psychologically construed bases of behaviors (e.g., inten­ tionality, desire, motivations). Some evidence for this inverse relationship comes from Miresco and Kirmayer’s study (2006) with clinicians. They found that clinicians judged psychological factors to be im­ portant for explaining dysfunctional behavior exhibited by a person with narcissistic per­ sonality disorder. In contrast, biological factors were seen as important for explaining that same dysfunctional behavior when exhibited by a person having a manic episode. That is, narcissistic personality disorder was seen as psychologically based, whereas man­ ic episodes were seen as biologically based, even when explaining the same symptom across the two disorders. Clinicians’ ascriptions of blame for the symptoms also varied depending on the underlying mental disorders; the person was blamed more for the same behavior if the person was described to have narcissistic personality disorder rather than a manic episode, presumably because the latter was assumed to be biologically caused (Miresco & Kirmayer, 2006). Ahn, Proctor, and Flanagan (2009) found a similar relationship. They examined the entire set of mental disorders appearing in the DSM-IV-TR (APA, 2000), and measured mental health clinicians’ beliefs about their biological, psychological, and environmental bases. Clinicians’ attributions of disorders to environmental and to psychological factors were highly positively correlated, but there was a large negative correlation between clini­ cians’ attributions to biological factors and to environmental factors. Clinicians conceptu­ alized mental disorders along a single continuum spanning from highly biological disor­

Page 10 of 26

The Role of Causal Knowledge in Reasoning About Mental Disorders ders (e.g., autistic disorder) to highly non-biological disorders (e.g., adjustment disor­ ders). Miresco and Kirmayer (2006) concluded that these kinds of results indicate that “mental health professionals continue to employ a mind-brain dichotomy when reasoning about clinical cases” (p. 913). Yet, rather than believing that mind and body are separate enti­ ties, clinicians’ conceptualizations may reflect explanatory dualism (or functionalism; Put­ nam, 1975), wherein biological and psychological explanations are viewed as complemen­ tary ways of describing human behaviors (Davidson, 1970; Fodor, 1974; Kendler, 2001). Explanatory dualism does not make any ontological assumptions about mind and brain; it simply states that any given mental activities may sometimes be better explained using biological constructs and may at other times be better explained using psychological con­ structs. Explanatory dualism is therefore not necessarily irrational (Dennett, 1995; Put­ nam, 1975). For instance, although a person’s feelings of depression can be explained in terms of the activity of neurons, neurotransmitters, neuromodulators, and hormones, the same feelings can also be explained in terms of experiencing interpersonal conflict and stress. This latter explanation can (p. 610) be undertaken without necessarily denying depression’s biological basis. Thus, the formal disciplines of biology and psychology can coexist without denying each other’s validity. However, recent studies have demonstrated that the tension between biological and psy­ chological explanations is strong enough to lead to an irrational bias in both clinicians and laypeople (Kim, Ahn, Johnson, & Knobe, 2016; Kim, Johnson, Ahn, & Knobe, 2016). People’s judgments appeared to reflect the belief that psychological causes and biological causes have an inverse relationship, such that when psychological causes become more salient or plausible, biological causes are automatically discounted (and vice versa). Thus, psychological and biological causes share a common effect (e.g., surface symptoms), where the two types of causes compete with each other (e.g., Waldmann, 2000) in a man­ ner inconsistent with the notion of explanatory dualism. Specifically, Kim, Ahn, et al. (2016) and Kim, Johnson, et al. (2016) asked clinicians and laypeople to judge the psychological and biological bases of everyday and disordered be­ haviors. Behaviors were either described in the context of a person (e.g., Sarah’s repeti­ tive behaviors, in which she checks her window locks three times upon leaving the house) or in the abstract (e.g., repetitive behaviors, which are generally characteristic of a par­ ticular disorder). As shown in past work, concretely described behaviors (i.e., in the con­ text of a specific person) increased endorsement of psychological bases of those behav­ iors. More importantly, in line with an inverse dualism account, people correspondingly reduced their endorsement of the biological bases of the same behavior, relative to their judgments when behaviors were described abstractly. Thus, inverse dualism could lead to irrational judgments; mere changes in the framing of behavior descriptions changed judg­ ments of the psychological and biological bases of behaviors. The preceding results are consistent with well-documented findings in causal reasoning. Causal model theory predicts that there will be competition among causes in common-ef­ Page 11 of 26

The Role of Causal Knowledge in Reasoning About Mental Disorders fect structures, but not among effects in common-cause structures in causal learning (Waldmann, 2000; see Le Pelley et al., Chapter 2 in this volume). In the case of inverse du­ alism, biological and psychological causes potentially give rise to the same behavior (in a common-effect structure), and people accordingly behave as though accepting one cause automatically denies the other.

Effects of Causal Knowledge on DSM-Based Diagnosis As we have discussed, the modern versions of the DSM (i.e., from 1980 to the present) have deliberately excluded information about causal etiology from the diagnostic criteria for nearly all disorders. One of the few exceptions to this general rule was, for many years, the bereavement exclusion in the diagnosis of major depression (DSM-III, 1980; DSM-III-R, 1988; DSM-IV, 1994; DSM-IV-TR, 2000). Specifically, even if the requisite be­ havioral symptoms for major depression were clearly present, a diagnosis of depression was not to be made if the person was in a period of bereavement due to the loss of a loved one. A number of observers argued that the bereavement exclusions should be ex­ panded to include all major loss events (e.g., divorce, job loss, sudden loss of all posses­ sions, as in the case of a natural disaster; Horwitz & Wakefield, 2007; Wakefield, Schmitz, First, & Horwitz, 2007), to avoid falsely pathologizing normal coping responses to loss. The DSM-5 (2013) Task Force, however, opted instead to remove the bereavement exclu­ sion criterion from the list of formal criteria in the DSM-5 (2013); one justification given for this decision was to avoid missing opportunities to provide mental health care to those in distress. The case of depression therefore provided a particularly strong test of the “understand­ ing it makes it normal” hypothesis as applied to DSM-based diagnosis. Judgments of ab­ normality do not necessarily translate to diagnostic judgments that would be made while considering whether to follow the recommendations of the DSM. Notably, the “under­ standing it makes it normal” hypothesis, as applied to diagnosis, would be that a bereave­ ment or other loss life event—anything severely negative—would lead to reduced en­ dorsement of a diagnosis of depression. Kim, Paulus, Nguyen, and Gonzalez (2012) presented practicing clinical psychologists with vignette case descriptions of people ex­ hibiting severe depression symptoms. These were prefaced either by a bereavement-re­ lated loss event (e.g., the death of the person’s spouse), non-bereavement loss event (e.g., the end of the person’s marriage), everyday event (e.g., going about everyday life with one’s spouse), or no event. Clinical psychologists reduced their endorsement of a diagno­ sis of depression when the case was prefaced by either type of loss event, compared to being prefaced by the everyday or no event—that is, they did not strictly adhere to the di­ agnostic criteria listed in any version of the (p. 611) modern DSM system in making their judgments. Rather, they appeared to subscribe to the notion that “understanding it makes it normal” and less worthy of depression diagnosis, in concert with Horwitz and Wakefield’s (2007) arguments.

Page 12 of 26

The Role of Causal Knowledge in Reasoning About Mental Disorders Interestingly, while some may argue that DSM-5 (APA, 2013) unnecessarily pathologized normal reactions to loss by removing the bereavement exclusion criteria for major de­ pression, in other disorders such as PTSD, DSM-5 (APA, 2013) may conversely have over­ looked some abnormal cases by explicitly specifying the causes required for a diagnosis. In order for a diagnosis of PTSD to be made, the patient must have experienced a trau­ matic event involving exposure to actual or threatened death, injury, or sexual violence, not just anything the person finds traumatizing. According to the “understanding makes it normal” effect, however, a person who has PTSD symptoms after experiencing the type of trauma designated by DSM-5 (APA, 2013) would be judged less abnormal than someone who has PTSD symptoms without that trauma. Yet, the former would be the one eligible for the diagnosis, and therefore more abnormal in an official sense. Indeed, Weine and Kim (2014) found that practicing clinicians, clinical psychology graduate students, and laypeople alike judged people with PTSD symptoms following a DSM-5-qualifying trau­ matic event to be less psychologically abnormal, yet simultaneously more likely to war­ rant a PTSD diagnosis than people with the identical symptoms following an upsetting event that does not meet DSM-5 criteria.

Prognosis Psychosocial explanations for mental disorders (e.g., those based on such factors as child­ hood trauma and current or past stress) are increasingly being replaced by biological ex­ planations (e.g., genes or brain abnormalities; Pescosolido, Martin, Long, Medina, Phelan, & Link, 2010). This shift has important consequences in that the two types of explana­ tions yield differences in inductive inferences. One type of clinical judgment that has been consistently shown to differ, given biological versus psychosocial explanations for psychopathology, is that regarding prognosis. In particular, biological explanations, such as those that attribute mental disorders to genetic or neurobiological abnormalities, have been shown to engender prognostic pessimism—a belief that the disorders are relatively unlikely to remit or be treated successfully (Kvaale, Haslam, & Gottdiener, 2013). A possi­ ble mechanism underlying this effect is people’s adherence to neuro- and genetic essen­ tialism, in which the brain and DNA, respectively, are seen as the fundamental “essences” underlying mental disorders (Dar-Nimrod & Heine, 2011; Haslam, 2011). Because these essences are perceived as deeply rooted and relatively immutable causes of symptoms, neuro- and genetic essentialism can lead to the belief that mental disorders are difficult to overcome or to treat effectively. For example, Phelan (2005) showed that attributing an individual’s mental disorder to ge­ netic causes increased laypeople’s tendency to view the disorder as likely to persist throughout the individual’s life. Bennett, Thirlaway, and Murray (2008) similarly found that when participants read a vignette describing a person with schizophrenia, they rated the individual as less likely to recover from the disorder when they were told it was ge­ netically caused. Furthermore, Deacon and Baird (2009) found that when undergraduates were asked to imagine themselves as suffering from depression, they were more pes­ simistic about “their” prognoses when they were told that depression was caused by a Page 13 of 26

The Role of Causal Knowledge in Reasoning About Mental Disorders chemical imbalance than when they were told that it was caused by a combination of bio­ logical, psychosocial, and environmental causes. Lebowitz, Rosenthal, and Ahn (2016) also found that a case of attention-deficit/hyperactivity disorder (ADHD) was rated as less treatable when symptoms were explained biologically than when they were explained psy­ chosocially. More recently, such findings documenting an association between biological explanations for psychopathology and prognostic pessimism have been shown to extend to people con­ sidering their own real psychiatric symptoms. In the first such study, Lebowitz, Ahn, and Nolen-Hoeksema (2013) found that the more people attributed their own depressive symptoms to genetic and biochemical causes, the longer they expected to remain de­ pressed. Other work has shown that people with and without symptoms of generalized anxiety disorder predict that the disorder will have a longer duration when they are pre­ sented with a biological explanation (versus no explanation) of its etiology (Lebowitz, Pyun, & Ahn, 2014). Kemp, Lickel, and Deacon (2014) provided the “results” of a fake bio­ chemical test to individuals who screened positive for a past or current depressive episode. They found that participants who were told that they had a “serotonin deficiency” (suggesting their depression had been caused by a “chemical imbalance”) ex­ perienced more prognostic pessimism than participants (p. 612) in a control condition who were told their neurotransmitter levels fell within normal limits. Related to the issue of prognostic pessimism and essentialism, biological explanations for psychopathology can also engender the belief that affected individuals lack agency or control over their psyches, and that this is why they are unlikely to overcome their disor­ ders. For example, in one study, undergraduates given a “chemical imbalance” explana­ tion of depression rated themselves as less able to effectively control depression on their own if they were to suffer from it, compared to participants given a multifactorial expla­ nation (Deacon & Baird, 2009). In another experiment, participants randomly assigned to be told that they carried a gene that increased their risk of alcoholism rated themselves as having less control over their drinking than participants who were told they did not have such a gene (Dar-Nimrod, Zuckerman, & Duberstein, 2013). Kemp, Lickel, and Dea­ con (2014) found that participants who were led to believe that their depression stemmed from a serotonin imbalance scored lower on a measure of confidence in their ability to regulate their own negative moods than did individuals in a control condition. Studies documenting how people perceive the downsides of recent advances linking psy­ chopathology to biological factors reveal that some do foresee risks associated with prog­ nostic pessimism and decreased feelings of agency. For example, while people affected by mental disorders appear largely receptive to the use of psychiatric genetics and neuro­ science in mental health, a subset of those people also perceive the possibility that associ­ ated fatalistic beliefs could lead to discrimination (e.g., in employment or insurance cov­ erage), psychological distress, and even suicide (Illes, Lombera, Rosenberg, & Arnow, 2008; Laegsgaard, Kristensen, & Mors, 2009; Lebowitz, 2014; Meiser et al., 2008).

Page 14 of 26

The Role of Causal Knowledge in Reasoning About Mental Disorders Notably, the association between biological causal explanations and prognostic pessimism does not appear to be limited to mental disorders. For example, among overweight and obese Americans, attributing one’s own weight status to biological causes appears to be associated with the belief that body weight is unchangeable (Pearl & Lebowitz, 2014). In line with the aforementioned findings regarding reduced feelings of agency among peo­ ple with essentialist views of their symptoms, Burnette (2010) found that (mostly over­ weight or obese) dieters who viewed body weight as fixed reported being less persistent after weight-loss setbacks. Genetic explanations for inactivity have also been shown to decrease self-efficacy and intentions to exercise (Beauchamp, Rhodes, Kreutzer, & Ru­ pert, 2011) and to increase consumption of unhealthy food (Dar-Nimrod, Cheung, Ruby, & Heine, 2014). Related research has found that information about the American Medical Association’s recent decision to classify obesity as a disease reduced obese individuals’ concerns about weight and the importance they placed on health-focused dieting, ulti­ mately predicting higher-calorie food choices (Hoyt, Burnette, & Auster-Gussman, 2014). In general, it has been predicted that the increasing emphasis on understanding the ge­ netic bases of health and illness could increase fatalism about health outcomes (Dar-Nim­ rod & Heine, 2011).

Treatment Among both clinicians and laypeople, different causal attributions for mental disorders al­ so appear to consistently affect judgments about the likely effectiveness of treatment, and what types of treatment may be seen as appropriate. In light of the ongoing shift toward increased emphasis on the biological factors involved in psychopathology (e.g., Insel & Wang, 2010), it is particularly important to understand how biological conceptualizations of mental disorders affect beliefs about treatment. Data from nationally representative US samples indicate that attributing mental disor­ ders to biological causes is associated with an increased perception that treatment is ad­ visable (Pescosolido et al., 2010; Phelan, Yang, & Cruz-Rojas, 2006). However, there is al­ so evidence that views about different types of treatment for mental disorders can be dif­ ferentially affected by biological explanations. For example, as Americans’ belief that de­ pression stems from biological causes increased significantly from 1996 to 2006, so did their preferences for biologically focused treatment (Blumner & Marcus, 2009). Indeed, among laypeople, biological explanations for mental disorders appear to be associated with increased belief in the effectiveness of medication but decreased belief in the effec­ tiveness of psychotherapy (Deacon & Baird, 2009; Iselin & Addis, 2003; Lebowitz et al., 2012). This effect has also been documented among individuals affected by depression (Kemp et al., 2014). These findings suggest that mind–body dualism is prevalent in people’s thinking about mental disorders—that is, people may conceive of the mind and the body/brain as sepa­ rate entities, with the former construed as having psychological properties while the lat­ ter is construed as having biological (p. 613) properties. Thus, “non-biological” treatments Page 15 of 26

The Role of Causal Knowledge in Reasoning About Mental Disorders that are seen as acting through psychological or psychosocial mechanisms (e.g., psy­ chotherapy) would be viewed as less likely to be effective when psychopathology is attrib­ uted to biological causes (and thus perceived as residing in the brain, genes, or body), while biomedical treatments (such as pharmacotherapy) are seen as more likely to be ef­ fective in such cases. Notably, despite their psychopathology-related expertise, mental-health clinicians do not appear to be immune to such dualist beliefs. Ahn et al. (2009) asked practicing mental health clinicians to consider a variety of mental disorders, list their cause(s), and rate each cause with respect to the degree to which it was psychological, environmental, and biological in nature. At least 30 days later, a subset of these clinicians was also asked to judge the degree to which they felt that psychotherapy versus medication would be effec­ tive in treating each mental disorder. In general, they found that clinicians rated medica­ tion as more effective for disorders that had been perceived as more biologically and less psychologically environmentally based, while they rated psychotherapy as more effective for disorders that were seen as more psychologically/environmentally and less biological­ ly based. Causal beliefs have been shown to influence clinicians’ treatment decisions, even at the level of the individual patient. A study by de Kwaadsteniet, Hagmayer, Krol, and Witte­ man (2010) presented practicing clinicians with two realistic, complex case descriptions of mental health patients, along with information about each patient’s environment, re­ cent events, and results of psychometric tests. Clinicians were asked to create a “causal map” for each patient by diagramming the respective roles of the various psychosocial factors in causing each patient’s symptoms. Results indicated that although the resulting individual causal maps differed between clinicians, each map was predictive of which in­ terventions the clinician rated as most effective for each case. This finding underscores the notion that clinicians’ causal beliefs play an important role in their clinical reasoning.

Issues for Future Research This section considers three issues for future research: (1) Are the effects of causal back­ ground knowledge on reasoning about mental disorders rational? (2) What are mecha­ nisms underlying competition between psychological and biological explanations for men­ tal disorders? (3) Are the effects of causal background knowledge on reasoning about mental disorders universal or culture-specific?

Rationality While the official nosology of mental disorders (namely, DSM-5, 2013) avoids causal inter­ pretations and differential weightings of symptoms in most cases, both clinicians’ and laypeople’s causal knowledge affect how symptoms of mental disorders are weighed, as shown by the findings reviewed in this chapter. We also showed that clinicians’ diagnoses of mental disorders and judgments of normality are also affected by their causal knowl­ edge. Is being affected by causal knowledge when reasoning about mental disorders ra­ Page 16 of 26

The Role of Causal Knowledge in Reasoning About Mental Disorders tional? Next, we discuss this issue in terms of the effects of causal knowledge on diagno­ sis and treatment judgments, followed by the effects on normality judgments. Needless to say, if the available causal background knowledge is valid, it would be ratio­ nal to use it for diagnoses or treatment decisions. However, because the etiologies of mental disorders are still controversial, it may seem problematic that clinicians and laypeople use their idiosyncratic causal beliefs in clinical reasoning. Thus, one might ar­ gue that both clinicians and laypeople should be encouraged or reminded to utilize offi­ cial diagnostic norms (i.e., DSM) without relying on their own causal theories, in order to avoid idiosyncratic weighting of features or subjective judgments on treatment efficacy. After all, one obvious and well-documented benefit of using the DSM in clinical practice is improved reliability in diagnoses, potentially providing a rational basis for discouraging the use of personal causal theories. Yet, there are reasons to take causal knowledge seriously. Even though there is little agreement about the root causes of mental disorders, certain kinds of causal relations among symptoms are well accepted, and the use of such causal knowledge can be benefi­ cial. For instance, if the causal status effect stems from a rational reasoning bias (e.g., giving more weight to features that are more indicative of an essence or a core feature), perhaps making diagnoses based on causal reasoning or suggesting treatment options targeted to controlling deeper causes would be rational, and doing so might facilitate more accurate diagnoses or the discovery of better treatment plans. Furthermore, enforcing nosology based only or mostly on surface symptoms can hinder significant progress in psychopathology research (p. 614) by preventing discoveries of bet­ ter classification schemes based on hidden, underlying causes. For instance, major de­ pressive disorder, which is considered a single disorder in the DSM-5, may consist of sev­ eral subtypes, each involving different causal mechanisms and requiring different kinds of treatments. Furthermore, some of these subtypes may be closely tied to generalized anxiety disorder. For these reasons, the field has been moving away from symptom-based descriptions of mental disorders, and many researchers have begun focusing on under­ standing biological mechanisms underlying mental disorders. In that sense, returning to the prototype approach to mental disorder classifications may not be rational. The issue of rationality is also controversial when considering judgments of normality based on causal history. Meehl (1973) presented the “understanding makes it normal” phenomenon as a reasoning fallacy: just because one now sees how the symptoms had de­ veloped, it should not make the symptoms less abnormal. This is one of the reasons that the DSM-5 (2013) removed the bereavement exclusion criteria; severe depression should receive clinicians’ attention and treatment, regardless of causes. On the other hand, it does appear rational for judgments of normality and diagnoses of mental disorders to be affected by information on precipitating conditions, as they can be informative for prog­ nosis and treatments. For example, clinicians clearly need to be aware of a patient’s trau­ ma history in order to determine whether or not a diagnosis of PTSD is appropriate and to plan a course of treatment in such a case. Page 17 of 26

The Role of Causal Knowledge in Reasoning About Mental Disorders

Mechanisms Underlying the Effects of Biological Explanations Another important research area, which recently has been receiving a great deal of atten­ tion, is uncovering the effects of biological explanations on reasoning about mental disor­ ders. In general, biological and psychosocial explanations appear to have an inverse rela­ tionship such that if one type of explanation is dominant, the other is discounted. The two types of explanations also bring about different kinds of inductive inferences. Biological explanations tend to result in pessimistic prognostic assumptions about mental disorders compared to psychosocial explanations, although biological explanations also cast symp­ tomatic individuals as less responsible for their symptoms than psychosocial explanations (Kvaale, Haslam, & Gottdiener, 2013). From cognitive psychologists’ perspectives, one intriguing research agenda is to identify the mechanism underlying this inverse relationship. As we have discussed, one possibility is dualism. It is possible that people can see a person either as a psychological agent or a biological mechanism, and it is difficult to conceive of a person as both because people have difficulty envisioning how mental activities cause or are caused by material process­ es. Thus, they might believe that biological dysfunctions should be treated with medica­ tions, whereas psychological impairments should be treated with psychotherapy. Another related, though not necessarily alternative, possibility is the perception of free will. Bio­ logical explanations may decrease the extent to which free will is seen as relevant. In the absence of free will, prognostic pessimism may increase because there is not much one can do about the symptoms, and treatment based on psychotherapy that would require agency on the part of the patient may appear not as promising. Future research should attempt to identify the degree to which each of these two specific mechanisms can, sepa­ rately or conjointly, account for the negative consequences of biological explanations. Understanding the mechanism underlying the effects of biological explanations on mental disorder concepts would also be crucial in devising any intervention programs. It is gen­ erally accepted that the best line of treatment for mental disorders is often a combination of medication and psychotherapy. To make such a combined approach more acceptable for clinicians as well as patients, an educational intervention explaining how psychothera­ py can cause changes in patients’ brains may be effective if reluctance to receive psy­ chotherapy for biologically construed mental disorders stems from dualism. Alternatively, perhaps one reason that laypeople are reluctant to take medications for mental disorders is the fear that doing so will decrease their free will. Thus, future research in these areas could have beneficial impacts on public health.

Universality of the Phenomena Another important future research question is whether the processing biases discussed in this chapter are universal phenomena or are restricted to Western cultures, in which all of the previously reviewed studies were conducted. Different cultures do have somewhat different causal beliefs about mental disorders. For instance, Hagmayer and Engelmann (2014) provide a systematic review of causal beliefs about depression among both (p. 615) Page 18 of 26

The Role of Causal Knowledge in Reasoning About Mental Disorders non-Western and Western cultural groups, documenting some cultural differences (e.g., magic, evil spirits, and the devil being listed as the top causes of depression among African Yoruba; Lavender et al., 2006). The question relevant to the current chapter would be whether such differences in causal beliefs also lead to differences in the way this causal knowledge affects reasoning about mental disorders. For instance, causal the­ ories generated by clinicians in Kim and Ahn (2002b) are only moderately consistent across the clinicians, but nonetheless, clinicians consistently demonstrated the causal sta­ tus effect based on their own causal theories. Likewise, even though different cultural groups exhibit different causal beliefs, they may still exhibit the same causal status ef­ fect, as well as similar effects of causal explanation on judgments of normality and treat­ ment efficacy, and so on. For instance, if the Yoruba group believes that counteracting evil magic should be the best treatment for depression, then they would be exhibiting the same kind of processing biases as the Western group, even though they have different causal beliefs. Such a possibility remains to be more fully documented. For instance, Lavender et al. (2006) merely note, “In the Yoruba and Bangladeshi groups people be­ lieved that the root causes of depression should be addressed,” without any further de­ tails, such as quantitative estimates of how strong this association between root cause and treatment efficacy judgments is. In general, few studies have directly examined the universality of processing biases in us­ age of causal beliefs about mental disorders. Hagmayer and Engelmann (2014), for in­ stance, note that none of the studies they reviewed about depression examined the use of causal beliefs for diagnoses. Furthermore, tracking whether choices of treatment corre­ late with beliefs about causes of mental disorders is further muddled by practical issues such as availability of treatments in a given culture. For instance, even though a patient believes in psychosocial causes for depression, she may indicate a preference for biomed­ ical treatments because counseling or psychotherapy are not available in her area (Hag­ mayer & Engelmann, 2014). We speculate that because the processing biases discussed in the current chapter are rooted in deeper beliefs and biases, such as essentialism (e.g., Sousa, Atran, & Medin, 2002) and dualism (e.g., Bering, 2006), which have been demonstrated to appear in vari­ ous cultures, these processing biases would be present across a wide variety of cultures. Thus, future investigations into the universality of these phenomena can also further shed light onto the originality of these phenomena.

References Ahn, W. (1998). Why are different features central for natural kinds and artifacts?: The role of causal status in determining feature centrality. Cognition, 69(2), 135–178. Ahn, W., Flanagan, E. H., Marsh, J. K., & Sanislow, C. A. (2006). Beliefs about essences and the reality of mental disorders. Psychological Science, 17(9), 759–766. Ahn, W., Gelman, S. A., Amsterlaw, J. A., Hohenstein, J., & Kalish, C. W. (2000). Causal sta­ tus effect in children’s categorization. Cognition, 76(2), B35–B43. Page 19 of 26

The Role of Causal Knowledge in Reasoning About Mental Disorders Ahn, W., Kalish, C., Gelman, S. A., Medin, D. L., Luhmann, C., Atran, S., Coley, J. D., & Shafto, P. (2001). Why essences are essential in the psychology of concepts. Cognition, 82, 59–69. Ahn, W., Kim, N. S., Lassaline, M. E., & Dennis, M. J. (2000). Causal status as a determi­ nant of feature centrality. Cognitive Psychology, 41, 361–416. Ahn, W., & Kim, N. S. (2000). The causal status effect in categorization: An overview. In D. L. Medin (Ed.), Psychology of learning and motivation, 40, 23–65. Ahn, W., & Kim, N. (2005). The effect of causal theories on mental disorder diagnosis. In W. Ahn, R. Goldstone, B. Love, A. Markman, & P. Wolff (Eds.): Categorization inside and outside the laboratory: Essays in honor of Douglas L. Medin (pp. 273–288). Washington, DC: American Psychological Association. Ahn, W., Novick, L., & Kim, N. S. (2003). Understanding behavior makes it more normal. Psychonomic Bulletin and Review, 10, 746–752. Ahn, W., Proctor, C. C., & Flanagan, E. H. (2009). Mental health clinicians’ beliefs about the biological, psychological, and environmental bases of mental disorders. Cognitive Science, 33(2), 147–182. American Psychiatric Association. (1980). Diagnostic and statistical manual of mental dis­ orders (3rd ed.). Washington, DC: APA. American Psychiatric Association. (1988). Diagnostic and statistical manual of mental dis­ orders (3rd ed., revised). Washington, DC: APA. American Psychiatric Association. (1994). Diagnostic and statistical manual of mental dis­ orders (4th ed.). Washington, DC: APA. American Psychiatric Association. (2000). Diagnostic and statistical manual of mental dis­ orders (4th ed., text revision). Washington, DC: APA. American Psychiatric Association. (2013). Diagnostic and statistical manual of mental dis­ orders (5th ed.). Arlington, VA: American Psychiatric Publishing. (p. 616) Anderson, J. R. (1990). The adaptive character of thought. Hillsdale, NJ; London: Lawrence Erlbaum Associates. Barton, M. E., & Komatsu, L. K. (1989). Defining features of natural kinds and artifacts. Journal of Psycholinguistic Research, 18(5), 433–447. Bennett, L., Thirlaway, K., & Murray, A. J. (2008). The stigmatising implications of pre­ senting schizophrenia as a genetic disease. Journal of Genetic Counseling, 17(6), 550– 559.

Page 20 of 26

The Role of Causal Knowledge in Reasoning About Mental Disorders Beauchamp, M. R., Rhodes, R. E., Kreutzer, C., & Rupert, J. L. (2011). Experiential versus genetic accounts of inactivity: Implications for inactive individuals’ self-efficacy beliefs and intentions to exercise. Behavioral Medicine, 37(1), 8–14. Bering, J. M. (2006). The folk psychology of souls. Behavioral and Brain Sciences, 29, 453– 498. Bhugra, D., & Mastrogianni, A. (2004). Globalisation and mental disorders: Overview with relation to depression. British Journal of Psychiatry, 184, 10–20. Blumner, K., & Marcus, S. (2009). Changing perceptions of depression: Ten-year trends from the general social survey. Psychiatric Services, 60(3), 306–312. Borsboom, D., & Cramer, A. O. (2013). Network analysis: An integrative approach to the structure of psychopathology. Annual Review of Clinical Psychology, 9, 91–121. Burnette, J. L. (2010). Implicit theories of body weight: Entity beliefs can weigh you down. Personality and Social Psychology Bulletin, 36(3), 410–422. Carey, S. (1985). Conceptual change in childhood. Cambridge, MA: MIT Press. Dar-Nimrod, I., Cheung, B. Y., Ruby, M. B., & Heine, S. J. (2014). Can merely learning about obesity genes affect eating behavior? Appetite, 81, 269–276. Dar-Nimrod, I., & Heine, S. J. (2011). Genetic essentialism: On the deceptive determinism of DNA. Psychological Bulletin, 137, 800–818. Dar-Nimrod, I., Zuckerman, M., & Duberstein, P. R. (2013). The effects of learning about one’s own genetic susceptibility to alcoholism: a randomized experiment. Genetics in Medicine, 15(2), 132–138. Davidson, D. (1970). Mental events. In L. Foster and J. W. Swanson (Eds.), Experience and theory (pp. 79–101). London: Duckworth. de Kwaadsteniet, L., Hagmayer, Y., Krol, N. P., & Witteman, C. L. (2010). Causal client models in selecting effective interventions: A cognitive mapping study. Psychological As­ sessment, 22(3), 581–592. Deacon, B. J. (2013). The biomedical model of mental disorder: A critical analysis of its va­ lidity, utility, and effects on psychotherapy research. Clinical Psychology Review, 33(7), 846–861. Deacon, B. J., & Baird, G. L. (2009). The chemical imbalance explanation of depression: Reducing blame at what cost? Journal of Social and Clinical Psychology, 28(4), 415–435. Dennett, D. C. (1995). Darwin’s dangerous idea: Evolution and the meanings of life. New York: Simon & Schuster.

Page 21 of 26

The Role of Causal Knowledge in Reasoning About Mental Disorders Einhorn, H. J., & Hogarth, R. M. (1986). Judging probable cause. Psychological Bulletin, 99(1), 3–19. Flores, A., Cobos, P. L., López, F. J., Godoy, A., & González-Martín, E. (2014). The influ­ ence of causal connections between symptoms on the diagnosis of mental disorders: Evi­ dence from online and offline measures. Journal of Experimental Psychology: Applied, 20(3), 175–190. Fodor, J. (1974). Special sciences (or: The disunity of science as a working hypothesis). Synthese, 28, 97–115. Gelman, S. A. (1988). The development of induction within natural kind and artifact cate­ gories. Cognitive Psychology, 20, 65–95. Hagmayer, Y., & Engelmann, N. (2014). Causal beliefs about depression in different cul­ tural groups: What do cognitive psychological theories of causal learning and reasoning predict? Frontiers in Psychology, 5, 1–17. Haslam, N. (2011). Genetic essentialism, neuroessentialism, and stigma: Commentary on Dar-Nimrod and Heine (2011). Psychological Bulletin, 137, 819–824. Horwitz, A. V., & Wakefield, J. C. (2007). The loss of sadness: How psychiatry transformed normal sorrow into depressive disorder. Oxford: Oxford University Press. Hoyt, C. L., Burnette, J. L., & Auster-Gussman, L. (2014). “Obesity is a disease”: Examin­ ing the self-regulatory impact of this public-health message. Psychological Science, 25(4), 997–1002. Illes, J., Lombera, S., Rosenberg, J., & Arnow, B. (2008). In the mind’s eye: Provider and patient attitudes on functional brain imaging. Journal of Psychiatric Research, 43(2), 107– 114. Insel, T. (2008). Assessing the economic costs of serious mental illness. American Journal of Psychiatry, 165(6), 663–665. Insel, T. R., & Wang, P. S. (2010). Rethinking mental illness. JAMA, 303(19), 1970–1971. Iselin, M. G., & Addis, M. E. (2003). Effects of etiology on perceived helpfulness of treat­ ments for depression. Cognitive Therapy and Research, 27(2), 205–222. Keil, F. C. (1989). Concepts, kinds, and conceptual development. Cambridge, MA: MIT Press. Kemp, J. J., Lickel, J. J., & Deacon, B. J. (2014). Effects of a chemical imbalance causal ex­ planation on individuals’ perceptions of their depressive symptoms. Behaviour Research and Therapy, 56, 47–52. Kendler, K. S. (2001). Twin studies of psychiatric illness: an update. Archives of General Psychiatry, 58(11), 1005–1014. Page 22 of 26

The Role of Causal Knowledge in Reasoning About Mental Disorders Kendler, K. S. (2005). “A Gene for …”: The nature of gene action in psychiatric disorders. American Journal of Psychiatry, 162, 1243–1252. Kessler, R. C., Angermeyer, M., Anthony, J. C., de Graaf, R., Demyttenaere, K., Gasquet, I., … Uestuen, T. B. (2007). Lifetime prevalence and age-of-onset distributions of mental dis­ orders in the World Health Organization’s World Mental Health Survey Initiative. World Psychiatry, 6(3), 168. Kessler, R. C., Chiu, W. T., Demler, O., & Walters, E. E. (2005). Prevalence, severity, and comorbidity of 12-month DSM-IV disorders in the National Comorbidity Survey Replica­ tion. Archives of General Psychiatry, 62(6), 617–627. Keyes, C. L. (2007). Promoting and protecting mental health as flourishing: A complemen­ tary strategy for improving national mental health. American Psychologist, 62(2), 95. Kim, N. S., & Ahn, W. (2002a). Clinical psychologists’ theory-based representations of mental disorders predict their diagnostic reasoning and memory. Journal of Experimental Psychology: General, 131, 451–476. Kim, N. S., & Ahn, W. (2002b). The influence of naïve causal theories on lay concepts of mental illness. American Journal of Psychology, 115, 33–65. Kim, N. S., Ahn, W., Johnson, S. G. B., & Knobe, J. (2016). The influence of framing on clin­ icians’ judgments of the biological basis of behaviors. Journal of Experimental Psychology: Applied, 22(1), 39–47. Kim, N. S., Johnson, S. G. B., Ahn, W., & Knobe, J. (2016). When do people endorse biologi­ cal explanations for behavior? The (p. 617) effect of abstract versus concrete framing. Manuscript submitted for publication. Kim, N. S., & LoSavio, S. T. (2009). Causal explanations affect judgments of the need for psychological treatment. Judgment and Decision Making, 4, 82–91. Kim, N. S., Paulus, D. J., Gonzalez, J. S., & Khalife, D. (2012). Proportionate responses to life events influence clinicians’ judgments of psychological abnormality. Psychological As­ sessment, 24, 581–591. Kim, N. S., Paulus, D. J., Nguyen, T. P., & Gonzalez, J. S. (2012). Do clinical psychologists extend the bereavement exclusion for major depression to other stressful life events? Medical Decision Making, 32, 820–830. Kvaale, E. P., Gottdiener, W. H., & Haslam, N. (2013). Biogenetic explanations and stigma: A meta-analytic review of associations among laypeople. Social Science & Medicine, 96, 95–103. Kvaale, E. P., Haslam, N., & Gottdiener, W. H. (2013). The ‘side effects’ of medicalization: A meta-analytic review of how biogenetic explanations affect stigma. Clinical Psychology Review, 33(6), 782–794. Page 23 of 26

The Role of Causal Knowledge in Reasoning About Mental Disorders Laegsgaard, M. M., Kristensen, A. S., & Mors, O. (2009). Potential consumers’ attitudes toward psychiatric genetic research and testing and factors influencing their intentions to test. Genetic Testing and Molecular Biomarkers, 13(1), 57–65. Lavender, H., Khondoker, A. H., & Jones, R. (2006). Understandings of depression: an in­ terview study of Yoruba, Bangladeshi and White British people. Family Practice, 23, 651– 658. Lebowitz, M. S. (2014). Biological conceptualizations of mental disorders among affected individuals: A review of correlates and consequences. Clinical Psychology: Science and Practice, 21(1), 67–83. Lebowitz, M. S., Ahn, W., & Nolen-Hoeksema, S. (2013). Fixable or fate? Perceptions of the biology of depression. Journal of Consulting and Clinical Psychology, 81(3), 518–527. Lebowitz, M. S., Pyun, J. J., & Ahn, W. (2014). Biological explanations of generalized anxi­ ety disorder: Effects on beliefs about prognosis and responsibility. Psychiatric Services, 65(4), 498–503. Lebowitz, M. S., Rosenthal, J. E., & Ahn, W. (2016). Effects of biological versus psychoso­ cial explanations on stigmatization of children with ADHD. Journal of Attention Disorders, 20(3), 240–250. Lincoln, T. M., Arens, E., Berger, C., & Rief, W. (2008). Can antistigma campaigns be im­ proved? A test of the impact of biogenetic vs psychosocial causal explanations on implicit and explicit attitudes to schizophrenia. Schizophrenia Bulletin, 34(5), 984–994. Luhmann, C. C., Ahn, W., & Palmeri, T. J. (2006). Theory-based categorization under speeded conditions. Memory & Cognition, 34, 1102–1111. Medin, D. L. (1989). Concepts and conceptual structure. American Psychologist, 44, 1469–1481. Medin, D. L., & Ortony, A. (1989). Psychological essentialism. In S. Vosniadou & A. Ortony (Eds.), Similarity and analogical reasoning (pp. 179–195). New York: Cambridge Universi­ ty Press. Meehl, P. E. (1973). Why I do not attend case conferences. In Psychodiagnosis: Selected papers (pp. 225–302). Minneapolis: University of Minnesota Press. Meiser, B., Kasparian, N. A., Mitchell, P. B., Strong, K., Simpson, J. M., Tabassum, L., et al. (2008). Attitudes to genetic testing in families with multiple cases of bipolar disorder. Ge­ netic Testing, 12(2), 233–243. Meunier, B., & Cordier, F. (2008). The role of feature type and causal status in 4–5-yearold children’s biological categorizations. Cognitive Development, 1, 34–48.

Page 24 of 26

The Role of Causal Knowledge in Reasoning About Mental Disorders Miresco, M., & Kirmayer, L. (2006). The persistence of mind-brain dualism in psychiatric reasoning about clinical scenarios. American Journal of Psychiatry, 163(5), 913–918. Pearl, R. L., & Lebowitz, M. S. (2014). Beyond personal responsibility: Effects of causal attributions for overweight and obesity on weight-related beliefs, stigma, and policy sup­ port. Psychology & Health, 29, 1176–1191. Pescosolido, B. A., Martin, J. K., Long, J. S., Medina, T. R., Phelan, J. C., & Link, B. G. (2010). “A disease like any other”? A decade of change in public reactions to schizophre­ nia, depression, and alcohol dependence. The American Journal of Psychiatry, 167(11), 1321–1330. Phelan, J. C. (2005). Geneticization of deviant behavior and consequences for stigma: The case of mental illness. Journal of Health and Social Behavior, 46(4), 307–322. Phelan, J. C., Yang, L. H., & Cruz-Rojas, R. (2006). Effects of attributing serious mental ill­ nesses to genetic causes on orientations to treatment. Psychiatric Services, 57(3), 382– 387. Proctor, C. C., & Ahn, W. (2007). The effect of causal knowledge on judgments of the like­ lihood of unknown features. Psychonomic Bulletin & Review, 14, 635–639. Putnam, H. (1975). Philosophy and our mental life. In H. Putnam, Mind, language, and re­ ality: Philosophical papers (Vol. 2, pp. 291–303). Cambridge: Cambridge University Press. Rips, L. J. (1989). Similarity, typicality, and categorization. In S. Vosniadou & A. Ortony (Eds.), Similarity and analogical reasoning (pp. 21–59). New York: Cambridge University Press. Sloman, S., Love, B., & Ahn, W. (1998). Feature centrality and conceptual coherence. Cog­ nitive Science, 22, 189–228. Smith, E. E., & Sloman, S. A. (1994). Similarity-versus rule-based categorization. Memory & Cognition, 22(4), 377–386. Sousa, P., Atran, S., & Medin, D. (2002). Essentialism and folk biology: Evidence from Brazil. Journal of Cognition and Culture, 2, 195–223. Wakefield, J. C., Schmitz, M. F., First, M. B., & Horwitz, A. V. (2007). Extending the be­ reavement exclusion for major depression to other losses: Evidence from the National Co­ morbidity Survey. Archives of General Psychiatry, 64, 433–440. Waldmann, M. R. (2000). Competition among causes but not effects in predictive and di­ agnostic learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26(1), 53. Waldmann, M. R., & Hagmayer, Y. (2006). Categories and causality: The neglected direc­ tion. Cognitive Psychology, 53, 27–58. Page 25 of 26

The Role of Causal Knowledge in Reasoning About Mental Disorders Weine, E. R., & Kim, N. S. (2014, November). Events, reactions, and behaviors: Clinical assessment of PTSD. Poster presented at the 35th annual meeting of the Society for Judg­ ment and Decision Making, Long Beach, CA. Wellman, H. M. (1990). The child’s theory of mind. Cambridge, MA: MIT Press.

(p. 618)

Notes: (1.) This chapter does not cover the effect of causal explanations on social stigma associ­ ated with mental disorders (e.g., Haslam, 2011; Kvalle, Gottdiener, & Haslam, 2013; Lin­ coln, Arens, Berger, & Rief, 2008; Phelan, 2005). (2.) Interestingly, the amenorrhea criterion (absence of the period) for anorexia nervosa was removed from DSM-5 (APA, 2013) because there were no clinically relevant differ­ ences between women who did and did not meet that criterion, as long as they met all the other criteria. This recent change seems to lend some credence to the conceptualizations of the clinician participants from Kim and Ahn (2002a), who also judged that amenorrhea is not an important part of anorexia nervosa.

Woo-kyoung Ahn

Department of Psychology Yale University New Haven, Connecticut, USA Nancy S. Kim

Department of Psychology Northeastern University Boston, Massachusetts, USA Matthew S. Lebowitz

Department of Psychiatry Columbia University New York, New York, USA

Page 26 of 26

Causality and Causal Reasoning in Natural Language

Causality and Causal Reasoning in Natural Language   Torgrim Solstad and Oliver Bott The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.32

Abstract and Keywords This chapter provides a combined overview of theoretical and psycholinguistic approach­ es to causality in language. The chapter’s main phenomenological focus is on causal rela­ tions as expressed intra-clausally by verbs (e.g., break, open) and between sentences by discourse markers (e.g., because, therefore). Special attention is given to implicit causali­ ty verbs that are argued to trigger expectations of explanations to occur in subsequent discourse. The chapter also discusses linguistic expressions that do not encode causation as such, but that seem to be dependent on a causal model for their adequate evaluation, such as counterfactual conditionals. The discussion of the phenomena is complemented by an overview of important aspects of their cognitive processing as revealed by psy­ cholinguistic experimentation. Keywords: causality, language, causative verbs, discourse, implicit causality, psycholinguistics

Natural languages display a great range of devices that allow us to speak of causal rela­ tions, ranging from verbs such as break or open and prepositions like due to sentence connectives such as because and since. These expressions may relate to very different as­ pects of causal relations. In the case of verbs, for instance, they allow us to specify whether some caused state resulted directly from a causing event with no other interven­ ing events, or whether the causing event initiated some other event or action that caused the effect. Verbs may also be differentiated according to the degree of volitionality on the part of an agent. At the discourse level (i.e. in multi-sentence sequences), languages may differentiate between markers that introduce reasons as causes of actions and ones that introduce simple, unmediated causes. In this chapter, we will discuss a number of properties of linguistic expressions relating to causation. For each type of expression, the discussion will be complemented by an overview of important aspects of its cognitive processing as revealed by psycholinguistic experimentation. Phenomenologically, we will focus on two domains that have received much attention, both in linguistic and in psychological studies of natural language: causal relations as introduced intra-clausally in the verbal domain, and relations that are intro­ Page 1 of 48

Causality and Causal Reasoning in Natural Language duced inter-clausally as a relation between different units of discourse (e.g., sentences). We will also discuss linguistic expressions that do not express causation as such, but that seem to be dependent on a causal model for their adequate evaluation, counterfactual conditionals being the most notable example. The discussion is intended to shed some light on what these different linguistic expressions can tell us about how we conceptual­ ize causal relations. The causal relations involved will vary greatly, and a number of distinctions need to be made. Although notions will be clarified as they occur, a short note seems to be in order at the outset. In general, we will treat the causal relation as a two-place relation, relating a causing entity and caused effect. Furthermore, we will refer to causes that do not cause intentions or intentional actions as causes proper, whereas causes of intentions or atti­ tudes (p. 620) will be characterized as reasons (for discussion, cf. Anscombe, 1963; David­ son, 2001). The chapter is structured as follows. In the section “Causality in the Verbal Domain,” we will focus on aspects of causality that may be encoded by verbs. In the section “Causality at the Discourse Level,” we will turn to causality in the realm of discourse, that is, as a re­ lation between propositions (or sentences, if you like), as for instance encoded by dis­ course markers such as because or since. In the section “From Verbs to Discourse: Implic­ it Causality,” we will turn to the phenomenon of implicit causality, which concerns linguis­ tically triggered expectations for specific explanations in discourse. In the section “Causal Constructions Without Causative Lexical Triggers,” we will look at phenomena such as counterfactual conditionals (if … then clauses) that may be argued not to encode causali­ ty as such, that is, as part of their semantics, but rather seem to be parasitic on the (men­ tal) construal of causal models. The final section offers a brief review and suggests ques­ tions for further research.

Causality in the Verbal Domain Turning first to expressions of causality in the verbal domain, consider the examples in (1) involving the causative verbs open and break: (1) a. Mary opened the door. b. John broke the vase. While sentence (1a) is used appropriately in situations where Mary performs some action that causes the door to become open, (1b) refers to situations in which John does some­ thing with the effect of the vase becoming broken. In both cases, we preferably interpret the actions of the subject argument as involving direct manipulation of the object argu­ ment. We believe that a precise analysis of such verbs and other verbal expressions in the domain of causality may offer a window into how we conceptualize causal relations. The linguistic distinctions that can be made in the realm of causative verbs (e.g., distinguish­ ing between events that must be caused by volitionally acting human beings and events Page 2 of 48

Causality and Causal Reasoning in Natural Language that may be caused by non-controlled events) offer important insights into how we con­ ceive of causal relations and what ontological distinctions are made when we reason about causal relations. We will mostly be looking at the different aspects of causal relations that are associated with the semantics of the verbs themselves. Occasionally, however, we will have some­ thing to say about how those aspects interact with the interpretation of other con­ stituents of the sentences in which the verbs occur. Where relevant, we will discuss psy­ cholinguistic research on the processing of such verbs. To better see how we describe situations of causation using causative verbs, we will make clear some of the assumptions that are often made in linguistic research on these verbs. Although these assumptions may not be shared by all linguists, they can be taken to rep­ resent a common core of most research in this domain. First, it is assumed that the semantics of a predicate such as open in (1a) may be repre­ sented by a two-place causal relation CAUSE between a causing entity or eventuality (event1) and a caused event (event2). (2) [[event1] CAUSE [event2]] The decomposition of the meaning of verbs such as open or break is commonplace in lin­ guistics (see, e.g., Dowty, 1979; Levin & Hovav, 1995; Parsons, 1990; Pustejovsky, 1995). Decomposed representations are typically taken to be part of a speaker’s mental lexicon; that is, although open in (1a) only relates the person Mary and the object door, it is as­ sumed that the fact that Mary is engaged in some event that causes the door to become open is represented linguistically at some level (such representations, however, should not be taken to be representations of our cognitive conceptualization of causation, cf. e.g., Croft, 2012; Warglien et al., 2012). In Levin and Hovav (1995, p. 94) a verb such as open would be represented as in (3a). A more general representation for all causative verbs is given in (3b). (3) a. [[x DO-SOMETHING]event1 CAUSE [y BECOME OPEN]event2] b. [[x DO-SOMETHING]event1 CAUSE [y BECOME STATE]event2] The representation in (3a) should be read as follows. An individual x is engaged in some action (DO-SOMETHING) that causes some entity y to change its state (BECOME) into be­ ing open, presupposing that it wasn’t already open before event1 occurred. In (1a), the action of the individual x has been left unspecified (see later discussion). The representa­ tion in (3b) constitutes a generalization of (3a) to causative verbs irrespective of the caused state. Linguistic evidence for the presence of such a state in the semantic repre­ sentation of the verb comes from the availability of two readings when (p. 621) modifying open by again (Dowty, 1979; von Stechow, 1996): (4) The wind opened the door again.

Page 3 of 48

Causality and Causal Reasoning in Natural Language With (4), one may either refer to the repetition of the whole event of causing the door to open, or only to the restitution of the state of the door being open. On the first reading there are at least two open-events, while the second reading is consistent with only one event of opening. This may be accounted for elegantly if again may either modify the whole representation in (3a) or only the OPEN part. We hinted earlier that there is no universal agreement among linguists regarding the rep­ resentations in (2) and (3b) (see, e.g., the discussion in Martin & Schäfer, 2014; Neele­ man & van de Koot, 2012). In particular, we would like to point to research within the force-dynamic framework (e.g., Talmy, 2000; Wolff, 2007; Wolff & Song, 2003), in which it has been argued that we need to assume further abstract causal predicates such as en­ able or prevent. For reasons of space and exposition, we will not go into further detail here, but refer the reader to the previously cited references for a semantic analysis of such abstract predicates. Nevertheless, the representation in (3b) is a useful first approximation of the causal rela­ tions expressed by causative predicates. Causative verbs can target different aspects of this schema. While verbs such as open and break specify the resultant state of the object as being open or in pieces, respectively, they do not specify the action performed by the agent x of the causing event (expressed by DO-SOMETHING; cf., e.g., Pinker, 1989). In other words, although we know that the subject was engaged in some action causing the resulting event, nothing more is said about this action. For these predicates, the causing entity may, for instance, be specified by a prepositional phrase: (5) John broke the vase by hitting it with a stick. In (5), the by phrase targets the DO-SOMETHING part of the representation in (3b). John’s action is specified as being an act of hitting involving a stick as an instrument (for a discussion of such constructions, see, e.g., Bennett, 1994; Solstad, 2007). Yet other predicates may come with a specification of the manner of the causing event “built in,” such as German zertreten (destroy by treading/kicking): (6) Maria hat die Tür zertreten. Mary destroyed the door by kicking it. The predicate zertreten in (6) not only specifies the resultant state as something being broken. In this case, the action performed by the subject, the causing event, is specified as involving treading or kicking movements (for discussion on the relationship between manner and result specification, see, e.g., Beavers & Koontz-Garboden, 2012; Goldberg, 2010; Rappaport Hovav & Levin, 2010; Warglien et al., 2012). A further question of central importance concerns the ontological status of the arguments of the causative predicate. In the preceding representations the causing event was speci­ fied as involving the action (DO-SOMETHING) of an agent x. Thus, in (1) the subject argu­ ments were animates, with a preference for interpreting the actions as being performed volitionally. However, as seen from (4), the causing event need not be an action, and no Page 4 of 48

Causality and Causal Reasoning in Natural Language animate (agent) need be involved in the causing event. It may, for instance, be an object (7a), an abstract property (7b), or an event (7c) (for an excellent overview, see, e.g., Wolff et al., 2010): (7) a. A stone broke the window. b. The pressure broke the window. c. A stray shot killed the burglar. Some languages have very elaborate and systematic means of specifying causing events. Talmy (1985) discusses Atsugewi (a Hokan language spoken in northern California), which has specific morphological means to differentiate a number of different causes, such as, for instance, natural forces, objects (in action), or body parts (in action). Among the natural forces, for instance, a further differentiation is made between wind and rain. Causal predicates may also vary with regard to the aspect of volitionality or intentionality involved in the causing event or the degree of control with which the caused effect is brought about. Mostly, two classes are recognized, one that requires the causing event to be performed by an (intentionally acting) agent, and one in which the causing event may be non-intentional or may not involve an agent at all. The difference may be illustrated by the semantically related predicates kill and murder. Kill, belonging to the second class, al­ lows animates and events as subjects (8), whereas murder allows only animates as its sub­ ject (9) (# indicates an infelicitous utterance). (8) a. A policeman killed Ohnesorg. b. A shot killed Ohnesorg. (9) a. A policeman murdered Ohnesorg. b. #A shot murdered Ohnesorg. (p. 622)

Correspondingly, only kill allows modification by unintentionally:

(10) a. A policeman unintentionally killed Ohnesorg. b. #A policeman unintentionally murdered Ohnesorg.

Page 5 of 48

Causality and Causal Reasoning in Natural Language Levin and Hovav (1995) and Neeleman and van de Koot (2012) argue that this relation­ ship is asymmetric: whereas a number of predicates require their subject argument to be a volitionally acting human being, no predicates seem to disallow human subjects. How­ ever, Talmy (1985, p. 79) and Solstad (2009, 2016) argue that there do exist causative predicates for which human agents are disallowed. Consider the following examples from Solstad (2009), involving the German verb anschwemmen (wash ashore): (11) a. Starke Strömungen haben die über Bord gegangenen Schiffsfrachten angeschwemmt. Strong currents washed ashore the cargo that fell overboard. b. #Die Taucher haben die Holzkiste angeschwemmt. The divers washed the wooden box ashore. The verb anschwemmen allows natural forces (“currents” in (11a)), but disallows human agents (“divers” in (11b)). However, it should be noted that apart from disallowing human agents, anschwemmen and similar predicates typically have a very specific meaning, ac­ cepting only a limited number of subject arguments, which can be argued to represent natural forces in a wider sense (cf. the earlier discussion of Atsugewi). For many predicates such as open, there also exists an intransitive variant in which the subject argument takes on a similar role in the situation as the object does in the transi­ tive version (cf. (12)): (12) a. Intransitive: The door opened. b. Transitive: Mary opened the door. The sentence in (12a) describes a change-of-state without specifying a cause. However, one intuitively assumes some cause to be involved. Given that the representation for causative predicates provided earlier includes a causal relation and a change-of-state event, two obvious questions are the following: First, is the causative predicate in cases such as open ultimately derived from the change-of-state variant or vice versa (for discus­ sion, see, e.g., Levin & Hovav, 1995; Rappaport Hovav, 2014; Schäfer, 2009)? Second, does the change-of-state predicate involve a causal relation in some sense (Chierchia, 2004), or is it a mere change-of-state predicate? Typological data offer no clear answer as to a general pattern of conceptualization. Some languages dominantly derive the causative predicate from a more basic (stative or change-of-state) predicate; other lan­ guages derive the change-of-state predicate from a causative one, and many languages display both types (for an overview, see Haspelmath, 1993). From a semantic, and, as we will see, a processing point of view, it is interesting that one may make (partial) predictions as to the verbs that participate in such alternations as in (12) by categorizing the change-of-state predicate according to whether it is internally or externally caused (Levin & Hovav, 1995; Rappaport Hovav, 2014). Only change-of-state predicates that are conceived of as externally caused (i.e., the influence is external to the Page 6 of 48

Causality and Causal Reasoning in Natural Language object that changes its state) can enter the alternation. Thus, open (12) and break do al­ ternate, whereas the internally caused bloom in (13) does not; note the ungrammaticality of (13b). Thus, the alternation may tell us something about how we conceptualize changes of state such as with break and bloom. With bloom, it is hypothesized, the change is conceptualized as being solely dependent on qualities internal to the plant. (13) a. The flowers bloomed. b. *The gardener bloomed the flowers (by fertilizing the soil). Finally, we turn to the aspect of direct and indirect causality. Shibatani (1976) speaks of directive versus manipulative causation; other terms include mechanic and teleological causation, dating back to Aristotle. A classic example involves the difference between kill and cause to die (for discussion, see Fodor, 1970; McCawley, 1978); note the examples in (14): (14) a. John killed Peter. b. John caused Peter to die. The verb kill typically refers to situations of direct, unmediated causation, whereas cause to die typically refers to cases of indirect causation. A possible “indirectness scenario” (see McCawley, 1978) could be one in which John manipulates Peter’s gun by stuffing it with cotton, causing it to backfire and thus kill Peter. Another case of indirect­ ness consists of situations in which one agent initiates the action of another agent, thus not fully controlling the outcome (Wunderlich, 1997): (p. 623) (15) Paul made John kill Peter. In general, what one understands under the (in)-directness of causal relations is mostly related to a difference in the length of the causal chain (i.e., the number of causes lead­ ing to an effect). Chains where only one cause is present or the final element in a causal chain is in focus represent cases of direct causation, and elements in a causal chain that do not immediately precede the effect are taken to be indirect causes. Wolff (2003) proposes a theory of direct versus indirect causation according to which we are dealing with direct causation when there are no intervening causes between the initial cause and the final cause, that is, no causal chains with more than two elements. Alternatively, one might define (in)directness in terms of control of the outcome. For indirect causation, one would control, at most, intermediate or initiating stages of the causal chain (Wunderlich, 1997), but not the one immediately preceding the effect. In linguistics, these notions are often used rather informally, without much elaboration. However, see, for example, Kratzer (2005) and Bittner (1999) for two different attempts at formalizing (in)directness of causal relations. It may also be noted that another construction, namely resultatives consisting of an activ­ ity verb (hammer) and a state description (flat) as in hammer the metal flat, is only associ­ Page 7 of 48

Causality and Causal Reasoning in Natural Language ated with direct causation (for discussion, see Bittner, 1999; Goldberg, 1995; Goldberg & Jackendoff, 2004): (16) John hammered the metal flat. It seems to be a generally accepted assumption (Dixon, 2000) that the more complex forms in a language are used to describe causal relations that deviate from the simple schema that was drawn for open earlier. According to Shibatani (2002, p. 8), “The more difficult it is to bring about the caused event, the more explicitly the causative meaning must be indicated. […] Lexical causatives represent simpler, routine causative situations, whereas productive counterparts in either regular morphological form or periphrastic construction express situations requiring unusual, elaborate, and more involved efforts.” As Wolff (2003) makes particularly clear, these are only tendencies (see also Blutner, 1998). If the whole chain of events is seen as intended (and controlled) by the subject ar­ gument, speakers judge kill to be adequate also in situations where the first and the last parts of the chain of events are separated by intermediate events. According to Shibatani and Pardeshi (2002), the generalization that less complex (synthetic) constructions are more direct than more analytical constructions does not hold in general. In some lan­ guages, similar morphological means (i.e., affixation) may be used for both types of cau­ sation. Rather, it seems that productivity, that is, the availability of applying the causal morpheme or verb, is a better predictor: “[…] morphological transparency correlates with the difficulty in bringing about the caused event” (Shibatani, 2002, p. 8).

Experimental Findings on Verb Causality Psycholinguistic evidence for decompositional analyses of causative verbs comes from processing studies that have looked into the relative difficulty of processing causative compared to change-of-state or non-causative verbs during real-time comprehension. To our knowledge, McKoon and MacFarland (2000) was the first paper comparing the pro­ cessing complexity of internally and externally caused change-of-state verbs as discussed earlier (see Table 31.1). They investigated whether externally caused change-of-state verbs such as break are inherently more complex than internally caused change-of-state verbs such as bloom. A decomposition analysis would lead us to expect that the additional lexical structure associated with externally caused changes increases processing difficul­ ty. Controlling for a number of factors known to influence processing, such as lexical fre­ quency, McKoon and MacFarland (2000) measured both lexical decision times for the two verb types and judgment times for sensicality judgments of simple sentences containing these verbs. In all experiments they found faster reactions for internally caused than for externally caused change-of-state verbs. This (p. 624) finding is supported by work by Gen­ nari and Poeppel (2003), who compared three types of eventive verbs, accomplishments (break), achievements (arrive), and activities (dance), to stative predicates (be dead) (cf. the verb classification of Vendler, 1957). Lexical decision times for verbs, as well as read­ ing times for these verbs in sentence contexts, revealed that the eventive verbs were more complex (i.e. took longer to process) than the simpler stative verbs. A similar result is reported by McKoon and Love (2011), who compared externally caused change-of-state Page 8 of 48

Causality and Causal Reasoning in Natural Language verbs of the break type to internally caused change-of-state verbs of the hit type. Table 31.1 presents an overview of the tested verb types. Table 31.1 Verb Types Tested in the Reviewed Studies on Verbal Complexity Complex Verb

Simple Verb

McKoon and MacFar­ land (2000)

[[x DO-SOMETHING] CAUSE [y BECOME STATE]]

[y BECOME STATE]

Gennari and Poeppel (2003)

[[x DO-SOMETHING] CAUSE [y BECOME STATE]] [y BECOME STATE] [x DO-SOMETHING]

[STATE]

McKoon and Love (2011)

[[x DO-SOMETHING] CAUSE [y BECOME STATE]]

[x DOSOMETHING]

It must, however, be noted that these issues are controversial. See, for instance, de Almei­ da (1999) and Fodor and Lepore (2007) for arguments against decomposition and Mobayyen and de Almeida (2005) for somewhat different experimental results than those reviewed in the preceding. Turning next to the case of direct versus indirect causation, Wolff (2003) provided experi­ mental evidence for the hypothesis that lexical causatives such as break or open are re­ stricted to causal models of direct causation without intervening causes. Participants were shown 3D animations and were asked to choose a linguistic description that either used a lexical causative verb (e.g., move a marble) or a periphrastic causative construc­ tion (e.g., cause the marble to move). Furthermore, they had to judge whether the video consisted of a single event or of multiple events. The first of a series of experiments showed that participants almost never chose lexical causatives for a sequence of two causing events (e.g., one marble launching a second one, which in turn launched a third marble) but used periphrastic causals instead. Unmediated launching events (e.g., one marble launching a second one), by contrast, were preferably described by means of lexi­ cal causatives. The general pattern was different when the agent of the first element in a causal chain consisting of two events was sentient. Accordingly, Experiment 2 manipulat­ ed whether the first “object” was a marble or an agent’s hand flicking a second marble. Here, lexical causatives also occurred in mediated sequences of events, and the mediated chains were more often perceived as single events. In another experiment, it was manipu­ lated whether an action such as switching on a television set was done intentionally by operating a remote control with hands, or unintentionally, by accidentally sitting down on it. Only intended actions were described by lexical causatives, whereas unintended ac­ tions were mostly described by periphrastic causative sentences. In addition, participants were asked for a judgment of whether the intermediary (the remote control) caused or en­ Page 9 of 48

Causality and Causal Reasoning in Natural Language abled/allowed the agent to perform the causal action. Participants were far more likely to view an intermediary as an enabler in intended than in unintended actions. Taken togeth­ er, the reviewed experiments support Wolff’s previously mentioned no-intervening-cause hypothesis for understanding direct and indirect causality as expressed by lexical causative verbs in opposition to periphrastic causative statements.

Causality at the Discourse Level Causal relations such as explanations are considered to be crucial to our understanding of discourse (i.e., in texts spanning more than one sentence; Asher & Lascarides, 2005; Hobbs, 1979; Kehler, 2002). In discourse, we may find a number of different kinds of causal relations. On the one hand, they may be expressible by linguistic devices such as the connectives because (17a) and therefore (17b). On the other hand, they may be left implicit and hence must be inferred by the reader (17c). (17) a. Mary disturbed Peter because she sang loudly. b. Mary annoyed Peter. Therefore, he left the party. c. Mary disturbed Peter. She sang loudly. One of the key findings in the literature on text processing is that causal relationships play a distinguished role in structuring the meaning of a text and its integration into a mental model of the discourse (Fletcher & Bloom, 1988; Trabasso et al., 1989; van den Broek, 1990). A large number of studies support the conclusion that a text’s causal struc­ ture is an important determinant of how it will be understood and remembered (see the review in van den Broek & Gustafson, 1999). First, the more causal connections a given proposition has to the rest of the text, the better it will be recalled (see, e.g., Trabasso & van den Broek, 1985). Second, propositions along a causal chain that connects a text’s opening to its final outcome are recalled better than propositions that are not in the chain (e.g., Trabasso and van den Broek, 1985). In this section, we will begin by discussing a number of different causal relations and their realization, looking first at different connectives with a causal meaning (e.g., be­ cause and therefore) before turning to cases where no marking is present, that (p. 625) is, where the causal relation has to be inferred based on the semantic and pragmatic aspects of the clauses involved. It should be noted that counterfactual conditionals, which would seem to belong to the linguistic constructions to be discussed in this section, will be treated in a separate sec­ tion (“Conditionals”), together with other expressions that do not explicitly encode causality, but rather seem to be dependent on the evaluation relative to a causal model.

Page 10 of 48

Causality and Causal Reasoning in Natural Language

Explicit and Implicit Causal Relations in Discourse Natural languages typically offer numerous discourse markers (broadly speaking) such as because, therefore, and since that allow us to single out various causal relations. These markers can be categorized according to a number of criteria. First, discourse markers may mark their host sentence (the sentence in which they occur) either as a cause (18a) or as a consequence (18b): (18) a. Peter left the party because he was bored. b. Peter was bored. Therefore, he left the party. In (18a), the connective because marks the state of affairs designated by the clause [Pe­ ter] was bored as the cause of the state of affairs described in the main clause Peter left the party. In (18b), on the other hand, therefore marks its host sentence as the conse­ quence or effect of the preceding clause. Regarding the linear presentation of cause and effect, there may exist certain correla­ tions between markers and linearity. Thus, whereas a therefore sentence always follows the material introducing the cause, because clauses may either follow (effect–cause order, as in (18a)) or precede the effect clause (cause–effect order; compare (18a) to Because Peter was bored, he left the party). It has been argued that the relative discourse order of effect and cause is of crucial importance for reasoning, since we are most likely to infer or predict causes as opposed to effects (cf., e.g., de Blijzer & Noordman, 2000; Magliano et al., 1993). Turning to the kinds of causal relations expressed in discourse, we would first like to point out that the causal relations are different in nature from the ones we described in the section “Causality in the Verbal Domain,” where we were mostly dealing with causes proper between events. At the discourse level, we find reasons and explanations (e.g., re­ garding necessary preconditions) in addition to causes proper. For the discourse level, we will assume that the causal relation pertains to entities that are propositional in nature: (19) [[proposition1] CAUSE [proposition2]] We should hasten to add that the label “proposition” doesn’t do justice to the range of on­ tological entities we observe in the realm of discourse since we are not only dealing with propositions in a classic sense. Among the entities entering the causal relation in dis­ course we also find, for example, (propositional) attitudes. It is an interesting question whether the causal relation CAUSE is the same for proposi­ tions and events or whether they instantiate two different relations. Based on Lewis (1973), Dowty (1979) provides an analysis of the CAUSE assumed for causative predi­ cates in terms of two propositional entities being related. However, it should be noted that Dowty (1979) does not make use of events as an ontological category whatsoever. In principle, a unified analysis would be desirable (for arguments against such a view, see

Page 11 of 48

Causality and Causal Reasoning in Natural Language Wolff et al., 2005), but we will not be able to discuss this issue here as it also depends to a large extent on one’s theory of causality (for discussion, see Copley & Wolff, 2014). The interpretational variation of connectives makes a strong argument for the necessity of a more fine-grained differentiation of the causal relations involved. In her seminal work, Sweetser discusses the different causal relations that because clauses can express (Sweetser, 1990, p. 77): (20) a. John came back, because he loved her. b. John loved her, because he came back. c. What are you doing tonight, because there’s a good movie on. Sweetser assumes that because may express three different causal relations, depending on the domain it applies to: real world (20a), epistemic (20b), and speech act (20c) causa­ tion. In (20a), the because clause (containing proposition1 in the representation in (19)) expresses John’s “realworld cause of his coming back” (proposition2 in (19)); (see Sweetser, 1990, p. 77). In (20b), the coming back does not cause John’s love, but rather the speaker’s reason—the cause of an epistemic state—for concluding that John loved the person in question. Finally, in (20c), the because clause provides justification, or explana­ tion, for uttering the question “What are you doing tonight?” (i.e., a cause for performing the speech act of asking). In this case, the because clause seems deprived of causal mean­ ing in the narrower sense. (p. 626) Rather, it is similar to Austin’s (1961) famous biscuit conditionals (“If you want some cookies, there are some in the cupboard”), where the con­ sequent does not conditionally depend on the antecedent. Sweetser’s point is that these relations are thought to correspond to different linguistic levels, acting below (20a) or above (20b) modal operators or on speech acts operators (20c). Given that because can express all of the preceding causal relations, one might argue that the differentiation in causal relations is dependent on the ontological categories involved. Thus, because has a different meaning if something of the type of a speech act (20c) enters the causal relation as its effect/consequence than when an event description does so (20a). It should be not­ ed that English has connectives that seem to be specialized for only one of these domains. For instance, since may be argued to be a marker related exclusively to the epistemic domain. While nicely illustrating some of the different causal relations we need to distinguish, the distinctions in (20) are not fine-grained enough, as argued by Pander Maat and Sanders (2000). First, the category of real world causation is too coarse: there are connectives that are more specialized in meaning. In (20a), the because clause may be argued to in­ troduce a reason of John’s, that is, a cause for the intention of John to come back (and, ul­ timately, his ultimate action of doing so) (see, e.g., Davidson, 2001, and the discussion in Malle, 1999, and references therein). However, there are real-world causative connec­ tives that cannot introduce reasons. Pander Maat and Sanders (2000) argue that the Dutch connective daardoor can only express real-world causation that does not involve any intentions (causes proper in our terminology); that is, it can be used in a context Page 12 of 48

Causality and Causal Reasoning in Natural Language where some real-world event causes another real-world event without any intentional in­ volvement, but it cannot be used in contexts such as (20a). Furthermore, Pander Maat and Sanders (2000) argue that the Dutch connective daarom can express both real-world causation and epistemic causal relations, but, in opposition to daardoor, it can only express reasons among the real-world causative relations assumed by Sweetser (1990). In sum, these Dutch causal connectives seem to divide up the space of causal relations quite differently from the dimensions discussed by Sweetser (1990). Pander Maat and Sanders (2000) propose that what governs the use of the different causal connectives, and consequently what distinguishes the different causal relations, is speaker involvement, or rather subjectivity. Subjectivity is a gradable, scalar notion where non-volitional real-world causation is maximally objective and epistemic relations are highly subjective. Reasons (“volitional causal relations”) are on an intermediate level of subjectivity. In the section “From Verbs to Discourse: Implicit Causality,” we will discuss further phe­ nomena that call for more fine-grained distinctions than those in Sweetser (1990). Howev­ er, we will divide up the space of causal relations in yet a different way. It should be added that the varying distinctions that have been made are not in sharp opposition, but rather complement each other. A shared assumption is that causal relations need to be finely differentiated and that, in particular, the nature of what is caused (is it an event, a mental state, or something else?) plays a crucial role in the conceptualization of the causal relation. Let us now turn briefly to implicit causal relations in discourse. They are discussed here, rather than in the section “Causal Constructions Without Causative Lexical Triggers,” because they are often analyzed as being the unmarked counterparts of constructions in­ volving because, since, and therefore, as discussed earlier. They are characterized as be­ ing inferred from a sequence of (independent) juxtaposed sentences. Thus, (21a) is as­ sumed to be interpreted in a parallel fashion to (21b): (21) a. Ben broke his leg. He tripped over a wire (Blakemore, 2002, p. 172). b. Ben broke his leg because he tripped over a wire. The second sentence is understood to be an explanation if the first sentence is seen to raise a question that the second sentence answers (see the discussion in Hobbs, 1979; Blakemore, 2002; Mann & Thompson, 1986). In the section “From Verbs to Discourse: Im­ plicit Causality,” we will see a rather detailed case study of how such questions may be raised. For the interpretation of all causal explanations/relations in discourse, the evaluation of the domain knowledge or world knowledge involved is of crucial importance (Asher & Lascarides, 2005; Noordman et al., 2015). However, it is particularly important in the ab­ sence of explicit marking. Noordman et al. (2015) show that world knowledge facilitates Page 13 of 48

Causality and Causal Reasoning in Natural Language causal processing, but only in expert readers (in their study, economists were reading texts on economics). This result could be taken to indicate that causal processing may be rather shallow. Regarding implicit causal discourse relations, two observations may be made that considerably enrich the picture presented in the section “Causality in the Verbal Domain.” The first one concerns manner specification. In the section “Causality in the Verbal Do­ main,” we presented examples where the causing event was specified by means of prepo­ sitional phrases (5) or via the manner information encoded in the verb itself ((6) “destroy by kicking”). However, as discussed in detail by Asher and Lascarides (2005), the cause of a change of state (see the discussion in the section “Causality in the Verbal Domain”) may be specified in a separate sentence in subsequent discourse. Consider the examples in (22): (p. 627)

(22) a. The boat sank. The enemy bombed it. (adapted from Asher & Lascarides, 2005, p. 276) b. The enemy sank the boat by bombing it. (Asher & Lascarides, 2005, p. 270) Asher and Lascarides (2005) observe that the second sentence in (22a) is interpreted in a parallel fashion to the by phrase in (22b). They furthermore contend that there is a pref­ erence for interpreting unmarked sentences following sentences with intransitive sink as explanations or causes of the sinking. Asher and Lascarides (2005) assume that this pref­ erence is due to the semantic nature of sink, which is assumed to introduce an underspec­ ified causal relation in which only the effect is specified. Subsequent material will prefer­ ably be interpreted as specifying the cause. We will return to this issue in considerable detail in the section “From Verbs to Discourse: Implicit Causality.” A second issue concerns the distinction between direct and indirect causation in the sec­ tion “Causality in the Verbal Domain.” It has been suggested by Wolff (see the discussion in 2003, p. 32) that interclausal causation is always indirect. However, the example in (22a) may be taken as evidence that this conclusion may be too strong (as acknowledged by Wolff, 2003, p. 33). After all, it seems natural to assume that the sequence in (22a) is interpreted in parallel to the direct, single-clause causation in (22b) (see the discussion in Asher & Lascarides, 2005, and in the section “From Verbs to Discourse: Implicit Causali­ ty” in this chapter). In the section “From Verbs to Discourse: Implicit Causality,” we will show how causality in the verbal and discourse domains may be unified for certain phe­ nomena.

Experimental Findings on Discourse Connectives There is a rich body of work on the influence of causal connectives on sentence and text comprehension. Several studies suggest a processing difference between causal and noncausal relations in narrative and expository text (e.g., Trabasso & van den Broek, 1985; Sanders & Noordman, 2000). These studies show that causally related events are re­ called better and more easily than events that are not causally related. Furthermore, Page 14 of 48

Causality and Causal Reasoning in Natural Language causally related sentences are read faster (Haberlandt & Bingham, 1978; Sanders & No­ ordman, 2000) and reading time is negatively correlated with the degree of causal con­ nection between propositions in a text (Keenan et al., 1984). These results from behav­ ioral studies have been corroborated in a recent event-related brain potentials study by Kuperberg et al. (2011), who show that domain knowledge has an immediate influence on accessing a word that is causally highly related (23a), intermediately related (23b), or completely unrelated (23c) to the preceding discourse. They observed that the causal re­ lation was directly reflected by the N400 amplitude of the critical (underlined) word, an electroencephalograph (EEG) signal known to be related to lexical integration of a new word. The study demonstrates that even without a connective explicitly marking a causal relation result, comprehenders try to establish causal coherence from the earliest stages of processing an incoming word. (23) a. She forgot to put sunscreen on. She had sunburn on Monday. b. She usually remembered to wear sunscreen. She had sunburn on Monday. c. She always put on sunscreen. She had sunburn on Monday. Interestingly, (23c) is fully acceptable once we explicitly change the discourse relation by inserting a discourse marker such as nevertheless or but. Thus, Kuperberg et al.’s (2011) results may likewise be taken as evidence for Sanders’s causality-by-default hypothesis: Because experienced readers aim at building the most informative representation, they start out assuming the relation between two consecutive sentences is a causal relation. (Sanders, 2005)

Page 15 of 48

Causality and Causal Reasoning in Natural Language The causality-by-default hypothesis, however, is not undisputed. Millis et al. (1995, Exper­ iments 2 and 3) conducted experiments in which they presented two (p. 628) consecutive statements that could be related by any of a number of coherence relations. They manipu­ lated whether the statements were connected by a full stop (no explicit marking), or by because, and, or after, indicating causal, additive, and temporal relations, respectively. Millis et al. (1995) found that causal inferences as probed by why questions after reading the sentence pairs were only reliably drawn in the condition where the causal relation was explicitly marked, but not in the full stop condition. Additional evidence against the causality-as-default hypothesis comes from sentence continuation studies by Kehler et al. (2008). Even though they found that for some verbs, so-called implicit causality verbs (see the section “From Verbs to Discourse: Implicit Causality”), the causal relation of an explanation is in fact the preferred discourse relation chosen to establish a link between sentences; other verbs give rise to non-causal discourse relations. Turning to natural text, however, Asr and Demberg (2012) conducted a corpus analysis on the Penn Discourse Treebank, a treebank annotated for implicit and explicit discourse relations, and provided evidence for the causality-as-default hypothesis. Their corpus analysis revealed that causal relations are more often left implicit than explicitly marked and are thus rather dif­ ferent from other relations, as for instance the concessive example (24) marked by al­ though (with Penn Discourse Treebank labels), which hardly can be left out. (24) [Oil prices haven’t declined]neg−consequence, although [supply has been increas­ ing]cause Psycholinguists have also turned to the different relations that can be expressed by a sin­ gle causal connective and the online effects in language comprehension. Traxler et al. (1997) compared causal (25a) to evidential (25b) uses of because in an eye tracking dur­ ing reading experiment building on theoretical work by Sweetser (1990), who makes a similar distinction between real-world and epistemic causality, respectively (see earlier discussion). (25) a. Heidi felt very proud and happy because she won the first prize at the art show. b. Heidi could imagine and create things because she won the first prize at the art show. c. Heidi must be able to imagine and create things because she won the first prize at the art show. Immediately when readers’ eyes entered the critical region (e.g., she won the first prize) the authors observed a slowdown in the evidential as compared to the causal conditions. The study thus provides evidence that because in evidential contexts leads to immediate processing difficulty. This finding has been confirmed and extended by de Blijzer and No­ ordman (2000) and Mohamed and Clifton (2007), who show that because clauses that re­ quire pragmatic inferences about the possible belief states of the speaker are generally more difficult to process than ordinary factual causals that do not. Existing work also Page 16 of 48

Causality and Causal Reasoning in Natural Language shows that the difficulty disappears once there are expressions that indicate that an infer­ ence about a belief state is required, as for instance the modal verb must in (25c). Work on the acquisition of causal connectives shows that causal connectives appear from relatively early on. The first causal connectives appear roughly between two and three years of age. Studying naturalistic discourse in a longitudinal study, Bloom et al. (1980) found that children acquiring English first learn the connective and, which they initially use to encode pure conjunction of two events. Later, they extend the meaning of and to an “all-purpose” connective, which they use to encode relations other than conjunction (tem­ poral, causal, and adversative). Although and always appeared first in child language, the relative order in which other discourse relations were acquired varied between children. Evers-Vermeul (2005), Spooren and Sanders (2008), and Evers-Vermeul and Sanders (2009) present a pragmatically motivated multidimensional theory of complexity that can account for uniformity and variation in the order of acquisition.

From Verbs to Discourse: Implicit Causality In treating causality in verbs and causal relations at the discourse level, we implicitly ac­ cepted a common division between the two domains. However, what we will present in this discussion shows that such a dividing line is not reasonable for all cases. In the pre­ ceding discussion of implicit and explicit explanatory discourse relations, we discussed one case in point from Asher and Lascarides (2005), who claim that in some cases, lexical knowledge may facilitate the understanding of two propositions as being related via cau­ sation. In this section, we will discuss the phenomenon of implicit causality, which is re­ lated to cases discussed by Asher and Lascarides (2005), although the authors only dis­ cuss a limited set of verbs and do not refer to implicit causality. Implicit causality verbs have (p. 629) received significantly more attention in psycholinguistics than in theoretical linguistics. One of our main motivations for including a rather long section on these verbs is that we believe that both psychologists and linguists might benefit greatly from tack­ ling this phenomenon. Implicit causality (henceforth, IC) verbs have been studied extensively in psycholinguis­ tics and social psychology (see, e.g., Au, 1986; Brown & Fish, 1983; Garvey & Caramazza, 1974; Pickering & Majid, 2007; Rudolph & Försterling, 1997). In general, IC verbs are in­ terpersonal predicates (i.e., they denote a relation between two animate arguments). What makes them particularly interesting is that they trigger explanations which focus on one of the two arguments when followed by a because clause. The preference for such specific because explanations is standardly elicited in production studies where partici­ pants are asked to freely continue the sequence “ARGUMENT VERB ARGUMENT because” (see, e.g., Ferstl et al., 2011; Garvey & Caramazza, 1974), as in (26) with sample continuations in parentheses: (26) a. Jane impressed Pete because … (she was always in control of the situation). Page 17 of 48

Causality and Causal Reasoning in Natural Language b. Jane admired Pete because … (he was always in control of the situation). For impress, there is a strong preference for providing an explanation referring primarily to the subject argument (26a). For admire, on the other hand, participants preferably pro­ duce continuations referring to the object argument (26b). Based on the linear order of subject and object, impress is characterized as an NP1 bias verb (Jane, the subject argu­ ment, is the first argument one encounters when processing the sentence from left to right), whereas admire is referred to as an NP2 bias verb. The proportion of subject or ob­ ject continuations is referred to as IC bias. Obviously, in addition to the bias-congruent continuations in (26), participants may pro­ duce non-bias-compliant continuations (27b): (27) a. Jane admired Pete because he was always in control of the situation. = (26b) b. Jane admired Pete because she was easily impressed. Three aspects of IC are particularly relevant with respect to causality in language and causal reasoning. First, it has been shown by Kehler et al. (2008) that IC verbs trigger a significantly higher number of explanations than other verbs when prompted for continu­ ation with a full stop following the “ARGUMENT VERB ARGUMENT” sequence. Thus, in the case of impress, participants tend to produce continuations that explain how the im­ pression comes about (28a), and not what follows from it (28b). Understanding the phe­ nomenon of IC therefore helps in understanding the connections between causality ex­ pressed in verbs and causality in discourse. (28) a. Jane impressed Pete. She was always in control of the situation. (= Explana­ tion) b. Jane impressed Pete. He decided to ask her for advice. (= Result) Another important aspect concerns the relationship between co-reference and explana­ tion. It may be observed that incongruent continuations such as (27b) are explanations of a very different kind than bias-congruent continuations. Consider again (27). Whereas the bias-compliant continuation in (27a) explains Jane’s admiration for Pete with reference to a property of Pete, the non-bias-compliant continuation in (27b) refers to a property of Jane. The phenomenon of IC bias thus allows us to systematically study the range of ex­ planation types and their interrelations. Finally, recent online processing studies of IC suggest that the bias takes the form of a proactive discourse expectation. The sequence “NAME IC-VERB NAME” immediately seems to give rise to an expectation about an upcoming explanation of a fairly specific type. Studying the time course and nature of IC bias effects thus opens a window into fairly automatic processes of causal reasoning as they happen online.

Page 18 of 48

Causality and Causal Reasoning in Natural Language In this section, we will discuss the interconnection between IC predicates and particular types of explanations. We will make use of some ideas suggested by Asher and Lascarides (2005) for other classes of verbs and explain why it is that IC verbs tend to trigger expla­ nations in production, as observed by Kehler et al. (2008), and what the specific charac­ teristics of these explanations are. The upshot is that IC verbs trigger explanations be­ cause they are semantically predetermined to do so as part of their lexical properties. IC verbs, we contend, are underspecified with respect to certain properties that are (causal­ ly) contingent on one of the two participants in the situation they describe. It is this miss­ ing information that triggers explanations in full stop continuations and that also triggers (p. 630) primary reference to one of the two participants. Based on this explanation of the phenomenon, we will outline two modes of processing commonly observed in online stud­ ies on discourse processing. Fast and anticipatory processing of still missing information —often referred to as “focusing”—and integration of discourse units that can only happen after they have both been fully processed.

Empirical Research on Implicit Causality We already have mentioned the finding from the discourse continuation study by Kehler et al. (2008) that IC verbs triggered explanations per default (> 60% explanations), whereas no such preference could be observed with “non-IC verbs” (< 25% explanations). The first studies to employ the sentence completion task mentioned earlier were Garvey and Caramazza (1974) and Garvey et al. (1974). Since then, congruency effects of IC have been assessed to be highly reliable, and a number of psycholinguistic methods have been applied across different languages, both in adults and in children (cf. the review in Rudolph & Försterling, 1997). Garvey and Caramazza attribute congruency effects to a feature of causal directionality encoded in the lexical semantics of interpersonal verbs. In later work, Brown and Fish (1983) argue for linking IC to systematic differences between certain classes of verbs and thematic properties of their arguments. They observe systematic differences between ac­ tions and states, corresponding essentially to the distinction between agent–patient and two kinds of psychological verbs: stimulus–experiencer and experiencer–stimulus verbs. They suggest that agent–patient verbs attribute causes to the agent role, while psycholog­ ical verbs attribute them to the stimulus argument. This line of research has been fol­ lowed by other researchers and has led to a more fine-grained taxonomy, termed the re­ vised action-state distinction (e.g., Rudolph & Försterling, 1997). Au (1986) notes that agent–patient verbs do not constitute a homogeneous class, but should be further split in­ to agent–patient verbs proper and a verb class that she terms agent–evocator verbs, that is, verbs that denote interpersonal actions in which the patient/theme presumably evokes the intention of the agent to act (e.g., thank, criticize, and congratulate). The studies by Rudolph (1997), Goikoetxea et al. (2008), and Ferstl et al. (2011) have provided norms for different languages and a large number of verbs, lending support to the four-way classifi­

Page 19 of 48

Causality and Causal Reasoning in Natural Language cation into agent–patient, agent–evocator, stimulus–experiencer, and experiencer–stimu­ lus verbs. In a framework based on folk psychology, Malle (2002) has pointed out that an explana­ tion is needed concerning what constitutes the connection between thematic properties and causal dependencies (see also the discussion in Crinean & Garnham, 2006; Garnham, 2001; Pickering & Majid, 2007). A recent proposal in this direction has been made by Hartshorne and Snedeker (2013) grounding IC in the semantics of VerbNet (KipperSchuler, 2006). It is also important to note that verb semantics is only one of the factors contributing to IC bias. It has been shown that perceived causality also takes into account the semantic connotations of the nouns denoting the participants (Corrigan, 2001; Garvey & Caramaz­ za, 1974). Corrigan (2001) provides evidence that some nouns are rated as more agentive than others (e.g., mugger vs. passerby) and are thus more likely to be perceived as insti­ gators of events. Similarly, Ferstl et al. (2011) demonstrates that even the gender infor­ mation of proper names (John fascinates Mary vs. Mary fascinates John) affects IC bias. These observations have led some researchers to the conclusion that IC is not a genuinely linguistic phenomenon, but primarily reflects encyclopedic knowledge about specific events in a given social or cultural setting (for more general considerations regarding so­ cial interaction and causal attribution, see Hilton, Chapter 32 in this volume). On the other hand, it has often been claimed that IC constitutes a cognitive universal; that is, the same verb will elicit the same bias across languages (e.g., Ferstl et al., 2011; Goikoetxea et al., 2008; Rudolph & Försterling, 1997). To date, however, only very few studies have systematically investigated IC cross-linguistically using the same set of verbs and employing the same methods across languages (see Hartshorne et al., 2013). IC bias­ es in English, Japanese, Mandarin, and Russian emotion verbs turn out to be highly corre­ lated across languages and cultures (see also Bott & Solstad, 2014, for a strong cross-lin­ guistic parallelism in IC bias between German and Norwegian). Besides the verbs and their arguments, the connective is a crucial determinant of IC bias, too (Ehrlich, 1980; Stevenson et al., 1994, 2000). Testing connectives other than because (such as and, but, and so), as well as a full stop condition, yielded continuations with dif­ ferent anaphoric preferences. Extending this line of research, Kehler et al. (2008) have provided evidence that IC bias hinges on the discourse relation of explanation. They demonstrate that the phenomenon is stable across different kinds of discourse connec­ tives (explicit explanations with (p. 631) because vs. implicit explanations after a full stop), once productions are conditioned on the explanation relation. The preferences for specific patterns of co-reference that are due to IC have been shown to elicit effects in online comprehension tasks, as well. Caramazza et al. (1977) observed longer response times when judging co-reference in incongruent continuations (27b) than in congruent (27a) ones. However, it is still an open question whether IC has an early fo­ cusing effect (e.g., Koornneef & van Berkum, 2006; Pyykkönen & Järvikivi, 2010; van Berkum et al., 2007), that is, whether the congruence effect shows up right at the pro­ Page 20 of 48

Causality and Causal Reasoning in Natural Language noun, or constitutes a later effect on clausal integration (e.g., Garnham et al., 1996; Guer­ ry et al., 2006; Stewart et al., 2000). Recent studies show that IC verbs yield early con­ gruency effects immediately at the pronoun. This has been demonstrated in eye-tracking studies during reading by Koornneef and van Berkum (2006) and Featherstone and Sturt (2010) and in evoked potentials (ERPs) when encountering the pronoun (van Berkum et al., 2007). The visual world eye-tracking stud­ ies by Pyykkönen and Järvikivi (2010) provide evidence for the activation of IC informa­ tion even before participants encounter the causal connective (see also Cozijn et al., 2011, for immediate focusing effects). This is strong evidence that the lexical semantics of the verb is responsible for the anticipation of an explanation that has to be determined in subsequent discourse. In order to emerge so quickly and automatically, these discourse expectations have to be calculated without putting too much computational burden on the comprehender (e.g., not relying on costly inferences in the style of Hobbs, 1979). Although there is a large number of experimental studies on IC bias, the phenomenon has been largely neglected in linguistic theory. By and large, existing theories have either tried to predict the bias on the basis of the argument structures that a given verb licenses (see Hartshorne & Snedeker, 2013; Hartshorne, 2014, for discussion and the most finegrained version of such a “verb-based account”) or have refuted the claim that IC bias is a linguistic phenomenon at all, explaining it as resulting from world knowledge instead. Recently, we have proposed a theory of IC bias that grounds the phenomenon in the com­ positional semantics and the pragmatics of explanatory discourse (Bott & Solstad, 2014). In the following, we will briefly sketch our account and present some evidence from lan­ guage production experiments.

The Semantics of the Implicit Causality Bias IC verbs, we propose, trigger expectations for specific explanation types. They do so be­ cause they are underspecified with respect to certain properties of the situation de­ scribed which are (causally) contingent on one of the two participants. Put differently, IC verbs carry an empty “slot” for specific explanatory content. It is this missing information that triggers explanations in full stop continuations and that also triggers primary refer­ ence to one of the two participants. IC thus reflects a general processing preference for not leaving missing content unspeci­ fied, that is, a tendency to avoid accommodation (Altmann & Steedman, 1988; van der Sandt, 1992). In our analysis, IC bias as a measure of co-reference preferences is an epiphenomenon of specific explanatory preferences derived from verb semantics and the particular realization of the verbal arguments.

Page 21 of 48

Causality and Causal Reasoning in Natural Language In order to capture the relation between explanations and co-reference, we need to distin­ guish several types of explanations. Based on Solstad (2010) and Bott and Solstad (2014), we distinguish three main types of explanations: (1) simple causes, (2) external reasons, and (3) internal reasons (see the examples in (29)): (29) a. Simple cause: John disturbed Mary because he was making lots of noise. b. External reason: John disturbed Mary because she had damaged his bike. c. Internal reason: John disturbed Mary because he was very angry at her. Simple causes are causes of events or (mental) states. They never involve volition or in­ tention. In (29a), Mary’s feeling disturbed is understood to be a byproduct, as it were, of John’s noise-emitting activity. External and internal reasons are causes of attitudinal states (cf. e.g., Davidson, 2001). Thus, the because clauses in (29b) and (29c) specify causes for John’s intention to disturb Mary. External reasons (29b) are states of affairs ex­ ternal to the attitude-bearer’s mind, whereas internal reasons (29c) are attitudes or men­ tal states internal to the attitude-bearer’s mind. The interdependency between explana­ tion type and reference resolution may be seen in (29b) and (29c). The external reason is associated with, and thus makes primary reference to, the object (NP2) argument by mentioning it first, whereas the internal reason is associated with the subject (NP1) argu­ ment. We also assume backgrounds as a fourth type of explanation. Backgrounds pro­ vide information that makes possible, or “facilitates,” the situation described by the verb (see example (30)): (p. 632)

(30) a. Felix frightened Vanessa because he suddenly screamed. b. Felix frightened Vanessa because she didn’t hear him coming. Whereas the because clause in (30a) specifies the simple, direct cause of Vanessa’s state of being frightened, the because clause in (30b) specifies the background (or precondi­ tions) for Vanessa’s being frightened, saying nothing about the actual cause of Vanessa’s fear. Note that all four types of explanations fall into the category of the real-world causal relation of Sweetser (1990) (see the section “Explicit and Implicit Causal Relations in Dis­ course”). Thus, our typology further divides up the typological space of causal relations in language. Finally, it is of importance to our approach that because clauses introduce entities propo­ sitional in nature (cf. Solstad, 2010). Thus, if because clauses are to suitably specify un­ derspecified causal entities, these must likewise be of the semantic type of propositions. This explains why the eventive, non-propositional cause of the causative verb kill cannot be specified by a because clause: (31) Mary killed Peter… by stabbing him in the back. Page 22 of 48

Causality and Causal Reasoning in Natural Language #because she stabbed… Two kinds of underspecified content trigger explanatory expectations. The one we will fo­ cus on here involves arguments that are underlyingly propositional in nature (the other involves projective/presuppositional content; for details, see Bott & Solstad, 2014). Con­ sider the stimulus argument of the stimulus–experiencer verb annoy. In the sentence Mary annoyed John, Mary may be seen as a mere placeholder of a semantic entity more complex in nature. It is actually a specific property or action of Mary’s that is the cause of John’s being annoyed. Support for this analysis derives from the fact that stimuli in gener­ al may be realized as either noun phrases or that clauses: Mary annoyed John/It annoyed John that Mary … Stimuli are simple causes, contributing to NP1 bias for stimulus–expe­ riencer verbs, and to NP2 bias with experiencer–stimulus verbs. This causal nature of stimulus arguments is determined by the lexical semantic structure of the verb (for de­ tails, see Bott & Solstad, 2014, pp. 223–224).

Implicit Causality as Explanatory Gaps: Experimental Findings Empirical evidence for our proposal comes from sentence continuation experiments re­ ported in Bott and Solstad (2014) and Solstad and Bott (2013). In these experiments (orig­ inally conducted in German and Norwegian), participants were asked to continue prompts with stimulus–experiencer verbs (32), experiencer–stimulus verbs (33), and causative agent–patient verbs (34). Importantly, causative agent–patient verbs are also considered to involve a CAUSE in their lexical semantics (see the section “Causality in the Verbal Domain”), but of a different nature than stimulus–experiencer verbs, as we will see. Conditions with a full stop were included in order to determine how often partici­ pants continued with an explanation or chose some other discourse relation (percentage of explanations reported in parentheses). All explanation continuations were then anno­ tated with respect to their explanation type and co-reference (percentages of simple caus­ es and re-mentions of NP1 in parentheses).

(32)

(33)

Page 23 of 48

Causality and Causal Reasoning in Natural Language

(34)

The results showed that IC verbs of both the stimulus–experiencer type and of the experi­ encer–stimulus type received far more explanations than causative agent–patient verbs. This confirms our claim that the former two kinds of verbs have an explanatory gap that can be filled in the subsequent discourse unit. This is different for causative agent–pa­ tient verbs. We attribute this to the fact that, although they involve a CAUSE as part of their semantic representation, the [x DO-SOMETHING] component cannot be specified by a because clause: (35) #Mary murdered John because she stabbed him in the back. (p. 633)

As can be seen from the figures in (32)–(34), the distribution of explanation types

was as expected. For both stimulus–experiencer and experiencer–stimulus verbs, simple causes were the dominant explanation type. However, causative agent–patient verbs did not allow for because clauses expressing simple causes, in line with our intuitions for (35). Rather, prompts involving causative agent–patient verbs were continued with expla­ nations of the (internal or external) reason type. Since simple causes of psychological verbs are associated with the stimulus argument, we observed a strong preference for remention of the stimulus argument (i.e., the subject argument in stimulus–experiencer and the object argument in experiencer–stimulus verbs). For ordinary agent–patient verbs, our experiments showed that the bias was highly predictable from the ratio of NP1-relat­ ed internal reasons relative to the NP2-related external reasons. This ratio accounted for 75% of the variance.

Page 24 of 48

Causality and Causal Reasoning in Natural Language Another piece of evidence stems from a further continuation study directly comparing stimulus–experiencer with causative agent–patient verbs. This time, we manipulated whether a simple cause was already given in the prompt via an appropriate by or with phrase, which, in our account, should pre-empt the bias, leading to a change in strategies for continuation. Sample materials and descriptive statistics are summarized in (36).

(36)

The experiment showed that the presence of a prepositional phrase explicitly introducing a simple cause has only a very minor effect in causative agent–patient verbs. In connec­ tion with stimulus–experiencer verbs, however, the explanation profile changes entirely. We observed a strong drop in simple causes relative to the unmodified conditions. In­ stead, participants mainly produced background and internal reason explanations. IC bias, which was quite strongly NP1 in the unmodified conditions, was fully balanced after modification. Thus, stimuli arguments of psychological verbs seem in fact to be placehold­ Page 25 of 48

Causality and Causal Reasoning in Natural Language ers for an underspecified property. The experimental findings show that causal language strongly depends on the semantic, ontological properties of the natural language expres­ sions chosen to express causal content. Relating our analysis of IC bias to causal reasoning, we can distinguish two modes of es­ tablishing causal relations in the processing of natural language discourse. As mentioned earlier, numerous studies have shown that at least in some cases implicit causality bias can affect pronoun interpretation almost immediately (Cozijn et al., 2011; Featherstone & Sturt, 2010; Koornneef & Sanders, 2013; Koornneef & van Berkum, 2006; Pyykkönen & Järvikivi, 2010; van Berkum et al., 2007). This raises the question of whether this is suffi­ cient time for complex inferences based on unconstrained amounts of world knowledge. In contrast, psycholinguistic research has established that comprehenders use the lexical semantics of verbs to predict upcoming words within a few hundred milliseconds of en­ countering the verb (see the review in Kamide, 2008). Our analysis of IC verbs predicts these fast, anticipatory effects of IC bias. On the other hand, the reviewed literature on establishing causal connections in discourse has also provided evidence for time-consum­ ing inferences of discourse relations in unmarked discourse. Again, this is fully expected in our account, namely in those cases where no explanatory slot is conveyed by the lexi­ cal semantics of the verb or in those cases where an existing explanatory slot has to be filled with a cause of a different type than expected.

Causal Constructions Without Causative Lexi­ cal Triggers In the preceding sections we have reviewed work on causality in language with a special focus on special lexical devices—causal expressions—dedicated to linguistically encoding causal relations. However, causality in language goes well beyond lexical triggers of causal interpretations. Often, causal models are themselves an integral part of language interpretation, and linguistic analyses of various constructions (p. 634) have therefore been grounded in causality. In this section we will look at a selection of such construc­ tions whose analysis crucially involves the concept of causality even though they may not contain any causal expressions, that is, lexical items referring to causes or effects. Most prominent among these constructions are conditionals, but they also include non-culmi­ nating accomplishments and the resultative construction. We will first review existing work on counterfactual conditionals and then briefly comment on other construction types.

Conditionals Conditionals do not express a causal relation explicitly, yet they involve causal models in their evaluation. In the analytical philosophic tradition following Stalnaker and Lewis, conditionals have been extensively studied, and philosophers have even tried to ground the conceptual analysis of causality in their semantics. Recent developments in seman­ tics, however, show a trend in the opposite direction. Semanticists have proposed analy­ Page 26 of 48

Causality and Causal Reasoning in Natural Language ses of conditionals treating causality as a primitive notion. In the following, we will pro­ vide a short summary of these developments. Counterfactual definitions of causality can be traced back to Hume: We may define a cause to be an object followed by another, and where all the ob­ jects, similar to the first, are followed by objects similar to the second. Or, in other words, where, if the first object had not been, the second never had existed. (Hume, 1748, Section VII, our emphasis) The quote from Hume exemplifies that natural language uses specific constructions in or­ der to express a causal relation. In this example, we find a subjunctive conditional (hence­ forth counterfactual) used to define the notion of a cause. Ever since Hume, counterfactual dependency has been a central aspect and often even a defining condition of causality. Counterfactuals, such as the would and might conditionals with subjunctive mood in (37), may thus be taken to exemplify causal reasoning par excel­ lence. (37) a. If John had extinguished the fire, the forest would not have burned down. b. If John had thrown away the cigarette, the forest might have burned down. The consensus view among philosophers is that even though there is a close connection between counterfactuals and causality, it is not trivial to explain the meaning of causal ex­ pressions exhaustively by the semantics of counterfactual conditionals (see the discus­ sions in Menzies, 2011, and in this volume, Gerstenberg & Tenebaum, Chapter 27, and Lagnado & Gerstenberg, Chapter 29). This is due to the fact that not every counterfactual statement involves causation, and not every causal statement may be expressed by a counterfactual. For instance, the sentence If I had not been born in 1973, I would not have been 41 in 2014 is intuitively true, but being born in 1973 is not a cause of being 41 in 2014 (Kim, 1973). Similarly, pre-emption examples like the famous scenario illustrated in (p. 635) Figure 31.1, where two assassins are hired to murder a prominent victim, show that causal relationships sometimes resist their transformation into a counterfactual con­ ditional.

Page 27 of 48

Causality and Causal Reasoning in Natural Language

Figure 31.1 Simplified causal models of the pre-emp­ tion example. The example at the left shows the causal model of the scenario without an intervention. At the right, the counterfactual is modeled as inter­ vention of the propositional variable assassin 1 shoots, which is set to 0 (“false”) and disconnected from its parent node. Even though assassin 1 doesn’t shoot, the prominent victim will be dead because as­ sassin 2 shoots as a backup. Thus, even though the first assassin’s shot is a cause in the model, the coun­ terfactual had the first assassin not shot the promi­ nent victim would not be dead is predicted to be false, in line with semantic intuitions (for details, see Halpern and Pearl, 2005; Pearl, 2000).

The example works as follows: The first assassin carries out the execution, pre-empting the action of the second. In this case we would clearly judge assassin 1 caused the victim to die to be true, but the counterfactual had assassin 1 not shot, the victim would still be alive as false. Psychological experiments also show that causal judgments differ from pre­ ferred counterfactual descriptions of one and the same situation. In a classic experiment (Mandel & Lehman, 1996), stories were presented to participants that included both an enabling condition and a direct cause. One of these causal scenarios described a situation in which a man took another route than the usual one to drive home. On his way home, he was killed in an accident when his car was hit by a truck whose driver was on drugs. When participants were asked for the continuation that first came to their mind for a counterfactual statement if only …, they tended to continue by elaborating on the en­ abling condition the man had taken his usual way. By contrast, when asked for the actual cause of the man’s death, they generally mentioned the drunken truck driver. Thus, al­ though both types of causes can principally be described with a counterfactual condition­ al, it seems that at least the if only type of counterfactuals are preferably used to report causal reasoning about a state of affairs in which the first proposition in the causal chain is different (for a comprehensive review of these findings, see the articles in Hoerl et al., 2011).

Page 28 of 48

Causality and Causal Reasoning in Natural Language Even though there is no perfect match, counterfactuals and causality seem to be close enough to think that understanding one may shed light on the other (see, e.g., the review in Menzies, 2014). Counterfactual theories of causality became popular with the publica­ tion of theories of natural language counterfactual conditionals by Stalnaker (1968), Stal­ naker and Thomason (1970), and Lewis (1973). Stalnaker suggests analyzing examples such as (38) in a possible world semantics. (38) If the Chinese enter the Vietnam conflict, the United States will use nuclear weapons. According to this account, we take our beliefs about the actual world and then add to these beliefs the proposition expressed by the antecedent of the conditional, making “whatever adjustments are required to maintain consistency” (Stalnaker, 1968, p. 102). To evaluate the counterfactual we then have to consider whether the consequent is true given the resulting set of beliefs. In example (38) this would mean that the sentence is evaluated against a state of affairs in which the United States is still engaged in the Viet­ nam conflict. To this state of affairs the counterfactual assumption that China has entered the conflict has to be added. Now, the counterfactual conditional is true just in case the consequent the United States will use nuclear weapons is entailed by the resulting hypo­ thetical set of beliefs. Stalnaker formalizes this idea in possible world semantics by defining a selection function f that takes a proposition A and a possible world α (here, the actual world) as arguments and returns another possible world α' = f (A, α) that differs minimally from α. To select this world, we need an ordering relation between worlds with respect to their resem­ blance to a given world α. The truth conditions of a counterfactual conditional are as fol­ lows: (39) If A were the case, then B is true in α if and only if B is true in α'. Stalnaker notes that a selection function that delivers a single “closest” world is not ade­ quate since often there seems to be a whole sphere of closest worlds. This is illustrated by the two counterfactuals in (40) from Lewis (1973, p. 80). (40) a. If Bizet and Verdi were compatriots, Bizet would be Italian. b. If Bizet and Verdi were compatriots, Verdi would be French. Solving this problem, Lewis (1973) proposes his famous possible world semantic analysis of counterfactuals where possible worlds are ordered in spheres around the ordering source α. According to this analysis, a counterfactual if A were the case then B is true if and only if B is true in all closest A-worlds (i.e., the ordinary conditional A → B holds in all closest worlds). Building on this analysis of counterfactual conditionals, Lewis (1973) proposes a reductionist account of causation by defining the relation of cause as involving two counterfactual conditionals that have to be true to make C cause E true: if C were the case, E would be the case and if C were not the case, E would not be the case. The first is Page 29 of 48

Causality and Causal Reasoning in Natural Language trivially true, once we assume that the antecedent is not true in the actual world and that the ordering source (i.e., the actual world) is closest to itself (for discussion, see Menzies, 2011). However, a number of counterexamples to Lewis’s original theory of counterfactu­ als have been put forward. The following example (p. 636) by Fine (1975) shows that the simple similarity of appearance cannot adequately serve as a basis for the required simil­ iarity relation between worlds. (41) If Nixon had pressed the button, there would have been a nuclear holocaust. Fine notes that this counterfactual is actually judged true. However, a possible world af­ ter a nuclear war is apparently far less similar to the world during the cold war than a world in which the button did not work properly and nothing happened. Thus, if the simi­ larity relation simply equaled similarity of appearance, Fine’s example should come out false in Lewis’s analysis. Lewis reacts to counterexamples such as these by trying to bring the required concept of similarity closer to causal intuitions (Lewis, 1979). However, we think it is fair to say that the similarity relation between worlds remains a vague concept only defined on an intu­ itive basis. This has changed with the work of Pearl (2000), who proposes a theory of counterfactuals on the basis of the notion of interventions in a causal model (see Over, Chapter 18 in this volume, and Oaksford & Chater, Chapter 19 in this volume, for compar­ isons with probabilistic approaches to causality and applications to causal reasoning tasks). For illustration purposes, let us consider the previous pre-emption example once again (Figure 31.1). The central idea of Pearl’s theory is that “the semantics of counterfactual conditionals re­ lies on a causal notion of entailment” (Schulz, 2011). A counterfactual is true if and only if an intervention of the node in the causal model corresponding to the antecedent entails the truth of the consequent. For instance, if we want to check whether the earlier preemption statement concerning the two assassins is true in the causal model shown at the left in Figure 31.1, the semantics works as follows. First, we intervene at the variable rep­ resenting the action of the first assassin and set it to a value of zero, corresponding to a state of affairs in which assassin 1 did not shoot, as illustrated in the model at the right in Figure 31.1. The intervention cuts off this variable from all the variables it depends on, here, that assassin 1 has been hired. The rest of the causal Bayesian network, including the background variables representing the facts in the world or the enabling conditions not directly causally relevant, just stay the same. Interventions are thus strictly local.1 Interventionist accounts of counterfactuals provide us with easy solutions to the problems raised earlier for the possible world analyses of Stalnaker (1968) and Lewis (1973). With­ in an interventionist framework, Fine’s example (41) is fully expected to be evaluated as true. This is due to the fact that the underlying causal model plausibly includes a simple causal connection between the propositional nodes press the button and the node nuclear holocaust, hence intervening at press the button will result in nuclear holocaust. Pearl’s theory has been adopted in linguistics by Schulz (2007), who transfers it back to possible world semantics. Importantly, within these frameworks causality is the basic notion un­ Page 30 of 48

Causality and Causal Reasoning in Natural Language derlying the meaning of counterfactuals and not vice versa, as the accounts by Stalnaker (1968) and Lewis (1973) might suggest. However, it has been pointed out by various re­ searchers that the causal model account of counterfactuals can only account for a subset of the counterfactuals and counterfactual interpretations that we actually find in natural language. Examples include not only the non-causal counterfactuals described earlier, but also the examples in (40) as well as back-tracking counterfactuals such as (42) (see Rips, 2010, for an empirical study on this kind of counterfactuals, and Hiddleston, 2005, for a theory aimed at dealing with back-tracking counterfactuals) and counterfactuals in epis­ temic interpretations (see, e.g., Schulz, 2007). (42) If the targets were dead, would either of the assassins have shot? Although Pearl’s interventionist account of counterfactuals solves a number of problems of its predecessors, it remains unclear how the underlying causal models can be built generatively using the information provided in the linguistic and extralinguistic contexts. As the theory stands, the causal models corresponding to a given scenario have to be con­ structed on purely intuitive grounds. Promising steps in the induction of causal models have, however, been made by Goodman et al. (2011) (see also Rottman, Chapter 6 in this volume, for the acquisition of causal structure knowledge). We cannot do justice to the existing linguistic literature on counterfactuals, but would like to mention that alternative accounts have been developed in formal semantics, in particu­ lar within premise semantics. The interested reader is referred to work by Kratzer (1989) (but see Kanazawa et al., 2005, for a critical evaluation of Kratzer’s semantics). To sum­ marize, we think it is fair to say that counterfactuals are in fact closely related to causali­ ty in that causal models are central to providing a semantic analysis of them. (p. 637)

Experimental Findings on Counterfactual Conditionals

Turning to psychological work on the interpretation of conditionals, they have received a great deal of attention in the psychology of reasoning. We cannot do justice to this re­ search here, but refer readers to Johnson-Laird and Byrne (2002), and Byrne (2002) (see also Johnson-Laird & Khemlani, Chapter 10 in this volume). In the following, we will briefly review existing psycholinguistic work on the acquisition and processing of coun­ terfactuals in order to shed light on the cognitive foundations of interpreting counterfac­ tual conditionals. Not surprisingly, a correct understanding of counterfactuals only emerges relatively late in language development—much later than the first appearance of causal expressions in language, such as causative verbs and causal connectives. Beck et al. (2011) have identified several stages in an ordered series of acquisition steps leading to adult-like counterfactual interpretations. The first step is that children become able to think about alternative possible worlds, but only if these do not contradict what they know to be true. This includes the understanding of statements about the future. At around four years of age, children become able to speculate about things they know to be false, resisting interference from the real world. Only after this stage are children able to relate the counterfactual worlds to the actual state of affairs, establishing the required Page 31 of 48

Causality and Causal Reasoning in Natural Language similarity relation between the counterfactual and the actual situation. Of course, Beck et al.’s (2011) results could also be described in interventionist terms. These studies suggest that the adult-like understanding of counterfactuals involves as necessary ingredients some of the evaluation steps proposed in the reviewed semantic accounts of counterfactu­ al conditionals. Only relatively recently have psycholinguists started to investigate the online comprehen­ sion of counterfactual conditionals and hypothetical sentences (Claus, 2005). Stewart et al. (2009) found increased reading times when the antecedent of a counterfactual such as had Darren been athletic … is actually true (i.e., following a discourse context stating that he is). This supports the view that during the online interpretation of counterfactual con­ ditionals the antecedent is automatically interpreted relative to contextually given infor­ mation in order to come up with a hypothetical counterfactual situation. Once this coun­ terfactual hypothetical situation is constructed (i.e., after having processed the an­ tecedent), the question occurs whether interpreters completely switch to the counterfac­ tual situation or whether they still hold active the actual state of affairs. This question is addressed in a study by Nieuwland and Martin (2012). The authors used highly constrain­ ing materials such as Spanish counterparts of if NASA had not developed its Apollo pro­ gram, the first country to land on the moon would have been Russia. They found no evi­ dence in event-related potentials (see, e.g., Luck, 2014) that real-world knowledge en­ tered into the evaluation of the critical word Russia in their counterfactual condition. By contrast, measuring eye movements during reading (Ferguson & Sanford, 2008) and in the visual world paradigm (Ferguson et al., 2010) as well as event-related potentials dur­ ing reading (Ferguson et al., 2008) have provided evidence for intrusion effects in less constraining consequence sentences, such as, for example, (43). (43) If cats were not carnivores, families would feed their cat a bowl of carrots. Taken together, readers thus seem to be able to predict consequences directly in the counterfactual world without relating it to the actual situation, but only in highly con­ straining contexts in which world knowledge encompasses comprehensive causal models.

Page 32 of 48

Causality and Causal Reasoning in Natural Language

Other Linguistic Constructions In section “Causality in the Verbal Domain” we discussed causal event decompositional analyses of causative verbs according to the scheme [[event1] CAUSE [event2]]. In lin­ guistic analyses, this complex type of event is often referred to as an accomplishment (Vendler, 1957). The analysis thus involves a causal chain of events, and sentences with accomplishment verbs are standardly taken to express a causal relation of the form of ∃e1∃e2(e1 CAUSE e2) (see, e.g., Dowty, 1979). The problem with this account, already not­ ed by Dowty himself, is that the second event need not necessarily occur and that existen­ tial quantification over e2 is too strong. This is the so-called imperfective paradox in im­ perfectives and progressives (44a). An even stronger case comes from non-culminating accomplishments, as illustrated in the German example (44b). (44) a. The architect was building a house, but he didn’t finish. b. Der Architekt errichtete ein Haus. Er tat dies zwei Jahre lang, dann gab er auf. The architect built a house. He did so for two years, then he gave up on it. Due to space limitations, we cannot go into details here, but would like to point out that recently proposed analyses of the imperfective paradox heavily rely on certain logical properties inherent to causal reasoning. Causal inferences always hold ceteris paribus, that is, they are necessarily defeasible and can be abolished by intervening fac­ tors. The first solution to the imperfective paradox along these lines was proposed by Hamm and van Lambalgen (2005), who claim that non-culminating accomplishments in­ volve computing a causal inference that has to be canceled if a disabling condition is added to the model. Very similar in spirit is the force-dynamic analysis of non-culminating accomplishments by Copley and Harley (2015), who propose that accomplishments have to be evaluated in a causal model with a force-dynamic transition from an initial state into an effect state. The central idea is that a cause does not necessarily lead to an effect. In particular, it fails if some opposing force intervenes. In a sense, non-culminating accom­ plishments are thus analyzed in both theories via a non-monotonic update of a minimal causal model. We would like to add that this analysis is fully consistent with experimental results from language comprehension. Comprehension studies on the imperfective para­ (p. 638)

dox (Baggio et al., 2008) and on non-culminating accomplishments (Bott & Hamm, 2014) have provided evidence for processing costs in both kinds of examples, and the authors have attributed these costs to the computation of a minimal causal model and its recom­ putation by adding defeaters. Another construction where causality comes into play at a supralexical level is the resultative construction already mentioned in section “Causality in the Verbal Domain.” Consider also the following examples: (45) a. John sneezed the handkerchief over the table. b. Mary cried herself into hysteria.

Page 33 of 48

Causality and Causal Reasoning in Natural Language Note that neither sneeze nor cry is a causative verb. Nevertheless, both examples in (45) are clearly causal. This raises an issue regarding compositionality: If it cannot be linked to any particular word, what is the source of the causative interpretation in these exam­ ples? It seems that cause and effect are only loosely associated in these examples. We cannot go into further details here, but refer the interested reader to Goldberg and Jack­ endoff (2004) for an analysis in terms of a distinguished construction meaning. We would also like to mention again that resultatives consisting of an activity verb and a state de­ scription such as hammer the metal flat are only compatible with direct causation (see Bittner, 1999; see also the discussion of direct vs. indirect causation in the section “Causality in the Verbal Domain”). To conclude this section, we must note that the mentioned constructions in no way consti­ tute an exhaustive list of causal constructions in natural language. Much needs to be added. However, the discussion of the English progressive reveals that causal analyses have been proposed for linguistic phenomena like tense and aspect that at first sight may not seem to be related to causality. It is interesting to see that the linguistic representa­ tion of time seems to be deeply interwoven with underlying causal models. This could be further substantiated by considering the semantics of the future tenses (e.g. will, be go­ ing to) and futurates, that is sentences that do not contain a future tense (they are play­ ing tomorrow). In this domain, the hypothesis space has also expanded from temporal and modal analyses of the different future forms to theories that make explicit reference to plans and goals—abstract objects that can only be evaluated relative to presupposed causal models specifying what should count as a normal sequence of events (cf. Copley, 2008; Hamm and van Lambalgen, 2005).

Conclusions We have reviewed linguistic and psycholinguistic work on a selection of causal expres­ sions in natural language. The first class were causative verbs such as open or kill, which are commonly analyzed with a decompositional structure involving a CAUSE predicate in their underlying semantic representation. Linguistic analyses highlight the importance of several dimensions constraining what can be expressed lexically and what requires more complex, analytic forms or causal expressions at the discourse level. First, we have seen that not all aspects of the decompositional analysis must be specified in the lexical entry of a given verb. Break, for instance, leaves open what kind of action was carried out to cause the resulting state. Thus, linguistic analysis demonstrates a fair amount of under­ specification. Second, we have seen that the linguistic system is sensitive to external (break) versus intrinsic (bloom) causation and uses argument structure as a guide to this distinction. In general, we find interesting correspondences between causation and syn­ tactic structure. Another dimension with clear reflexes in syntactic structure is direct ver­ sus indirect causation, mediating between causality that is lexically encoded in verbs and causality that has to be (p. 639) encoded structurally. Finally, we have shown that inten­ tionality, or rather agentivity, is an important factor in expressing causality and that causative verbs often carry selectional restrictions of animacy, but can also be restricted Page 34 of 48

Causality and Causal Reasoning in Natural Language to natural forces. The lexical semantics of causative verbs thus seems to closely mirror what has been observed in work in cognitive psychology on the acquisition of causality, both in the physical and in the social domain (see, e.g., Saxe & Carey, 2006, and Muentener & Bonawitz, Chapter 33 in this volume). Due to space limitations, we restrict­ ed the discussion to causative verbs. However, we would like to emphasize the existence of a rather flexible division of labor between causality expressed by verbs and causality expressed by other means, for instance, prepositional phrases, as witnessed by the exam­ ple involving a by phrase in (5). We also reviewed work on the expression of causality in discourse. Again, there exist a number of linguistic means, providing a rich inventory to express various aspects of causality. Causal discourse connectives are expressions devoted to express causal rela­ tions between sentences. We have seen that the connective because can specify explana­ tions of types rather different in nature. The reviewed work revealed remarkable differ­ ences between causal connectives such as since and because, which provides evidence for a rich system of different explanation types to be found in natural language. Further­ more, the discussion of explicitly and implicitly marked discourse relations revealed that in natural language discourse, causality is often not overtly expressed, but instead has to be inferred by the comprehender. Psycholinguistic work on text processing demonstrates that these causal inferences are automatically generated during online interpretation, yet also sometimes depend on overt marking of discourse relations. The section ended with the open question of whether causality expressed within the sentence interacts with causality at the discourse level (i.e., between sentences). If they do, this in turn raises the question of which form this interaction takes and whether there are differences between the two. The following section introduced the phenomenon of implicit causality. We used implicit causality to demonstrate the intricate interactions between causality within the sentence, on the one hand, and the larger discourse context, on the other. Based on a rich body of empirical work, we sketched an analysis of the phenomenon that takes implicit causality to be a genuine discourse phenomenon with clear interactions with the lexical semantics of verbs. Even though driven by the lexical semantics of verbs, it is the complex interplay with the semantics of the connective and the compositional interpretation of other sen­ tence ingredients that is required to fully understand discourse biases such as implicit causality. What is more, we were able to explain why some of the causal inferences ob­ served in language processing can happen so extremely fast and effortlessly. We have ar­ gued that besides the anticipation of causal content driven by lexical knowlegde there is, however, still a major role to be played by inferences relying on world or domain knowl­ edge. The linguistic information provided in natural language discourse is on its own al­ most never fully sufficient, but requires pragmatic enrichment of all sorts, among them inferences of discourse relations and co-reference. Often the required inferences can only be computed after two discourse units have been fully processed. In our view, it is thus not very surprising to find substantial variation in the timing of the establishing of causal connections in discourse processing. Page 35 of 48

Causality and Causal Reasoning in Natural Language We finished our discussion by looking at various constructions whose proper linguistic analysis crucially involves the concept of causality and requires reference to underlying causal models. Most prominent are (counterfactual) conditionals, which have figured as the prototypical construction for the expression of causal relations between sentences (i.e., causal connections between propositions). Linguistic analyses reveal that even though there is a close connection, they cannot simply be equated with each other since not every counterfactual statement is causal and vice versa. Besides conditionals, the se­ mantics of tense and aspect seems to be closely related to causality as well. We would like to close by pointing to a number of remaining questions to which we do not have a definite answer, but the joint study of which would benefit language science and cognitive science alike. First, it may be noted that it is debatable to what extent the presented phenomena in­ volve processes of causal reasoning in the narrow sense of inferring the presence of a causal relation. After all, if the verbs and connectives themselves encode causality, no rea­ soning is involved and no explanation must be inferred abductively (see, e.g., Lombrozo & Vasilyeva, Chapter 22 in this volume, on the concept of causal explanation). However, it may be argued that more fine-grained aspects of causal reasoning are involved: Was the causing event performed with the intention of causing another event to occur? Were there any intermediary steps involved in the (p. 640) causal chain not explicitly provided in the linguistic expression? Causal reasoning in the narrow sense, we believe, is crucial when morphology and syntax do not provide exact cues as to which linguistic units should be interpreted as standing in a causal relation. Second, if we assume all expressions of causation to involve the abstract predicate CAUSE at some level of representation, it is an interesting question to what extent we need to differentiate between various CAUSE predicates, or whether we are dealing with the same basic relation in all of these cases. Our attitude is that we should try to stick to one unified notion (for a different view, see Wolff et al., 2005). Thus, the causal relation assumed in semantics must be underspecified in order to allow for the different kinds of ontological categories with which it occurs (states, events, facts, objects, abstract ob­ jects). Finally, as discussed by Copley and Wolff (2014), models of causality and linguistic analy­ ses of expressions of causation could, in our opinion, gain much from one another. Lan­ guage offers a rich and varied picture of causal relations, but their analysis can only be made precise if they are related to precise models of causality. On the other hand, as we have tried to argue, expressions of causality are highly informative with regard to the conceptual distinctions relevant for causal reasoning.

Page 36 of 48

Causality and Causal Reasoning in Natural Language

Acknowledgments We would like to thank the editor, Michael Waldmann, as well as Fritz Hamm, Keith Sten­ ning, and an anonymous reviewer for insightful and helpful comments on an earlier ver­ sion of this chapter. Furthermore, we are grateful to Hans Kamp and Antje Roßdeutscher for discussing these issues with us on a number of occasions, and last, but not least, to Bridget Copley and the audience at the CNRS in Paris for discussion following an invited talk there. Needless to say, we alone are responsible for any mistakes or shortcomings in this chapter. This research was made possible by grants from the Deutsche Forschungs­ gemeinschaft to project B1 of Collaborative Research Centre (SFB) 833, “The Construc­ tion of Meaning: The Dynamics and Adaptivity of Linguistic Structures”, and the project “Composition in Context” as part of the Priority Programme 1727 “Xprag.de,” both at the University of Tübingen, as well as to the projects B4 and D1 of SFB 732, “Incremental Specification in Context,” at the University of Stuttgart. We also gratefully acknowledge the financial support from the Centre for Advanced Study (CAS) in Oslo, the Research Council of Norway (NFR project IS-DAAD 216850), the German Academic Exchange Ser­ vice (DAAD), and the German Federal Ministry of Education and Research (BMBF; Grant No. 01UG1411).

References Altmann, G., & Steedman, M. (1988). Interaction with context during human sentence processing. Cognition, 30(3), 191–238. Anscombe, G. E. M. (1963). Intention (2nd ed.). Oxford: Blackwell. Asher, N., & Lascarides, A. (2005). Logics of conversation. Cambridge: Cambridge Univer­ sity Press. Asr, F. T., & Demberg, V. (2012). Implicitness of discourse relations. In Proceedings of COLING 2012, pp. 2669–2684. Au, T. K.-F. (1986). A verb is worth a thousand words: The causes and consequences of in­ terpersonal events implicit in language. Journal of Memory and Language, 25(1), 104– 122. Austin, J. L. (1961). Ifs and cans. In J. L. Austin (Ed.), Philosophical papers (pp. 153–180). Oxford: Oxford University Press. Baggio, G., van Lambalgen, M., & Hagoort, P. (2008). Computing and recomputing dis­ course models: An ERP study. Journal of Memory and Language, 59(1), 36–53. Beavers, J., & Koontz-Garboden, A. (2012). Manner and result in the roots of verbal mean­ ing. Linguistic Inquiry, 43(3), 331–369.

Page 37 of 48

Causality and Causal Reasoning in Natural Language Beck, S. R., Riggs, K. J., & Burns, P. (2011). Multiple developments in counterfactual thinking. In C. Hoerl, T. McCormack, & S. R. Beck (Eds.), Understanding counterfactuals, understanding causation (pp. 110–122). Oxford: Oxford University Press. Bennett, J. (1994). The “namely” analysis of the “by”-locution. Linguistics and Philosophy, 17(1), 29–51. Bittner, M. (1999). Concealed causatives. Natural Language Semantics, 7(1), 1–78. Blakemore, D. (2002). Relevance and linguistic meaning: The semantics and pragmatics of discourse markers. Cambridge: Cambridge University Press. Bloom, L., Lahey, M., Hood, L., Lifter, K., & Fiess, K. (1980). Complex sentences: Acquisi­ tion of syntactic connectives and the semantic relations they encode. Journal of Child Lan­ guage, 7(2), 235–261. Blutner, R. (1998). Lexical pragmatics. Journal of Semantics, 15(2), 115–162. Bott, O., & Hamm, F. (2014). Cross-linguistic variation in the processing of aspect. In B. Hemforth, B. Schmiedtová, & C. Fabricius-Hansen (Eds.), Psycholinguistic approaches to meaning and understanding across languages (pp. 83–109). Studies in Theoretical Psy­ cholinguistics 44. New York: Springer. Bott, O., & Solstad, T. (2014). From verbs to discourse: A novel account of implicit causali­ ty. In B. Hemforth, B. Mertins, & C. Fabricius-Hansen (Eds.), Psycholinguistic approaches to meaning and understanding across languages (pp. 213–251). Studies in Theoretical Psycholinguistics 44. New York: Springer. (p. 641) Brown, R., & Fish, D. (1983). The psychological causality implicit in language. Cognition, 14(3), 237–273. Byrne, R. M. (2002). Mental models and counterfactual thoughts about what might have been. Trends in Cognitive Sciences, 6(10), 426–431. Caramazza, A., Grober, E., Garvey, C., & Yates, J. (1977). Comprehension of anaphoric pronouns. Journal of Verbal Learning and Verbal Behavior, 16(5), 601–609. Chierchia, G. (2004). A semantics for unaccusatives and its syntactic consequences. In A. Alexiadou, E. Anagnostopoulou, & M. Everaert (Eds.), The unaccusativity puzzle: Explo­ rations of the syntax-lexicon interface (pp. 22–59). Oxford: Oxford University Press. Claus, B. (2005). Hypothetische Situationen beim Textverstehen: Mentale Repräsentation beschriebener Wunschwelten. PhD thesis, TU Berlin, Berlin. Copley, B. (2008). The plan’s the thing: Deconstructing futurate meanings. Linguistic In­ quiry, 39(2), 261–274. Copley, B., & Harley, H. (2015). A force-theoretic framework for event structure. Linguis­ tics and Philosophy, 38(2), 103–158. Page 38 of 48

Causality and Causal Reasoning in Natural Language Copley, B., & Wolff, P. (2014). Theories of causation should inform linguistic theory and vice versa. In Copley, B. & Martin, F. (Eds.), Causation in Grammatical Structures, (pp. 1– 57). Oxford University Press, Oxford. Corrigan, R. (2001). Implicit causality in language: Event participants and their interac­ tions. Journal of Language and Social Psychology, 20(3), 285–320. Cozijn, R., Commandeur, E., Vonk, W., & Noordman, L. G. (2011). The time course of the use of implicit causality information in the processing of pronouns: A visual world para­ digm study. Journal of Memory and Language, 64(4), 381–403. Crinean, M., & Garnham, A. (2006). Implicit causality, implicit consequentiality and se­ mantic roles. Language and Cognitive Processes, 21(5), 636–648. Croft, W. (2012). Verbs: Aspect and causal structure. Oxford: Oxford University Press. Davidson, D. (2001). Actions, reasons, and causes. In D. Davidson (Ed.), Essays on actions and events (pp. 3–19). Oxford: Oxford University Press. de Almeida, R. G. (1999). The representation of lexical concepts: A psycholinguistic in­ quiry. PhD thesis, Rutgers University, New Brunswick, NJ. de Blijzer, F., & Noordman, L. G. (2000). On the processing of causal relations. In E. Couper-Kuhlen & B. Kortmann (Eds.), Cause, condition, concession, contrast: Cognitive and discourse perspectives (pp. 35–56). Berlin: Walter de Gruyter. Dixon, R. M. W. (2000). A typology of causatives: Form, syntax and meaning. In R. M. W. Dixon & A. Y. Aikhenvald (Eds.), Changing valency: Case studies in transitivity (pp. 30– 83). Cambridge: Cambridge University Press. Dowty, D. (1979). Word meaning and Montague grammar. Dordrecht: Reidel. Ehrlich, K. (1980). Comprehension of pronouns. The Quarterly Journal of Experimental Psychology, 32(2), 247–255. Evers-Vermeul, J. (2005). Connections between form and function of Dutch connectives: Change and acquisition as windows on form-function relations. PhD thesis, UiL OTS, Uni­ versiteit Utrecht. Evers-Vermeul, J., & Sanders, T. (2009). The emergence of Dutch connectives: How cumu­ lative cognitive complexity explains the order of acquisition. Journal of Child Language, 36(4), 829–854. Featherstone, C. R., & Sturt, P. (2010). Because there was a cause for concern: An investi­ gation into a word-specific prediction account of the implicit-causality effect. The Quar­ terly Journal of Experimental Psychology, 63(1), 3–15. Ferguson, H. J., & Sanford, A. J. (2008). Anomalies in real and counterfactual worlds: An eye-movement investigation. Journal of Memory and Language, 58(3), 609–626. Page 39 of 48

Causality and Causal Reasoning in Natural Language Ferguson, H. J., Sanford, A. J., & Leuthold, H. (2008). Eye-movements and ERPs reveal the time course of processing negation and remitting counterfactual worlds. Brain Research, 1236, 113–125. Ferguson, H. J., Scheepers, C., & Sanford, A. J. (2010). Expectations in counterfactual and theory of mind reasoning. Language and Cognitive Processes, 25(3), 297–346. Ferstl, E. C., Garnham, A., & Manouilidou, C. (2011). Implicit causality bias in English: A corpus of 300 verbs. Behavior Research Methods, 43(1), 124–135. Fine, K. (1975). Critical notice. Mind, 84(1), 451–458. Fletcher, C. R., & Bloom, C. P. (1988). Causal reasoning in the comprehension of simple narrative texts. Journal of Memory and Language, 27(3), 235–244. Fodor, J. A. (1970). Three reasons for not deriving “kill” from “cause to die.” Linguistic In­ quiry, 1(4), 429–438. Fodor, J. A., & Lepore, E. (2007). The emptiness of the lexicon: Reflections on James Pustejovsky’s The Generative Lexicon. Linguistic Inquiry, 29(2), 269–288. Garnham, A. (2001). Mental models and the interpretation of anaphora. Hove, UK: Psy­ chology Press. Garnham, A., Traxler, M., Oakhill, J., & Gernsbacher, M. A. (1996). The locus of implicit causality effects in comprehension. Journal of Memory and Language, 35(4), 517–543. Garvey, C., & Caramazza, A. (1974). Implicit causality in verbs. Linguistic Inquiry, 5(3), 459–464. Garvey, C., Caramazza, A., & Yates, J. (1974). Factors influencing assignment of pronoun antecedents. Cognition, 3(3), 227–243. Gennari, S., & Poeppel, D. (2003). Processing correlates of lexical semantic complexity. Cognition, 89(1), B27–B41. Goikoetxea, E., Pascual, G., & Acha, J. (2008). Normative study of the implicit causality of 100 interpersonal verbs in Spanish. Behavior Research Methods, 40(3), 760–772. Goldberg, A. E. (1995). Constructions: A construction grammar approach to argument structure. Chicago: University of Chicago Press. Goldberg, A. E. (2010). Verbs, constructions, and semantic frames. In M. Rappaport Hov­ av, E. Doron, & I. Sichel (Eds.), Lexical semantics, syntax, and event structure (pp. 39– 58). Oxford: Oxford University Press. Goldberg, A. E., & Jackendoff, R. (2004). The English resultative as a family of construc­ tions. Language, 80(3), 532–568.

Page 40 of 48

Causality and Causal Reasoning in Natural Language Goodman, N. D., Ullman, T. D., & Tenenbaum, J. B. (2011). Learning a theory of causality. Psychological Review, 118(1), 110. Guerry, M., Gimenes, M., Caplan, D., & Rigalleau, F. (2006). How long does it take to find a cause? An online investigation of implicit causality in sentence production. The Quarter­ ly Journal of Experimental Psychology, 59(9), 1535–1555. Haberlandt, K., & Bingham, G. (1978). Verbs contribute to the coherence of brief narra­ tives: Reading related and unrelated sentence triples. Journal of Verbal Learning and Ver­ bal Behavior, 17(4), 419–425. Halpern, J. Y., & Pearl, J. (2005). Causes and explanations: A structural-model approach. Part II: Explanations. The British Journal for the Philosophy of Science, 56(4), 889–911. (p. 642)

Hamm, F., & van Lambalgen, M. (2005). The proper treatment of events. Malden, MA: Blackwell. Hartshorne, J. K. (2014). What is implicit causality? Language and Cognitive Processes, 29(7), 761–898. Hartshorne, J. K., & Snedeker, J. (2013). Verb argument structure predicts implicit causal­ ity: The advantages of finer-grained semantics. Language and Cognitive Processes, 28(10), 1474–1508. Hartshorne, J. K., Sudo, Y., & Uruwashi, M. (2013). Are implicit causality pronoun resolu­ tion biases consistent across languages and cultures? Experimental Psychology, 60(3), 179–196. Haspelmath, M. (1993). More on the typology of inchoative/causative verb alternations. In B. Comrie & M. Polinsky (Eds.), Causatives and transitivity, Vol. 23 of Studies in Lan­ guage Companion Series (pp. 87–120). Amsterdam: John Benjamins. Hiddleston, E. (2005). A causal theory of counterfactuals. Noûs, 39(4), 632–657. Hobbs, J. R. (1979). Coherence and coreference. Cognitive Science, 3(1), 67–90. Hoerl, C., McCormack, T., & Beck, S. R. (Eds.) (2011). Understanding counterfactuals, un­ derstanding causation. Oxford: Oxford University Press. Hume, D. (1748/2007). An enquiry concerning human understanding. Cambridge: Cam­ bridge University Press. Johnson-Laird, P. N., & Byrne, R. M. (2002). Conditionals: A theory of meaning, pragmat­ ics, and inference. Psychological Review, 109(4), 646. Kamide, Y. (2008). Anticipatory processes in sentence processing. Language and Linguis­ tics Compass, 2(4), 647–670. Page 41 of 48

Causality and Causal Reasoning in Natural Language Kanazawa, M., Kaufmann, S., & Peters, S. (2005). On the lumping semantics of counter­ factuals. Journal of Semantics, 22(2), 129–151. Keenan, J. M., Baillet, S. D., & Brown, P. (1984). The effects of causal cohesion on compre­ hension and memory. Journal of Verbal Learning and Verbal Behavior, 23(2), 115–126. Kehler, A. (2002). Coherence, reference, and the theory of grammar. Stanford, CA: CSLI Publications. Kehler, A., Kertz, L., Rohde, H., & Elman, J. L. (2008). Coherence and coreference revisit­ ed. Journal of Semantics, 25(1), 1–44. Kim, J. (1973). Causes and counterfactuals. The Journal of Philosophy, 70(17), 570–572. Kipper-Schuler, K. (2006). VerbNet: A broad-coverage, comprehensive verb lexicon. PhD thesis, University of Pennsylvania. Koornneef, A., & Sanders, T. (2013). Establishing coherence relations in discourse: The in­ fluence of implicit causality and connectives on pronoun resolution. Language and Cogni­ tive Processes, 28(8), 1169–1206. Koornneef, A. W., & van Berkum, J. J. A. (2006). On the use of verb-based implicit causali­ ty in sentence comprehension: Evidence from self-paced reading and eye tracking. Jour­ nal of Memory and Language, 54(4), 445–465. Kratzer, A. (1989). An investigation of the lumps of thought. Linguistics and Philosophy, 12(5), 607–653. Kratzer, A. (2005). Building resultatives. In C. Maienborn & A. Wöllstein-Leisten (Eds.), Events in syntax, semantics, and discourse (pp. 177–212). Tübingen: Niemeyer. Kuperberg, G. R., Paczynski, M., & Ditman, T. (2011). Establishing causal coherence across sentences: An ERP study. Journal of Cognitive Neuroscience, 23(5), 1230–1246. Levin, B., & Hovav, M. R. (1995). Unaccusativity: At the syntax-lexical semantics interface. Cambridge, MA: MIT Press. Lewis, D. (1973). Counterfactuals. Cambridge, MA: Harvard University Press. Lewis, D. (1979). Counterfactual dependence and time’s arrow. Noûs, 13(4), 418–446. Luck, S. J. (2014). An introduction to the Event-Related Potential Technique. Cambridge, MA: MIT Press. Magliano, J. P., Baggett, W. B., Johnson, B. K., & Graesser, A. C. (1993). The time course of generating causal antecedent and causal consequence inferences. Discourse Processes, 16(1–2), 35–53.

Page 42 of 48

Causality and Causal Reasoning in Natural Language Malle, B. F. (1999). How people explain behavior: A new theoretical framework. Personali­ ty and Social Psychology Review, 3(1), 23–48. Malle, B. F. (2002). Verbs of interpersonal causality and the folk theory of mind and be­ havior. In M. Shibatani (Ed.), The grammar of causation and interpersonal manipulation (pp. 57–83). Amsterdam; Philadelphia: John Benjamins. Mandel, D. R., & Lehman, D. R. (1996). Counterfactual thinking and ascriptions of cause and preventability. Journal of Personality and Social Psychology, 71(3), 450. Mann, W. C., & Thompson, S. A. (1986). Relational propositions in discourse. Discourse Processes, 9(1), 57–90. Martin, F., & Schäfer, F. (2014). Causation at the syntax-semantics interface. In B. Copley & F. Martin (Eds.), Causation in grammatical structures (pp. 209–244). Oxford: Oxford University Press. McCawley, J. D. (1978). Conversational implicature and the lexicon. In P. Cole (Ed.), Prag­ matics, Vol. 9 of Syntax and semantics (pp. 245–259). New York: Academic Press. McKoon, G., & Love, J. (2011). Verbs in the lexicon: Why is hitting easier than breaking? Language and Cognition, 3(2), 313–330. McKoon, G., & MacFarland, T. (2000). Externally and internally caused change of state verbs. Language, 76(4), 833–858. Menzies, P. (2011). The role of counterfactual dependence in causal judgements. In C. Ho­ erl, T. McCormack, & S. R. Beck (Eds.), Understanding counterfactuals, understanding causation (pp. 186–207). Oxford: Oxford University Press. Menzies, P. (2014). Counterfactual theories of causation. In E. N. Zalta (Ed.), The Stan­ ford encyclopedia of philosophy. Stanford, CA: CSLIy. Millis, K. K., Golding, J. M., & Barker, G. (1995). Causal connectives increase inference generation. Discourse Processes, 20(1), 29–49. Mobayyen, F., & de Almeida, R. G. (2005). The influence of semantic and morphological complexity of verbs on sentence recall: Implications for the nature of conceptual repre­ sentation and category-specific deficits. Brain and Cognition, 57(2), 168–171. Mohamed, M. T., & Clifton, C. J. (2007). Processing inferential causal statements: Theoret­ ical refinements and the role of verb type. Discourse Processes, 45(1), 24–51. Neeleman, A., & van de Koot, H. (2012). The linguistic expression of causation. In M. Everaert, M. Marelj, & T. Siloni (Eds.), The theta system (pp. 20–51). Oxford: Oxford Uni­ versity Press.

Page 43 of 48

Causality and Causal Reasoning in Natural Language Nieuwland, M. S., & Martin, A. E. (2012). If the real world were irrelevant, so to speak: The role of propositional truth-value in counterfactual sentence comprehension. Cognition, 122(1), 102–109. Noordman, L. G., Vonk, W., Cozijn, R., & Frank, S. (2015). Causal inferences and world knowledge. In E. O’Brien, A. Cook, A., & R. Lorch (Eds.), Inferences during reading (pp. 260–289). Cambridge: Cambridge University Press. (p. 643) Pander Maat, H., & Sanders, T. (2000). Domains of use or subjectivity? The distribution of three Dutch causal connectives explained. In E. Couper-Kuhlen & B. Kortmann (Eds.), Cause, condition, concession, contrast: Cognitive and discourse perspectives (pp. 57–82). Berlin; New York: Mouton de Gruyter. Parsons, T. (1990). Events in the semantics of English: A study in subatomic semantics. Cambridge, MA: MIT Press. Pearl, J. (2000). Causality: Models, reasoning and inference. New York: Cambridge Uni­ versity Press. Pickering, M., & Majid, A. (2007). What are implicit causality and implicit consequentiali­ ty? Language and Cognitive Processes, 22(5), 780–788. Pinker, S. (1989). Learnability and cognition: The acquisition of argument structure. Cam­ bridge, MA: MIT Press. Pustejovsky, J. (1995). The generative lexicon. Cambridge, MA: MIT Press. Pyykkönen, P., & Järvikivi, J. (2010). Activation and persistence of implicit causality infor­ mation in spoken language comprehension. Experimental Psychology, 57(1), 5–16. Rappaport Hovav, M. (2014). Lexical content and context: The causative alternation in English revisited. Lingua, 141, 8–29. Rappaport Hovav, M., & Levin, B. (2010). Reflections on manner/result complementarity. In M. Rappaport Hovav, E. Doron, & L. Sichel (Eds.), Lexical semantics, syntax, and event structure (pp. 21–38). Oxford: Oxford University Press. Rips, L. J. (2010). Two causal theories of counterfactual conditionals. Cognitive Science, 34(2), 175–221. Rudolph, U. (1997). Implicit verb causality: Verbal schemas and covariation information. Journal of Language and Social Psychology, 16(2), 132–158. Rudolph, U., & F. Försterling, F. (1997). The psychological causality implicit in verbs: A re­ view. Psychological Bulletin, 121(2), 192–218. Sanders, T. J. (2005). Coherence, causality and cognitive complexity in discourse. In Pro­ ceedings/Actes SEM-05: First international symposium on the exploration and modelling of meaning (pp. 105–114). Page 44 of 48

Causality and Causal Reasoning in Natural Language Sanders, T. J., & Noordman, L. G. (2000). The role of coherence relations and their lin­ guistic markers in text processing. Discourse Processes, 29(1), 37–60. Saxe, R., & Carey, S. (2006). The perception of causality in infancy. Acta Psychologica, 123(1-2), 144–165. Schäfer, F. (2009). The causative alternation. Language and Linguistics Compass, 3(2), 641–681. Schulz, K. (2007). Minimal models in semantics and pragmatics: Free choice, exhaustivi­ ty, and conditionals. PhD thesis, ILLC Dissertation Series. Schulz, K. (2011). If you’d wiggled a, then b would’ve changed. Synthese, 179(2), 239– 251. Shibatani, M. (1976). The grammar of causative constructions: A conspectus. In M. Shi­ batani (Ed.), The grammar of causative constructions, Vol. 6 of Syntax and semantics (pp. 1–40). New York: Academic Press. Shibatani, M. (2002). Introduction: Some basic issues in the grammar of causation. In M. Shibatani (Ed.), The grammar of causation and interpersonal manipulation (pp. 1–22). Amsterdam: John Benjamins. Shibatani, M., & Pardeshi, P. (2002). The causative continuum. In M. Shibatani (Ed.), The grammar of causation and interpersonal manipulation (pp. 85–126). Amsterdam: John Benjamins. Solstad, T. (2007). Lexical pragmatics and unification: The semantics of German causal “durch” (“through”). Research on Language and Computation, 5(4), 481–502. Solstad, T. (2009). On the implicitness of arguments in event passives. In A. Schardl, M. Walkow, & M. Abdurrahman (Eds.), Proceedings of the 38th annual meeting of the North East Linguistic Society (Vol. 2, pp. 365–374). Amherst: GLSA Publications. Solstad, T. (2010). Some new observations on “because (of).” In M. Aloni, H. Bastiaanse, T. de Jager, & K. Schulz (Eds.), Amsterdam Colloquium 2009, Lecture Notes in Computer Science (pp. 436–445). Berlin; Heidelberg: Springer. Solstad, T. (2016). Lexikalische Semantik im Kontext: Die Spezifikation kausaler Relatio­ nen am Beispiel von “durch.” Tübingen: Stauffenburg. Solstad, T., & Bott, O. (2013). Towards a formal theory of explanatory biases in discourse. In M. Aloni, M. Franke, & F. Roelofsen (Ed.), Proceedings of the Amsterdam Colloquium 2013 (pp. 203–210). Amsterdam: ILLC, University of Amsterdam. Spooren, W. P., & Sanders, T. J. (2008). The acquisition order of coherence relations: On cognitive complexity in discourse. Journal of Pragmatics, 40(12), 2003–2026.

Page 45 of 48

Causality and Causal Reasoning in Natural Language Stalnaker, R. (1968). A theory of conditionals. American Philosophical Quarterly, 2, 98– 112. Stalnaker, R., & Thomason, R. (1970). A semantic analysis of conditional logic. Linguistic Inquiry, 4(2), 23–42. Stevenson, R., Knott, A., Oberlander, J., & McDonald, S. (2000). Interpreting pronouns and connectives: Interactions among focusing, thematic roles and coherence relations. Language and Cognitive Processes, 15(3), 225–262. Stevenson, R. J., Crawley, R. A., & Kleinman, D. (1994). Thematic roles, focus and the rep­ resentation of events. Language and Cognitive Processes, 9(4), 519–548. Stewart, A. J., Haigh, M., & Kidd, E. (2009). An investigation into the online processing of counterfactual and indicative conditionals. The Quarterly Journal of Experimental Psy­ chology, 62(11), 2113–2125. Stewart, A. J., Pickering, M. J., & Sanford, A. J. (2000). The time course of the influence of implicit causality information: Focusing versus integration accounts. Journal of Memory and Language, 42(3), 423–443. Sweetser, E. (1990). From etymology to pragmatics: The mind-body metaphor in semantic structure and semantic change. Cambridge: Cambridge University Press. Talmy, L. (1985). Lexicalization patterns: Semantic structure in lexical forms. In T. Shopen (Ed.), Language typology and syntactic description III: Grammatical categories and the lexicon (pp. 57–149). Cambridge: Cambridge University Press. Talmy, L. (2000). Towards a cognitive semantics, Vol. 1: Concept structuring systems. Cambridge, MA: MIT Press. Trabasso, T., & van den Broek, P. (1985). Causal thinking and the representation of narra­ tive events. Journal of Memory and Language, 24(5), 612–630. Trabasso, T., van den Broek, P., & Suh, S. Y. (1989). Logical necessity and transitivity of causal relations in stories. Discourse Processes, 12(1), 1–25. Traxler, M. J., Bybee, M. D., & Pickering, M. J. (1997). Influence of connectives on lan­ guage comprehension: Eye tracking evidence for incremental interpretation. The Quarter­ ly Journal of Experimental Psychology: Section A, 50(3), 481–497. van Berkum, J. J. A., Koornneef, A. W., Otten, M., & Nieuwland, M. S. (2007). Establishing reference in language comprehension: An electrophysiological perspective. Brain Re­ search, 1146, 158–171. (p. 644) van den Broek, P. (1990). The causal inference maker: Towards a process model of infer­ ence generation in text comprehension. In D. A. Balota, G. B. Flores d’Arcais, & K. Rayner

Page 46 of 48

Causality and Causal Reasoning in Natural Language (Eds.), Comprehension processes in reading (pp. 423–445). Hillsdale, NJ: Lawrence Erl­ baum Associates. van den Broek, P., & Gustafson, M. (1999). Comprehension and memory for texts: Three generations of reading research. In S. R. Goldman, A. C. Graesser, & P. van den Broek (Eds.), Narrative comprehension, causality, and coherence: Essays in honor of Tom Tra­ basso (pp. 15–34). Mahwah, NJ: Lawrence Erlbaum Associates. van der Sandt, R. A. (1992). Presupposition projection as anaphora resolution. Journal of Semantics, 9(4), 333–377. Vendler, Z. (1957). Verbs and times. The Philosophical Review, 66(2), 143–160. von Stechow, A. (1996). The different readings of wieder ‘again’: A structural account. Journal of Semantics, 13(2), 87–138. Warglien, M., Gärdenfors, P., & Westera, M. (2012). Event structure, conceptual spaces and the semantics of verbs. Theoretical Linguistics, 38(3–4), 159–193. Wolff, P. (2003). Direct causation in the linguistic coding and individuation of causal events. Cognition, 88(1), 1–48. Wolff, P. (2007). Representing causation. Journal of Experimental Psychology: General, 136(1), 82–111. Wolff, P., Jeon, G.-H., Klettke, B., & Yu, L. (2010). Force creation and possible causers across languages. In B. C. Malt & P. Wolff (Eds.), Words and the mind: How words capture human experience (pp. 93–110). Oxford: Oxford University Press. Wolff, P., Klettke, B., Ventura, T., & Song, G. (2005). Expressing causation in English and other languages. In W.-K. Ahn, R. L. Goldstone, B. C. Love, A. B. Markman, & P. Wolff (Eds.), Categorization inside and outside of the lab: Festschrift in honor of Douglas L. Medin (pp. 29–48). Washington, DC: APA. http://psycnet.apa.org/psycinfo/ 2005-03411-003 Wolff, P., & Song, G. (2003). Models of causation and the semantics of causal verbs. Cog­ nitive Psychology, 47(3), 276–332. Wunderlich, D. (1997). Cause and the structure of verbs. Linguistic Inquiry, 28(1), 27–68.

Notes: (1.) The interventionist treatment of the pre-emption example just described has to be amended by some mechanism of temporal interpretation. The reason for this is that intu­ itions regarding the truth of the counterfactual are quite different depending on whether we consider a situation immediately after the pre-empted shot of assassin 1 or a situation with some time between, sufficiently long for assassin 2 to act (see also Pearl, 2000, p.

Page 47 of 48

Causality and Causal Reasoning in Natural Language 325 ff.). We are grateful to Fritz Hamm (personal communication) for pointing this out to us.

Torgrim Solstad

Centre for General Linguistics (ZAS) Berlin, Germany Oliver Bott

University of Tübingen Tübingen, Germany

Page 48 of 48

Social Attribution and Explanation

Social Attribution and Explanation   Denis Hilton The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.33

Abstract and Keywords Attribution processes appear to be an integral part of human visual perception, as lowlevel inferences of causality and intentionality appear to be automatic and are supported by specific brain systems. However, higher-order attribution processes use information held in memory or made present at the time of judgment. While attribution processes about social objects are sometimes biased, there is scope for partial correction. This chapter reviews work on the generation, communication, and interpretation of complex explanations, with reference to explanation-based models of text understanding that re­ sult in situation models of narratives. It distinguishes between causal connection and causal selection, and suggests that a factor will be discounted if it is not perceived to be connected to the event and backgrounded if it is perceived to be causally connected to that event, but is not selected as relevant to an explanation. The final section focuses on how interpersonal explanation processes constrain causal selection. Keywords: attribution, causality, intentionality, memory, communication

Social perception is pervasive in social animals. Charles Darwin told a touching story about his dog (in his view also an intelligent and social animal) that illustrates the point: I once noticed my dog, a full-grown and very sensible1 animal, was lying on the lawn during a hot and still day; but at a little distance a slight breeze occasionally moved an open parasol, which would have been wholly disregarded by the dog, had anyone stood near it. As it was, every time that the parasol slightly moved, the dog growled fiercely and barked. He must, I think, have reasoned to himself in a rapid and unconscious manner, that movement without any apparent cause indi­ cated the presence of some strange living agent, and that no stranger had a right to be on his territory. (1879/2004, p. 118) It seems that Darwin’s dog passed in the blink of an indignant eye from an attribution of intentionality (“a strange living agent”) to a deontic judgment (“no stranger had a right to be on his territory”). It is this move from “is” to “ought” that led Darwin to include this in­

Page 1 of 49

Social Attribution and Explanation cident in his chapter on moral sense in The Descent of Man, and his point was of course to point out the continuities between animal and human cognition. As we shall see in the following, human beings share a similar propensity to make infer­ ences of intentionality from certain visual patterns of movement. And when we attribute intentionality to another being (whether human, animal, or robot), we are also making a social categorization—that person, animal, or robot belongs to the same class of sentient beings as ourselves. As we shall see, we may be biased in our perceptions of others’ in­ tentions according to whether we see them as a member of our ingroup, or a negatively evaluated outgroup. Intersubjectivity also becomes possible, and Darwin’s dog indeed at­ tempted to communicate with this strange agent by barking and trying (p. 646) to make it go away. This strategy is often successful: aggressive barks often do give people or other animals reason to change their behavior by warily retiring from the scene. In this sense, dogs show the some of the same capacity for perceiving social objects as we do, which lays the foundations for taking (and communicating) either a friendly or a hostile attitude to that social object. The fact that we ourselves are social animals means that understanding another person’s behavior is different from understanding how physical processes work. If a safety system fails to send an alarm when a dangerous situation arises (which it should normally do), we might consider physical causes, such as whether there is a fault in the electrical cir­ cuit or energy supply and so on. But if we learn that someone deliberately disabled the safety system, we might also consider “intentional” causes that have to do with the agent’s beliefs, abilities, attitudes, desires, and so on (Heider, 1958; Malle, 2004). Did this agent disable the safety system by mistake? If not, what was her reason for disabling the system? Was it friendly (e.g., to stop us from being interrupted incessantly by a disturb­ ing noise), ambiguous (to save energy), or hostile (to put us in harm’s way)? Knowing that the signal system was disabled by a voluntary action is not enough; social perceivers want to know the goal served by that action (e.g., to bring about a train accident). For this reason, actions that deliberately produced the result in question (e.g., a train accident) are seen as more causally important than the same voluntary action performed without that intention or a physical event (e.g., an electrical fault) that produces the same result (Hilton, McClure, & Moir, 2016). Reasons citing human agency are not just different from physical causes (Buss, 1978); reasons are evaluated as more important causes. As Darwin recognized, humans and dogs are synergetic species who collaborate with each other, and who can—to a significant extent—communicate with each other and rec­ ognize the other’s intentions. But the differences between the species are also illuminat­ ing. What perhaps most clearly differentiates dogs from humans is the capacity to repre­ sent entities and events that are physically absent (Gärdenfors, 2003), coupled with the existence of a cultural system (language, pictures, writing systems, and other forms of symbolic communication) to communicate about those entities and events. These symbol­ ic systems provide a form of intellectual support that allows humans to pass their discov­ eries from one generation to the next, rendering each new generation the smartest one yet. In particular, following Grice’s (1957) distinction between natural and non-natural Page 2 of 49

Social Attribution and Explanation meaning, we will distinguish cases where information has not been pre-structured for the causal inquirer (e.g., in visual perception of movement, medical diagnosis) from cases where it is intentionally presented using social conventions (as when we use verbal de­ scriptions in the form of stories, newspaper articles, experimental scenarios). We will also draw attention to ways in which social conventions (e.g., Grice’s rules of conversation) may govern the formulation and interpretation of verbally given causal explanations. These rules of conversation apply to various social media that rely on social conventions for their interpretation: subtle changes in the way information is rendered salient or backgrounded in a story or a graph (e.g., Sedlmeier & Hilton, 2012) can have a signifi­ cant impact on the causal inferences drawn from the information given.

Plan of This Chapter The first part of the chapter focuses on the cognitive psychology of social objects with which we can—in principle—communicate. It is organized in terms of a hierarchy of so­ cial inferences that move from low-level understanding of action in response to sensory input to higher-level attribution of goals and dispositions to agents, often based on verbal communications. We highlight research in social psychology and neuroscience which sug­ gests that initial phases of social perception on the basis of sensory input appear to be au­ tomatic and “hardwired,” in contrast to higher-level “attributional” inferences that re­ quire more information than is directly available in the visual field. We go on to examine what information is required for major kinds of social inference, and how certain biases in the way people go beyond the information given may be corrected by appropriate inter­ ventions. In the second part of the chapter, we turn to the study of higher mental processes, such as constructing a complex mental representation of how an event came about, telling a story based on that representation, and understanding that narrative. While the first stage of constructing a complex representation may be a purely cognitive process (al­ though it may rely on socially provided information from witnesses, etc.), storytelling and understanding are likely to be constrained by socially shared conventions about coopera­ tive social interaction, such as expressed in Grice’s (1975) rules of conversation. Hearers of stories are entitled to expect (unless told (p. 647) otherwise) that what they are being told is true (or probably true), relevant to understanding the point of the story (e.g., how a war came about, how an invention was discovered, etc.), and so on. In addition, as we shall see, stories describing human behavior have been shown to be understood using general knowledge about scripts, plans, goals, and behavior (Schank & Abelson, 1977), which yield situation models whose backbone incorporates information about causal de­ pendencies between states and events (Graesser, Singer, & Trabasso, 1994). In the third part of the chapter, we explore ways in which interpersonally given explana­ tions may be constrained by intersubjective processes. The communication and reception of explanations in a conversation about causes requires a careful attention to conversa­ tional pragmatics, as a “good” explanation will not only have to be true or probably true, Page 3 of 49

Social Attribution and Explanation it will have to be relevant, informative, and clear to the inquirer (Hilton, 1990, 1991; Hilton & Slugoski, 2001). Using this perspective, we go on to examine how people select “the” cause of an event from a complex causal chain as the most relevant response to give to a causal question about an event posed by another person in conversation. We show how contrast cases determine the focus of a causal question, which in turn con­ strains the relevance of a causal explanation, and consider how causal questions (e.g., why vs. how) are sensitive to goal hierarchies in narrative structure. All three parts of this chapter have their roots in research paradigms that existed well be­ fore the current wave of interest in causal reasoning in cognitive science that has moti­ vated much of the research reported in this volume. One aim of this chapter will be to sit­ uate research in social psychology and discourse processing with respect to current pre­ occupations in cognitive science with a view to highlighting ways in the research para­ digms may mutually inform each other. For related (and partly overlapping) treatments of these issues with a social psychological audience in mind, see Hilton (2007, 2012).

Social Perception People seem to have a pre-disposition to perceiving events as motivated by intentions (see White, Chapter 14 in this volume). Early work in both anthropology and psychology con­ firmed the natural inclination toward animistic thinking in non-Western societies and in children (Evans-Pritchard, 1965; Piaget, 1954), and it is not hard to find similar examples in Western adults. For example, early researchers in artificial intelligence (e.g., Colby, 1973; Weizenbaum, 1966) were struck by how readily users would attribute deeper mean­ ings to sequences of computer-generated dialogue that were generated by simple syntac­ tic rules of dialogue. The day-to-day movement of stockmarkets seems to be quite unpre­ dictable (Taleb, 2001), yet analysts seem to quite readily come up with post hoc explana­ tions with the benefit of hindsight (e.g., “the market reacted to the central bank’s an­ nouncement”) if there is a substantial movement in the index following an abnormal event (Fischhoff, 1982). Conversely, if there is no movement in the market in response to impor­ tant news, this can still be explained (e.g., “the market had already priced in the central bank’s announcement”). Other work confirms that adults still have a tendency to at­ tribute events to intentional actions, even in the absence of a plausible causal mechanism linking cause and effect (e.g., Callan et al., 2014; Lalljee, Brown, & Hilton, 1990). This prompts the question: Are people predisposed to make animistic inferences of this kind?

The Structure and Neurological Bases of Social Perception Heider and Simmel (1944) performed a remarkable demonstration of the social nature of everyday motion perception. They created a cartoon in which geometric figures (like a large triangle, a small triangle and a circle) moved somewhat erratically around a black frame, before the black frame disintegrated at the end of the sequence. The forms (trian­ gles, circle) they used bore a considerable similarity to those posited by Gestalt psycholo­ gists and artists in the Bauhaus school (with whom they were in touch) to be basic to vi­ Page 4 of 49

Social Attribution and Explanation sual perception. Artists and cartoonists have long known how to use forms to convey movement; the Heider and Simmel film can be compared to a Kandinsky composition put into motion (see Figure 32.1). People who watched this cartoon were asked to describe the movements. A remarkable 49 out of 50 observers described the dots’ movements in the language of description for human behavior: the dots were described as chasing, fol­ lowing, avoiding, repelling each other, and so on. Oatley and Yuill (1984) showed a ver­ sion of the cartoon to participants with titles that expressed higher-order themes such as “jealous lover.” The observers had no difficulty in attributing emotions such as anger to the figures that “explained” why they then went on to destroy the “house” (the frame that disintegrated). (p. 648)

Figure 32.1 Above: Basic visual forms of the Bauhaus group, from Nina Kandinsky’s private col­ lection and as presented in their 1923 exhibition. Be­ low: Kandinsky’s Komposition VIII (1923) and Heider and Simmel’s (1944) Figure 1 (reproduced from Hilton, 2012).

Premack and Premack (1995) argued that development of a “theory of mind” from infancy to early childhood will go through three stages. These are the following: automatic activa­ tion by movement of the perception that an agent is intentional; interpretation of interac­ tions of moving objects as “helping” or “hindering” each other; and the development of a “theory of mind” similar to adults at about the age of four years with the predicates see, desire, and believe. Research in social cognitive neuroscience suggests that human perception of goals is in­ deed hierarchically organized and that this faculty is distributed in two distinct and par­ tially independent brain circuits, the mirror neurone system (MNS) and the mentalizing (ToM or “theory-of-mind”) system (van Overwalle & Baetens, 2009). According to these authors (see Figure 32.2) “we can discriminate motions (e.g., opening the hand), actions (often the conjunction of a motion sequence with an object, e.g., reaching or grasping a cookie), immediate goals (e.g., take a cookie), and task goals (e.g., prepare a snack). In a

Page 5 of 49

Social Attribution and Explanation social context, goals often imply longer perspectives and are often termed intentions (e.g., stay good friends, live a happy life).” Van Overwalle and Baetens conclude from their meta-analysis that the mirror neurone system (MNS) is involved in lower-level action understanding from directly observable sensory input and primarily recruits the anterior intraparietal sulcus (aIPS) and the pre­ motor cortex (PMC). On the other hand, the mentalizing system primarily recruits the precuneus (PC), temporal-parietal junction (TPJ) and the medial prefrontal cortex (mPFC) (Figure 32.3). Moreover, based on a another meta-analysis of the theory-of-mind (ToM) system, van Overwalle (2009) suggested that the TPJ is primarily involved in transient in­ ferences about other people (such as their goals, desires, and beliefs), whereas the mPFC subserves the attribution of more enduring qualities about the self and others, such as traits and dispositions. Importantly, the ToM system is not dependent on sensory (e.g., vi­ sual input) and can be activated by verbal materials in the form of narratives, stories, sce­ nario descriptions, and so on. Another illustration of (p. 649) the partial independence of these systems is given by Spunt and Lieberman (2013), who show that a cognitive load manipulation does not affect the mirror-neurone system (MNS) involved in action under­ standing (the “how”), but does affect the mentalizing system involved in inference of higher-order goals (the “why”).

Figure 32.2 Hierarchical organization of goals. A prospective social intention (anticipating a social in­ teraction) may involve several (private) tasks goals, each of which may be composed of immediate goals, each of which requires a sequence of basic actions, and each of which is associated with an action that is composed of several movements. Adapted with permission from Figure 1 of Hamilton and Grafton (2006), copyright 2006 by Society for Neuroscience, with examples from Ciaramidaro et al. (2007, pp. 3106–3107).

Page 6 of 49

Social Attribution and Explanation

Figure 32.3 The regions of interest involved in the mirror and mentalizing system placed in an x–y–z Talairach atlas. The regions are drawn based on the recent litera­ ture, and, in particular, on Keysers and Gazzola (2006, Figure 1b) for the mirror system and on Van Overwalle (2009, Figure 1b) for the mentalizing sys­ tem.

While both the mirror and the mentalizing systems may be expected to be universal in hu­ mans, there is some evidence that the lower-level faculties of the mirror system may be less modifiable by learning and cultural experience. Morris, Nisbett, and Peng (1995) report no difference between Americans and Chinese in perceiving internal force, exter­ nal force, and animacy of moving dots in Michotte-style entraining and launching move­ ments. In contrast, cultural differences emerge in inferences that may be expected to re­ cruit the mentalizing system. For example, when Morris et al. presented similar displays using fish (either alone or in groups), they did observe differences between Americans and Chinese; Americans tended to perceive that the fish moved more under internal influ­ ence, whereas the Chinese were more likely to perceive that movement was due to exter­ nal influence. Cultural differences also emerge when Americans and Chinese are asked to give verbal explanations of launching movements (Peng & Knowles, 2004): Americans were more likely to attribute the dot’s movements to “dispositional explanations” (e.g., weight and mass, composition, inertia, shape), whereas Chinese were more likely to pro­ duce “contextual explanations” (p. 650) (e.g., the other object, gravity, friction, wind). Ben­ der, Beller and Medin (Chapter 35 in this volume) review work suggesting that while all cultures learn “higher-order” forms of mentalizing (such as measured by success of a false-belief task), children seem to show this ability earlier in some cultures than others. Finally, Miller (1984) reported that while young American and Indian children do not dif­ fer in their propensity to produce dispositional or situational explanations of social behav­ ior, older American children and adults progressively favor dispositional explanations, while Indians progressively favor situational ones. As White argues in Chapter 14 of this volume, the visual world is already interpreted in terms of causality before causal reasoning processes operate on this representation The lower-level attribution of intention and causality to objects that appear to our sensory sys­ tems to be self-propelled seems to be hard-wired in human beings, performed by the mir­ ror system, and activated by specific visual patterns such as the Michotte launching ef­ fect (Blakemore et al., 2001). Higher-level attributions of intentionality and social infer­ ences are performed by the mentalizing system, can be activated independently by verbal Page 7 of 49

Social Attribution and Explanation descriptions in the form of narratives, and so on, and may be more susceptible to modifi­ cation by experience and cultural influences. In the following, we review research from social cognition that suggests that there is indeed a hierarchy of social inferences, and that the higher-level inferences are more likely to be deliberate.

Social Attribution Malle and Holbrook (2012, Experiment 1) noted that previous studies had presented par­ ticipants with target behaviors that tended either to elicit goal inferences (e.g., Hassin et al., 2005) or trait inferences (e.g., Winter & Uleman, 1984). They therefore generated a more representative range of verbally described social behaviors to their participants, which elicited an intermediate number of trait and goal inferences. They found that al­ most all of this set of behaviors elicited the judgment that they were intentional (“Did the person INTENTIONALLY perform the behavior?”: 84%) and revealed something about the actor’s desire (“Did the behavior reveal a certain GOAL the actor has?”: 82%), but that fewer were judged as revealing something about the actor’s beliefs (“Did the behavior re­ veal what the main actor was THINKING in this situation? Consider THINKING very broadly—what the actor was thinking, seeing, hearing, what s/he believed, knew, was aware of, etc.”: 69%) and personality (“Did the behavior reveal a certain PERSONALITY characteristic the actor has?”: 46%). Analysis of response times suggested that judg­ ments of the intentionality of a behavior are generally made faster (M = 1.49 seconds) than judgments about whether it reveals something about the desires (M = 1.59), beliefs (M = 1.75) or personality (M = 1.94) of the actor. Malle and Holbrook also used the same procedure on the sets of behaviors originally used to study goal inferences (Hassin et al., 2005) and trait inferences (Winter & Uleman, 1984). Importantly, the pattern of response times varies as a function of stimulus type (Malle & Holbrook, Experiment 1; see also Fig­ ure 32.4 based on their Experiment 3 using visual stimuli). Thus goal-tailored stimuli seem to facilitate intentionality inferences, and trait-tailored stimuli personality infer­ ences. Malle and Holbrook’s results aligned with those found in an earlier study by Smith and Miller (1983), (p. 651) which used verbally described behaviors. Thus judgments about the intentionality of the behavior (“Did the person INTEND to perform the action described in the sentence?”) were judged faster (M = 2.41 seconds) than questions about whether the protagonist would repeat the action (“Will the person REPEAT the action described in the sentence,”: M = 2.76 seconds). These in turn were judged faster than questions about personal causality (“Did something about the PERSON cause the action described in the sentence?”: M = 3.42 seconds) or situational causality (“Did something about the SITU­ ATION cause the action described in the sentence?”: M = 3.80 seconds). Responses to a question about whether a trait described the actor were relatively fast if they were judged as true (M = 2.48 seconds) but slower if they were judged as false (M = 3.02 sec­ onds). The finding that the questions about personal and situational causality take the longest to answer supports Malle’s (2011) claim that the internal-external distinction has

Page 8 of 49

Social Attribution and Explanation been unduly emphasized in social psychological work on attribution theory (see also Al­ icke et al., 2015; Hilton, 2012).

Figure 32.4 Speed of inferring intentionality, belief, desire, personality in Malle and Holbrook (2012, Ex­ periment 3, using video stimuli).

Various models have addressed the question of different stages of social inference. For example, dual-process models such as that proposed by Trope (1986; Trope & Alfieri, 1997; Trope & Gaunt, 1999) differentiate behavioral identification (i.e., seeing an action as helping or hindering another, or a facial expression as a smile or a grimace) from dis­ positional inference (e.g., that a person is helpful or anxious) and argue that the second stage consumes more cognitive resources. Others have addressed the question of whether personality trait attributions from behavior descriptions are automatic (e.g., Ule­ man, 1999) and need cognitive resources to be corrected. An important finding is that dispositional inferences to the person or the situation seems to be triggered by the focus of the question posed to the participant (Krull, 1995; Krull et al., 1993). These results align with other findings that over-attribution to dispositional factors in the person or sit­ uation are equally likely to result if the relevant person- or situation-oriented question is posed but cognitive resources are not available (see Trope & Gaunt, 1999, for a related analysis). In the following discussion, we focus on the information patterns and rules of inference necessary to make three major kinds of social judgment. These are behavioral identifica­ tions (e.g., seeing a behavior as racist or not), causal attributions (e.g., seeing an aggres­ sive behavior as motivated by a bad mood or a situational provocation or a combination of both), or a dispositional inference (e.g., inferring that a person is racist or a situation is anxiety-provoking). We shall see that for all these three types of inference, differencebased causal reasoning appears to be involved (e.g., in discriminating a racist from a nonracist intention for shoving someone; in verifying whether being in a bad mood made a difference to someone’s acting aggressively; in comparing whether a certain kind of situa­ tion is more likely to induce stress than another). In all cases, causal inference is arrived at by analyzing whether factors make a difference to the occurrence of an effect (whether in the form of an analysis of non-common effects, counterfactual reasoning, or covariation analysis).

Page 9 of 49

Social Attribution and Explanation Note, however, that these causal inferences serve different functions. For example, some causal inferences serve to explain an event (e.g., by arriving at a conclusion of the type: “he pushed the man off the train because he doesn’t like blacks” or “she insulted her neighbor because she was in a bad mood”), whereas others involve dispositional attribu­ tions of characteristics to involved persons or entities (e.g., “he is racist,” “taking exams is stressful”). While dispositions can be used to support causal explanations (e.g., “she in­ sulted her neighbor because she was stressed by the exam”), we shall see that they have distinctive properties (e.g., they rely on causal generalizations of the form exams induce stress). Conversely, causal explanations of events (e.g., that enable characterization of an action as, say, racist or not) are often an important by-station for dispositional attribu­ tions about a person’s character.

Action Identification by Inferring the Motive Early work in attribution theory followed a period of research devoted to impression for­ mation (Hilton, 2012), and Heider’s (1958) book inspired considerations of how event ex­ planation might in turn inform the impressions we form of others. Thus the front end of Jones and Davis’s (1965) model of dispositional inference is composed of an analysis of non-common effects that identify the goal served by an action, and thus clarify its nature. As an illustration, consider the photograph in Figure 32.5, which shows a group of Eng­ lish football fans in Paris on a metro train bound for a match between Paris St. Germain and Chelsea on the evening of February 17, 2015. One of them (later identified as Richard Barklie) was using his outstretched arm to bar a local (black) Parisian from enter­ ing the underground train, while the others were singing (p. 652) racist chants. The film (posted on YouTube by a spectator, himself an Englishman living in Paris; see Figure 32.5) caused a media storm in the United Kingdom and France, not least when it transpired that Richard Barklie was a retired policeman who had since become a director of an inter­ national charity. For many, it was surprising that such a pillar of society should present a physical appearance (wearing dark jeans and shirt) that corresponded to a stereotype as­ sociated with football hooligans and apparently behave in a racist manner.

Figure 32.5 Video picture of Chelsea fans pushing Souleymane S.

Page 10 of 49

Social Attribution and Explanation In terms of Premack and Premack’s (1995) first two stages of action perception, there is little doubt that Barklie’s action was intentional, exhibiting the classic pattern of equifi­ nality as he deliberately and knowingly used his arm to block Souleymane S.’s repeated attempts to board the train. It is also easy to describe the scene in the kind of language that participants used to describe the behavior of the geometric figures in Heider and Simmel’s film; for example, he was repelling the man on the platform, who was repeated­ ly trying to enter the train (thus identifying Barklie’s behavior as “hindering” Souleymane S.). But what reason did Barklie have for hindering Souleymane S.? While it is apparent to the naked eye that Barklie intentionally obstructed Souleymane S. in his attempts to get on the train, it is less clear is why he did so (i.e., what was the higher-order goal that moti­ vated this action?). Was it in order not to have a black man next to him in the train? Or was it in order to prevent the car from becoming too overcrowded? Clearly, the fact that the other similarly dressed fans were singing racist chants at the same time predisposes the perceiver to believe that the intention was racist, particularly if we perceive Barklie to be a member of this group. Here verbal communication and symbolic records can be crucial, as they can allow another story to be told, other than the one that immediately meets the eye. According to Jones and Davis’s (1965) analysis of non-common effects, in order to know why someone made the choice he or she did, we need to know the unique consequences (or non-common effects) of that choice. For example, had there been more space on board, Barklie’s obstruction of Souleymane S. would have had a uniquely racist conse­ quence—just to keep a black man away, thus justifying the characterization of the behav­ ior as racist. But because the train was full, there is attributional ambiguity about the ef­ fects intended by the action, as the same action would not only keep a black person off the train, but would preserve enough space for those already on board. Tellingly, in a statement to the press issued in the immediate aftermath of the incident, Barklie’s lawyer was careful to avoid descriptions of what happened that would be informative about why Barklie acted as he did, stating simply that Mr. Barklie had been involved in an “incident” in which a man—identified as Souleymane S.—was “unable to enter part of the train.” However, more than a week after the incident, Barklie himself was more specific about what motivated him to act—he claimed to the British press that he had pushed Souley­ mane S. off the train simply because it was too full. If we accept this explanation, we can­ not characterize the behavior as racist. In this way, the analysis of non-common effects can tell us what was the motive that made a difference to an action being performed and its not being performed. For example, had a white man tried to board the train at the same time as Souleymane S. and Barklie had pushed him off, too, he could not be de­ scribed as racist. But if he only pushed the black man off and let the white man board, then his action could indeed be characterized as racist because in this case there would be a non-common effect—allowing the white man to board while preventing the black man from doing so.

Page 11 of 49

Social Attribution and Explanation

Kelley’s Dimensions of Covariation Information and Social Inference In addition to his interventions described in the preceding, Barklie’s lawyer used another line of argument—very familiar for social psychologists—to defend his client in the days immediately following the posting of the film on YouTube. In particular, he defended his client’s character using principles from Heider’s (1958) work on unit formation and Kelley’s (1967) ANOVA model of (p. 653) causal attribution, both inspired by Mill’s (1872/1973) work on methods of causal induction. For example, the lawyer claimed that Barklie was not part of a unit formed by the group of English football fans, went to the match alone, did not know anyone shown in the video, had not taken part in racist singing, and “condemns any behaviour supporting that.” In addition, he emphasized the high distinctiveness of the Paris incident compared to other domains in Barklie’s life, pointing to his résumé, which detailed his charity work in Africa and India. The lawyer al­ so insisted on inconsistency of the present incident with other occasions where Barklie had traveled to support his favorite team, Chelsea, as Barklie had traveled to their games for more than 20 years “without incident.” There was also an implicit reference to Barklie’s lack of consensus with troublemaking fans in general, as it was claimed that he had never been part of any “group or faction” of Chelsea fans. Barklie’s lawyer was in fact diffusing consensus, distinctiveness, and consistency informa­ tion, which form the bedrock of Kelley’s (1967) ANOVA model of causal attribution.2 Simplifying the example slightly by describing the target event as “Richard Barklie (the person) abused a black person (the event) while traveling to a football match (the situa­ tion) in Paris in February (the occasion),” Barklie’s lawyer appeared to be asserting that Richard Barklie abused a black person while traveling to the match in Paris. Many others abused this black person while traveling to the match in Paris (high consensus). Richard Barklie never abuses other black people in other situations (high distinc­ tiveness). In the past, Richard Barklie has never abused black people while traveling to foot­ ball matches (low consistency). Such high consensus, high distinctiveness, and low consistency (HHL) configurations typ­ ically lead to attributions to a combination of the situation and the occasion (e.g., “some­ thing about traveling to the football match in Paris on that day” (see Cheng and Novick, 1990; Försterling, 1989; Hilton & Jaspars, 1987; Hilton & Slugoski, 1986; Jaspars, 1983). However, had further investigation revealed a different background of covariation, a very different conclusion would be drawn. For example, imagine that no one else in the train had abused the black passenger, the fan in question had behaved abusively to black peo­ ple in other situations (e.g., at work) as well as on other occasions when traveling to sup­ port his team. Covariation analysis of a low consensus, low distinctiveness, high consis­ tency (LLH) configuration of this kind would unequivocally suggest that it was “some­ Page 12 of 49

Social Attribution and Explanation thing about the person” that made him do it (e.g., that he is racist). Of course, this was the inference that Barklie and his lawyer were trying to avert by presenting an HHL con­ figuration, and by Barklie’s offering his “sincerest apologies for the trauma and stress suffered by Mr. Souleymane.”

Causal Explanation of Events Using Mill’s Method of Difference We now turn to an important distinction between the causal explanation of events and dis­ positional attributions to the underlying characteristics of involved persons or entities. While both of these activities involve causal inference about social objects, they address different causal questions and use different rules of inference. We will show how attribu­ tion of an event to person, situation, or occasion causes in response to a why-question (or “what-caused” question) uses difference-based causal inference (e.g., Mill’s method of difference), whereas dispositional attribution in response to a question about what kind of entities were involved in the production of that event also uses generalization information analyzed by Mill’s joint method of agreement. Here, covariation information is used to an­ swer a classic question posed by attribution researchers to their experimental partici­ pants (e.g., McArthur, 1972): Is the cause of a person’s behavior due to something that is internal or external to that person? Intentionality and internality have often been con­ founded in attribution theory (see Malle, 1999, 2004, 2011, for relevant discussions). Suf­ fice to say here that we would characterize the action depicted in Figure 32.5 of pushing the man off the train as intentional, whether we later learned that he did so because he wanted to or because he was under duress (e.g., he was being given instructions to do so by a man holding a knife in his back). But if asked what caused him to do it, in the first case we might say that we would attribute the cause of the event to “something about the person” (e.g., he is racist), whereas in the second case, we would surely be more likely to say that “something about the situation” (e.g., he was being menaced) was the cause of his action. In the following, we will use a stylized example to illustrate these different principles of causal inference, noting that our discussion is informed by extensive empiri­ cal research (for relevant experiments see Hilton, Smith, & Kim, 1995; van Overwalle, 1997).3 When answering a question of the type “Why did Barklie push Souleymane S. off the train?” empirical research (Hilton, Smith, & Kim, 1995; van Overwalle, 1997) sug­ (p. 654)

gests that ordinary people, like scientists, use Mill’s method of difference when making causal attributions. Following Heider (1958), Kelley defined a cause as “that condition which is present when the effect is present and which is absent when the effect is ab­ sent” (1967, p. 154). Given an explanandum (event-to-be-explained) described as Richard abused another passenger on the train while he was traveling to the match in Paris, the lay scientist might begin by identifying three putative causes: the person (e.g., something about Richard); the type of situation (e.g., something about traveling to a football match); or the particular occasion (e.g., the match in Paris). In order to test these three causal hy­ potheses, three “experimental control” conditions are needed to test each causal hypothe­ sis. These are the following: consensus (Did other people behave the same way in the same situation on the same occasion?); distinctiveness (Does Richard behave this way in Page 13 of 49

Social Attribution and Explanation other in other kinds of situation?); and consistency (Does Richard behave the same way when traveling to support his team on other occasions?”). Imagine that our inquiries re­ veal an LHH configuration of covariation information, namely that this behavior (abusing another passenger) did not occur with other fans traveling in this train to this match (low consensus), and does not occur when Richard is present in other situations (high distinc­ tiveness) but has occurred on other occasions when he has traveled to support his team (high consistency). This means that the effect (abusing other passengers) almost always occurs when Richard is going to a football match, but not when other people go to the football match, or when Richard is not traveling to support his team. In this kind of low consensus, high-distinctiveness, high-consistency (LHH) configuration, we would infer that it is something about the conjunction of Richard and football matches that always causes this to happen (Hilton, Smith, & Kim, 1995; van Overwalle, 1997). Perhaps Richard is normally a very calm, unaggressive, sober person (as attested by the lack of in­ cidents in other situations), but always drinks too much and gets carried away when trav­ eling to support his favorite team.

Dispositional Attribution Using Mill’s Method of Agreement Note that in the example of an LHH configuration used in the previous section, we have made a dispositional attribution to Richard (“he is normally calm, unaggressive, and sober”) which is causally relevant for aggressive behavior (it should normally inhibit his tendency to abuse other people), but which does not figure in our explanation of why he abused this particular person at this particular place and time. However, we explain the incident by reference to another “disposition,” the explosive cocktail created by Richard’s always getting carried away when he travels to support his team. This illustrates the fact that dispositional inferences and causal explanations are not always the same thing. If we were to ask the question, “To what extent do you think Richard is an aggressive person?” empirical research suggests that this judgment would be very strongly influenced by dis­ tinctiveness information about whether or not Richard abuses people in other situations (Hilton, Smith, & Kim, 1995; Van Overwalle, 1997). To give another example of the way in which causal explanations and dispositional attri­ butions from the same covariation configurations diverge, consider the HLL information configuration. Here we can see where covariation information justifies dispositional attri­ butions that are causally relevant to the behavior in question but nevertheless do not explain it. So imagine that, upon investigation of the incident of Richard’s abusing anoth­ er passenger in the Paris metro, we discover that other passengers in the train also did so (high consensus), and that Richard frequently abuses people in other situations (low dis­ tinctiveness), although in the past he has not abused passengers in trains when traveling to his favorite team’s football matches (low consistency). Here, research shows that in HLL information configurations, people tend to identify something about the particular occasion (traveling to the match in Paris) as the cause of Richard’s abusive behavior (e.g., the presence in that train of an aggressive passenger who made not only Richard but also other passengers respond in kind) in response to a question about what caused the event. Importantly, HLL configurations tend to lead to dispositional attributions to the person Page 14 of 49

Social Attribution and Explanation (e.g., that he is an aggressive person) and the situation (e.g., that it tends to provoke ag­ gression) that are not invoked as causes for this particular event, even though they are causally relevant to the event in question. Dispositional attributions to persons rely on evidence that someone has not only behaved in a certain way, but that they have done so in many situations (low distinctiveness) and on many occasions (high consistency). In other words, if the cases (p. 655) of aggression under consideration all agree in having Richard’s presence (Richard on the train this week and last week, Richard in the pub this week and last week, Richard at work this week and last week), then we may conclude that Richard is aggressive using Mill’s method of agreement. Similarly, dispositional attributions to situations similarly rely on evidence that many people behaved the same way in that situation (high consensus), as well as high-consistency information.

Causal Explanations Versus Dispositional Attributions for Moral Be­ havior The distinction between explaining an event and making a dispositional attribution to an involved person or entity can help clarify the meaning of people’s responses to a number of judgment tasks. For example, in the case of actions that cause undesirable side effects (e.g., saving a mother’s life by aborting her baby), the occurrence of the side effect does not explain why the action was performed (the goal was to save the mother’s life) but may license dispositional attributions about the agents concerned. For example, people with certain religious beliefs who believe that a fetus’s life is sacrosanct might consider the mother (and the doctors involved) to be “un-Christian” or “immoral.” In addition to proposing the analysis of non-common effects discussed earlier, Jones and Davis’s (1965) model predicts that norm-violating behaviors will trigger stronger dispositional attribu­ tions to the actor. It is also easy to derive similar predictions from neo-Kelleyan models, as norm violation implies that a behavior is low consensus, which will favor dispositional attributions when associated with low-distinctiveness information (Hilton, Smith, & Kim, 1997; van Overwalle, 1997). In line with these predictions, Uttich and Lombrozo (2010, Experiments 2 & 3) obtained responses that suggest that actions which produced highly undesirable side effects triggered stronger negative trait attributions to the target actor (“Bob is a bad person”), especially when the actor was described as an evil henchman whose job “is to do maximum evil at every opportunity” (conveying low distinctiveness in­ formation with reference to the target behavior, a planned bank raid that involves the re­ lease of neurotoxins into a town’s water supply). Examples such as these indicate the po­ tential of well-known models of the attribution process to illuminate research into moral judgment processes (for more examples, see Alicke et al., 2015; Hilton, 2007).

Implicit Covariation Assumptions and Causal Attribution Of course, covariation information is not routinely available to the naked eye, and has to be either recruited from long-term memory or furnished verbally (e.g., by an experi­ menter or a lawyer). These assumptions about covariation associated with kinds of event, Page 15 of 49

Social Attribution and Explanation personality, stereotypes, or scripts are sometimes referred to as causal schemata (Kelley, 1973).4 For example, verbs describing emotional states show a strong tendency to be at­ tributed to the object eliciting the state (e.g., Paul in Ted likes Paul) and, consistent with this, are associated with assumptions of high consensus and high distinctiveness, as peo­ ple spontaneously tend to assume that Many people like Paul and Ted likes few other peo­ ple. In contrast, action verbs such as Ted charms Paul tend to be associated with lower consensus and lower distinctiveness (Brown & Fish, 1983; McArthur, 1972; Rudolph & Försterling, 1997) and evoke more attributions to the agent performing the action. Stereotypes and norms are, of course, an important source of implicit covariation infor­ mation for causal attributions. For example, although state verbs typically trigger attribu­ tions to the verb object consistent with assumed high consensus and high distinctiveness (e.g., Ted admires Paul triggers attributions to “something about Paul” consistent with the assumption that many people admire Paul and Ted admires few other people), a counternormative event such as Ted admires the rapist triggers more causal attributions to the person (something about Ted) consistent with the low consensus assumed for this event— most people consider that few people admire the rapist (Burguet & Hilton, 2004). Howev­ er, more work remains to be done to establish exactly what kind of implicit covariation in­ formation is communicated by stereotypes. For example, McCauley and Stitt (1978; see also McCauley, Stitt, & Segal, 1980) suggest that stereotypic features such as Germans are scientifically minded should not be interpreted as overgeneralizations (e.g., most Ger­ mans are scientifically minded) but in terms of Bayesian diagnosticity (e.g., Germans are more likely than most nationalities to be scientifically minded). They report data showing that American students consider a minority of Germans to be scientifically minded, but al­ so consider that this is nevertheless a distinctive feature of Germans (Germans are more scientifically minded than the average national group). Some events are assumed to happen regularly (e.g., most people buy something on most visits to most supermarkets), whereas others are rare (e.g., not leaving a tip in a restaurant), and these assumptions about norms have been shown to influence causal at­ tributions (e.g., Hilton & Slugoski, 1986). More generally, people tend to assume that neg­ ative events are rarer than positive events (e.g., Reeder & Brewer, 1979), which in turn can explain why people are more likely to attribute a negative disposition from observing a negative behavior (e.g., telling a lie) than from a positive behavior (e.g., giving help when asked). The assumption of low frequency for negative behaviors favors the assump­ tion of low consensus (few people do it), and thus can explain the judgments of people (p. 656)

who consider that the participants in Milgram’s (1974) experiment who went to the limit in administering potentially lethal electric shocks were “brutal” (Bierbrauer, 1979). Given that Milgram’s studies showed that most neutral observers who had the experimental scenario described to them assumed low consensus (i.e., they expected that few partici­ pants would go to the limit), the attribution of brutality to the participants in the Milgram experiment seems explicable in terms of the application of Mill’s methods of induction. If there is an error here, it seems to be primarily due to an error in the premises used by the causal inference process, not in the causal inference process per se. More generally, recent reviews have suggested that earlier work in social psychology (e.g., Nisbett & Page 16 of 49

Social Attribution and Explanation Ross, 1980) may have overestimated the inaccuracy of everyday social inference. For ex­ ample, properly designed experiments reveal little evidence for underuse of consensus in­ formation in making causal attributions, and people do discount causes appropriately, though perhaps not as much as they should do (for further discussion, see Hastie, 2016; Hilton, 2007).

Stereotyping and Attribution: Automatic Versus Controlled? The existence of intergroup discrimination in attributions of responsibility and guilt has been termed the “ultimate attribution error,” and has been observed in various studies (e.g., Taylor & Jaggi, 1974; but see Khan & Liu, 2008, for a more nuanced position). A classic demonstration of the psychology of rumor (Allport & Postman, 1947) showed that a scene showing a dispute on a metro train tends to be distorted in the direction of nega­ tive stereotypes about blacks by white Americans (e.g., the black protagonist tends to be described as being armed, while there is no suggestion of this in the original picture). The presence of such prejudice is still evident in the United States, even leading President Obama to regret some 70 years later, on June 22, 2015, that racism continues to be “part of our DNA” in the United States. Examples range from clear differences between whites and blacks in their judgments of O. J. Simpson’s guilt for the murder of his wife in 1994 to the responsibility of white police officers and security agents for killing blacks whom they believed to be dangerous during the years of the Obama presidency. Most white partici­ pants in US universities affirm that they are not racist, yet show evidence of possessing racist associations (e.g., associating black people with negative characteristics) when ad­ ministered the implicit associations task (IAT; Greenwald et al., 2009). Do these associa­ tions influence their judgments of blame and guilt in response to visual scenes involving a member of another group? As we shall see, the answer is yes: under certain circum­ stances. Duncan (1976) showed participants videotapes of an interaction between two other peo­ ple engaged in a heated exchange, which resulted in one of the protagonists shoving the other. Participants were led to believe that what they were watching were television im­ ages of an interaction between two other experimental participants actually taking place elsewhere in the laboratory at the University of California, Irvine. This impression was en­ couraged by the experimenter, who hurriedly turned off the television monitor after the shove, and went down the corridor muttering to himself to see what was wrong. There were actually four conditions of the film, created by crossing two conditions of the color (black or white) of the protagonist and the victim of the shove. The white observers were far more likely to say that the protagonist was “playing around” and “dramatizing” rather than “being violent” if the protagonist was white. For example, 75% labeled a black man shoving a white man as violent, but only 17% did so when a white man shoved a black man. When the harm-doer was black, person attributions were favored (regardless of the color of the victim), and when the harm-doer was white, situational attributions were fa­ vored, regardless of the color of the victim.

Page 17 of 49

Social Attribution and Explanation Are such effects of stereotypes on attribution automatic? An informative study was con­ ducted by Bodenhausen (1990), who reasoned that if stereotypic judgments are automat­ ic, it should be harder for people to suppress their effects on judgment (p. 657) when they lack cognitive resources. Accordingly, he presented a story to undergraduates that pur­ ported to come from their campus newspaper concerning a person apparently involved in drug trafficking, with ambiguous information about whether the protagonist was guilty or not. The protagonist had either a recognizably Anglo Saxon name (Mark Washburn) or a recognizably black name (Marcus Washington), and the students were asked to judge his likely guilt. What was varied was the time of day at which the undergraduates were asked to do this task, and whether they had circadian rhythms that made them at their best in the morning (“day people”) or in the evening (“night people”). The undergraduates were asked to make this judgment either when they were expected to be at their most (i.e., day people in the morning, night people in the evening) or least alert (day people in the evening, night people in the morning). Consistent with the hypothesis that people’s judg­ ments would be most influenced by the stereotype when they were least alert, “day” peo­ ple were more likely to judge Marcus Washington guilty in the evening, and “night” peo­ ple to do so in the morning. In another experiment, Bodenhausen (1988) reasoned that stereotypes will be more likely to color our judgment if they are active at the moment that we consider evidence. In his study, students were asked to imagine that they were jurors in a trial involving a case of aggression. They were presented with 12 pieces of information, of which three were neu­ tral, and four or five were either favorable or unfavorable to the defendant. As might be expected, the judgments of the likely guilt of the defendant were affected by the number of pieces of favorable or unfavorable information. In addition, they were affected by intro­ ducing the name of the defendant before the students read the information: they were more likely to consider that the defendant was guilty when he had a Hispanic name (Robert Ramirez) rather than an Anglo Saxon one (Robert Johnson5). The name of the de­ fendant had no effect if it was introduced after the students read the items of information, consistent with the hypothesis that the stereotype colored their interpretation of the in­ formation (Asch, 1946). Finally, the effect of stereotype in the presentation-before condi­ tion disappeared if the students were required to evaluate the favorable or unfavorable nature of each item of information while they were reading it. Such attentional focaliza­ tion manipulations have also been known to eliminate primacy effects in impression for­ mation (Stewart, 1965).

Can We Eliminate Biases in Social Judgment? A major program of research in causal and responsibility attribution was driven by “manthe-lawyer” model of responsibility attribution, which proposed that people should consid­ er the motives and mental states in the explanation and punishment of behavior (Fincham & Jaspars, 1980). It has been found that people sometimes take motive into account when —according to legal perspectives—they should not do so, and at other times they neglect it when they should. As an illustration of the first phenomenon, Alicke (1991) showed that information about the pro- or anti-social nature of the agent’s motives can influence judg­ Page 18 of 49

Social Attribution and Explanation ments of blame, even when they are only tangentially related to the offence in question (e.g., speeding in order to hide a birthday present vs. to hide drugs at home). As an illus­ tration of the second, Goldberg, Lerner, and Tetlock (1999) reported that participants whose anger has been aroused by seeing a prior offense go unpunished are likely to use simpler judgment strategies when asked to evaluate a second, unrelated offense. In par­ ticular, these individuals were less likely to take the intentional nature of the second act into account when deciding punishment. The prevalence of “blame validation” or “intu­ itive prosecutor” phenomena such as these (Alicke, 2000; Tetlock, 2002) poses the ques­ tion of whether there are ways of eliminating, or at least attenuating, the effect of such biased processing on causal attribution and legal or moral blame. A first method is to draw participants’ attention to the fact that an attempt is being made to influence their judgment. For example, the effect of leading questions on judgments of a vehicle’s speed during a collision (Loftus & Palmer, 1974) are eliminated when the ques­ tion is presented as coming from a prosecutor in a law court seeking to obtain a convic­ tion (Dodd & Bradshaw, 1980). A second method is to make the participants accountable for their judgments. Accountability, or knowing that one will have to explain one’s judg­ ments to another person, often leads to elimination of judgmental biases. For example, Tetlock (1985) found that primacy bias (the tendency to be most influenced by initial in­ formation when making judgments) was eliminated by accountability instructions. Thus when presented with a dossier containing positive and negative information concerning a defendant’s guilt, students placed in the role of jurors paid more attention to information presented in the dossier when they had been told before reading this (p. 658) information that they would have to explain their judgments to a third party. This led to the elimina­ tion of primacy effects on judgments of guilt. However, timing is important: if the ac­ countability manipulation is introduced after a judgment has been formed, it risks leading the decision-maker to polarize that judgment as he seeks to entrench it with further justi­ fications (Tetlock, 1992).

Summary Social attribution in humans uses information available in the perceptual field that con­ tains specific kinds of stimulus patterns that automatically activate specific brain circuits connected to the visual system that contribute to the perception that an action is inten­ tional. However, this output can also be integrated with higher order information ob­ tained from communication (e.g., about the agent’s reasons for her action), memory (e.g., about what others would have done in the same situation, the target’s behavior in other situations and on other occasions) or from media (the press, conversation with friends, in­ formation presented by an experimenter, etc.). We have seen how such information can be analyzed by some variant of Mill’s method of difference to clarify three major kinds of attributional judgment: the characterization of behavior; causal explanation of events; and dispositional attribution to involved persons and entities (a form of causal learning). As well as requiring use of a difference-based strategy, the attribution of a disposition

Page 19 of 49

Social Attribution and Explanation (e.g., aggressive) to a person implies that he or she will produce the effect in question across situations and occasions more frequently than the average person. Whereas anyone watching the film of the black man being barred from entering the train in Paris would automatically “see” the white football fan’s action as intentional and ob­ structive, further elaboration of one’s perception of the event depends on knowing more about the “backstory.” We have already seen that this backstory may take the form of a résumé about an agent’s character (e.g., the fact that the fan who pushed the black man away was a former police officer, working in a senior position in an international charity). But it may also (and often does) take the form of a complex narrative given the media, conversations, or experimental scenarios. These reports are produced by humans for con­ sumption by other humans, and are thus pre-formatted to be digestible for the human mind. They follow social conventions (Grice, 1957) and thus respect fundamental rules for human communication (Graesser, Singer, & Trabasso, 1994; Sedlmeier & Hilton, 2012). We next consider how such stories are generated, communicated, and interpreted.

Explanation-Based Understanding of Stories Humans are capable of generating and communicating complex explanations in the form of stories. For example, Winston Churchill’s six-volume history of the Second World War can be considered to be a full explanation of his view of why and how this event hap­ pened. Some things “need” explanation, such as catastrophes and surprising events, and causal queries arise when there is some mismatch between what happened and what should have happened (Böhner et al., 1988; Hastie, 1984; Kanazawa, 1992; Weiner, 1985). Explanation can be thought of as a process of puzzle resolution (Turnbull, 1986), and it seems natural to speak of something “needing explanation,” leading a doctor to puzzle about a patient’s painful symptoms, or a newspaper editor to wonder why a plane has crashed. Causal attribution may resolve these puzzles, by tracing an effect to its source. So in medical diagnosis a doctor may seek to attribute a symptom to an underlying dis­ ease, an art curator to attribute a painting to a grand master, a newspaper editor to at­ tribute a foreign policy disaster to its architect. Sometimes, elaborate causal attribution processes (e.g., medical diagnosis, historical re­ search) are necessary to construct a “situation model” of how an event came about. Situa­ tion models are often the result of explanatory inquiries designed to resolve some puzzle (e.g., Why did the plane crash? Why is there political instability in the Middle East?). These situation models represent a chain of events that resulted in the particular eventto-be-explained (e.g., an engine failure, a war). Situation models represent the culmina­ tion of a program of research into discourse comprehension processes inspired by Schank and Abelson’s seminal book Scripts, Plans, Goals and Understanding: An Enquiry into Hu­ man Knowledge Structures, itself inspired by Heider’s 1958 analysis. A descendant is the “story model” approach to understanding judgments of criminal guilt (see Lagnado & Gerstenberg, Chapter 29 in this volume). Imposing one “story” rather than another is of­ ten crucial to deciding verdicts in the courtroom (cf. Bennett, 1992; Hastie & Pennington, Page 20 of 49

Social Attribution and Explanation 2000), and may equally influence what version of events we accept in deciding blame in more informal settings (such as a romantic separation). (p. 659)

Generating Explanations of Particular Events

Most everyday explanation is typically concerned with why a particular event occurred when and how it did, and thus more closely resembles legal and historical explanation than it does scientific explanation. As legal theorists Hart and Honoré (1985) write: The lawyer and the historian are both primarily concerned to make causal state­ ments about particulars, to establish on some particular occasion some particular occurrence was the effect or consequence of some other particular occurrence. The causal statements characteristic of these disciplines are of the form “This man’s death was caused by this blow.” Their characteristic concern with causation is not to discover connexions between types of events, and so not to formulate laws or generalizations, but is often to apply generalizations, which are already known or accepted as true and even platitudinous, to particular concrete cases. Accordingly, Hart and Honoré (1959, p. 40) suggest that when we ask “What was the cause of death?” what we typically mean is “What caused this man’s death at this time?” Consequently, explanations that answer the question “Why did Michael Jackson die?” by answering that “blood stopped flowing to his brain” might be stating the truth, and in­ deed referring to a scientific criterion of death, but they are not giving a relevant answer to the intended meaning of the question. Consistent with the claim that everyday explana­ tion focuses on particular cases, a survey of reports in German newspapers in 1903, 1992, 1993, and 1996 found that 72% of causal explanations dealt with particular events (Oestermeier & Hesse, 2001). In contrast, the explananda of science and everyday expla­ nation are typically quite different, consisting of questions like “Why do apples fall down­ ward?” “Why do things catch fire?” and receiving answers in the form of universal causal generalizations to explain such types of occurrences, such as “gravity causes things to fall,” “oxygen causes fires,” and so on.6 In order to take a “slow motion” picture of the generation of a causal explanation from beginning to end, Hilton, Mathes, and Trabasso analyzed six months of the New York Times’ reports of the Challenger disaster, and identified three distinct stages in the expla­ nation process (Figure 32.6). These were (1) assessment of consequences (e.g., the death of the astronauts, the nation’s shock, implications for space travel); (2) causal under­ standing (of the physical mechanisms that led to the crash); and (3) responsibility attribu­ tion (for deciding to launch despite warnings, for the design errors in the booster rock­ ets). The latter two phases appear to correspond to Hart and Honoré’s (1959) distinction between explanatory inquiries that attempt to understand how an event happened, and at­ tributive inquiries that attempt to apportion responsibility. Hilton, Mathes, Trabasso observed that the New York Times’ investigation into the Chal­ lenger disaster began by describing the consequences of the launch (the death of the as­ tronauts, the distress of (p. 660) their families, the cancellation of the planned lesson from Page 21 of 49

Social Attribution and Explanation space, the reactions of American schoolchildren, the delays to the launch program, etc.). In parallel, a causal inquiry began by outlining the three major parts of the shuttle system (booster rockets, external launch tank, shuttle) and enumerating the possible sources of failure in each. As further evidence arrived each day, these provisional scenarios were “fleshed out.” The process resembled in many ways the process of medical diagnosis: general practitioners who have to understanding a presenting problem from scratch gen­ erate quite general hypotheses based on their knowledge of body systems (“Something to do with the heart system? The stomach system? The brain system?” etc.) before activat­ ing more specific hypotheses based on incoming information and building up confidence in their putative explanations (Weber et al., 1993, 2000).

Figure 32.6 Frequency of references to causal expla­ nations, responsibility attributions, and conse­ quences in this first 30 days of New York Times reports on the Challenger disaster. From Hilton, Mathes, & Trabasso (1992).

We may assume that the result of the explanatory inquiry is a situation model that repre­ sents the inquirer’s causal understanding of how the particular event in question (e.g., the Challenger crash, the patient’s problem) came about. Note that the same element may be identified as part of the causal story in one situation model but not in another, depend­ ing on whether causal connection between its occurrence and the event-to-be-explained is proved or not. For example, an initial hypothesis for the Challenger space-shuttle disaster was that it was due to damage to the lunar orbiter’s skin caused by the lift-off blasting ici­ cles on the launch platform caused by the unusually cold weather (Hilton, Mathes, & Tra­ basso, 1992). This hypothesis was then discarded when no evidence was found for dam­ age to the orbiter’s skin. However, the hypothesis that the cold weather played a role was resurrected when it was established that this may have affected the performance of the rubber seals between the booster sections, in turn leading to leakage of explosive hydro­ gen fuel. This latter account indeed became the accepted explanation of how the disaster occurred (e.g., as described in the Rogers Commission’s report on the disaster), and the new situation model re-established the causal connection between the cold weather and the disaster. Examples such as this underscore the importance of coherence considera­ tions when constructing a causal “story” for the Challenger disaster (see Lagnado & Ger­ stenberg, Chapter 29 in this volume): the cold weather fit into both a “faulty seals” and a Page 22 of 49

Social Attribution and Explanation “damaged orbiter skin” story as it could potentially explain both rigidity in the seals due to freezing and damage to the skin due to icicles hitting it during bast-off. However, the causal mechanisms that produced the disaster had been established, attention turned to questions of responsibility—for example, should the shuttle have been launched despite the concerns about the performance of the booster rockets in cold weather flagged by the Morton Thiokol engineers who built them? Should the contract to build the booster rock­ ets (with their defective design) have been given to Morton Thiokol in the first place? (see Figure 32.6). Media reactions to disasters often follow the pattern of assessing causal consequences first, exploring (physical) causal mechanisms second, and assigning responsibility to hu­ man agents third. One such example occurred on the morning of March 24, 2015; while flying from Barcelona to Düsseldorf under a clear blue sky, a Germanwings A320 Airbus flew directly into the lower Alps near Digne in France, killing all 150 passengers and crew on board. First, they reported the consequences of the crash (whether there were survivors, the dispatch of search-and-rescue teams, the distress of relatives waiting for their loved ones at Düsseldorf airport, etc.). Second, and in parallel, they established core hypotheses as to the possible causes of the accident (e.g., Abelson & Lalljee, 1988). Was it due to a mechanical fault? Terrorism? Some other cause? Indices were used to discrimi­ nate between possible explanations and to flesh them out. For example, the perfect weather seemed to exclude the role of meteorological factors, while the peaceful nature of the region and intense radar coverage seemed to exclude the possibility that the air­ craft had been shot down by a missile. Features of the accident then emerged that pointed to a factor internal to the plane as a cause, such as the fact that the aircraft had begun an eight-minute descent which took it straight into the mountainside (e.g., a mechanical fault, pilot error, or some combination of such factors). On March 26, attention dramatically focused on copilot Andreas Lubitz, as analysis of one of the black boxes indicated that he had locked himself in the cockpit while the captain was in the toilet, and refused to open the door despite the insistent knocking and demands of his colleague. In the following days, investigations by the Ger­ man press then proceeded to a “progressive localization of cause” (Mackie, 1974) by re­ vealing what specifically could explain Lubitz’s action. It turned out that he had a history of depression, had recently become separated from his girlfriend, was being treated for eye problems that threatened his career as a pilot, and had torn up a sick note that for­ bade him from flying that day. There was even a suggestion that the reason (p. 661) that he may have chosen this area for his suicide was that from 1997 to 2003 he had spent holidays flying gliders with his parents in this very region. Having achieved an under­ standing of how the crash happened, attention then turned to questions of responsibility and prevention. Had Lufthansa done enough to screen its pilots for mental and emotional stability? Did they put too much pressure on aircrews? The causal analysis prompted air­ lines worldwide to move quickly to put measures in place to ensure that two aircrew were in the cockpit at all times, so as to prevent a similar event from occurring again.

Page 23 of 49

Social Attribution and Explanation

What Makes a Good Explanation? Causal Connection Versus Causal Selection The situation model approach allows us to see that causal explanations can be challenged for quite different reasons. Following Hesslow (1988), we suggest that there are two fun­ damental causal questions answered by each singular causal statement. The first is whether two events are causally connected to each other. This question can be settled by a counterfactual test: Would the second event have happened if the first event had not been? In the Challenger example, we have seen that the same event (the abnormally cold weather) was rejected as being part of the explanation of the disaster when the proposed connection to the disaster (ice blown off the launch platform) was shown to be implausi­ ble, but was reinstated when another plausible connection was found (faulty performance of the booster rocket seals). The second question of causal selection is settled by the im­ plicit contrast implied in a causal question. For example, the question “Why did the Chal­ lenger blow up during launch when the other shuttles didn’t?” would be aptly answered by the explanation “the cold weather,” focusing on the difference between the abnormally cold weather and normal launch-time weather. But if the question is “Why did the Chal­ lenger blow up during launch when it was designed not to?” this would be better an­ swered by the explanation “the failure of the booster rocket seals” (focusing on the differ­ ence between the actual vs. ideal performance of the seals). The distinction between questions of causal connection and causal selection also helps clarify the difference between causal discounting (choosing which of two or more compet­ ing situation models accurately describes a target outcome) and causal backgrounding (selecting causes from conditions within a given situation model). In causal discounting, where there are two competing situation models that give different accounts of how the target effect might have come about, the causal question concerns which of the two mod­ els is probably (or actually) responsible for the target outcome happening the way it did. Here we have a case of multiple sufficient causes (MSC) where each situation model is sufficient to produce the causal outcome in question. The judge has to choose which situ­ ation model is most likely to have been the one that actually produced the event in ques­ tion, and adding information that makes one situation model more likely will normally de­ crease belief that the explanation based on the competing situation model is true. For ex­ ample, learning that a cancer sufferer has been a heavy smoker may decrease belief in the probability that this person got cancer because he worked in an asbestos factory (Hilton & Erb, 1996), leading it to be discounted (McClure, 1998; Morris & Larrick, 1995) or (in causal model parlance) explained away (see Rehder, Chapter 20 in this volume). In contrast, a factor that is causally backgrounded is not “explained away” and is still be­ lieved to be true. It remains in the situation model believed to have resulted in the target event, but is presupposed as supporting background for the focal cause. In an experimen­ tal demonstration, Hilton and Erb (1996) showed that learning that a hammer broke a watch leads most people to evaluate the explanation that the watch broke because the hammer broke it as a good one. However, if they then learn that the watch broke as part of a routine test in a manufacturing process, they consider their original explanation to be Page 24 of 49

Social Attribution and Explanation less good and worse than an alternative explanation “the watch broke because it had a fault.” But when asked to reason counterfactually, people still consider that the watch would not have broken if the hammer had not struck it. In other words, they still see a causal connection between the hammer strike and the watch breaking, but no longer con­ sider this causal connection to furnish such a good explanation of why the watch broke in the context of a routine factory testing procedure. Causal backgrounding is thus the re­ verse side of causal selection. Causal selection dignifies some events as “causes” (e.g., the watch had a fault) because they are relevant to the implicit causal question in hand (“Why did this watch break when others did not?”), leading other events that are causally connected to the event in question (e.g., the hammer strike) to be backgrounded as “mere conditions” (Mackie, 1974) in the context of a routine factory testing procedure. (p. 662)

Evidence for Situation Models in Story Understanding

The core of a situation model represents events that are “necessary in the circum­ stances” (Mackie, 1974) for the consequent’s occurrence such that if they had not oc­ curred, the consequent would not have occurred. They can thus be characterized as a network of counterfactual dependencies between remembered events (e.g., Graesser, Singer, & Trabasso, 1994). The causal connections between events in the situation model are determined by online counterfactual reasoning during the comprehension process, and work on children’s stories has shown that events that have more causal connections to other events in the narrative are more likely to be recalled and rated as central and im­ portant (Trabasso, Secco, & van den Broek, 1984; Trabasso & Sperry, 1985; Trabasso & van den Broek, 1985; Trabasso, van den Broek, & Suh, 1989). Situation models thus have some empirical support; in the following we illustrate how they can clarify key questions in the study of causal and counterfactual reasoning. While much work on the situation model approach has used children’s stories, Trabasso and Bartalone (2003) applied the approach to two experimental scenarios read by adults in research on counterfactual if-only reasoning, originally generated by Kahneman and Tversky (1982). Both experimental narratives end with the same outcome, an accident in which the focal actor, Mr. Jones, died: The accident occurred at a major intersection. The light turned amber as Mr. Jones approached. Witnesses noted that he braked hard to stop at the crossing, al­ though he could easily have gone through. His family recognized this as a com­ mon occurrence in Mr. Jones’ driving. As he began to cross after the light changed, a light truck charged into the intersection at top speed, and rammed Mr. Jones’ car from the left. Mr. Jones was killed instantly. It was later ascertained that the truck was driven by a teenage boy, who was under the influence of drugs. As commonly happens in such situations, the Jones family and their friends often thought and often said “If only …,” during the days that followed the accident. How did they continue this thought? Please write one or more likely completions.

Page 25 of 49

Social Attribution and Explanation The two versions of the story were identical except for one paragraph. In the route version, the critical paragraph read as follows: On the day of the accident, Mr. Jones left his office at the regular time. He some­ times left early to take care of home chores at his wife’s request, but this was not necessary on that day. Mr. Jones did not drive home by his regular route. The day was exceptionally clear and Mr. Jones told his friends at the office that he would drive along the shore to enjoy the view. The time version of this paragraph was as follows: On the day of the accident Mr. Jones left the office earlier than usual, to attend to some household chores at his wife’s request. He drove home along his regular route. Mr. Jones occasionally chose to drive along the shore to enjoy the view on exceptionally clear days, but that day was just average. Kahneman and Tversky found that there was a very strong tendency for participants placed in the position of Mr. Jones’s relatives to generate if-only completions that men­ tioned the abnormal event (leaving home earlier than usual in the time version, taking an unusual route home in the route version). Although Kahneman and Tversky’s seminal study generated considerable interest in the generation of if-only thoughts (for a review, see Alicke et al., 2015), Trabasso and Bartalone’s (2003) research introduces an important clarification. Using Mackie’s (1974) test of “necessity in the circumstances,” they generated situation models that correspond­ ed to the time and route versions of this narrative (see Figure 32.7).7 Inspection of Figure 32.7 shows the situation models generated to represent each scenario, and shows that the outcome—when described as Mr. Jones being killed in the way that his death came about—depends on a complex causal chain. Each situation model can be seen as being the output of a process of online but-for counterfactual tests in the comprehension process, that are causally relevant in some (weak) sense to the event-to-be-explained (here, Mr. Jones’s death as it came about). In fact, very few events in Kahneman and Tversky’s narrative can be said to be causally irrelevant to this outcome, in the sense that if they had been otherwise, Mr. Jones would not have died in the accident. The situation model thus incorporates a causal chain composed of a complex set of causally relevant conditions that are all necessary for Mr. Jones’s death as it happened in the story. To verify this, Hilton, Schmeltzer, Wallez, and Mandel (2016) created a route version of the Mr. Jones’s story which included a detail that was clearly causally irrele­ vant to the crash (the blue color of Mr. Jones’s car). Participants’ evaluation of but-for counterfactuals showed that they judged the color of Mr. Jones’s car (p. 663) to be causal­ ly irrelevant, as they judged it unlikely to have changed the outcome (described as Mr. Jones’s death that day in that manner) had it been different. However, most respondents accepted that the other elements (e.g., time of leaving the office, choice of route, the teenager’s dangerous driving, Mr. Jones arriving at the crossroads at the same time as the teenager) would all have prevented the outcome had they been different. Importantly, Page 26 of 49

Social Attribution and Explanation perspective (being asked to take the point of view of the family of Mr. Jones vs. that of the teenage driver) did not have a significant effect on any of the but-for counterfactual evalu­ ations. This suggests that participants who take different perspectives (Mr. Jones’s family vs. the teenage boy’s family) still construct the same situation model of the narrative.

Figure 32.7 Route and time version causal network for the Mr. Jones stories derived by Trabasso and Bartalone (2003). Arrows indicate a causal relation. Events are of six kinds: S = setting; E = event; A = attempt; G = goal; G/A = goal/attempt; O = outcome.

In contrast, there were strong perspective effects in responses to the if-only questions, which tend to focus on those necessary conditions that are controllable from the point of view of the focal actor (Girotto, Legrenzi, & Rizzo, 1991). When placed in the perspective of Mr. Jones’s family, most participants selected a factor controllable by Mr. Jones (e.g., time of departure, his braking as he came into the intersection), but a factor controllable by the teenager (his dangerous driving) if they had been cued into the perspective of the teenager’s family (see also Mandel & Lehman, 1996). Interestingly, a significant number of respondents cued into Mr. Jones’s perspective evaluated the abnormal event (e.g., leav­ ing the office earlier than usual) as a cause of Mr. Jones’s dying as and when he did, even if most still generated the teenager’s dangerous driving as the principal cause of “the ac­ cident” in a free response task. This last result accords with the prediction that people se­ lect as “the” cause the factor that covaries strongly with the outcome and makes it pre­ dictable (Mandel, 2010; Mandel & Lehman, 1996). Finally, Hilton et al. (2016) wrote a “Mark Smith” version of the accident, which explained the combination of circumstances which led Mark Smith to be at that intersection at that moment in time, and was corre­ spondingly elliptic about Mr. Jones’s role. Here, being placed in one family’s perspective still led respondents to if-only counterfactual questions to focus more on actions by the fo­ cal family member (Jones vs. Smith) that could have avoided the accident, but led to no difference (p. 664) in evaluations of but-for counterfactuals or judgments of “the” cause (still Mark Smith’s dangerous driving). These results support the view that the but-for counterfactual reasoning that helps identi­ fy the causal structure of a situation model built up in understanding a narrative is dis­ Page 27 of 49

Social Attribution and Explanation tinct from the post hoc counterfactual reasoning used to generate responses to an if-only question. Consequently, it is important to distinguish the but-for counterfactuals used to identify situation models of how the event-to-be-explained came about (e.g., Hart & Hon­ oré, 1959; Mackie, 1974; Woodward, 2003) from the if-only counterfactuals used to probe which condition people will undo when ruminating about a negative event from the focal actor’s perspective (see Kahneman, 1995; Mandel, 2011; Woodward, 2011). The if-only probe, like a request for a or the principal cause, selects one of the complex set of causal conditions represented in the situation model, and may best be thought of as a post hoc causal query with specific focusing properties. These results may have important implications for theories of causal selection, under­ stood as dignifying some causal factors as “causes” while relegating others to the status of “mere conditions” (Mackie, 1974; discussed in more detail later). For example, when discussing the Mr. Jones example, Hitchcock and Knobe (2009) proposed that responses to an if-only question would be a good guide to identifying which conditions were to be relevant to be considered causes. Our results indicate that this is not the case. While butfor counterfactual questions do, however, seem to systematically identify the chain of cau­ sation that produced the event in question, which Hitchcock and Knobe refer to as “causal structure,” if-only questions do not systematically orient people to causes, but to factors that the focal actor could have undone to prevent the event in question from hap­ pening (see also Mandel & Lehman, 1996; N’gbala & Branscombe, 1995).

Applying World Knowledge to the Explanation of Story Events An outstanding challenge for research is to better specify how general knowledge can be applied to the explanation of particular events that have richly specified historical back­ grounds (as is the case with story events). Recent work in cognitive science has brought considerable advances in the understanding of causal models that specify timeless rela­ tions between types of events (e.g. Hastie, 2016; Rehder, Chapter 20 in this volume), whereas stories describe situations that involve particular people acting and reacting at particular places at times. One can imagine, as Hart and Honoré suggest, that causal gen­ eralizations between types of events (e.g., smoking causes cancer, hitting someone often hurts them, speeding causes accidents) will be used in understanding stories and explain­ ing events, but this is not the same as saying that everyday causal explanation can be re­ duced to the application of causal generalizations. For example, particular causal rela­ tions may not instantiate general ones. The Hilton et al. (2016) study described earlier can be used to illustrate this point. It is true that the principal cause in the Mr. Jones sto­ ry (speeding) is not only causally connected to this accident, but also a typical cause of car accidents (speeding covaries with car accidents in general). But consider two other events or properties that do not typically covary with car accidents, such as Mr. Jones’s choice of route and the blue color of his car (neither is the type of thing that is associated with car accidents in general). However, in the particular circumstances of the story, one event (going home by an unusual route) is perceived to be causally connected to the out­ come, as participants agree that the accident would not have happened in the way it did if Mr. Jones had chosen to go home by his normal route. In contrast, the other non-covari­ Page 28 of 49

Social Attribution and Explanation ate of accidents (the color of Mr. Jones’s car) is not considered to be connected to the ac­ cident. In other words, general assumptions about covariation do not seem to determine perceptions of causal relations between particular events in a narrative. As Rehder (Capter 20 in this volume) emphasizes, causal relations in Bayes’s nets are probabilistic and uncertain in nature. But many stories are recounted as if the speaker believes that he is certain about what happened in the story and the relations between the recounted events. This, of course, follows from the fact that many stories come guar­ anteed by Grice’s (1975) assumptions about cooperativeness in conversation. So the hear­ er is entitled to expect that the maxims of quality are respected—the storyteller under­ takes only to say things that he considers to be true, or has reason to believe are true. The maxims of relevance are also important: if surprising or unexpected events (“mira­ cles” in the terminology of Lewis, 1986, 2000) that could have impacted the flow of events have not been mentioned, the listener can reasonably surmise that they did not in fact happen (Levinson, 2000). In this way, (p. 665) everyday expectations about communicative relevance will guide the construction of situation models (Graesser, Singer, & Trabasso, 1994). A final challenge for integrating research on causal models with the situation model ap­ proach concerns the nature of the events modeled. Many of the examples used in the causal model approach involve physical or biological events (e.g., diseases and symp­ toms) that have probabilistic relations (e.g., having a certain disease will increase the chances of showing certain symptoms). However, stories report not only physical events, but also events involving beliefs, intentions, and goals, and the kinds of causal relations that exist between actions and goals are not the same as those that exist between actions and outcomes. An action that is done to achieve a goal has an “in order to” relation to that goal, whereas an action that causes another action will only have a simple conse­ quence relation. For example, a request for an explanation for “why” that action was per­ formed will traverse an “in order to” arc to the higher-order goal, whereas a request to explain “how” the action was performed will traverse an arc to a lower-level action that enabled that action to be performed (Graesser, Roberts, & Anderson, 1981). Graesser and Hemphill (1991) show that these different kinds of arc support different question-answer­ ing processes in different ontological domains (e.g., physical and biological). It is hard to see how this kind of qualitative distinction in links between event nodes could emerge from the probabilistic links that connect nodes in causal Bayesian networks. An important question for future research therefore seems to be how best to merge insights from the causal model and situation model approaches.

Conversational Processes in Causal Explana­ tion Causal explanation is a speech act, whereby someone explains something to someone else. Accordingly, the proper unit of analysis for the study of causal explanation is the ques­ tion-explanation pair, as explanations are selected by questions meant to resolve puzzles Page 29 of 49

Social Attribution and Explanation (Turnbull, 1986; Turnbull & Slugoski, 1988). The thing-to-be-explained constitutes the ex­ planandum, which is resolved by an explanans that does the explaining (Harré, 1988). The explanans may range from an answer to a causal question (“Because you forgot to go to the supermarket yesterday” in response to “Why have we run out of coffee?”) to a formal hypothesis that resolves a scientific question (e.g., Galileo’s heliocentric theory to explain patterns of planetary motion). Whereas attribution is a cognitive process involving trac­ ing an object back to its source, explanation is interpersonal in nature. For example, at­ tributing the heliocentric theory of planetary motion to Galileo would not be the same thing as explaining it to him. More generally, causal explanation is a form of conversation and as such must follow the rules of conversation. Conversation is essentially cooperative (Grice, 1975), and so good causal explanations must generally satisfy four sets of maxims whose names are (with brief explanations of their import in parentheses): the maxims of quality (say what you be­ lieve to be true); the maxims of quantity (be informative); the maxims of relation (be rele­ vant); and the maxims of manner (avoid obscure expressions). Languages other than Eng­ lish explicitly mark the process of posing a question to oneself as a form of internalized conversation; thus “I wonder” in English is translated as Je me demande in French and Ich frage mich in German (literally, “I ask myself”). Whether the explanation is given in conversation to someone else, or is generated for internal consumption, a good explana­ tion must not only be true, it must be relevant to the causal question posed.

Specifying the Contrast Case: The Selection of Causes from Condi­ tions Situation models incorporate a plethora of necessary conditions that are causally con­ nected to the outcome of interest; we typically only mention one or two factors as “the” cause in answering a conversational request for an explanation. To do this we need selec­ tion criteria, and here the role of specific contrasts becomes important. Contrastive ex­ planation (e.g., Lipton, 1990; van Fraassen, 1980) takes the view that every causal ques­ tion has an implicit rather than built into it, and causal selection is thus determined by what the implicit contrast made in the causal question. Thus we do not seek to explain why an event happened per se, but we explain why this event happened rather than some other event that could have happened (or did happen). Following Hesslow (1983), Hilton (1988, 1990) proposed a typology of common types of contrast cases (see Table 32.1; see Hitchcock & Knobe, 2009, for a related analysis). One of the original aims of this typology was to show that different models of the attribution process (e.g., Hilton & Slugoski, 1986; Jones & Davis, 1965; Kelley, 1967) could be seen as (p. 666) using the same differ­ ence-based logic to answer different causal questions. A second aim was to show how this approach could be extended to understanding causal explanation using contrast cases that might never have occurred (e.g., moral, legal, or design standards; ideal goal states). Experimental work has revealed considerable support for the core proposal that the ex­ planations given in response to a causal question will depend on the presupposed con­

Page 30 of 49

Social Attribution and Explanation trast case (for relevant reviews, see Hilton, 1991, 2007, Hilton & Slugoski, 2001), as we shall see in the following. Table 32.1 A Typology of Contrast Cases, Causal Questions, and Explanations Type of Cause

Type of Contrast Case

Type of Implied Question

Millian sum of necessary conditions

Non-occurrence of effect

“Why X rather than not X?”

Abnormal condition

The normal case

“Why X rather than the default value for X?”

Differentiating factor

Non-common effect

“Why X rather than Y?”

Moral or legal fault

Prescribed or statu­

“Why X rather than what ought

tory case

to be the case?”

Ideal case

“Why X rather than the ideal value for X?”

Design fault or bug

Source: Hilton (1990). A first illustration was given by McGill (1989). She proposed that actor–observer asymme­ tries in explanation could be due to the activation of different presupposed contrast cas­ es. Thus when asked why you chose chemistry, you tend to assume that the questioner wants to know why you chose chemistry rather than some other subject. Similarly, when asked why your roommate chose chemistry, you assume that the questioner wants to know about some characteristic that differentiates your roommate from other people who did not choose chemistry. McGill (1989) therefore disambiguated the question by clarify­ ing the focus of the question in both the actor and observer questions. She therefore asked people to explain “why you/your roommate in particular chose chemistry” and “why you/your roommate chose chemistry in particular.” As predicted, the addition of the focus adjunct in particular removed the actor–observer symmetry in predictions. McGill went on to apply her analysis of contrast cases to the success/failure asymmetry in causal attribu­ tion. She suggested that a person who expects to succeed at a task and does so will at­ tribute her success to internal actors when asked to explain it, because she assumes that few other people will have succeeded. Conversely, people who expected to fail but instead succeed will attribute success to the task, because they expected to fail on tasks of this type. McGill obtained support for her analysis through manipulations of subjects’ ex­ pectancies on an anagram task about which they received false feedback concerning their success or failure. Other research has observed conceptually similar effects of implicit reference class (Majid et al., 2007) and norms (Reuter et al., 2014) on causal judgment in Page 31 of 49

Social Attribution and Explanation verbal scenarios, as well as in scenarios using visual displays of moving objects (Gersten­ berg et al., 2014). Slugoski, Lalljee, Lamb, and Ginsburg (1993) made an important demonstration that peo­ ple are sensitive to the questioner’s perspective when giving explanations (cf. Grice, 1975). They had subjects read a detailed case history about a youth who had committed a crime. The case included personality information about the youth, and situational infor­ mation about the circumstances in which the crime occurred. Subjects were then asked to explain the crime to an interlocutor whom they believed either knew a lot about the boy’s personal history, but not the circumstances of the crime, or who knew little about the boy’s personal history but knew a lot about the circumstances of the crime. Slugoski et al. (1993) found that explainers varied their explanations so as to complement the questioner’s point of view: that is, interlocutors who already had information about the boy’s personal history received explanations that put the emphasis on situational factors, and vice versa for the interlocutors who already knew of the situational circumstances surrounding the crime. Slugoski et al.’s results are consistent with the idea that the ex­ plainers are identifying the condition that is abnormal from the interlocutor’s point of view as the cause (cf. Hilton & Slugoski, 1986). Conversational constraints may also govern the formulation of conjunctive explanations due to audience effects. Kashima, McKintyre, and Clifford (1998) found that Australian participants who had to imagine that they had to explain mundane behavior (e.g., going to the Melbourne cricket ground [MCG] to watch a football match) to other Australians referred primarily to desires (e.g., (p. 667) wanting to watch the match) in their explana­ tions. However, when asked to explain the event to a tourist, they referred to both desires and beliefs “Michael knows that teams play football at the MCG.” This result suggests that participants recognized that strangers could not be expected to share the relevant beliefs and thus formulated conjunctive explanations. Kashima et al. (1998) obtained this result only with open-ended explanations, not with rating scales, indicating that only ver­ bally given explanations are sensitive to Gricean constraints regarding informativeness (for similar differences in results obtained with verbal explanations and rating proce­ dures, see Cheng & Novick, 1991; McGill, 1991).

Two Kinds of Causal Question: Explanatory Versus Attributive In­ quiries The conversational approach allows us to recast a question already posed in the previous section. For example, we have already noted that different phases of media reporting of disasters overlap with the distinct questions posed in the legal process. In particular, the different phases of search for causal understanding (how the event actually came about) and the attribution of responsibility correspond to the distinction made between explana­ tory and attributive inquiries by Hart and Honoré (1959). Explanatory inquiries are geared to understanding how the event in question (e.g., a death, a theft, a fraud) came

Page 32 of 49

Social Attribution and Explanation about, whereas attributive inquiries are geared to establishing responsibility with a view to deciding reparations and/or punishment. So the same causal question might receive quite different answers when posed in the con­ text of different inquiries due to the different contrasts implied by explanatory and at­ tributive contexts. In an autopsy, the principal aim of the causal inquiry is to establish the physical cause of death, whereas in a court of law it is to establish whether the (in)action of some human agent was at least a contributory cause, and to impose a punishment as a result. For example, if we ask “Why did Michael Jackson die?” the answer given in the context of an explanatory inquiry (the pop star’s medical autopsy in June 2009) was “acute propofol intoxication,” which explains why he was alive one day and dead the next. But when posed to a California court in 2011, the question effectively received the answer “because of the incompetence of his doctor,” when his personal physician, Dr. Conrad Murray, was sentenced to four years of imprisonment for involuntary manslaughter. This explains why Jackson died when he should have been kept alive through appropriate medical care. Note that both explanations can claim to be true if we believe that Jackson would not have died if he had not been injected with excessive propofol or if he had had a more compe­ tent doctor, but they are relevant to different causal questions. Samland and Waldmann (2014, Experiment 2) provide an experimental demonstration of the importance of attending to the exact meaning of a causal question. These researchers adapted the pen scenario of Knobe and Fraser (2008) in which a philosophy department runs out of pens after an administrative assistant (who has the right to take pens) and a professor (who does not) both take pens from the departmental stock, causing the depart­ ment to run out of pens. They suggest that participants’ tendency to consider the profes­ sor more as “the cause” of the lack of pens is due to their interpreting the question “How strongly did the professor/the administrative assistant cause the department to run out of pens?” as a query about institutional responsibility and blame. In contrast, if participants interpreted the question as a query about the degree of causal connectedness between the protagonists’ actions and the department running out of pens, then this should be re­ flected in standard measures of causal strength (e.g., the probability of the outcome oc­ curring in the presence vs. absence of a protagonist’s action). However, they observed no difference between the protagonists’ actions on the causal strength measure. Taken together with the finding that probabilistic measures of causal strength fail to pre­ dict the preference to judge voluntary actions as causes (Hilton, McClure, & Moir, 2016; Hilton, McClure, & Sutton, 2010; Lagnado & Channon, 2008; McClure, Hilton, & Sutton, 2007), Samland and Waldmann’s results suggest that in certain contexts, the word “cause” is taken to mean something like “morally or institutionally responsible.” This pos­ sibility is confirmed by the high correlations observed in some studies in ratings of cause, responsibility, and blame (e.g., Fincham & Bradbury, 1987; Fincham & Jaspars, 1983; Lagnado & Channon, 2008). Given that “responsibility” can be given at least four differ­ ent interpretations in English (Hart, 1968), future research will do well to make clear Page 33 of 49

Social Attribution and Explanation which meaning of cause, responsibility, and blame is at issue in a given situation. Causal questions (cause, responsibility vs. blame; why vs. how; but-for vs. if-only counterfactuals, etc.) can probe different aspects of a situation model, and a major challenge for future re­ search is to specify the processes by (p. 668) which people can determine the intended meaning of the question at hand (for a start, see Hilton, McClure, & Moir, 2016).

Causal Questions and the Explanatory Relevance of Goals and Pre­ conditions As noted earlier, certain features of events favor the selection of causes from conditions, such as abnormality (Hart & Honoré, 1959; Hilton & Slugoski, 1986; Hitchcock & Knobe, 2009; McClure & Hilton, 1997); intentionality (Hart & Honoré, 1959; Hilton, McClure, & Sutton, 2010; Lagnado & Channon, 2008; McClure, Hilton, & Sutton, 2007) and covaria­ tion with the kind of outcome in general (Mandel, 2010). In addition, certain kinds of causal questions will focus on different aspects of a planned action sequence. McClure et al. (2001) and Malle (1999) found that actions that are normally difficult for the actor to perform, such as passing a difficult exam or a poor man taking a trip around the world, provoke “how come” questions that are most frequently answered by explanations that refer to preconditions (e.g., “He’s a stats whiz,” “He won the lottery”). On the other hand, “why” questions about routine actions tended to elicit explanations that focused on goals, while questions about “what caused” routine actions focused about equally on goals and preconditions (McClure & Hilton, 1998). Each kind of question set up different expecta­ tions about relevance: thus while goals were perceived as most informative and relevant to “why” questions, both preconditions and goals were perceived as equally informative and relevant responses to “what caused” questions. When preconditions become abnor­ mal (e.g., food being available in Ethiopia), then preconditions are favored as explana­ tions (McClure & Hilton, 1997). Studies using both verbal and visual stimuli show that “how” questions solicit the mirrorneurone system (MNS) associated with action understanding, whereas “why” questions solicit the mentalizing system associated with inference of higher-order goals (Spunt, Falk, & Lieberman, 2010; Spunt, Satpute, & Lieberman, 2011). The dissociation of “why” and “how” questions has been implemented in artificial intelligence programs for lan­ guage understanding (e.g., Lehnert, 1978; Schank & Abelson, 1977) and has been empiri­ cally validated in question-answering studies by Graesser and colleagues (e.g. Graesser, Lang, & Roberts, 1991). They showed that mental representations of narratives that dis­ tinguish different kinds of causal connection (e.g., between mental states and actions vs. between actions and physical outcomes) interact with causal question type (why, how, en­ ables, consequence) to constrain appropriate answers. For example, Graesser, Robertson, and Anderson (1981) showed that “why” questions are typically answered by going up a goal hierarchy from an action (e.g. “Why did the prince get a horse?”) to a goal (“Because he wanted to get to the castle”), whereas “how” questions typically involved descending from a goal (“How did he get to the castle?”) to an action that forms part of a plan to at­ tain the goal (“By getting a horse”). Such goal hierarchies can also explain different pat­ terns of question-answering in scientific domains such as biology (where teleological ex­ Page 34 of 49

Social Attribution and Explanation planations are relevant) and physics (where they are not; Graesser & Hemphill, 1991; see also Lombrozo, 2010).

Specifying the Event-to-Be-Changed and Counterfactual Questions Whereas everyday causal explanation focuses on “the effect as it came about” (Mackie, 1974, p. 47, italics in original), everyday counterfactual reasoning is not so constrained. Whereas causal explanation focuses on what actually happened, counterfactual mutations may be particular (e.g., mentally undoing the particular process by which an undesired outcome came about) or more general (e.g., mentally avoiding the undesired outcome by undoing all plausible processes that could have brought it about). For example, consider some arguments about whether Michael Jackson’s doctor was responsible for his death. In evaluating the specific proposition that Dr. Murray’s administration of propofol was a cause of Jackson’s death, we would no doubt agree with this if we consider that Jackson would not have died that day if propofol had not been given. But what if we had to evalu­ ate the more general proposition that the cause of Michael Jackson’s premature death was medical misadventure? If Jackson was already a “dead walking man,” in the words of one participant in a CNN discussion forum concerning the responsibility for his death, he would have died prematurely sooner or later. One contributor to the forum indeed sought to absolve Michael Jackson’s doctor of responsibility by focusing on this more general event and reasoning that it would have happened anyway, arguing that Dr. Murray was not responsible because “[i]f this doctor refused Michael’s demands, then Michael would have simply fired him and found another who would. The outcome would have eventually been the same” (italics added). We can see the relevance of the specificity of the description of the counterfactual contrast case when analyzing causal pre-emption scenarios. These may be viewed as spe­ cial cases of competing situation models, where the listener has a representation of two active causal processes that will result in the same goal (e.g., two assassin plots that share the aim of killing the same well-known gangster, Mr. Wallace). Here, results show that a first candidate cause (e.g., putting poison in a gangster’s coffee) is less likely to be judged as the cause of the event in question (the gangster’s death) if it has been preempted by a second candidate cause that actually results in the gangster’s death. Note that the first action (putting the poison in the gangster’s coffee) is eliminated as cause (p. 669)

even though it has significantly raised the probability of the gangster’s death (Leal, 2013; Mandel, 2003). However, what needs explanation is the gangster’s death as it came about, as the gangster’s death came about through burns sustained in a car accident, not by poisoning. Raising the probability of an outcome may be a valid criterion for selecting causes from conditions within a situation model (e.g., Mandel, 2010; Spellman, 1997), as when danger­ ous driving is privileged as “the” cause in the Mr. Jones scenario reviewed earlier. But this criterion cannot be used if there is no actual causal connection between the action and the outcome. The actus reus (the action that actually did produce Mr. Wallace’s death as it came about) is the intervention of the second assassin who knocked the gangster’s Page 35 of 49

Social Attribution and Explanation car off the road, with the consequence that only he will be held responsible or blamed for Mr. Wallace’s death (Hilton & Schmeltzer, 2015). This analysis suggests that the paradox for the counterfactual analysis of causation identified by Lewis (1973, 2000)—namely, that we treat the second action as the cause even though the gangster would have died any­ way due to the first action—may simply be due to the fact that in such causal pre-emption scenarios, the explanandum (the event-to-be-explained) is more specific than the mutan­ dum (the-event-to-be-changed). When the object of the counterfactual question is made specific (e.g., the gangster’s death due to burns sustained in the road accident), then par­ ticipants indeed undo the actual cause, as predicted by the counterfactual analysis of cau­ sation. When the object of the counterfactual question is left open (e.g., the gangster’s death), participants are more likely to undo his death altogether, by undoing both assassination attempts (Hilton & Schmeltzer, 2015; Leal, 2013). It seems likely, then, that associations between causal and counterfactual judgment are likely to be found when both the explanandum and the mutandum are both at the same level of specificity (i.e., both specific, or both general).

Conclusions In the first section of this chapter, we examined the cognitive psychology of social ob­ jects. This part developed key ideas of Heider (1958), such as the perception of human ac­ tion as equifinal, and the causal theory of perception (the attribution of the variable, con­ stantly changing visual field to stable underlying dispositions). It suggests that the per­ ception of equifinality is hardwired in the human visual system, which responds to pat­ terns of movement as indicating intentional agents. While we saw that patterns may acti­ vate the mirror neurone system (MNS), the perception of intentional activity as serving higher-order plans and goals may recruit the mentalizing or theory-of-mind (ToM) system. While the mirror system deals with lower-level sensory input, the mentalizing system can deal with verbal material, and seems to deal with higher-level inferences such as trait at­ tributions. We then saw how Mill’s method of difference can be used to infer which goal motivated a behavior (thus clarifying its characterization). It can also be used to infer whether the cause of the behavior is internal to the agent or external (e.g., something to do with the kind of situation or specific occasion), or (in conjunction with the method of agreement) to infer whether the behavior indicates an underlying disposition in the agent or situation. We then reviewed work on the automatic activation of causal schemata (in the form of assumed patterns of covariation) by various forms of social knowledge and be­ liefs about verbs, the self, stereotypes, and situational scripts, and showed that these can indeed influence causal attributions. While recognizing that the automatic and motivated activation of these beliefs often leads to biases in social judgment, we noted ways in which these biases can be attenuated. In the second section, we saw how work inspired by Schank and Abelson’s (1977) analysis of explanation-based understanding processes has demonstrated that people represent the goal-directed nature of actions and outcomes in understanding narratives. People au­ tomatically interpret stories about human action in terms of goal hierarchies, such that Page 36 of 49

Social Attribution and Explanation the stories often used to study causal judgment processes are already impregnated with causal understandings. We considered how complex explanations are generated, and saw how online (p. 670) counterfactual reasoning results in the construction of causal chains that form the vertebrae of the situation models used to represent such narratives. Recog­ nition of the causal nature of these complex discourse representations can help clarify current debates (e.g., about the relation between causal and counterfactual reasoning) and help distinguish between the questions of causal connection and causal selection. One important distinction we drew was between choosing which of two (or more) causal scenarios is most likely to be true (leading the others to be discounted) and—when a sce­ nario has been accepted—choosing which part of that scenario is most relevant to men­ tion as “the” cause of an event in question (causal selection, leading the other parts of the scenario to be backgrounded). In the third section, we took the perspective of the social psychology of the higher mental processes. The very complexity of these causal chains studied in the previous section rais­ es the problem of causal selection in everyday explanation, as a plethora of necessary conditions can be cited as a “cause” of any given effect. Selection of “the” cause that should be mentioned in a causal explanation seems often to depend not only on the na­ ture of the event itself (e.g., its abnormality, goal-directed nature, or ability to make the outcome in question predictable), but also on general question-answering processes, as well as the specific focus of the causal probe used (e.g., Why vs. what caused vs. how questions for causal explanations, But-for vs. if-only questions for counterfactuals). Final­ ly, recognition that human causal and counterfactual reasoning seems to be sensitive to the way an event is described underlines the importance of understanding the role of sub­ tle linguistic factors in understanding causal judgment.

Acknowledgments I thank David Lagnado, John McClure, David Mandel, Michael Waldmann, and Claire Wallez for helpful comments on a previous draft.

References Alicke, M. D. (1991). Culpable causation. Journal of Personality and Social Psychology, 63, 368–378. Alicke, M. D. (2000). Culpable control and the psychology of blame. Psychological Bulletin, 126, 556–574. Alicke, M. D., Mandel, D. R, Hilton, D., Gerstenberg, T. & Lagnado, D. A. (2015). Causal conceptions in social explanation and moral evaluation: A historical tour. Perspectives in Psychological Science, 10(6), 790–812. Allport, G. W., & Postman, L. (1947). The basic psychology of rumor. Transactions of the New York Academy of Sciences, Series II, VIII, 61–81. Page 37 of 49

Social Attribution and Explanation Bennett, W. L. (1992). Legal fictions: Telling stories and doing justice. In M. McLaughlin, M. Cody, & S. Read (Eds.), Explaining one’s self to others: Reason-giving in a social con­ text. Mahwah, NJ: Lawrence Erlbaum Associates. Bierbrauer, G. (1979). Why did he do it? Attribution of obedience and the phenomenon of dispositional bias. European Journal of Social Psychology, 9(1), 67–84. Blakemore, S.-J., Fonlupt, P., Pachot-Clouard, M., Darmon, C., Boyer, P., Meltzoff, A.N., Segebarth, S. & Decety, J. (2001). How the brain perceives causality: an event-related fM­ RI study. Neuroreport, 12, 3741–3746. Bodenhausen, G. V. (1988). Stereotypic biases in social decision making and memory: Testing process models of stereotype use. Journal of Personality and Social Psychology, 55(5), 726. Bodenhausen, G. V. (1990). Stereotypes as judgmental heuristics: Evidence of circadian variations in discrimination. Psychological Science, 1(5), 319–322. Bohner, G., Bless, H., Schwarz, N. L., & Strack, F. (1988). What triggers causal attribu­ tions? The impact of valence and subjective probability. European Journal of Social Psy­ chology, 18, 335–345. (p. 671) Brown, R., & Fish, D. (1983). The psychological causality implicit in language. Cognition, 14, 237–273. Burguet, A., & Hilton, D. (2004). Effets de contexte sur l’explication causale. In M. Bromberg & A. Trognon (Eds.), Psychologie sociale de la communication (pp. 219–228). Paris: Dunod. Buss, A. R. (1978). Causes and reasons in attribution theory: A conceptual critique. Jour­ nal of Personality and Social Psychology, 36(11), 1311. Callan, M. J., Sutton, R. M., Harvey, A. J., & Dawtry, R. J. (2014). Immanent justice reason­ ing: Theory, research, and current directions. Advances in Experimental Social Psycholo­ gy, 49, 105–161. Cheng, P. W., & Novick, L. R. (1990). A probabilistic contrast model of causal induction. Journal of Personality and Social Psychology, 58, 545–567. Cheng, P. W., & Novick, L. R. (1991). Causes versus enabling conditions. Cognition, 40, 83–120. Colby, K. M. (1973). Simulations of belief systems. In R. C. Schank & K. Colby (Eds.), Computer models of thought and language. San Francisco: W. H. Freeman. Darwin, C. (1879/2004). The descent of man. London: Penguin Books. Dodd, D. H., & Bradshaw, J. M. (1980). Leading questions and memory: Pragmatic con­ straints. Journal of Verbal Learning and Verbal Behavior, 19(6), 695–704. Page 38 of 49

Social Attribution and Explanation Duncan, B. L. (1976). Differential social perception and attribution of intergroup violence: Testing the lower limits of stereotyping of Blacks. Journal of Personality and Social Psy­ chology, 34(4), 590. Evans-Pritchard, E. E. (1965). Theories of primitive religion. Oxford: Clarendon Press. Fincham, F. D., & Bradbury, T. N. (1987). The impact of attributions in marriage: a longi­ tudinal analysis. Journal of Personality and Social Psychology, 53(3), 510. Fincham, F. D., & Jaspars, J. M. F. (1983). A subjective probability approach to responsibil­ ity attribution. British Journal of Social Psychology, 22, 145–162. Fischhoff, B. (1982). For those condemned to study the past: Heuristics and biases in hindsight. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncertainty: Heuristics and biases (pp. 335–351). Cambridge: Cambridge University Press. Försterling, F. (1989). Models of covariation and attribution: How do they relate to the analysis of variance? Journal of Personality and Social Psychology, 57, 615–626. Gärdenfors, P. (2003). How homo became sapiens: On the evolution of thinking. Oxford: Oxford University Press. Gerstenberg, T., Goodman, N. D., Lagnado, D. A., & Tenenbaum, J. B. (2014). From coun­ terfactual simulation to causal judgment. In P. Bello, M. Guarini, M. McShane, & B. Scas­ sellati (Eds.), Proceedings of the 36th annual conference of the Cognitive Science Society (pp. 523–528). Austin, TX: Cognitive Science Society. Girotto, V., Legrenzi, P., & Rizzo, A. (1991). Event controllability in counterfactual think­ ing. Acta Psychologica, 78(1), 111–133. Goldberg, J. H., Lerner, J. S., & Tetlock, P. E. (1999). Rage and reason: The psychology of the intuitive prosecutor. European Journal of Social Psychology, 29(56), 781–795. Graesser, A. C., & Hemphill, D. (1991). Question answering in the context of scientific mechanisms. Journal of Memory and Language, 30(2), 186–209. Graesser, A. C., Lang, K. L., & Roberts, R. M. (1991). Question answering in the context of stories. Journal of Experimental Psychology: General, 120(3), 254. Graesser, A. C., Robertson, S. P., and Anderson, P. A. (1981). Incorporating inferences in narrative representations: A study of how and why. Cognitive Psychology, 13, 1–26. Graesser, A. C., Singer, M., & Trabasso, T. (1994). Constructing inferences during narra­ tive text comprehension. Psychological Review, 101(3), 371. Greenwald, A. G., Poehlman, T. A., Uhlmann, E. L., & Banaji, M. R. (2009). Understanding and using the Implicit Association Test: III. Meta-analysis of predictive validity. Journal of Personality and Social Psychology, 97(1), 17. Page 39 of 49

Social Attribution and Explanation Grice, H. P. (1957). Meaning. The Philosophical Review, 66, 377–388. Grice, H. P. (1975). Logic and conversation. In P. Cole & J. L. Morgan (Eds.), Syntax and semantics 3: Speech acts (pp. 41–58). New York: John Wiley & Sons. Hamilton, A. F. D. C., & Grafton, S. T. (2006). Goal representation in human anterior intra­ parietal sulcus. The Journal of Neuroscience, 26(4), 1133–1137. Harré, R. (1988). Modes of explanation. In D. J. Hilton (Ed.), Contemporary science and natural explanation: Commonsense conceptions of causality (pp. 129–146). Brighton: Har­ vester Press. Hart, H. L. A., & Honoré, A. M. (1985). Causation in the law (2nd ed.). Oxford: Oxford University Press. Hassin, R. R., Aarts, H., & Ferguson, M. J. (2005). Automatic goal inferences. Journal of Experimental Social Psychology, 41, 129–140. doi:10.1016/j.jesp.2004.06.008 Hastie, R. (2016). Causal thinking in judgments. The Wiley Blackwell Handbook of Judg­ ment and Decision Making, 590–628. Hastie, R., & Pennington, N. (2000). Explanation-based decision-making. In T. Connolly, H. Arkes, & K. Hammond (Eds.), Judgment and decision-making: A reader (2nd ed.; pp. 212– 228). Cambridge: Cambridge University Press. Heider, F. (1958). The psychology of interpersonal relations. New York: John Wiley & Sons. Heider, F., & Simmel, M. (1944). An experimental study of apparent behavior. American Journal of Psychology, 57, 243–259. Hesslow, G. (1983). Explaining differences and weighting causes. Theoria, 49, 87–111. Hesslow, G. (1988). The problem of causal selection. In D. J. Hilton (Ed.), Contemporary science and natural explanation: Commonsense conceptions of causality (pp. 11–32). Brighton: Harvester Press. Hilton, D. J. (1988). Logic and causal attribution. In D. J. Hilton (Ed.), Contemporary sci­ ence and natural explanation: Commonsense conceptions of causality (pp. 33–65). Brighton: Harvester Press. Hilton, D. J. (1990). Conversational processes and causal explanation. Psychological Bul­ letin, 107, 65–81. Hilton, D. J. (1991). A conversational model of causal explanation. In W. Stroebe & M. Hewstone (Eds.), European Review of Social Psychology, 2, 61–91.

Page 40 of 49

Social Attribution and Explanation Hilton, D. J. (2001). Norms in commonsense explanation: Types of contrast and abnormal conditions. In P. Demeulenaere, R. Boudon, & R. Viale (Eds.), L’explication des normes so­ ciales: Rationalité et cognition (pp. 205–214). Paris: Presses Universitaires de France. (p. 672)

Hilton, D. J. (2002). Thinking about causality: Pragmatic, social and scientific rationality. In P. Carruthers, S. Stich, & M. Siegal (Eds.), The cognitive bases of science (pp. 211– 231). Cambridge: Cambridge University Press. Hilton, D. (2007). Causal explanation. In A. Kruglanski & E. T. Higgins (Eds.), Social psy­ chology: Handbook of basic principles (pp. 232–253). New York: Guilford Press. Hilton, D. (2012). The emergence of cognitive social psychology: A historical analysis. In W. Stroebe & A. Kruglanski (Eds.), Handbook of the history of social psychology (pp. 45– 79). New York: Psychology Press. Hilton, D. J., & Erb, H.-P. (1996). Mental models and causal explanation: Judgments of probable cause and explanatory relevance. Thinking and Reasoning, 2, 33–65. Hilton, D. J., McClure, J. L., & Slugoski, B. R. (2005). The course of events: Counterfactu­ als, causal sequences and explanation. In D. Mandel, D. J. Hilton, & P. Catellani (Eds.), The psychology of counterfactual thinking (pp. 44–73). London: Psychology Press. Hilton, D. J., McClure, J., & Sutton, R. M. (2010). Selecting explanations from causal chains: Do statistical principles explain preferences for voluntary causes. European Jour­ nal of Social Psychology, 40, 383–400. Hilton, D. J., Mathes, R. M., & Trabasso, T. R. (1992). The study of causal explanation in natural language: Analysing reports of the Challenger disaster in the New York Times. In M. McLaughlin, S. Cody, and S. J. Read (Eds.), Explaining one’s self to others: Reason-giv­ ing in a social context. Hillsdale, NJ: Lawrence Erlbaum Associates. Hilton, D. J., McClure, J. L., & Moir, B. (2016). Acting knowingly: Effects of the agent’s awareness of an opportunity on causal attributions. Thinking and Reasoning, 22, 461–494. Hilton, D. J., & Schmeltzer, C. (2015). A question of detail: Matching counterfactuals to judgments of actual cause. Unpublished manuscript. Hilton, D. J., Schmeltzer, C., Wallez, C., & Mandel, D. R. (2016). Do causal attributions de­ pend on counterfactuals? Distinguishing but-for from if-only tests. Paper presented at the EASP Small Group meeting, Counterfactual thinking: Functions, causes, emotions. Aix-enProvence, June 1–4. Hilton, D. J., & Slugoski, B. R. (1986). Knowledge-based causal attribution: The abnormal conditions focus model. Psychological Review, 93, 75–88.

Page 41 of 49

Social Attribution and Explanation Hilton, D. J., & Slugoski, B. R. (2001). The conversational perspective in reasoning and ex­ planation. In A. Tesser & N. Schwarz (Eds.), Blackwell handbook of social psychology, Vol 1: Intrapersonal processes (pp. 181–206). Oxford: Blackwell. Hilton, D. J., Smith, R. H. & Kim, S.-H. (1995). The processes of causal explanation and dispositional attribution. Journal of Personality and Social Psychology, 68, 377–387. Hitchcock, C., & Knobe, J. (2009). Cause and norm. The Journal of Philosophy, 106(11), 587–612. Jones, E. E., & Davis, K. E. (1965). From acts to dispositions: The attribution process in person perception. In L. Berkowitz (Ed.), Advances in Experimental Social Psychology, 63, 220–266. Kahneman, D. (1995) Varieties of counterfactual thinking. In N. J. Roese & J. M. Olson (Eds.), What might have been: The social psychology of counterfactual thinking (pp. 375– 396). Mahwah, NJ: Lawrence Erlbaum Associates. Kahneman, D., & Tversky, A. (1982): The simulation heuristic. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncertainty: Heuristics and biases (pp. 201–208). New York: Cambridge University Press. Kanazawa, S. (1992). Outcome of expectancy? Antecedent of spontaneous causal attribu­ tion. Personality and Social Psychology Bulletin, 18, 659–668. Kashima,Y., McKintyre, A., & Clifford, P. (1998). The category of the mind: Folk psycholo­ gy of belief, desire, and intention. Asian Journal of Social Psychology, 1, 289–313. Kelley, H. H. (1967). Attribution in social psychology. Nebraska Symposium on Motivation, 15, 192–238. Kelley, H. H. (1973). The process of causal attribution. American Psychologist, 28, 107– 128. Khan, S. S., & Liu, J. H. (2008). Intergroup attributions and ethnocentrism in the Indian subcontinent: The ultimate attribution error revisited. Journal of Cross-Cultural Psycholo­ gy, 39(1), 16–36. Knobe, J., & Fraser, B. (2008). Causal judgment and moral judgment: Two experiments. In W. Sinnott-Armstrong (Ed.), Moral psychology (pp. 441–447). Cambridge, MA: MIT Press. Krull, D. S. (1993). Does the grist change the mill? The effect of the perceiver’s inferential goal on the process of social inference. Personality and Social Psychology Bulletin, 19, 340–348. Krull, D. S., & Erickson, D. J. (1995). Judging situations: On the effortful process of taking dispositional information into account. Social Cognition, 13, 417–438.

Page 42 of 49

Social Attribution and Explanation Lagnado, D. A., & Channon, S. (2008). Judgments of cause and blame: The effects of in­ tentionality and foreseeability. Cognition, 108, 754–770. Lalljee, M., Brown, L. B., & Hilton, D. (1990). The relationships between images of God, explanations for failure to do one’s duty to God, and invoking God’s agency. Journal of Psychology and Theology, 18, 166–173. Leal, W. J. (2013). Causal selection and counterfactual reasoning. Revista Colombiana de Psicologia, 22, 179–197. Lehnert, W. G. (1978). The process of question answering: A computer simulation of cog­ nition. Mahwah, NJ: Lawrence Erlbaum Associates. Levinson, S. C. (2000). Presumptive meanings: The theory of generalized conversational implicature. Cambridge, MA: MIT Press. Lewis, D. (1973). Causation. The Journal of Philosophy, 70, 556–567. Lewis, D. (2000). Causation as influence. The Journal of Philosophy, 97, 182–197. Lipton, P. (1990). Contrastive explanation. In D. Knowles (Ed.), Explanation and its limits (pp. 247–266). Cambridge: Cambridge University Press. Loftus, E. F., & Palmer, J. C. (1974). Reconstruction of automobile destruction: An exam­ ple of the interaction between language and memory. Journal of Verbal Learning and Ver­ bal Behavior, 13(5), 585–589. Lombrozo, T. (2010). Causal explanatory pluralism: How intentions, functions, and mecha­ nisms influence causal ascriptions. Cognitive Psychology, 61, 303–332. Majid, A., Sanford, A. J., & Pickering, M. J. (2007). The linguistic description of minimal social scenarios affects the extent of causal inference making. Journal of Experimental So­ cial Psychology, 43(6), 918–932. McArthur, L. A. (1972). The how and what of why: Some determinants and consequences of causal attributions. Journal of Personality and Social Psychology, 22, 171–193. McCauley, C., & Stitt, C. L. (1978). An individual and quantitative measure of stereotypes. Journal of Personality and Social Psychology, 36, 929–940.

(p. 673)

McCauley, C., Stitt, C. L., & Segal, M. (1980). Stereotyping: From prejudice to prediction. Psychological Bulletin, 87, 195–208. McClure, J. L. (1998). Discounting causes of behavior: Are two reasons better than one? Journal of Personality and Social Psychology, 47, 7–20. McClure, J. L., & Hilton, D. J. (1997). For you can’t always get what you want: When pre­ conditions are better explanations than goals. British Journal of Social Psychology, 36, 223–240. Page 43 of 49

Social Attribution and Explanation McClure, J. L., & Hilton, D. J. (1998). Are goals or preconditions better explanations? It depends on the question. European Journal of Social Psychology, 28, 897–911. McClure, J. L., Hilton, D. J., Cowan, J., Ishida, L., & Wilson, M. (2001). When rich or poor people buy expensive objects is the causal question how or why? Journal of Language and Social Psychology, 20, 339–357. McClure, J., Hilton, D. J., & Sutton, R. M. (2007). Judgments of voluntary and physical causes in causal chains: Probabilistic and social functionalist criteria for attributions. Eu­ ropean Journal of Social Psychology, 37, 879–901. McGill, A. L. (1989). Context effects on causal judgment. Journal of Personality and Social Psychology, 57, 189–200. McGill, A. L. (1991). Conjunction effects: Accounting for events that differ from several norms. Journal of Experimental Social Psychology, 27, 527–549. Mackie, J. L. (1974). The cement of the universe (2nd ed.). London: Oxford University Press. Malle, B. F. (1999). How people explain behavior: A new theoretical framework. Personali­ ty and Social Psychology Review, 3, 23–48. Malle, B. F. (2004). How the mind explains behavior: Folk explorations, meaning and so­ cial interaction. Cambridge, MA: MIT Press. Malle, B. F. (2011). Time to give up the dogmas of attribution: An alternative theory of be­ havior explanation. Advances in Experimental Social Psychology, 44(1), 297–311. Malle, B. F., & Holbrook, J. (2012). Is there a hierarchy of social inferences? The likeli­ hood and speed of inferring intentionality, mind, and personality. Journal of Personality and Social Psychology, 102(4), 661. Mandel, D. R. (2003). Judgment dissociation theory: An analysis of differences in causal, counterfactual, and covariational reasoning. Journal of Experimental Psychology: General, 132, 419–434. Mandel, D. R. (2010). Predicting blame assignment in a case of negligent harm. Mind & Society, 9(1), 5–17. Mandel, D. R. (2011). Mental simulation and the nexus of causal and counterfactual ex­ planation. In C. Hoerl, T. McCormack, & S. Beck (Eds.), Understanding counterfactuals, understanding causation: Issues in philosophy and psychology (pp. 147–170). Oxford: Ox­ ford University Press. Mandel, D. R., & Lehman, D. R. (1996). Counterfactual thinking and ascriptions of cause and preventability. Journal of Personality and Social Psychology, 71, 450–463. Milgram, S. (1974). Obedience to authority: An experimental view. London: Tavistock. Page 44 of 49

Social Attribution and Explanation Mill, J. S. (1872/1973). System of logic. In J. M. Robson (Ed.), Collected works of John Stu­ art Mill (8th ed., Vols. 7 and 8). Toronto: University of Toronto Press. Miller, J. G. (1984). Culture and the development of everyday social explanation. Journal of Personality and Social Psychology, 46(5), 961. Morris, M. W., & Larrick, R. (1995).When one cause casts doubt on another: A normative analysis of discounting in causal attribution. Psychological Review, 102, 331–355. Morris, M. W., Nisbett, R. E., & Peng, K. (1995). Causal attribution across domains and cultures. In D. Sperber, D. Premack, & A. J. Premack (Eds.), Causal cognition: A multidis­ ciplinary debate (pp. 577–612). Oxford: Clarendon Press. N’gbala, A., & Branscombe, N. R. (1995) Mental simulation and causal attribution: When simulating an event does not affect fault assignment. Journal of Experimental Social Psy­ chology, 31, 139–162. Nisbett, R. E., & Ross, L. (1980). Human inference: Strategies and shortcomings of social judgment. Englewood Cliffs, NJ: Prentice-Hall. Oatley, K., & Yuill, N. (1985). Perception of personal and interpersonal action in a cartoon film. British Journal of Social Psychology, 24(2), 115–124. Oestermeier, U., & Hesse, F. (2001). Singular and general causal arguments. In J. D. Moore & K. Stenning (Eds.), Proceedings of the 23rd annual conference of the Cognitive Science Society (pp. 720–725). Peng, K., & Knowles, E. D. (2004). Culture, education and the attribution of physical causality. Personality and Social Psychology Bulletin, 29, 1272–1284. Piaget, J. (1954). The construction of reality in the child (Vol. 82). London: Routledge. Premack, D., & Premack, A. (1995). Intention as a psychological cause. In D. Sperber, D. Premack, & A. J. Premack (Eds.), Causal cognition: A multidisciplinary debate (pp. 185– 199). Oxford: Clarendon Press. Reeder, G. D., & Brewer, M. B. (1979). A schematic model of dispositional attribution in interpersonal perception. Psychological Review, 86(1), 61–79. Reuter, K., Kirfel, L., Van Riel, R., & Barlassina, L. (2014). The good, the bad, and the timely: How temporal order and moral judgment influence causal selection. Frontiers in Psychology, 5, 1–10. Samland, J., & Waldmann, M. R. (2014). Do social norms influence causal inferences. In P. Bello, M. Guarini, M. McShane, & B. Scassellati (Eds.), Proceedings of the 36th Annual Conference of the Cognitive Science Society (pp. 1359–1364). Austin, TX: Cognitive Science Society.

Page 45 of 49

Social Attribution and Explanation Schank, R. C., & Abelson, R. P. (1977). Scripts, plans, goals and understanding: An en­ quiry into human knowledge structures. Hillsdale, NJ: Lawrence Erlbaum Associates. Sedlmeier, P., & Hilton, D. (2012). Improving judgment and decision making through com­ munication: The role of conversational rules and representational formats. In M. Dhami, A. Schlottmann, & M. Waldmann (Eds.), Origins of judgment and decision-making (pp. 229–257). Cambridge: Cambridge University Press. Slugoski, B. R., Lalljee, M., Lamb, R., & Ginsburg, G. P. (1993). Attribution in conversa­ tional context: Effect of mutual knowledge on explanation‐giving. European Journal of So­ cial Psychology, 23(3), 219–238. Smith, E. R., & Miller, F. D. (1983). Mediation among attributional inferences and compre­ hension processes: Initial findings and a general model. Journal of Personality and Social Psychology, 44, 492–505. Spellman, B. (1997) Crediting causality. Journal of Experimental Psychology: General, 126, 323–348. Spunt, R. P., Falk, E. B., & Lieberman, M. D. (2010). Dissociable neural systems support retrieval of how and why action knowledge. Psychological Science, 21, 1593–1598. Spunt, R. P., & Lieberman, M. D. (2013). The busy social brain evidence for automaticity and control in the neural systems (p. 674) supporting social cognition and action under­ standing. Psychological Science, 24(1), 80–86. Spunt, R. P., Satpute, A. B., & Lieberman, M. D. (2011). Identifying the what, why, and how of an observed action: An fMRI study of mentalizing and mechanizing during action observation. Journal of Cognitive Neuroscience, 23(1), 63–74. Stewart, R. H. (1965). Effect of continuous responding on the order effect in personality impression formation. Journal of Personality and Social Psychology, 1(2), 161. Taleb, N. N. (2001). Fooled by randomness: The hidden role of chance in the markets and in life. New York: Texere. Taylor, D. M., & Jaggi, V. (1974). Ethnocentrism and causal attribution in a South Indian context. Journal of Cross-cultural Psychology, 5(2), 162–171. Tetlock, P. E. (1985). Accountability: The neglected social context of judgment and choice. Research in Organizational Behavior, 7(1), 297–332. Tetlock, P. E. (1992). The impact of accountability on judgment and choice: Toward a so­ cial contingency model. In M. P. Zanna (Ed.), Advances in Experimental Social Psychology, 25, 331–376. Tetlock, P. E. (2002) Social functionalist frameworks for judgment and choice: Intuitive politicians, theologians and prosecutors. Psychological Review, 109, 451–471. Page 46 of 49

Social Attribution and Explanation Trabasso, T., & Bartolone, J. (2003). Story understanding and counterfactual reasoning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29(5), 904. Trabasso, T., Secco, T., & van den Broek, P. (1984). Causal cohesion and story coherence. In H. Mandl, N. L. Stein, & T. Trabasso (Eds.), Learning and the comprehension of dis­ course. Hillsdale, NJ: Lawrence Erlbaum Associates. Trabasso, T., & Sperry, L. L. (1985). The causal basis for deciding importance of story events. Journal of Memory and Language, 24, 595–611. Trabasso, T., & van den Broek, P. (1985). Causal thinking and story comprehension. Jour­ nal of Memory and Language, 24, 612–630. Trabasso, T., van den Broek, P., & Suh, S. Y. (1989). Logical necessity and transitivity of causal relations in stories. Discourse Processes, 12(1), 1–25. Trope, Y. (1986). Identification and inferential processes in dispositional attribution. Psy­ chological Review, 93, 239–257. Trope, Y. & Alfieri, T. (1997). Effortfulness and flexibility in dispositional judgment processes. Journal of Personality and Social Psychology, 73, 662–674. Trope, Y., & Gaunt, R. (1999). A dual-process model of overconfident attributional infer­ ences. In S. Chaiken and Y. Trope (Eds.), Dual process theories in social psychology (pp. 161–178). New York: Guilford Press. Turnbull, W. (1986). Everyday explanation: The pragmatics of puzzle resolution. Journal for the Theory of Social Behaviour, 16(2), 141–160. Turnbull, W., & Slugoski, B. (1988). Conversational and linguistic processes in causal at­ tribution. In D. J. Hilton (Ed.), Contemporary science and natural explanation: Common­ sense conceptions of causality (pp. 66–93). Brighton: Harvester Press. Uleman, J. S. (1999). Spontaneous versus intentional inferences in impression formation. In S. Chaiken & Y. Trope (Eds.), Dual-process theories in social psychology (pp. 141–160). New York: Guilford Press. Uttich, K., & Lombrozo, T. (2010). Norms inform mental state ascriptions: A rational ex­ planation for the side-effect effect. Cognition, 116(1), 87–100. Van Fraassen, B. C. (1980). The scientific image. Oxford: Oxford University Press. Van Overwalle, F. (1997). A test of the joint model of causal attribution. European Journal of Social Psychology, 27, 221–236. Van Overwalle, F. (2009). Social cognition and the brain: A meta‐analysis. Human Brain Mapping, 30(3), 829–858.

Page 47 of 49

Social Attribution and Explanation Van Overwalle, F., & Baetens, K. (2009). Understanding others’ actions and goals by mir­ ror and mentalizing systems: A meta-analysis. Neuroimage, 48(3), 564–584. Weber, E. U., Böckenholt, U., Hilton, D. J., & Wallace, B. B. (1993). Determinants of hy­ pothesis generation: Effects of information, baserates and experience. Journal of Experi­ mental Psychology: Learning, Memory, and Perception, 19, 1–14. Weber, E. U., Böckenholt, U., Hilton, D. J., & Wallace, B. B. (2000). Confidence judgments as expressions of experienced decision conflict. Risk, Decision and Policy, 5, 69–100. Weiner, B. (1985). “Spontaneous” causal thinking. Psychological Bulletin, 109, 74–84. Weizenbaum, J. (1966). ELIZA: A computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 36–45. Winter, L., & Uleman, J. S. (1984). When are social judgments made? Evidence for the spontaneousness of trait inferences. Journal of Personality and Social Psychology, 47, 237–252. doi:10.1037/0022-3514.47.2.237 Woodward, J. (2003). Making things happen: A theory of causal explanation. Oxford: Ox­ ford University Press. Woodward, J. (2011). Causation: Interactions between philosophical theories and psycho­ logical research. In C. Hoerl, T. McCormack, & S. Beck (Eds.), Understanding counterfac­ tuals, understanding causation: Issues in philosophy and psychology (pp. 147–170). Ox­ ford: Oxford University Press.

Notes: (1.) “Sensible” in nineteenth-century English refers to a being’s excitability, and is closer in meaning to the modern meaning of “sensitive” than to that of “sensible.” (2.) The importance of these dimensions of covariation for causal attribution and blame have also been independently recognized by experimental philosophers such as Systma, Livengood, and Rose (2012), though under different names. (3.) Note that for ease of exposition the following target example is changed from a spe­ cific kind of abuse (racism) to a more general one. This is because only one observation of racist behavior, being highly counter-normative, may suffice to justify an inference of racism (cf. Reeder & Brewer, 1979), and would thus complicate the presentation of these forms of causal inference. (4.) Here we focus on implicit assumptions about covariation configurations, whereas Kel­ ley also evoked two kinds of causal schemata, the multiple necessary causes (MNC) schema and the multiple sufficient causes (MSC) schema. However, it may be more accu­ rate to characterize these as the multiple necessary conditions situation model and multi­

Page 48 of 49

Social Attribution and Explanation ple necessary causes situation model, following the distinction between causes and condi­ tions made in the next section. (5.) Of course, aficionados of the Mississippi Delta Blues will make a quite different asso­ ciation with this name! (6.) Of course, science is often used to respond to questions such as “How and when did the dinosaurs become extinct?” and children often ask questions such as “Why don’t the stars fall down from the sky?” But note that in the former case, science is being used to answer a question that has a historical aspect (the extinction of dinosaurs has in principle a specifiable date and place), and in the latter children are asking proto-scientific ques­ tions that generalize across space and time. For discussion of differences between partic­ ular and general explanation, see Hilton (1995, 2001, 2002; Hilton, McClure, & Slugoski, 2005). (7.) They used slightly different scenarios from those used by Kahneman and Tversky. In­ terested readers are referred to their paper for details.

Denis Hilton

Department of Psychology University of Toulouse Toulouse, France

Page 49 of 49

The Development of Causal Reasoning

The Development of Causal Reasoning   Paul Muentener and Elizabeth Bonawitz The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.40

Abstract and Keywords Research on the development of causal reasoning has broadly focused on accomplishing two goals: understanding the origins of causal reasoning, and examining how causal rea­ soning changes with development. This chapter reviews evidence and theory that aim to fulfill both of these objectives. In the first section, it focuses on the research that explores the possible precedents for recognizing causal events in the world, reviewing evidence for three distinct mechanisms in early causal reasoning: physical launching events, agents and their actions, and covariation information. The second portion of the chapter examines the question of how older children learn about specific causal relationships. It focuses on the role of patterns of statistical evidence in guiding learning about causal structure, suggesting that even very young children leverage strong inductive biases with patterns of data to inform their inferences about causal events, and discussing ways in which children’s spontaneous play supports causal learning. Keywords: development, causal reasoning, learning, induction, bias

Causality plays a fundamental role in human cognition, and has long been a topic of inter­ est for many developmental researchers—causal reasoning has been proposed to be a central aspect of early learning in the physical (Baillargeon, 2004; Carey, 2009; Cohen, Amsel, Redford, & Casasola, 1998; Leslie, 1995; Spelke, 1990), psychological (Csibra & Gergely, 1998; Gopnik & Wellman, 1992; Meltzoff, 2007; White, 2014; Woodward, 2009), and biological world (Carey, 2009; Wellman & Gelman, 1992). Moreover, causal reasoning has been implicated in many theories of early social development, including developing notions of and theory of mind (Gopnik & Wellman, 2012) and morality (Cushman, 2008; Hamlin, 2013). Causal representations are also central to many linguistic theories of meaning (Jackendoff, 1990; Levin & Rappaport Hovav, 1995) and have been an area of in­ terest in research on early language acquisition (Fisher, Hall, Rakowitz, & Gleitman, 1994; Naigles, 1990). Finally, researchers have studied the role of causal learning in the development of explanations (Legare, 2012; Rozenblit & Keil, 2002).

Page 1 of 38

The Development of Causal Reasoning Research on the development of causal reasoning has broadly focused on accomplishing two goals: understanding the origins of causal reasoning, as well as how causal reasoning changes with development. Consequently, in this chapter, we review evidence and theory that aim to fulfill both of these objectives. In the first section, we focus on the research that explores the possible precedents for recognizing causal events in the world. How are early notions of possible causal relationship related to a broader abstract framework for recognizing events as causal? We review evidence for three distinct mechanisms in early causal reasoning—physical launching events, agents and their actions,1 and covariation information—and discuss how these mechanisms might be integrated across develop­ ment. In the second portion of the chapter, we focus on the question of how older chil­ dren learn about specific causal relationships. Taking for granted that children recognize an event as involving a causal relation, how do (p. 678) children learn about the specific strengths and properties of causes and effects in a system? We focus on the role of pat­ terns of statistical evidence in guiding learning about causal structure and point to evi­ dence that shows that children integrate this evidence with their developing inductive bi­ ases.

Origins of Causal Reasoning One of the hallmarks of mature causal reasoning is that adults can learn the causal struc­ ture of any kind of event in the world, ranging from how viruses cause disease to how hu­ man behavior influences global warming. In short, adult causal cognition is abstract and domain-general: there are no limits on the kinds of causal structures we can represent. Knowing that event A precedes event B warrants the potential conclusion that event A caused event B. What are the origins of such an ability? One might assume that this fea­ ture of mature causal reasoning is continuous throughout development—young children and perhaps even infants may also hold abstract, domain-general beliefs about causal re­ lations in the world. Yet, despite general agreement about the domain-generality of adult causal reasoning, de­ velopmental research suggests that there may be three potential early notions of causa­ tion in infancy: (1) the transfer of physical force between objects, (2) the outcomes of goal-directed actions produced by dispositional agents, and (3) the ability to track covari­ ation relations between events. This research raises broad questions concerning the de­ velopment of causal reasoning. How are these notions of causality related in early child­ hood development? Do infants integrate all three notions of causation across develop­ ment? Alternatively, is one of these notions primary in development, and if so, does it serve as a starting point for the development of abstract causal reasoning?

Causal Reasoning Emerges from Representations of Motion Events The majority of research investigating the development of causal reasoning has focused on infants’ representations of caused motion events. These findings originated with sys­ tematic psychophysical investigations by Michotte (1963), and have been replicated by Page 2 of 38

The Development of Causal Reasoning several other research groups in more recent years (Schlottmann, Ray, Mitchell, & Demetriou, 2006; Scholl & Nakayama, 2002, 2004; White, 1995, Chapter 14 in this vol­ ume). In the prototypical causal motion event, object A approaches and contacts object B, immediately after which object B begins to move. To adults, this event automatically gives rise to the impression that object A caused object B to move. In particular, research has shown that this impression depends upon the spatial and temporal features of the event. If object A stops short of object B before it moves (spatial gap), or if object B moves after a short delay from point of contact with object A (temporal gap), adults do not perceive the interaction as causal. These impressions are as automatic as many other illusions in vision research. Just as participants cannot help but see that one arrow is longer than the other in the Muller-Lyer illusion (Muller-Lyer, 1889), participants cannot help but view object A as the cause of object B’s motion, even though both “objects” are really only col­ ored discs on a computer screen and partake in no actual causal interaction. Much of the early research on infant causal reasoning has similarly investigated the con­ ditions under which infants perceive events as causal interactions between objects. Three general findings have emerged from this research. First, infants’ perception of motion events is sensitive to the same spatial and temporal features that influence adult causal perception (Belanger & Desrochers, 2001; Cohen & Amsel, 1998; Cohen & Oakes, 1993; Leslie, 1982, 1984a; Leslie & Keeble, 1987; Mascalzoni, Regolin, Vallortigara, & Simion, 2013; Newman, Choi, Wynn, & Scholl, 2008; Oakes, 1994; Oakes & Cohen, 1990). For ex­ ample, using a looking-time paradigm, Cohen and Amsel (1998) habituated 4-, 5.5-, and 6.25-month-old infants to either the prototypical causal motion event or to non-causal mo­ tion events that had either a spatial or temporal gap. Infants were then presented with all three events (causal motion event, spatial gap, temporal gap) at test. At 4 months of age, the youngest age tested, Cohen and Amsel (1998) found that infants preferred to look at the causal events, regardless of the event to which they were habituated. However, by 6.25 months of age, infants who were habituated to the prototypical motion event in­ creased their looking time to both the gap and delayed motion events. These findings suggest that 6-month-old infants were sensitive to the change in causality between the habituation and test phase. An alternative interpretation, however, is that in­ fants simply detected changes in either the spatial or temporal features of a motion event. Support for the former interpretation came from infants who were habituated to either the spatial gap or temporal gap events—events that adults typically do not per­ ceive as causal interactions. Cohen and Amsel (1998) found that infants in these condi­ tions increased their looking time to (p. 679) the prototypical “causal” motion event, while maintaining decreased looking time to the other “non-causal” event to which they were not habituated (spatial or temporal gap). This looking-time pattern occurred despite the fact that switching between the non-causal events involved more featural changes than switching from a non-causal to a causal event. For example, the change from a spatial gap event to a temporal gap event involves both a spatial and temporal change; switching from the spatial or temporal gap to the causal event involves only one change (spatial or temporal change). Therefore, by around 6 months of age, infants appear able to categori­ Page 3 of 38

The Development of Causal Reasoning cally perceive motion events along causal dimensions in addition to spatial and temporal dimensions (see also Oakes, 1994, for similar results with 7-month-old infants). Second, infants’ causal representations support causal inference, in addition to causal perception (Ball, 1973; Kotovksy & Baillargeon, 1998, 2000; Muentener & Carey, 2010). For example, Kotovsky and Baillargeon (2000) have shown that infants as young as 7 months of age can infer causal violations between moving objects, even if they do not see the precise moment of contact between the objects. In these studies, infants viewed a scene in which one object stood at the top of the ramp; when released, the object would roll down the ramp toward a second object. For some infants (no contact condition), a sol­ id wall stood at the bottom of the ramp, such that the first object could not hit the second object. For other infants (contact condition), the bottom portion of the wall was removed so that the first object could contact the second object. A black screen was then placed over the point of potential contact between the objects, and infants viewed one of two types of test events. In one test event, the first object rolled down the ramp, disappeared behind the wall, and the second object remained stationary. In the second test event, the first object rolled down the ramp, disappeared behind the wall, and the second object then began to move across the stage. Infants’ looking times were consistent with a causal interpretation of the scene. When the second object moved, infants looked longer in the no contact condition, in which the solid wall should have blocked the point of contact, than in the contact condition, in which the physical interaction should have occurred and caused the second object to move. When the second object remained stationary, however, infants looked longer in the contact con­ dition, in which the second object should have been caused to move, than in the no con­ tact condition, in which the inferred lack of contact between objects explained the lack of subsequent motion. Thus, discussions of infants’ causal reasoning abilities are aptly named; infants not only directly perceive physical interactions between objects in a causal nature, they also can make inferences about potential causal interactions based on the spatial and temporal properties of events. Finally, infants not only encode the spatial and temporal features of motion events that are relevant to causal reasoning and make successful inferences based on these features, they also represent motion events in terms of the situational causal roles (i.e., situational agent, situational patient) that individual objects play in a given interaction.2 Leslie and Keeble (1987) presented infants with caused motion events until they habituated to the displays. Then, rather than switch the spatial and temporal features of the event as in the studies discussed previously, they showed infants the habituation display in reverse: ob­ ject B now moved across the screen and contacted object A, after which it immediately began to move. Thus, although the spatial and temporal features of the event were the same, the causal representation of the event was altered: object B now appeared to cause object A’s motion. In the comparison condition, infants were habituated to delayed events, in which a temporal gap was inserted at the point of contact between object A and object B, and were tested with reversals of the delayed event. The researchers found that in­ fants dishabituated to reversals only in the causal condition, in which object A first ap­ Page 4 of 38

The Development of Causal Reasoning peared to cause object B’s motion by direct contact. Infants in the delayed event condi­ tion did not similarly increase their looking time to the reversed test events. Belanger and Descrochers (2001) have replicated and extended these findings with non-causal events involving spatial gaps in addition to temporal gaps. Taken together, these findings sug­ gest that infants are indeed representing these motion events in an abstract manner, as­ signing the conceptual roles of situational agent and situational patient to the events they perceive. Thus, a long line of research has shown that adults automatically encode the interaction between moving objects in terms of their causal relations, and that by at least 6 months of age infants also represent the causal nature of motion events. One potential conclusion to draw from this research is that causal reasoning is continuous across development and has its roots in early causal perception. Despite not having conducted any developmental research, Michotte (1963) in fact explicitly hypothesized (p. 680) that causal perception of motion events formed the origin of mature adult causal reasoning of events outside the domain of motion. The fact that adults use spatial language to describe many instances of non-spatial causal events (“he pushed me over the edge,” “he turned me into a monster”) was among the speculative evidence to Michotte that motion events formed the origin of adults’ most abstract causal representations. That motion events form the origin of mature causal reasoning is also implicit in the de­ velopmental research on causal reasoning of motion events. First, infants’ representa­ tions of motion events were the primary focus of causal reasoning research in infancy for over 25 years. Other researchers, although not explicitly stating that abstract causal rea­ soning was derived from reasoning about caused motion events, framed their research as addressing the early developing causal representations that at least precede mature causal reasoning. Finally, some researchers have offered formal computational models de­ scribing the potential development of causal reasoning, placing the spatial and temporal features of spatial contact and delay time as the inputs to causal reasoning from which abstract causal representations are derived (Cohen, Chaput, & Cashon, 2002).

Causal Reasoning Emerges from Representation of Agents and Their Actions Among the many developmental changes children undergo within the first year of life is their increasing ability to act upon the world around them, as well as reason about other agents’ actions. By around 6 months of age, infants begin to physically interact with ob­ jects, providing them with endless opportunities to engage causally with their environ­ ment. By 6 months of age, infants also encode the goals of intentional, human action as object-directed (Woodward, 1998). When infants are habituated to a person repeatedly reaching for one of two objects, they dishabituate when the person subsequently reaches for the new object. Moreover, the ability to engage in reaching actions and the ability to encode other agents’ actions appear to be intertwined—infants’ reasoning about the goaldirectedness of others’ actions seems to follow the developmental trajectory of their own motor milestones (Cashon, Ha, Allen, & Barna, 2013; Cicchino & Rakison, 2008; Som­ Page 5 of 38

The Development of Causal Reasoning merville, Woodward, & Needham, 2005), and improving infants’ motor repertoire in early development enhances their ability to represent both the objects and the people around them (Libertus & Needham, 2010, 2011; Needham, Barrett, & Peterman, 2002; Som­ merville et. al., 2005). These findings lead to the possibility that abstract causal represen­ tations may also emerge from the close tie between agent action (the infant’s own actions as well as those of other people) and causal events. The hypothesis that causal reasoning may emerge from representations of agents and their actions is most closely associated with Piaget (1954). Piaget believed that causal reasoning emerges slowly over the first few years of life, with children first learning how his or her own actions produce effects in the environment (e.g., that moving a rattle is fol­ lowed by a specific noise), much like animals can learn causal relationships between their own actions and outcomes in operant conditioning (see Schloegl & Fischer, Chapter 34 in this volume, for further discussion on comparative approaches). The child then gradually extends this egocentric causal representation to other intentional agents in his or her en­ vironment. Finally, children extend their causal reasoning to non-agentive entities, such as objects, and eventually to the abstract causal events represented in adulthood. More recently, White (1995, 2014, Chapter 14 in this volume) has made similar arguments that causal reasoning emerges from action representations early in infancy. As reviewed earlier, however, infants appear to reason causally about object motion with­ in the first year of life. Infants are able to represent the causal relations between objects that are separate from their own action at about the same age at which successful reach­ ing behaviors first emerge in infancy. These findings suggest that Piaget may have placed the developmental milestones at the wrong time course in development. However, there are several pieces of evidence which suggest that while Piaget may have been wrong about the timing of developmental change, he may have been correct in identifying the importance of infants’ actions and representations of other agents and their actions in early causal reasoning. First, infants’ representations of motion events are affected by whether the individuals in the event are dispositional agents (Leslie, 1984b; Luo, Kaufman, & Baillargeon, 2009; Woodward, Phillips, & Spelke, 1993). In an early finding, for example, Woodward, Phillips, and Spelke (1993) showed that while 7-month-old infants expected one object to contact a second object before it moved, they did not hold such an expectation for two moving peo­ ple. The (p. 681) researchers inferred that infants reasoned that people were capable of self-generated motion, and thus, contact between the individuals was not necessary for the second person to move (see also Kosugi & Fujita, 2002). In a second example, Luo, Kaufman, and Baillargeon (2009) have shown that although infants expect objects to move upon contact, they do not expect self-moving objects to move upon contact. More­ over, infants dishabituate only when an inert object, but not a self-propelled object, ap­ pears to resist the force of an external cause. Thus, despite the spatial and temporal fea­ tures that elicit infants to represent one object as causing another object to move, these expectations do not seem to apply when the entities involved are dispositional agents.

Page 6 of 38

The Development of Causal Reasoning Second, infants seem to infer that objects are caused to move by other agents, not by oth­ er objects. Saxe and colleagues have shown in a series of studies that when infants see an object launched from off-stage, they infer the location of the “launcher” and have expec­ tations about what kind of individuals can fill the “launcher” role (Saxe, Tenenbaum, & Carey, 2005; Saxe, Tzelnic, & Carey, 2007; see also Kosugi, Ishida, & Fujita, 2003). In­ fants expect causers to be at the origin of motion (rather than the end point of motion) and represent human hands or novel, self-moving puppets with eyes as more likely causes of object motion than prototypical objects such as toy trains. Although infants in these studies do not see the entire Michottian features argued as necessary for causal percep­ tion, if representations of caused motion events served as the origin of causal reasoning, one might expect infants to infer that any individual could cause its motion given the ap­ propriate point of origin. However, this does not appear to be the case. Third, infants’ causal perception develops alongside infants’ increasing motor abilities (Sommerville et. al., 2005). In these studies, 12-month-old infants were shown means–end sequences in which an experimenter pulled a cloth to retrieve an object. Following habit­ uation to the means–end sequences, infants were shown test events that were consistent or inconsistent with the causal structure presented during the habituation phase. In the inconsistent test trials, the experimenter moved the object off the cloth; however, when she pulled the cloth, the object still moved toward her, seeming as if the cloth caused the object to move at a distance. In contrast, in the consistent test trials, the object did not move when the experimenter pulled it. Infants’ attention was drawn more toward the in­ consistent test events than the consistent test events, suggesting that they detected the violation in causal structure. Ten-month-old infants similarly detected causal violations in means–end sequences that involved pulling platforms, rather than cloths. Interestingly, infants’ looking times were correlated with their own ability to engage successfully in the means–end sequences presented during the study. Infants who engaged in more planful actions in a separate reaching task were more likely to detect the causal violations than infants with more immature means-end reaching abilities. These findings provide evidence that the spatial and temporal features that consistently give rise to causal perception of motion events cannot be the sole origin of mature causal reasoning. Infants’ representations of agents influence their causal reasoning well within the first year of life and near the same age at which we first have evidence for infants’ representing object-caused motion. Whether causal reasoning is tied to representations of agents in general (i.e., themselves or other agents) or only initially to infants’ own ac­ tions, the studies reviewed in the preceding suggest that causal reasoning may emerge from representation of agents and their actions. Findings that the development of children’s motor abilities is related to developmental changes in causal reasoning also provides a potential mechanism responsible for developmental change. As children learn to act on objects, they might begin to identify the outcomes that followed their actions and gradually begin to detect more complex causal relations in the world. Once they de­ termine that actions are particularly relevant to causal reasoning, they might focus on other people’s actions. Such a process could then rapidly expand the child’s acquisition of causal knowledge. Still, even if developmental change in causal reasoning does not de­ Page 7 of 38

The Development of Causal Reasoning pend specifically on motor development, infants’ initial concepts of causality may still de­ rive from reasoning about agents and their actions.

Causal Reasoning from Covariation Information The previous two sections provide evidence for domain-specific origins for the develop­ ment of causal reasoning (caused motion interactions between two objects and infants’ representations of their own and other agent’s actions on the environment) and suggest that causal cognition becomes more abstract and domain-general over the course of de­ velopment. A third line of research, in contrast, suggests that causal reasoning may be (p. 682) domain-general, even in very young infants. Here, the focus is on children’s abili­ ty to form causal representations based on the covariation information they receive in their environment. In the most fundamental sense, a causal representation is a covaria­ tion relation between two events: when event A occurs, event B follows; when event A does not occur, event B also does not occur. Research has shown that children are sensi­ tive to statistical information in their environment from a very early age (Saffran, Aslin, & Newport, 1996; Wu, Gopnik, Richardson, & Kirkham, 2011), and that this might inform their language learning (Smith & Yu, 2008; Xu & Tenenbaum, 2007a, 2007b) and even so­ cial preferences (Kushnir, Xu, & Wellman, 2010; Ma & Xu, 2011). Therefore, young chil­ dren may be sensitive to statistical information in their environment to represent causal structure. The majority of evidence for causal reasoning being a domain-general process comes from research focused on causal reasoning in the preschool years, which we review at length in the following. Research with younger populations suggests, however, that even very young infants are sensitive to the covariation relations between events (Sobel & Kirkham, 2006, 2007). For example, Sobel and Kirkham (2006) presented 8-month-old in­ fants with covariation information in which the appearance of two objects (A & B) predict­ ed the appearance of another object at location C. A fourth event (D) is not predicted by the two objects (A & B). Over the course of familiarization to the events, because A and B occurred together, it was unclear whether A or B caused the appearance at location C. However, during test events, Kirkham and Sobel asked whether infants were capable of inferring the correct cause. In one condition (backwards blocking), infants saw that ob­ ject B predicted the appearance at location C (thus explaining away A as a potential cause for C). In a second condition (indirect screening off), infants saw B predict the ap­ pearance at D (thus providing evidence that A must have been the cause of C). To adults, this evidence leads to distinct causal interpretations (Le Pelley, Griffiths, & Beesley, Chap­ ter 2 in this volume). Adults infer that A is the cause of the appearance at C in the indi­ rect screening-off condition, but not in the backwards blocking condition. Infants’ subse­ quent looking patterns supported a similar causal interpretation. When they viewed A ap­ pear on the screen, they spent more time looking at location C in the indirect screening off condition than the backwards blocking condition.

Page 8 of 38

The Development of Causal Reasoning Thus, infants are capable of tracking statistical information in causal situations to predict events. These findings provide a clear picture for the development of causal reasoning. If causal reasoning emerged from the ability to track covariation information without con­ straints on the types of events (e.g., motion), features (e.g., temporal or spatial), and indi­ viduals (e.g., objects or agents), then infants and young children would have a domaingeneral and abstract causal reasoning system set up to acquire a wealth of causal infor­ mation across development.

Reconciling the Different Accounts Thus, there are three distinct proposals for the origins, mechanism, and development of causal reasoning across infancy and early childhood. After nearly three decades of re­ search on infants’ folk physics, it might be of little surprise that causal reasoning emerges from early physical reasoning systems. Given infants’ increasing physical and social inter­ actions over the first year of life, and the importance of these experiences to infants’ rea­ soning about objects and people, it also seems plausible that infants’ early causal reason­ ing may be tied to their representations of agents and actions. Finally, research has shown that children are sensitive to the statistical information in their environment from early in infancy; given that covariation relations define a causal relation, they may also entail infants’ earliest causal representations. How then might one reconcile these distinct views on early causal reasoning? One possi­ bility is to accept all three views as providing distinct origins for causal reasoning, much in the same way that multiple origins for agency can be seen in the innate ability to de­ tect face-like stimuli, the early developing ability to pay attention to eyes, and imprinting in animals (see Carey, 2009, for extended discussion; Saxe & Carey, 2006). Yet, how do these distinct causal concepts interact (or not) across development and how do they be­ come integrated across development? Moreover, why do we use the same causal lan­ guage to describe physical and social causation (e.g., “he made the child cry,” “he made the block move”) if causal reasoning emerges from distinct representational systems? Fi­ nally, if there are distinct origins, why is covariation information a part of representations of both caused object motion and the causal consequences of agent action? A second option might be to simply assert that infants start with no causal representa­ tions and are simply sensitive to all the statistical information (p. 683) in their environ­ ment. Since their own actions are likely the most salient in their environment and the out­ comes they notice would likely occur following contact, the early causal notions focused on spatial features of an event and agents’ actions could simply be a product of early causal learning based on covariation information. Similarly, as infants begin to act on the world around them, they typically (although not always) involve moving objects. Thus, children may learn about caused motion events early in development. However, in most studies of early causal reasoning, infants’ attention is always clearly drawn to a highly predictable relation between events, but infants seem to represent only some of them as causal relations. If tracking covariation information is the sole mechanism behind infants

Page 9 of 38

The Development of Causal Reasoning developing causal reasoning abilities, then one might expect infants to be most successful when experiments narrow the hypothesis space for the infant. Finally, a third way to reconcile these differences is to suggest that causal reasoning about motion events is not an independent origin for mature causal reasoning. For exam­ ple, Scholl and colleagues have proposed that causal perception of motion events is a modular process distinct from the abstract causal reasoning seen in adults (Newman et al., 2008; Scholl & Tremoulet, 2000). Alternatively, White (2009, 2014, Chapter 14 in this volume) has argued that representations of agency may in fact underlie our ability to rep­ resent caused motion events involving objects. If infants view the causal agent in a caused motion event as an abstract dispositional agent, then there may only be a singular origin for causal reasoning, one that derives from representations of agents and their ac­ tions. Thus, although there is extensive research documenting causal reasoning in early child­ hood, emerging well within the first year of life, and there are multiple proposals to ex­ plain the emergence and development of causal reasoning across the life span, re­ searchers have yet to reach a consensus explanation for the development of causal rea­ soning in the first few years of life. More recently, researchers have attempted to recon­ cile these approaches by directly testing predictions from each research tradition using highly similar paradigms, with varied dependent measures, across a range of ages. Muentener and Carey (2010) first investigated 8.5-month-old infants’ causal representa­ tion of occluded causal events, which included three important changes from past re­ search on infant causal reasoning. First, the studies investigated causal reasoning of nov­ el state change events such as a box breaking into pieces or lighting up and playing mu­ sic, moving the investigation of causal reasoning outside the domain of canonical motion events. Second, the study manipulated the kind of candidate agents (dispositional agents vs. objects) within these events. Third, the predictive relations within the events were kept constant. In these studies, infants were shown predictive events in which one entity (e.g., a human hand) was the potential cause of an outcome. For instance, infants saw a candidate causal agent travel behind an occluder toward a box, a short time after which the box would break apart into several pieces. Note that since the infants never saw what occurred be­ hind the occluder, they had no visual evidence of a causal relation between the candidate agent and the outcome. The research question was whether infants inferred that the can­ didate agent was the cause of the box’s breaking. Muentener and Carey (2010) tested for this in two ways. In all subsequent test events, in­ fants then were shown the unoccluded test events in which the agent either contacted or stopped short of the box. For half of the infants, the outcome occurred and the box broke; for the other half of the infants, the outcome did not occur and the box remained solid. If infants represented the agent as the cause of the outcome, then they should look longer

Page 10 of 38

The Development of Causal Reasoning when (1) the candidate agent contacts the box and the outcome doesn’t occur, and (2) when the candidate agent stops short of the box and the outcome still occurs. Note that covariation presented initially to the infants was fully consistent with a causal interaction—the candidate agents’ approach always preceded the subsequent outcome that occurs. Thus, if infants rely solely on covariation information to establish causal rep­ resentations, they should have inferred that a causal interaction had occurred because the candidate agent’s action always preceded and predicted the outcome. However, in­ fants only inferred a causal interaction for change of state events when a dispositional agent was the candidate cause. When a human hand (or a self-propelled puppet) was the candidate agent, infants looked longer during the test events when the agent stopped short of the box and the outcome occurred and when the agent contacted the box and the outcome did not occur. In contrast, when the candidate agent was a toy train (a typically inert object), infants’ looking times did not differ across conditions. The findings that infants reason differently about equivalent predictive relations based on the kind of individual in the event suggests that infants’ causal reasoning is un­ likely to be a domain-general process across development; infants should have succeeded (p. 684)

across all conditions if that were the case. Moreover, that infants are able to reason about causal interactions outside the domain of caused motion at nearly the earliest ages at which researchers have evidence for causal reasoning in infancy suggests that mature causal reasoning is unlikely to emerge solely from causal perception (or inference) of mo­ tion events. These results extend prior findings (Leslie, 1984b; Saxe et al., 2005, 2007) that infants infer dispositional agents, such as a human hand, as the cause of object mo­ tion. They also suggest that infants may represent caused motion events in terms of dis­ positional agent–patient relations, rather than simply situational agent–patient relations, akin to arguments made by White (1995, 2014, Chapter 14 in this volume). Thus, there may be a bias toward reasoning about agents and their actions in infants’ early causal reasoning. Follow-up research suggests that this agency bias continues through toddlerhood. In a se­ ries of studies, Bonawitz and colleagues found that 2-year-old toddlers fail to represent predictive events as causal events unless they are initiated by an agent’s action (Bonawitz et al., 2010). The researchers presented toddlers with predictive chains in which a block (A) slid toward and contacted another block (B), after which a small toy airplane (C), con­ nected to B by a wire, began to spin. Between conditions, they manipulated whether A be­ gan to move on its own toward B or whether the experimenter moved A. Toddlers readily learned the predictive relation between A and B’s contact and C’s activation, and there was no difference between conditions. The researchers then asked a simple question: Do toddlers infer that A caused C to activate? If so, then toddlers should be more likely to push A into contact with B and look toward C upon contact—that is, they should expect that their action should cause the airplane to spin. Bonawitz et al. found that this was the case when A’s motion was agent-initiated, but that toddlers failed to make predictive looks when it engaged in self-initiated motion. This failure was not due to a lack of inter­ est in or fear of the self-moving block, as all toddlers played with the block and eventually Page 11 of 38

The Development of Causal Reasoning placed A into contact with B. Rather, toddlers simply did not believe that their action would cause the outcome to occur. Intervening on the block (A) is a fairly low-cost action, and in fact, all children eventually interacted with the block and placed the two blocks in contact with each other. Thus, the difference between the agent and non-agent conditions was not a matter of motivation or imitation. The failure to make a predictive look toward the airplane, thus, seems to be a true failure to represent it as even a potential cause of the airplane’s motion. Again, to pass this task, children are not required to infer that the block is the definitive cause of the outcome, but rather a plausible cause of the outcome. That they fail to look toward the outcome following their action strongly suggests that toddlers did not even entertain this possibility. Toddlers, similar to infants, were biased to represent only predictive rela­ tions involving agents as potential causal interactions. In fact, a visible agent is not even necessary to show such an effect. Using a modified looking-time method with toddlers, Muentener, Bonawitz, Horowitz, and Schulz (2012) asked whether toddlers would infer the presence of an agent when they see an object emerge in sight already in motion (as seen in Saxe et al., 2005) and whether this invisible agent could facilitate toddlers’ causal inferences. In one condition, toddlers viewed the block (A) initiate its own motion toward the base object (B), after which the airplane (C) began to spin. In the second condition, toddlers viewed A emerge from offstage already in motion; it moved toward B, after which C began to spin. In contrast to Bonawitz et al., toddlers viewed occluded causal interactions, similar to the looking time paradigm used in Muentener and Carey (2010) described earlier—that is, toddlers needed to infer whether or not A contacted B behind a screen. During the test, toddlers’ causal interpre­ tations were tested by varying the spatial relation between A and B alongside the pres­ ence or absence of C’s spinning, between conditions. When A’s motion was self-initiated, an analysis of toddlers’ looking times provided a con­ ceptual replication of Bonawitz et al. (2010). Toddlers looked longer when C spun than when it didn’t, but they were insensitive to the spatial relations between A and B—they did not seem to infer that A was the cause of C’s spinning. In contrast, when A appeared onstage already in motion, they appeared to engage in causal inference. When C spun during the test, they looked longer when A stopped short of B than when it contacted B. When C did not spin during the test, they looked longer when A contacted B than when it stopped short of B. Follow-up conditions confirmed that toddlers inferred the (p. 685) presence of a hidden agent. When a hidden hand was revealed behind an occluder at the impetus of A’s motion, infants looked longer when A initiated its own motion onstage than when A appeared onstage already in motion. Thus, across two different ages and two dif­ ferent measures, children’s causal reasoning appears biased toward dispositional agents. Despite equivalent predictive relations between events, infants and toddlers seem to pref­ erentially represent only those events initiated by dispositional agents as causal interac­ tions.

Page 12 of 38

The Development of Causal Reasoning These findings have implications for a complete understanding of the origins of causal reasoning. First, these findings suggest that abstract causal reasoning does not emerge solely from representations of caused motion events triggered by spatial and temporal features. As has been found in prior research, infants and toddlers are capable of repre­ senting causal structure in non-Michottian events, even when the precise spatial interac­ tion is occluded. Second, these findings suggest that early causal reasoning is not simply the result of tracking covariation information in the world, as infants and toddlers failed to represent predictive events as causal relations when they did not include a disposition­ al agent. While it is possible that infants and toddlers have already acquired an expecta­ tion that dispositional agents are more likely causal agents than are objects, toddlers did not engage in causal exploration even when the cost of intervening was very low (i.e., simply looking up toward the outcome). This suggests that the link between agents and causation is either established via tracking covariation information and insensitive to new evidence or that there are early biases on the types of predictive relations that infants and toddlers track when engaged in causal reasoning. Infants may be particularly at­ tuned to attending to the actions of dispositional agents and the outcomes that follow them. This close relation between agents and causation may provide a starting point for the abstract, domain-general causal representations that emerge across development.

Learning Specific Causal Relationships in Early Childhood We have focused on three mechanisms that might support infants’ early recognition of causal events, suggesting that domain-specific knowledge plays an important role in iden­ tifying events as causal. Of course, causal reasoning extends beyond initial recognition that events are causal—it involves reasoning about more specific causal relationships, in­ cluding the identification of which objects or agents in a scene are responsible for effects, assessing the relative strengths of generative and inhibitive causal entities, and under­ standing how causes and effects stand in relation to each other. How might children un­ derstand these more specific causal relationships once events are identified as causal?

Domain-Specific Mechanism Information Consistent with the idea that infants use very early developing or innate knowledge of physical forces and agents’ actions to identify causal events, one way that older children might learn causal relations is through prior causal knowledge expressed in domain-spe­ cific beliefs (Bullock, Gelman, & Baillargeon, 1982; Leslie & Keeble, 1987; Meltzoff, 1995; Shultz, 1982a; Spelke, Breinlinger, Macomber, & Jacobson, 1992; Wellman, Hickling, & Schult, 1997; Woodward, 1998; Spelke, Phillips, & Woodward, 1995). For example, do­ main boundaries (such as boundaries between psychological and biological events) might help early learners identify which kinds of entities in the world are candidate causes for observed events in specific domains. Indeed, research shows that older children’s causal inferences respect domain boundaries. Specifically, preschoolers are hesitant to say that Page 13 of 38

The Development of Causal Reasoning a psychological event, such as being embarrassed, can cause a biological event, such as blushing (Notaro, Gelman, & Zimmerman, 2001). A second way in which domain-specific knowledge could facilitate causal reasoning is in identifying the kinds of causal mechanisms that might link a potential cause and effect. Young children believe that there are domain-specific differences in the kinds of forces that lead to causal transmission from one event to another, with physical events linked by transmission of energy and social events linked by psychological mechanisms such as in­ tention (Schultz, 1982b). Children can also use this domain-specific knowledge to draw inferences about new causal events (Shultz, 1982a). The idea that distinct mechanisms (e.g., between physical and psychological domains) un­ derpin children’s causal understanding is consistent with Piaget (1954), and prevalent in contemporary theories of causal reasoning (e.g., Ahn, Kalish, Medin, & Gelman, 1995; Bullock et al., 1982; Schlottmann, 2001). However, many causal inferences that we draw do not have an obvious mechanism, and learners need not understand a specific process of transmission in order to draw causal conclusions. Indeed, given that our causal mecha­ nism knowledge (p. 686) is quite impoverished (Keil, 2006), children must also rely on oth­ er sources of information to reason causally. We turn our attention to a domain-general mechanism—statistical learning—and discuss how this mechanism has informed our un­ derstanding of children’s causal reasoning.

Statistical Learning How might young learners begin to identify the potential causal relationships among events? Although domain-specific knowledge may help to constrain the space of possible causes, along with their strength and relationship to other events, additional information must drive causal inference and is necessary for learning. As with learning in other do­ mains, sensitivity to patterns of statistical evidence certainly plays an important part in causal reasoning (see Perales, Catena, Cándido, & Maldonado, Chapter 3 in this volume, for further discussion). Perhaps the simplest statistical machinery is an association detector—a mechanism by which the covariation of spatiotemporally related events are learned. Causes and their ef­ fects will be correlated, so these simple association detectors might get reasoning off the ground, without any need for a notion of “cause” (Hume, 1748/1999). In this way, covaria­ tion information could be sufficient for drawing inferences about causal strength (Mack­ intosh, 1975; Rescorla & Wagner, 1972). In more sophisticated models of statistical co­ variation learning, information about effects happening in the presence and absence of potential causes inform causal inferences (e.g., ΔP: Allan, 1980; Jenkins & Ward, 1965; Shanks, 1995). Other models suggest ways in which the strength (or generative power) of a cause on its effect might be measured by taking into account all potential causes on the effect and considering the unique additional difference of a particular cause (e.g., power PC: Cheng, 1997, 2000; see also Cheng & Lu, Chapter 5 in this volume). These models predict the degree to which an effect should occur given a cause and some background Page 14 of 38

The Development of Causal Reasoning noise, and provide a theoretical story for how young learners might draw causal infer­ ences from statistical data, assuming they can track probability information of this kind. Are children able to track probability information in the service of causal reasoning? In­ deed, even 4-year-old children can use covariation information to infer causal strength; children who see a block activate a machine 2 out of 6 times believe it to be less effective than a block that activates a machine the same number of times (2 activations) but in few­ er trials (4 total trials). This evidence suggests that children are tracking the proportion of activation (e.g., Kushnir & Gopnik, 2005). An appeal of adopting these covariation approaches to understand early causal reasoning is that they do not depend on specialized causal learning mechanisms, but instead could be adapted from domain-general statistical learning mechanisms. However, more special­ ized causal learning mechanisms might better describe preschool-aged children’s causal learning. Specifically, learners could attend to the conditional probability of variables, as with causal power models, but this statistical information may then be used to construct or identify the best “causal graph”—an abstract representation of the causal variables and the relationship between them (Glymour, 2001; Gopnik, 2000; Gopnik & Glymour, 2002; Pearl, 1988, 2000; Steyvers, Tenenbaum, Wagenmakers, & Blum, 2003; Tenenbaum & Griffiths, 2003; Spirtes, Glymour, & Scheines, 1993; see Rottman, Chapter 6 in this vol­ ume, for detailed discussion). Causal graphs are a generative model, representing the patterns of dependencies be­ tween variables, and affording a causal-specific framework for reasoning beyond covaria­ tion information about the relationship between variables (Glymour, 2001; Pearl, 1988, 2000; Spirtes et al., 1993). Causal graphs may be a critical component of children’s early intuitive theories, because—necessary for theories—they are an abstract representation that supports prediction, explanation, and intervention (Carey, 1985; Gopnik & Meltzoff, 1997; Wellman & Gelman 1992). Graphs support predictions because a learner with such a representation can “run the model” in their mind to generate possible outcomes. Graphs support explanations in that observed outcomes can be traced back within the graph to the likely causes that explain their presence or absence. Inferring causal struc­ tures is also intimately tied to an interventionist account of causal learning (Woodward, 2003, 2007); knowing that there is a causal link between two variables, as expressed in a graph, means that intervening to change the probability of the cause will change the probability of the effect. Given that these causal structure representations may underlie intuitive, potentially domain-specific reasoning, it becomes important to ask whether this structure account captures children’s causal reasoning more generally. There is evidence that the causal structure account underlies children’s causal represen­ tations (p. 687) in multiple domains, and that causal inference in general depends on this more sophisticated learning account. To start, research has shown that children can use indirect evidence to draw causal inferences that would not be supported under the sim­ pler associative models (Gopnik, Sobel, Schulz, & Glymour, 2001; Schulz, Kushnir, & Gop­ nik, 2007). For example, Sobel, Tenenbaum, and Gopnik (2004) investigated whether Page 15 of 38

The Development of Causal Reasoning young children could use rules of conditional dependence to draw inferences about likely causes. Children were introduced to possible “blickets” and a “blicket detector machine” that activated when blickets were placed on the machine (see also Gopnik & Sobel, 2000). In one backwards blocking condition, children observed two blocks placed on the ma­ chine simultaneously (A and B), and the machine activated. On the next trial, only block A was placed on the machine, and the machine activated. Children were asked whether blocks A and B were blickets. Unsurprisingly, children correctly inferred that block A was a blicket. However, 4-year-olds were less likely to endorse B as a blicket (3-year-olds’ re­ sponses were less clear). Simpler associative models cannot explain this pattern of re­ sponding (though see also Dickinson & Burke, 1996; Larkin et al., 1998; McClelland & Thompson 2007; Le Pelley et al., Chapter 2 in this volume), however, the structure learn­ ing accounts can explain this pattern. Specifically, Sobel et al.’s (2004) structure learning account considers two models (one in which both A and B are blickets, and one in which only A is a blicket). Assuming that blickets are rare, the “only-A” model is more likely un­ der that structure learning account, and qualitatively matching preschoolers’ preference. A second, but related, set of evidence derives from preschoolers’ use of base-rate infor­ mation to draw inferences about potential causes. The structure learning account de­ pends on not only the probability of observing patterns of data given different possible causal structures, but also the base-rate probabilities of these structures. To directly test the prediction that base-rate information influences children’s causal structure infer­ ences, Griffiths, Sobel, Tenenbaum, and Gopnik (2011) manipulated the base-rate infor­ mation of blickets and gave them the same backwards blocking task described earlier. Consistent with the predictions of the structure learning account, when blickets were rare, children endorsed only A as the cause following this pattern of data, but when blick­ ets were common, children also endorsed B (Griffiths, Sobel, Tenenbaum, & Gopnik, 2011). A third line of evidence in support of structure learning accounts pertains to children’s use of intervention information to draw causal inferences. The adage “correlation does not imply causation” derives from the pervasive issue of using covariation to inform our causal understanding. Knowing that A and B are correlated does not tell us whether A causes B, B causes A, or some unobserved variable C causes both. However, as with care­ ful scientific experiment, intervention information can resolve the ambiguity among causal structures. Structure learning frameworks afford a privileged place for interven­ tions, because they can account for causal direction in a way that associationist or causal power accounts cannot. Like scientists, children also combine intervention information with covariation information to draw inferences about the most likely causal structure, such as whether A caused B, or B caused A, and also to draw inferences about unob­ served variables (Gopnik et al., 2004). Children can also decide between more complicat­ ed causal structures using both covariation and intervention information (Schulz, Gopnik, & Glymour, 2007); for example, following evidence about a toy with two gears spinning, preschoolers accurately inferred whether patterns of data were caused by one of four

Page 16 of 38

The Development of Causal Reasoning structures including causal chains (e.g., A → B → C) and common cause structures (e.g., A → B and A→ C). Of course, a causal graphical model does not capture all possible information about the structure of causal events. Additional information, such as causal form and the number of ontological kinds (categories) in the observed set could depend on “higher order” causal knowledge. Research with preschoolers suggests that children make causal inferences that extend to this higher order knowledge. For example, causal form can vary across dif­ ferent event types: a causal event might occur when at least any single activator is present; the event might require that two activators be present at the same time; or the event might only occur when exactly one activator is present, and so on. Recent research suggests that children are not only able to learn these different causal forms from limited evidence, but they may be more likely to be able to learn more unusual causal forms than adults (Lucas, Bridgers, Griffiths, & Gopnik, 2014). Causal reasoning also depends on a joint inference between categories and causal rela­ tions. For example, imagine playing with a set of objects that includes magnets, metals, and inert (e.g., plastic) objects for the first time. Drawing inferences from (p. 688) pat­ terns of exploration depends on three things. First, the learner must be able to learn that there are exactly three causal “kinds,” the categories of “metal,” “magnet,” and “inert.” Then, causal inference about any particular object depends on both the correct catego­ rization of these objects (which objects are magnets? which are metals? etc.), as well as inferring the causal relations between them (magnets react with other magnets and also with metals, but metals do not react with themselves, and inert objects do not react with anything). Preschool-aged children can jointly infer causal structure and category infor­ mation like this—discovering how many categories exist, which objects should be as­ signed to these categories, and the causal laws that guide the relationship between these categories (Bonawitz, Ullman, Gopnik, & Tenenbaum, 2012; Bonawitz, Ullman, Gopnik, & Tenenbaum, in revision; see also Schulz, Goodman, Tenenbaum, & Jenkins, 2008). These results are consistent with hierarchical computational learning accounts that start with logical grammars; these grammars serve as a broad “language of thought” for how more specific kinds of causal structures could be represented and discovered (Goodman, Ull­ man, & Tenenbaum, 2011; Kemp, Perfors, & Tenenbaum, 2007). Thus, it appears as though children are employing statistical learning strategies that go beyond naïve asso­ ciative accounts. Causal structure accounts suggest a means by which learners can draw inferences about causal events, from relatively limited data. By at least as early as preschool age, children’s sophistical causal inferences are well explained by this frame­ work.

Reconciling Statistical and Domain-Specific Approaches The research on children’s causal reasoning described in the preceding suggests that by at least 4 years of age, children are representing causal structures and using these repre­ sentations to guide their causal inferences. However, children’s causal inferences are supported by more than statistical learning principles—statistical learning is not likely to Page 17 of 38

The Development of Causal Reasoning be able to account for the speed or accuracy in which even very young children draw in­ ferences from ambiguous, noisy data. In contrast, consider that domain-specific mecha­ nisms support rapid learning, but do not necessarily capture the flexibility of data-driven approaches. Might children be able to harness the power of both domain-specific knowl­ edge and statistical learning accounts? Children’s causal inferences clearly include both domain-specific mechanism information and statistical learning. For example, following covariation information, 3-year-old chil­ dren are more likely to extend causal efficacy based on an object’s internal properties when the mechanism is psychological (an “agent” likes the block) than when it is physical (a machine activates it), adding support to the claim that domain-specific mechanistic knowledge can support causal inferences (Sobel & Munro, 2009). Additionally, given iden­ tical covariation evidence, children are more likely to endorse a cause that is domain con­ sistent (e.g., a switch activating a machine) than domain violating (e.g., asking the ma­ chine to go); however, given overwhelming evidence, children are willing to endorse caus­ es that cross domain boundaries (Schulz & Gopnik, 2004). These studies of children’s causal reasoning using mechanism information and statistical learning might lead one to believe that children follow an “either-or” approach. Perhaps children choose to stick with either domain knowledge or evidence, depending on the strength of their beliefs and the evidence. However, children seem able to integrate do­ main knowledge with evidence in a more graded way. Schulz, Bonawitz, and Griffiths (2007) provided 3-, 3.5-, and 4-year-old children with ambiguous, but statistically com­ pelling evidence about possible causes (e.g., A & B→E; A & C→E; A & D→E, etc.), and were asked at the end of the book what caused E. In one condition, the recurring variable A crossed domain boundaries (e.g., “worrying”—a psychological domain covaried with “stomach ache—a biological domain), and in another the variables were all within domain boundaries (e.g., plants causing itchy spots, both biological). While the youngest children had difficulty learning from evidence in either condition, 3.5-year-olds were able to learn from the evidence when all variables were within domain, and 4-year-olds were able to learn from both within and cross-domain conditions. However, 4-year-olds also showed sensitivity to their prior belief in domain boundaries—they were less likely to endorse A as the causal variable when it crossed domain boundaries than when it was within. A fol­ low-up training study with 3.5-year-olds suggested that younger children’s failures to learn were due both to stronger prior belief in domain boundaries and a fragile ability to reason from ambiguous evidence presented in these tasks (Bonawitz, Fischer, & Schulz, 2012). Thus, children may rely on domain-specific information early in life, and especially when statistical evidence is probabilistic. However, by 4 years of age, children seem able to in­ tegrate these beliefs (p. 689) with patterns of evidence to inform their causal judgements including drawing inferences about (Seiver, Gopnik, & Goodman, 2013) and from (Kush­ nir, Vredenburgh, & Schneider, 2013) other people. This research suggests the important role for both domain-specific beliefs and statistical learning in early causal reasoning.

Page 18 of 38

The Development of Causal Reasoning

Inductive Constraints Beyond Domain Knowledge In addition to a belief in domain boundaries, children likely employ numerous inductive constraints to guide their causal inferences, and importantly integrate these constraints with probability information. Such constraints may take many forms. For example, they could include causal-specific principles—such as beliefs in determinism or biases to favor information that comes from one’s own interventions or that has salient spatiotemporal cues. Constraints could also include more general inductive biases that shape children’s preference for certain explanations (such as a preference for simpler explanations) and that shape their interpretation of evidence in different contexts (such as when evidence is generated purposefully/accidentally by a knowledgeable/ignorant agent). Is there evi­ dence that children have and use these inductive constraints in the service of causal rea­ soning? Recent research suggests that preschoolers do indeed bring numerous causal inductive biases to bear on their interpretation of causal events. Preschool-aged children infer the presence of unobserved (inhibitory) causes (or absent generative causes) when observed causes appear stochastic (probabilistic), suggesting that children are causal “determin­ ists” (Schulz & Sommerville, 2006). A bias for determinism can provide a powerful basis for inferring the existence of unobserved variables and can help guide exploration in search of these potential hidden causes (Schulz, Hooppell, & Jenkins, 2008; for evidence from toddlers, see Muentener & Schulz, 2014). Children also demonstrate a bias for evi­ dence observed from their own interventions (Kushnir, Wellman, & Gelman, 2009), but they can ignore this bias when their interventions are confounded, such as when an alter­ nate known cause (flipping a switch) is produced at the same time as the intervention (Kushnir & Gopnik, 2005). Furthermore, children use spatiotemporal information (e.g., whether there is contact between a block and a machine) to guide their causal infer­ ences; however, probability information can overturn this bias (Kushnir & Gopnik, 2007). Thus, children seem ready to bring causal-specific biases to bear in reasoning from evi­ dence, though it is unknown whether or not these biases are learned from experience. More general biases may also help learners constrain and reason from causal informa­ tion. Lombrozo (2007) hypothesized that a bias for “simplicity” might inform inference to the best explanation (see Lombrozo & Vasilyeva, Chapter 22 in this volume). Lombrozo (2007) found that, controlling for the probability of events, adults preferred explanations that were simpler—in that they appealed to fewer causes. Bonawitz and Lombrozo (2012) extended this investigation to preschoolers, to see whether young children rely solely on probability information to select candidate causes, or whether, like adults, children pre­ ferred explanations with fewer causes (controlling for probability information). Children were shown a toy that could light up (when a red chip was placed in an activator bin), spin (when a green chip was placed in the bin), or that could do both simultaneously (when a blue chip was placed in the bin, or when both a red and a green chip were placed in the bin). Children were shown information about the prevalence of red, green, and blue chips. Then a bag with all three chips was accidentally spilled into the bin. Both the fan and light activated, and children were asked what fell into the bin. Children were sensi­ Page 19 of 38

The Development of Causal Reasoning tive to the probability information and began to favor the “red and green chip” explana­ tion as its probability increased across conditions. However, children also showed a strong preference for the simpler, one-chip explanation—even when the complex explana­ tion was twice as likely as the one-chip explanation. These results suggest that children may rely on a principle of parsimony as an inductive constraint to draw inferences about causal events, and that they integrate this constraint with probability information. In addition to integrating inductive biases with data, children could also leverage social information to guide their causal reasoning. In particular, if children were sensitive to the process of how data were generated, that could lead to stronger causal inferences from limited data. Consider watching a person walk over to a wall and flip a light switch, but nothing happens. Although the covariation evidence provides cues that the switch is not related to a light, inferences about a person’s actions (“there must have been a reason that she tried to flip it”) can inform causal inferences. Recent research suggests that children are indeed sensitive to the social cues, and use these cues about how data were generated to draw causal conclusions (Kushnir, Wellman, & Gelman, 2008). For example, Gweon, Tenenbaum, and Schulz (2010) found (p. 690)

that 15-month-old infants attend to whether objects were drawn purposefully or randomly from a box in order to make inferences about the extension of a causal property to novel objects. Sixteen-month-old infants also track whether probabilistic causal outcomes co-oc­ cur with an individual (i.e., person A always makes it go, but person B fails) or object (i.e., the object fails occasionally independent of the user) to decide whether a failed event with the same object is due to their own inability or the object’s inconsistency (Gweon & Schulz, 2011). Other work shows that preschool children use information about whether an experimenter was knowledgeable or ignorant when deciding whether to switch re­ sponses in response to neutral question about a causal event (Gonzalez, Shafto, Bonawitz, & Gopnik, 2012; Bonawitz et al., in review). Thus, from very early on, children seem to understand that people are part of the genera­ tive process; children used information about the other people’s goals and knowledge to draw stronger causal inferences from data. Pedagogy is a special case of leveraging so­ cial information, because the learner can infer not only that data were drawn with intent, but also that data were drawn in order to be maximally efficient for teaching (Shafto, Goodman, & Griffiths, 2014). For example, pedagogical inferences seem to drive children to “over-imitate” causal action sequences; children are more likely to generate longer strings of actions to cause an outcome when those strings are demonstrated by a teacher than when strings are demonstrated by an ignorant actor (Buchsbaum, Gopnik, Griffiths, & Shafto, 2011). The teaching assumption yields a special additional constraint for causal inference. Al­ though lack of evidence does not always entail evidence of a lack, when a teacher choos­ es the data, lack of a demonstration provides such support for evidence of a lack. For ex­ ample, Bonawitz, Shafto, et al. (2011) showed children a novel toy with many interesting pieces (knobs, tubes, buttons) that might afford an interesting event (perhaps squeaking, Page 20 of 38

The Development of Causal Reasoning lighting up, or making music). In one condition, children were given strong pedagogical cues and were told, “this is how my toy works,” as the demonstrator pulled a tube out and caused the toy to squeak. The experimenter did not demonstrate additional properties of the toy. Here, lack of evidence provides evidence of a lack: had there been additional causal affordances, then the teacher should have demonstrated them. We can contrast this inference to a non-pedagogical condition: children were shown the same outcome (the tube causing the toy to squeak), but the event happened accidentally by an experi­ menter who was “ignorant” about how the toy worked. In this case, the child need not draw the inference that squeaking is the only causal function of the toy. In the accidental condition, children explored the toy broadly, suggesting that they believed the toy was likely to have additional properties. In contrast, children in the pedagogical condition constrained their exploration to the pedagogically demonstrated function; this less vari­ able play suggests that children believed that there was likely only one causal property (the demonstrated squeaking event). Furthermore, children use this pedagogical assump­ tion to evaluate whether teachers are accurate, following causal demonstrations. For ex­ ample, if children discover additional properties of a toy following a narrow pedagogical demonstration, children rate those informants lower and are less likely to constrain search following subsequent pedagogical demonstrations from the same informant (Gweon, Pelton, Konopka, & Schulz, 2014).

Discussion of Children’s Causal Reasoning Taken together, these results suggest that even very young children leverage strong in­ ductive biases with patterns of data to inform their inferences about causal events. Given that children seem to form sophisticated causal theories about the world by the first few years of life, it is perhaps not surprising that multiple constraints inform their inferences from data. Rapid learning from minimal data requires that there be inductive biases to limit the myriad possible causal structures that could have produced the data. Important­ ly, these multiple constraints point to ways in which we might expect causal inferences to reflect important developmental differences in learning about cause. Namely, if, as we have suggested, causal reasoning is driven by the interaction of data and these inductive constraints, then changes in these inductive constraints will change the resulting causal inferences. Thus, one can begin to explore developmental changes in domain beliefs, in­ ductive constraints, and social (p. 691) inference more broadly as a possible explanation for observed developmental differences in causal reasoning.

Causal Exploration and Discovery Being able to identify causal scenes in the world is certainly important, but of course it differs from the ability to seek out and discover the causal structure of the world. Al­ though we now know that gravity causes apples to fall from trees, there is nothing in the visual percept that identifies the gravitational pull that eventually connects an apple to the ground beneath a tree. Rather, if we want to understand how the apple falls from the tree and believe that the apple is a prototypical object that does not have the ability to Page 21 of 38

The Development of Causal Reasoning move on its own, then we must infer and posit the existence of an unknown causal source for the motion. Only then are we in a position to discover new causal structure in the world around us. This is the root of scientific exploration: positing new theories, creating new hypotheses, and designing experiments to test those hypotheses. In the previous section, we suggest­ ed that the structure learning account alone provided a privileged role for interventions of these kinds, and we pointed to evidence that children can use intervention information to draw accurate causal inferences. But, to what extent do children spontaneously en­ gage in causal hypothesis testing in their exploratory play? Are children only able to iden­ tify the causal structure of fully transparent predictive relations provided for them? Or, do they seek out causal structure in their everyday play? Recent studies suggest that chil­ dren are engaged in this type of behavior from at least the second year of life (for a re­ view, see Schulz, 2012). One way that play could facilitate causal reasoning is if actions in play serve as new evi­ dence. If one were biased to explore events where evidence has not yet disambiguated between potential causal structures, then play could generate the necessary disambiguat­ ing evidence from which to learn. Are children motivated to explore when causal informa­ tion is ambiguous? Schulz and Bonawitz (2007) investigated this question with preschool­ ers, using a jack-in-the-box with two levers. When the levers were depressed, two toys popped up. The location of the toys were both in the center of the box, so there was no in­ formation that could guide inference as to which lever might cause which toy. In one con­ dition, children observed confounded evidence in which the levers were always depressed simultaneously. Because the levers were depressed simultaneously, the evidence does not disambiguate the potential causal structures. (Either the right lever could cause toy A, or the left, or the toy levers combined could cause one, etc.). Another group of children were introduced to the same box, but observed disambiguating evidence, where one lever was depressed at a time. For all children, the toy was removed and then returned, along with a novel toy, and children were allowed to play freely for 60 seconds. Unsurprisingly, chil­ dren in the unconfounded condition showed a strong preference to play with the new toy. However, in the confounded condition, the pattern of play reversed: children overcame the novelty preference and played more with the familiar (confounded evidence) toy and even spontaneously generated evidence that resolved the confounding. Indeed, other studies have found that when given confounded evidence, children are more likely to pro­ duce variable exploration, which provides evidence to support causal learning (Cook, Goodman, & Schulz, 2011; Gweon & Schulz, 2008). These findings suggest that children are motivated to explore, following uncertainty about causal structure. Other studies have shown that children’s play is guided by the interaction of their strong prior beliefs and the evidence they observe (Bonawitz, van Schijndel, Friel, & Schulz, 2012; Legare, 2012; van Schijndel et al, 2015). For example, Bonawitz, van Schijndel, Friel, and Schulz (2012) looked at children’s exploratory play in the domain of balance. As initially found by Karmiloff-Smith and Inhelder’s (1975) influential study, children initially entertain a “center theory” of balance, believing that regardless of the center of mass, an Page 22 of 38

The Development of Causal Reasoning object should be balanced at its geometric center; gradually, children learn a “mass theo­ ry”—that blocks balance over their center of mass. Bonawitz, van Schijndel, Friel, and Schulz (2012) first tested children on a set of blocks to determine their beliefs about bal­ ance. Then children were shown one of two events. In one event, an unevenly weighted Lshaped block balanced at its center of mass (toward the heavy side). Another group of children saw the L-shaped block “balance” at its geometric center. Note that to mass the­ orists the block balancing at its geometric center is surprising and causes a conflict with their prior beliefs; however, center theorists observing the same evidence have no con­ flict. In contrast, mass theorists seeing the block balance at its center of mass do not ex­ perience conflict, but this same evidence is surprising to center theorists. Neither children’s beliefs or the evidence alone predicted children’s pattern of play. Critically, it was the interaction of (p. 692) children’s beliefs and the evidence that led to significant differences. Children who observed belief-consistent evidence showed a standard prefer­ ence for the novel toy. However, when evidence was surprising with respect to the children’s beliefs, they overcame a preference for the novel toy and spent more time con­ tinuing to explore the block. Children’s explanations on the Bonawitz, van Schijndel, Friel, and Schulz (2012) task also suggested that they were seeking out potential causal explanations during play: Why might the block balance the way it does? In fact, there was a magnet at the balance point of the blocks and the stand. However, although all children discovered the magnet during the course of free play, and the magnet is a reasonable explanation for the block’s balanc­ ing in all conditions, children were significantly more likely to appeal to the magnet as an explanatory variable when evidence conflicted with their beliefs than when it confirmed their beliefs. In a follow-up condition, when no magnets were present and thus could not explain away the surprising evidence, center theory children were significantly more like­ ly to revise their beliefs about balance following play. These results show the important combined role of evidence and children’s prior beliefs in guiding play. Given evidence that is surprising with respect to their beliefs, children are more likely to explore, and thus consequently either they discover causal variables to explain away the evidence, or they generate new evidence from their own interventions that leads to belief revision. These results suggest that children’s desire to learn about causal outcomes is reflected in their play. Other researchers have suggested that the link between play and causal learn­ ing extends beyond generating evidence to resolve immediate, tangible causal outcomes; instead, it could support reasoning through imaginary, possible worlds. Thus, children’s pretend play may guide their understanding of causal events. In particular, it has been suggested that pretend play may facilitate causal counterfactual reasoning, much as thought experiments support reasoning in science (Buchsbaum, Bridgers, Weisberg, & Gopnik, 2012; Gopnik & Walker, 2013; Harris, 2000; Walker & Gopnik, 2013; Weisberg & Gopnik, 2013). Some initial recent evidence supports a link between pretend play and causal reasoning. For example, children’s performance in causal counterfactual reason­ ing tasks and their engagement in pretend play correlate with each other (Buchbaum et

Page 23 of 38

The Development of Causal Reasoning al., 2012). However, many open questions between pretense and causal reasoning re­ main, and additional empirical evidence is still in progress. We have suggested that play and causal inference are tightly coupled. Although the par­ ticular actions children take in the course of play might not be systematic, children’s ex­ ploratory play might nonetheless be driven by opportunities to learn about causal struc­ ture. When evidence is ambiguous or surprising with respect to the children’s beliefs, they explore more and more variably, producing opportunities for causal learning. Fur­ thermore, children’s pretend play may be an early mechanism that supports the develop­ ment of causal counterfactual reasoning.

General Conclusions and Open Questions We have reviewed research on the development of causal reasoning in infancy, toddler­ hood, and the preschool years with the broad goals of (1) understanding the origin of our mature causal reasoning abilities, and (2) discussing how the process of causal reasoning and discovery may change throughout early childhood. Research on causal reasoning in infancy and toddlerhood provides evidence for both domain-specific (object motion, agent action) as well as domain-general (covariation information) roots to causal reasoning. More recent research suggests that representations of agent’s actions may play a particu­ larly important role in the development of causal reasoning. However, independent from the precise origin of causal reasoning, our review on studies with preschool-aged children leads to the conclusion that by about 4 years of age, children are integrating domain-gen­ eral covariation information with domain-specific prior knowledge, as well as with causal inductive constraints and more general inductive biases, to rapidly and effectively repre­ sent causal structure. Yet, despite a large body of research focused on the development of causal reasoning, several open questions remain. With regard to the origins of causal reasoning, how do early notions of causality, such as caused motion and agent action, become integrated across development? One possibility is that causal representations serve to integrate ob­ ject and agent representations as children come to form event representations. Alterna­ tively, there may be multiple representations of cause that are part of infants’ core knowl­ edge of objects and agents (Carey, 2009; Spelke & Kinzler, 2007). While the answers to these question inform our understanding of the development of causal reasoning, they al­ so have important implications for a fuller understanding of our early conceptual system. If infants have several (p. 693) domain-specific representations of causality, then how do we come to integrate these representations over development? How do early non-linguis­ tic representations of causality map onto emerging causal language in early childhood? The fact that adults (as well as children; see Callanan & Oakes, 1992; Hickling & Well­ man, 2001; Muentener & Lakusta, 2011) can use similar language to identify the broad range of causal relations suggests that somehow this does occur over development. How­ ever, what role might language play in helping young children identify and learn about specific causal relations in their everyday life (Muentener & Schulz, 2012)? Page 24 of 38

The Development of Causal Reasoning A second, related question concerns the belief that there is in fact a unified abstract of representation in adult higher cognition. Although adults use similar causal language across domains, the distinction between reasons (e.g., social causation) and causes (e.g., physical causation) in adult causal reasoning suggests a continued distinction within causal reasoning across development. What is the relation between physical and psycho­ logical causation early in development? Although there is recent research suggesting that children use similar processes to reason about causes in the social world (Seiver et al., 2013), future research is needed to understand whether younger infants and toddlers rea­ son in similar ways. The review of research on preschool causal reasoning suggests that children are sophisti­ cated intuitive causal reasoners. However, this finding is in tension with research on children’s causal scientific reasoning abilities. One the one hand, children can successful­ ly use causal information to categorize events and objects (Nazzi & Gopnik, 2003; Schulz, Standing, & Bonawitz, 2008), can understand the relationship between internal parts and causal properties (Sobel, Yoachim, Gopnik, Meltzoff, & Blumenthal, 2007), can use coun­ terfactuals to reason causally (Harris, German, & Mills, 1996), and even can design ap­ propriate interventions on causal systems (Cook et al., 2011; Gopnik, Sobel, Schulz & Gly­ mour, 2001; Gweon & Schulz, 2008). On the other hand, children have difficulty in explic­ it, scientific reasoning studies. In particular, children have a weak explicit understanding of how patterns of data support or falsify possible hypotheses, and they do not isolate variables to produce informative experiments (Klahr, 2000; Kuhn, 1989; Schauble, 1990, 1996). This pattern of results suggests an important divide between the metacognitive processes that support explicit scientific reasoning and the intuitive mechanisms for causal inference in day-to-day experience. Here we have focused on the intuitive mecha­ nisms that might support children’s developing causal reasoning abilities, but under­ standing the way in which metacognitive reasoning might connect to intuitive reasoning remains an important challenge to the field. A second ongoing challenge for studies that demonstrate young children’s rapid and ac­ curate causal inferences pertains to the problem of search. How does a learner find the structure that accurately carves up the world into causes and events, with the relation­ ship and causal weights correctly specified? Consider that the number of possible causal structures that could capture any particular pattern of statistical evidence is vast at best, and often infinite. We have pointed to the role of domain-specific knowledge, inductive constraints, and play-generated interventions as potential tools to deal with this problem of information and search. However, if following a truly rational model of learning—one that considers all possible causal models and selects the best model given the data—then additional constraints only offer a modest narrowing of the space of possibilities; the search problem and computational demands on the learner remain immense. How could children—who, in particular, face even stronger working memory, attentional, and execu­ tive function limitations than adults—possibly hope to solve this search problem?

Page 25 of 38

The Development of Causal Reasoning Recently, researchers have suggested that—rather than considering all possible causal structures—children might “sample” causal hypotheses from a probability distribution (Bonawitz, Denison, Griffiths, & Gopnik, 2014). For example, in a paradigm close to Bonawitz and Lombrozo’s simplicity experiment (2012), children were shown a box full of red and blue chips in different proportions and were asked to predict which chip was most likely to have fallen out of a “spilled” bag to activate a machine (Denison, Bonawitz, Gopnik, & Griffiths, 2013). Children’s responses were variable—the same child would sometimes say “red” and sometimes say “blue.” But they also closely tracked the proba­ bility of the relevant hypotheses—children said “red” more often when that was more likely to be the correct answer. In follow-up work, Bonawitz et al. (2011, 2014) proposed different specific sampling rules, such as a win-stay, lose-sample strategy that greatly re­ duces the computational demands of causal structure search; they showed that in simple causal learning paradigms, children and adults’ behavior matched well with the predic­ tions of this algorithm. Although these algorithmic approaches (p. 694) offer a promising potential solution to the search problem, we have only just begun to explore which algo­ rithms best capture young learners’ causal learning. Future work is needed to under­ stand the degree to which these algorithms solve the search problem and capture causal learning behavior more generally. A potential limitation of our claims is that the focus of the research discussed here comes primarily from “WEIRD” populations (Western Educated Industrialized Rich Democratic). Indeed, many studies show important cultural differences in causal, categorical infer­ ences (e.g., Coley, 2000; Lopez et al., 1997; Ross et al., 2003). Other studies have demon­ strated cultural and developmental differences in identifying causes, such as “scientific” versus “magical” mechanisms (Subbotsky, 2001; Subbotsky & Quinteros, 2002). However, the differences in causal inductive inferences found in these studies are almost certainly driven by cultural differences in the content of knowledge. It is less clear the degree to which experience shapes the form of causal learning. That is, we have suggested that causal reasoning is driven by the interaction of data and inductive constraints, and it is unlikely that this process is radically different across cultures. However, if some inductive constraints are learned, this could explain additional cross-cultural differences in causal learning (see Bender, Beller, & Medin, Chapter 35 in this volume, for further discussion). In sum, understanding the development of causal reasoning can have both near- and farreaching implications. On the near side, understanding the origins of causal reasoning and how it changes over the course of development can help us to better understand the process by which we continue to discover and learn about causal structure throughout adulthood. On the far side, given the ubiquity of causal reasoning in higher-order cogni­ tion, understanding causal reasoning early in development can have implications for our understanding of the origins and development of our conceptual system more broadly.

References Ahn, W., Kalish, C., Medin, D., & Gelman, S. (1995). The role of covariation versus mecha­ nism information in causal attribution. Cognition, 54, 299–352. Page 26 of 38

The Development of Causal Reasoning Allan, L. (1980). A note on measurements of contingency between two binary variables in judgment tasks. Bulletin of the Psychonomic Society, 15, 147–149. Baillargeon, R. (2004). Infants’ physical world. Current Directions in Psychological Science, 13, 89–94. Ball, W. (1973). The perception of causality in the infant. Presented at the Meeting of the Society for Research in Child Development, Philadelphia, PA. Belanger, N., & Desrochers, S. (2001). Can 6-month-old infants process causality in differ­ ent types of causal events? British Journal of Developmental Psychology, 19, 11–21. Bonawitz, E., Denison, S., Chen, A., Gopnik, A., & Griffiths, T. L. (2011, July). A simple se­ quential algorithm for approximating Bayesian inference. In L. Carlson, C. Holscher, & T. Shipley (Eds.), Proceedings of the 33rd annual conference of the Cognitive Science Soci­ ety (pp. 2463–2468). Bonawitz, E., Denison, S., Gopnik, A., & Griffiths, T. L. (2014). Win-stay, lose-sample: A simple sequential algorithm for approximating Bayesian inference. Cognitive psychology, 74, 35–65. Bonawitz, E., Denison, S., Griffiths, T. L., & Gopnik, A. (2014). Probabilistic models, learn­ ing algorithms, and response variability: sampling in cognitive development. Trends in Cognitive Sciences, 18, 497–500. Bonawitz, E., Ferranti, D., Saxe, R., Gopnik, A., Meltzoff, A., Woodward, J., & Schulz, L. (2010). Just do it? Investigating the gap between prediction and action in toddlers’ causal inferences. Cognition, 115, 104–117. Bonawitz, E., Fischer, A., & Schulz, L. (2012). Teaching 3.5-Year-Olds to Revise Their Be­ liefs Given Ambiguous Evidence. Journal of Cognition and Development, 13(2), 266–280. Bonawitz, E., & Lombrozo, T. (2012). Occam’s Rattle: Children’s use of simplicity and probability to constrain inference. Developmental Psychology, 48, 1156–1164. Bonawitz, E., Shafto, P., Bridgers, S., Gonzalez, A., Yu, Y., & Gopnik, A. (in review) Chil­ dren rationally change their beliefs in response to neutral follow-up questions. Bonawitz, E., Shafto, P., Gweon, H., Goodman, N. D., Spelke, E., & Schulz, L. (2011). The double-edged sword of pedagogy: Instruction limits spontaneous exploration and discov­ ery. Cognition, 120, 322–330. (p. 695) Bonawitz, E., Ullman, T., Gopnik, A., & Tenenbaum, J. (2012, November). Sticking to the evidence? A computational and behavioral case study of micro-theory change in the do­ main of magnetism. In Development and learning and epigenetic robotics (ICDL), 2012 IEEE International Conference (pp. 1–6). IEEE.

Page 27 of 38

The Development of Causal Reasoning Bonawitz, E., van Schijndel, T., Friel, D., & Schulz, L. (2012). Children balance theories and evidence in exploration, explanation, and learning. Cognitive Psychology, 64, 215– 234. Buchsbaum, D., Bridgers, S., Weisberg, D., & Gopnik, A. (2012). The power of possibility: Causal learning, counterfactual reasoning, and pretend play. Philosophical Transactions of the Royal Society B: Biological Sciences, 367(1599), 2202–2212. Buchsbaum, D., Gopnik, A., Griffiths, T., & Shafto, P. (2011). Children’s imitation of causal action sequences is influenced by statistical and pedagogical evidence. Cognition, 120, 331–340. Bullock, M., Gelman, R., & Baillargeon, R. (1982). The development of causal reasoning. In W. J. Friedman (Ed.), The developmental psychology of time (pp. 209–254). New York: Academic Press. Callanan, M., & Oakes, L. (1992). Preschoolers’ questions and parents’ explanations: Causal thinking in everyday activity. Cognitive Development, 7, 213–233. Carey, S. (1985). Conceptual change in childhood. Cambridge, MA: MIT Press. Carey, S. (2009). The origin of concepts. New York: Oxford University Press. Cashon, C., Ha, O., Allen, C., & Barna, A. (2013). A U-shaped relation between sitting abil­ ity and upright face processing in infants. Child Development, 84, 802–809. Cheng, P. (1997). From covariation to causation: A causal power theory. Psychological Re­ view, 104, 367–405. Cheng, P. (2000). Causality in the mind: Estimating contextual and conjunctive power. In F. C. Keil & R. A. Wilson (Eds.), Explanation and cognition (pp. 227–253). Cambridge, MA: MIT Press. Cicchino, J. B., & Rakison, D. H. (2008). Producing and processing self-propelled motion in infancy. Developmental Psychology, 44, 1232–1241. Cohen, L., & Amsel, G. (1998). Precursors to infants’ perception of the causality of a sim­ ple event. Infant Behavior and Development, 21, 713–732. Cohen, L., Amsel, G., Redford, M., & Casasola, M. (1998). The development of infant causal perception. In A. Slater (Ed.), Perceptual development: Visual, auditory and speech perception in infancy (pp. 167–209). London: UCL Press. Cohen, L., Chaput, H., & Cashon, C. (2002). A constructivist model of infant cognition. Cognitive Development, 17, 1323–1343. Cohen, L., & Oakes, L. (1993). How infants perceive a simple causal event. Developmen­ tal Psychology, 29, 421–433. Page 28 of 38

The Development of Causal Reasoning Coley, J. D. (2000). On the importance of comparative research: The case of folkbiology. Child Development, 82–90. Cook, C., Goodman, N. D., & Schulz, L. E. (2011). Where science starts: Spontaneous ex­ periments in preschoolers’ exploratory play. Cognition, 120, 341–349. Csibra, G. & Gergely, G. (1998). The teleological origins of mentalistic action explana­ tions: A developmental hypothesis. Developmental Science, 1, 255–259. Cushman, F. (2008). Crime and punishment: Distinguishing the roles of causal and inten­ tional analyses in moral judgment. Cognition, 108, 353–380. Denison, S., Bonawitz, E., Gopnik, A., & Griffiths, T. (2013). Rational variability in children’s causal inference: The Sampling Hypothesis. Cognition, 126, 285–300. Fisher, C., Hall, D., Rakowitz, S., & Gleitman, L. (1994). When it is better to receive than to give: Syntactic and conceptual constraints on vocabulary growth. Lingua, 92, 333–375. Glymour, C. N. (2001). The mind’s arrows: Bayes nets and graphical causal models in psy­ chology. Cambridge, MA: MIT Press. Gonzalez, A. Shafto, P., Bonawitz, E., & Gopnik, A. (2012) Is that your final answer? The effects of neutral queries on children’s choices. In N. Miyake, D. Peebles, & R. Cooper (Eds.), Proceedings of the Thirty-fourth Cognitive Science Society (pp. 1614–1619). Goodman, N., Ullman, T., & Tenenbaum, J. (2011). Learning a theory of causality. Psycho­ logical Review, 118, 110. Gopnik, A. (2000). Explanation as orgasm and the drive for causal knowledge: The func­ tion, evolution, and phenomenology of the theory formation system. In F. C. Keil & R. A. Wilson (Eds.), Explanation and cognition (pp. 299–323). Cambridge, MA: MIT Press. Gopnik, A., & Glymour, C. (2002). Causal maps and Bayes nets: A cognitive and computa­ tional account of theory-formation. In P. Carruthers, S. Stich, & M. Siegal (Eds.), The cog­ nitive basis of science (pp. 117–132). Cambridge: Cambridge University Press. Gopnik, A., Glymour, C., Sobel, D., Schulz, L., Kushnir, T., & Danks, D. (2004). A theory of causal learning in children: Causal maps and Bayes nets. Psychological Review, 111, 1– 31. Gopnik, A., & Meltzoff, A. N. (1997). Words, thoughts, and theories. Cambridge, MA: MIT Press. Gopnik, A., & Sobel, D. (2000). Detecting blickets: How young children use information about novel causal powers in categorization and induction. Child Development, 71, 1205– 1222

Page 29 of 38

The Development of Causal Reasoning Gopnik, A., Sobel, D., Schulz, L., & Glymour, C. (2001). Causal learning mechanisms in very young children: Two, three, and four-year-olds infer causal relations from patterns of variation and covariation. Developmental Psychology, 37, 620–629. Griffiths, T., Sobel, D., Tenenbaum, J., & Gopnik, A. (2011). Bayes and blickets: Effects of knowledge on causal induction in children and adults. Cognitive Science, 35, 1407–1455. Gopnik, A., & Walker, C. M. (2013). Considering counterfactuals: The relationship be­ tween causal learning and pretend play. American Journal of Play, 6, 15–28. Gopnik, A. & Wellman, H. (1992). Why the child’s theory of mind is really a theory. Mind & Language, 7, 145–171. Gopnik, A., & Wellman, H. (2012). Reconstructing constructivism: Causal models, Bayesian learning mechanisms, and the theory theory. Psychological Bulletin, 138, 1085– 1108. Gweon, H., Pelton, H., Konopka, J. A., & Schulz, L. E. (2014). Sins of omission: Children selectively explore when teachers are under-informative. Cognition, 132, 335–341. Gweon, H., & Schulz, L. (2008). Stretching to learn: Ambiguous evidence and variability in preschoolers’ exploratory play. In B. C. Love, K. McRae, & V. M. Sloutsky (Eds.), Pro­ ceedings of the 30th Annual Conference of the Cognitive Science Society (pp. 570–574). Austin, TX: Cognitive Science Society. Gweon, H., & Schulz, L. E. (2011). 16-month-olds rationally infer causes of failed actions. Science, 332, 1524. Gweon, H., Tenenbaum, J., & Schulz, L. (2010). Infants consider both the sample and the sampling process in inductive generalization. Proceedings of the National Academy of Sciences, 107, 9066–9071. Hamlin, J. K. (2013). Moral judgment and action in preverbal infants and toddlers: Evi­ dence for an innate moral core. Current Directions in Psychological Science, 22, 186–193. (p. 696)

Harris, P. L. (2000). The work of the imagination. Oxford: Blackwell Publishing. Harris, P., German, T., & Mills, P. (1996). Children’s use of counter-factual thinking in causal reasoning. Cognition, 61, 233–259. Hickling, A. K., & Wellman, H. M. (2001). The emergence of children’s causal explana­ tions and theories: Evidence from everyday conversation. Developmental Psychology, 37, 668–683. Hume, D. (1748/1999). An enquiry concerning human understanding. Oxford: Oxford Uni­ versity Press. Jackendoff, R. (1990). Semantic structures. Cambridge, MA: MIT Press. Page 30 of 38

The Development of Causal Reasoning Jenkins, H., & Ward, W. (1965). Judgment of contingency between responses and out­ comes. Psychological Monographs: General and Applied, 79, 1–17. Karmiloff-Smith, A. and Inhelder, B. (1975). If you want to get ahead, get a theory. Cogni­ tion, 3, 195–212. Keil, F. (2006). Explanation and understanding. Annual Review of Psychology, 57, 227– 254. Kemp, C., Perfors, A., & Tenenbaum, J. (2007). Learning overhypotheses with hierarchical Bayesian models. Developmental Science, 10, 307–321. Klahr, D. (2000). Exploring science: The cognition and development of discovery process­ es. Cambridge, MA: MIT Press. Kosugi, D., & Fujita, K. (2002). How do 8-month-old infants recognize causality in object motion and that in human action? Japanese Psychological Research, 44, 66–78. Kosugi, D., Ishida, H., & Fujita, K. (2003). 10-month-old infants’ inference of invisible agent: Distinction in causality between object motion and human action. Japanese Psycho­ logical Research, 45, 15–24. Kotovsky, L., & Baillargeon, R. (1998). The development of calibration-based reasoning about collision events in young infants. Cognition, 67, 311–351. Kotovsky, L., & Baillargeon, R. (2000). Reasoning about collisions involving inert objects in 7.5-month-old infants. Developmental Science, 3, 344–359. Kuhn, D. (1989). Children and adults as intuitive scientists. Psychological Review, 96, 674–689. Kushnir, T., & Gopnik, A. (2005). Young children infer causal strength from probabilities and interventions. Psychological Science, 16, 678–683. Kushnir, T., & Gopnik, A. (2007). Conditional probability versus spatial contiguity in causal learning: Preschoolers use new contingency evidence to overcome prior spatial as­ sumptions. Developmental Psychology, 43, 186–196. Kushnir, T., Vredenburgh, C., & Schneider, L. A. (2013). “Who can help me fix this toy?:” The distinction between causal expertise and conventional knowledge guides preschool­ ers’ causal learning from informants. Developmental Psychology, 49(3), 446. Kushnir, T., Wellman, H. M., & Gelman, S. A.(2008). The role of preschoolers’ social un­ derstanding in evaluating the informativeness of causal interventions. Cognition, 107, 1084–1092. Kushnir, T., Wellman, H. M., & Gelman, S. A. (2009). A self-agency bias in children’s causal inferences. Developmental Psychology, 45, 597–603. Page 31 of 38

The Development of Causal Reasoning Kushnir, T., Xu, F. & Wellman, H. (2010). Young children use statistical sampling to infer the preferences of other people. Psychological Science, 21, 1134–1140. Legare, C. (2012). Exploring explanation: Explaining inconsistent information guides hy­ pothesis-testing behavior in young children. Child Development, 83, 173–185 Leslie, A. (1982). The perception of causality in infants. Perception, 11, 173–186. Leslie, A. (1984a). Spatiotemporal continuity and the perception of causality in infants. Perception, 13, 287–305. Leslie, A. (1984b). Infant perception of a manual pick-up event. British Journal of Develop­ mental Psychology, 2, 19–32. Leslie, A. (1995). A theory of agency. In D. Sperber, D. Premack, & A. Premack (Eds.), Causal cognition: A multidisciplinary debate (pp. 121–141). Oxford: Clarendon Press. Leslie, A., & Keeble, S. (1987). Do six-month-old infants perceive causality? Cognition, 25, 265–288. Levin, B., & Rappaport-Hovav, M. (1995). Unaccusativity: At the syntax-lexical semantics interface. Cambridge, MA: MIT Press. Libertus, K., & Needham, A. (2010). Teach to reach: The effects of active vs. passive reaching experiences on action and perception. Vision Research, 50, 2750–2757. Libertus, K., & Needham, A. (2011). Reaching experience increases face preference in 3month-old infants. Developmental Science, 14, 1355–1364. Lombrozo, T. (2007). Simplicity and probability in causal explanation. Cognitive Psycholo­ gy, 55, 232–257. Lopez, A., Atran, S., Coley, J. D., Medin, D. L., & Smith, E. E. (1997). The tree of life: Uni­ versal and cultural features of folkbiological taxonomies and inductions. Cognitive Psy­ chology, 32, 251–295. Lucas, C., Bridgers, S., Griffiths, T., & Gopnik, A. (2014). When children are better (or at least more open-minded) learners than adults: Developmental differences in learning the forms of causal relationships. Cognition, 131, 284–299. Luo, Y., Kaufman, L., & Baillargeon, R. (2009). Young infants’ reasoning about physical events involving self- and nonself-propelled objects. Cognitive Psychology, 58, 441–486. Ma, L. & Xu, F. (2011). Young children’s use of statistical sampling evidence to infer the subjectivity of preferences. Cognition, 120, 403–411. Mackintosh, N. (1975). A theory of attention: Variations in the associability of stimuli with reinforcement. Psychological Review, 82, 276–298.

Page 32 of 38

The Development of Causal Reasoning McClelland, J., & Thompson, R. (2007). Using domain-general principles to explain children’s causal reasoning abilities. Developmental Science, 10, 333–356. Meltzoff, A. N. (1995). Understanding the intentions of others: Reenactment of intended acts by 18-month-old children. Developmental Psychology, 31, 838–850. Meltzoff, A. (2007). “Like me”: A foundation for social cognition. Developmental Science, 10, 126–134. Michotte, A. (1963). The perception of causality. New York: Basic Books. Muentener, P., Bonawitz, E., Horowitz, A., & Schulz, L. (2012). Mind the gap: Investigat­ ing toddlers’ sensitivity to contact relations in predictive events. PLoS ONE, 7(4), e34061. Muentener, P., & Carey, S. (2010). Infants’ causal representations of state change events. Cognitive Psychology, 61, 63–86. Muentener, P., & Lakusta, L. (2011). The intention-to-CAUSE bias: Evidence from children’s causal language. Cognition, 119, 341–355. Muentener, P. & Schulz, L. (2012). What doesn’t go without saying: Communication, in­ duction, and exploration. Language, Learning, and Development, 8, 61–85. Muentener, P., & Schulz, L. (2014). Toddlers infer unobserved causes for spontaneous events. Frontiers in Psychology, 5, 1496. (p. 697) Müller-Lyer, F. (1889). Optische Urteilstäuschunge Archiv für Physiologie, 2(Suppl.) pp. 263–270. Naigles, L. (1990). Children use syntax to learn verb meanings. Journal of Child Language, 17, 357–374. Nazzi, T., & Gopnik, A. (2003). Sorting and acting with objects in early childhood: An ex­ ploration of the use of causal cues. Cognitive Development, 18, 219–237. Needham, A., Barrett, T., & Peterman, K. (2002). A pick-me-up for infants’ exploratory skills: Early simulated experiences reaching for objects using “sticky mittens” enhances young infants’ object exploration skills. Infant Behavior and Development, 25, 279–295. Newman, G., Choi, H., Wynn, K., & Scholl, B. (2008). The origins of causal perception: Ev­ idence from postdictive processing in infancy. Cognitive Psychology, 57, 262–291. Notaro, P., Gelman, S., & Zimmerman, M. (2001). Children’s understanding of psy­ chogenic bodily reactions. Child Development, 72, 444–459. Oakes, L. (1994). Development of infants’ use of continuity cues in their perception of causality. Developmental Psychology, 30, 869–879.

Page 33 of 38

The Development of Causal Reasoning Oakes, L., & Cohen, L. (1990). Infant perception of a causal event. Cognitive Development, 5, 193–207. Pearl, J. (1998). Graphs, causality, and structural equation models. Sociological Methods Research, 27, 226–284. Pearl, J. (2000). Causality: Models, reasoning and inference. Cambridge: Cambridge Uni­ versity Press. Piaget, J. (1954). The construction of reality in the child. New York: Basic Books. Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement.Classical conditioning II: Cur­ rent research and theory, 2, 64–99. Ross, N., Medin, D., Coley, J. D., & Atran, S. (2003). Cultural and experiential differences in the development of folkbiological induction. Cognitive Development, 18, 25–47. Rozenblit, L., & Keil, F. (2002). The misunderstood limits of folk science: An illusion of ex­ planatory depth. Cognitive Science, 26(5), 521–562. Saffran, J., Aslin, R., & Newport, E. (1996). Statistical learning by 8-month-old infants. Science, 274, 1926–1928. Saxe, R., & Carey, S. (2006). The origin of the idea of cause: Critical reflections on Michotte’s theory with evidence from infancy. Acta Psychologica, 123, 144–165. Saxe, R., Tenenbaum, J., & Carey, S. (2005). Secret agents: Inferences about hidden caus­ es by 10- and 12-month-old infants. Psychological Science, 16, 995–1001. Saxe, R., Tzelnic, T., & Carey, S. (2007). Knowing who dunnit: Infants identify the causal agent in an unseen causal interaction. Developmental Psychology, 43, 149–158. Schauble, L. (1990). Belief revision in children: The role of prior knowledge and strate­ gies for generating evidence. Journal of Experimental Child Psychology, 49(1), 31–57. Schauble, L. (1996). The development of scientific reasoning in knowledge-rich contexts. Developmental Psychology, 32(1), 102. Schlottmann, A. (2001). Perception versus knowledge of cause and effect in children: When seeing is believing. Current Directions in Psychological Science, 10(4), 111–115. Schlottmann, A., Ray, E., Mitchell, A. & Demetriou, N. (2006). Perceived social and physi­ cal causality in animated motions: Spontaneous reports and ratings. Acta Psychologica, 123, 112–143. Scholl, B., & Nakayama, N. (2002). Causal capture: Contextual effects on the perception of collision events. Psychological Science, 13, 493–498.

Page 34 of 38

The Development of Causal Reasoning Scholl, B., & Nakayama, N. (2004). Illusory causal crescents: Misperceived spatial rela­ tions due to perceived causality. Perception, 33, 455–470. Scholl, B., & Tremoulet, P. (2000). Perceptual causality and animacy. Trends in Cognitive Sciences, 4, 299–309. Schulz, L. (2012). The origins of inquiry: Inductive inference and exploration in early childhood. Trends in Cognitive Sciences, 16, 382–389. Schulz, L. E. & Bonawitz, E. B. (2007) Serious fun: Preschoolers engage in more ex­ ploratory play when evidence is confounded. Developmental Psychology, 43, 1045–1050. Schulz, L., Bonawitz, E. B., & Griffiths, T. (2007). Can being scared make your tummy ache? Naive theories, ambiguous evidence and preschoolers’ causal inferences. Develop­ mental Psychology, 43, 1124–1139. Schulz, L., Goodman, N., Tenenbaum, J., & Jenkins, C. (2008). Going beyond the evidence: Abstract laws and preschoolers’ responses to anomalous data. Cognition, 109, 211–223. Schulz, L., & Gopnik, A. (2004). Causal learning across domains. Developmental Psycholo­ gy, 40, 162–176. Schulz, L., Gopnik, A., & Glymour, C. (2007). Preschool children learn about causal struc­ ture from conditional interventions. Developmental Science, 10, 322–332. Schulz, L., Hooppell, K., & Jenkins, A. (2008). Judicious imitation: Young children imitate deterministic actions exactly, stochastic actions more variably. Child Development, 79, 395–410. Schulz, L., Kushnir, T., & Gopnik, A. (2007). Learning from doing: Intervention and causal inference. In A. Gopnik & L. Schulz (Eds.), Causal learning: Psychology, philosophy, and computation (pp. 67–85). Oxford: Oxford University Press. Schulz, L. E. & Sommerville, J. (2006). God does not play dice: Causal determinism and children’s inferences about unobserved causes. Child Development, 77, 427–442. Schulz, L., Standing, H., & Bonawitz, E. B. (2008). Word, thought and deed: The role of object labels in children’s inductive inferences and exploratory play. Developmental Psy­ chology, 44, 1266–1276. Seiver, E., Gopnik, A., & Goodman, N. (2013). Did she jump because she was the big sis­ ter or because the trampoline was safe? Causal inference and the development of social attribution. Child Development, 84, 443–454. Shafto, P., Goodman, N. D., & Griffiths, T. L. (2014). A rational account of pedagogical rea­ soning: Teaching by, and learning from, examples. Cognitive Psychology, 71, 55–89. Shanks, D. (1995). The psychology of associative learning. Cambridge: Cambridge Univer­ sity Press. Page 35 of 38

The Development of Causal Reasoning Shultz, T. (1982a). Rules of causal attribution. Monographs of the Society for Research in Child Development, 47. Shultz, T. (1982b). Causal reasoning in the social and nonsocial realms. Canadian Journal of Behavioural Science, 14(4), 307. Smith, L., & Yu, C. (2008). Infants rapidly learn word-referent mappings via cross-situa­ tional statistics. Cognition, 106, 1558–1568. Sobel, D., & Kirkham, N. (2006). Bayes nets and blickets: Infants developing representa­ tions of causal knowledge. Developmental Science, 10, 298–306. Sobel, D. M., & Kirkham, N. Z. (2007). Bayes nets and babies: Infants’developing statisti­ cal reasoning abilities and their representation of causal knowledge. Developmental Science, 10, 298–306. (p. 698) Sobel, D. M., & Munro, S. E. (2009). Domain generality and specificity in children’s causal inference about ambiguous data. Developmental Psychology, 45, 511–524. Sobel, D., Tenenbaum, J., & Gopnik, A. (2004). Children’s causal inferences from indirect evidence: Backwards blocking and Bayesian reasoning in preschoolers. Cognitive Science, 28, 303–333. Sobel, D., Yoachim, C., Gopnik, A., Meltzoff, A., & Blumenthal, E. (2007). The blicket with­ in: Preschoolers’ inferences about insides and causes. Journal of Cognition and Develop­ ment, 8, 159–182. Sommerville, J., Woodward, A., & Needham, A. (2005). Action experience alters 3-monthold infants’ perception of others’ actions. Cognition, 96, B1–B11. Spelke, E. (1990). Principles of object perception. Cognitive Science, 14, 29–56. Spelke, E., Breinlinger, K., Macomber, J., & Jacobson, K. (1992). Origins of knowledge. Psychological Review, 99, 605–632. Spelke, E., & Kinzler, K. (2007). Core knowledge. Developmental Science, 10, 89–96. Spelke, E., Phillips, A., & Woodward, A. (1995). Infants’ knowledge of object motion and human action. In D. Sperber, D. Premack, & A. Premack (Eds.), Causal cognition: A multi­ disciplinary debate. Oxford: Oxford University Press. Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, prediction, and search. MIT Press: Cambridge. Steyvers, M., Tenenbaum, J. B., Wagenmakers, E. J., & Blum, B. (2003). Inferring causal networks from observations and interventions. Cognitive Science, 27(3), 453–489.

Page 36 of 38

The Development of Causal Reasoning Subbotsky, E. (2001). Causal explanations of events by children and adults: Can alterna­ tive causal modes coexist in one mind? British Journal of Developmental Psychology, 19, 23–45. Subbotsky, E., & Quinteros, G. (2002). Do cultural factors affect causal beliefs? Rational and magical thinking in Britain and Mexico. British Journal of Psychology, 93, 519–543. Tenenbaum, J. B., & Griffiths, T. L. (2003). Theory-based causal inference. Advances in Neural Information Processing Systems, 43–50. van Schijndel, T. J., Visser, I., van Bers, B. M., & Raijmakers, M. E. (2015). Preschoolers perform more informative experiments after observing theory-violating evidence. Journal of Experimental Child Psychology, 131, 104–119. Walker, C. M., & Gopnik, A. (2013). Causality and imagination. In M. Taylor (Ed.), The Ox­ ford handbook of the development of imagination (pp. 342–358). Oxford: Oxford Universi­ ty Press. Weisberg, D. S., & Gopnik, A. (2013). Pretense, counterfactuals, and Bayesian causal models: Why what is not real really matters. Cognitive Science, 37, 1368–1381. Wellman, H. M., & Gelman, S. A. (1992). Cognitive development: Foundational theories of core domains. Annual Review of Psychology, 43(1), 337–375 Wellman, H. M., Hickling, A. K., & Schult, C. A. (1997). Young children’s psychological, physical, and biological explanations. New Directions for Child and Adolescent Develop­ ment, 75, 7–26. White, P. (1995). The understanding of causation and the production of action: From in­ fancy to adulthood. Hillsdale, NJ: Lawrence Erlbaum Associates. White, P. (2014). Singular clues to causality and their use in human causal judgment. Cog­ nitive Science, 38, 38–75. Woodward, A. (1998). Infants selectively encode the goal object of an actor’s reach. Cog­ nition, 69, 1–34. Woodward, J. (2003). Making things happen. New York: Oxford University Press. Woodward, J. (2007). Interventionist theories of causation in psychological perspective. In A. Gopnik & L. Schulz (Eds.), Causal learning: Psychology, philosophy, and computation (pp. 19–36). New York: Oxford University Press. Woodward, A. (2009). Infants’ learning about intentional action. In A. Woodward & A. Needham (Eds.), Learning and the infant mind (pp. 227–248). Oxford University Press. Woodward, A., Phillips, A., & Spelke, E. (1993). Infants’ expectations about the motion of animate versus inanimate objects. In Proceedings of the 15th annual meeting of the Cog­ nitive Science Society. Boulder, CO. Page 37 of 38

The Development of Causal Reasoning Wu, R., Gopnik, A., Richardson, D., & Kirkham, N. (2011). Infants learn about objects from statistics and people. Developmental Psychology, 47, 1220–1229. Xu, F., & Tenenbaum, J. (2007a). Word learning as Bayesian inference. Psychological Re­ view, 114, 245–272. Xu, F., & Tenenbaum, J. (2007b). Sensitivity to sampling in Bayesian word learning. Devel­ opmental Science, 10, 288–297.

Notes: (1.) The term “agents” can refer to three distinct kinds of individuals. The common, gen­ eral use of “agents” in developmental research refers to intentional beings, such as peo­ ple. We use the abstract term “agents,” rather than a more specific terms such as “peo­ ple,” since studies have shown that infants and adults alike are able to engage in social reasoning about individuals that behave like people (e.g., an object that appears to move on its own and change direction) despite not looking like a person. “Dispositional agents” refer to individuals that are capable of intentional and causal actions—these are enduring properties of the individual. In contrast, “situational agents” are the causes of an out­ come in a given situation, but do not necessarily have enduring causal powers outside of that situation (e.g., a billiard ball can cause another ball to move, but is not independent­ ly capable of causing motion). We refer to all three notions of agency throughout this chapter and specify the specific notion of agency throughout (“agents,” “dispositional agents,” “situational agents”). (2.) As discussed earlier, the term “situational agent” refers to an individual that causes an outcome in a given situation, without specifying whether the ability to cause an out­ come is an enduring property of the individual. Similarly, the term “situational patient” refers to an individual that undergoes a change in a given causal event, without specify­ ing whether the individual is incapable of causing another event to occur. “Patient” is used instead of “effect” to distinguish between the individual that undergoes a change (“patient”) and the change itself (“effect”).

Paul Muentener

Department of Psychology Tufts University Medford, Massachusetts, USA Elizabeth Bonawitz

Department of Psychology Rutgers University - Newark Newark, New Jersey, USA

Page 38 of 38

Causal Reasoning in Non-Human Animals

Causal Reasoning in Non-Human Animals   Christian Schloegl and Julia Fischer The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.36

Abstract and Keywords One goal of comparative cognitive studies is to achieve a better understanding of the se­ lective pressures and constraints that play a role in cognitive evolution. This chapter fo­ cuses on the question of causal reasoning in animals, which has mainly been investigated in tool-using and large-brained species. Our survey reveals that numerous animal species appear to be sensitive to violations of causality and may even be tuned to attend to causally relevant features. This, in turn, may facilitate causal learning. The ability to draw logical conclusions and make causal deductions, however, seems to be restricted to few species and limited to (ecologically) relevant contexts. It seems warranted to reject the traditional associationist view that non-human animals lack any understanding of causali­ ty, but convincing evidence for human-like abilities is lacking. For instance, animals do not appear to understand the causal structure of interventions. Keywords: animals, causal learning, evolution of cognition, tool use

Whether or not we grant non-human animals (hereafter: “animals”) the ability to think, or reason, depends very much on the time we live in, and the scientific school we belong to (Menzel & Fischer, 2011; Wild, 2008). Radical US behaviorists, for instance, questioned whether the “inner processes” that underlie animal behavior were amenable to scientific investigation (Watson, 1913); they assumed that animal behavior could largely be ex­ plained by the formation of stimulus–response patterns. Ethologists, in contrast, were much more interested in innate components, such as instincts, and, to a lesser degree, emotions. The cognitive revolution of the 1960s paved the way for the view that animals were capable of storing, processing, and retrieving information, and thus for cognitive ap­ proaches to animal behavior (Menzel & Fischer, 2011). But even before this turn of the tide, a handful of scientists who studied animal intelligence, such as Wolfgang Köhler (1917/1963) and Robert Yerkes (1916) were interested in the reasoning abilities of ani­ mals. The purpose of this chapter is to provide an overview of the studies on animal causal rea­ soning. After a brief introduction into the history of the field and the terminology used, Page 1 of 30

Causal Reasoning in Non-Human Animals we discuss the developments over the last two decades; we will begin with an overview of studies on important prerequisites for causal reasoning, namely the ability to tune into causally relevant features and sensitivity for disruptions of causal regularities. Next, we will discuss inference by exclusion and various forms of reasoning about object–object re­ lationships before turning toward reasoning about the outcome of one’s own actions. As the reader will see, not all animal taxa (i.e., groups of phylogenetically related species) have attracted the same amount of attention. Great apes tend to be the stars of the show, but considerable work has also been done in monkeys, and more recently in corvids and dogs. We conclude with a discussion of the limitations when it comes to reconstructing the evolution of cognitive abilities, and will provide some suggestions for future research.

(p. 700)

A Little Bit of History

Wolfgang Köhler (1917/1963) was interested in the problem-solving abilities of chim­ panzees (Pan troglodytes). In a series of now famous experiments, he was able to demon­ strate that chimpanzees are able to combine tools to gain access to otherwise out-ofreach food. In another classical experiment, the animals stacked several boxes on top of each other to reach a banana attached to a string several meters above their head. Köhler did not use the term “causal reasoning”, however, and rather spoke of “insight”, although he clearly assumed that the chimpanzees understood causal relationships between ob­ jects. Similarly, other scientists who studied animals’ understanding of causal relation­ ships at that time did not refer to causal understanding or reasoning (e.g., Grether & Maslow, 1937; Klüver, 1933). Apart from these initial studies, the topic of causal reasoning was largely neglected until Premack and Premack (1994) published their seminal study on causal understanding in chimpanzees. The authors conducted a series of experiments to distinguish between three “[l]evels of causal understanding in chimpanzees and children,” with causal reason­ ing regarded as the deepest level, defined as an individual “solving problems in which he sees the outcome of a process but not its cause, and must infer or reconstruct the missing events” (p. 348). At the intermediate level, the subject would need to be able to decom­ pose “intact causal sequences” (p. 348), such as an actor using a tool to manipulate an object, and to label the different components accordingly. At the “most superficial level” (p. 348), an individual would be able to “complete an incomplete representation of a causal action, by selecting the correct alternative” (p. 348). Premack and Premack then set out to test if their chimpanzees would reach the highest level. In the first task, the apes first learned to run down a path to a spot where food was hidden. After they had comprehended this task, the experimenter hid a rubber snake in the hiding spot at ran­ dom intervals on 15% of the trials, which greatly disturbed the animals. In the test, the subject was able to see another chimpanzee, which had just completed the task, either af­ ter having encountered the food, or a snake. At stake was the question of whether the subject would be able to infer from the (positive or negative) emotional state of the other

Page 2 of 30

Causal Reasoning in Non-Human Animals chimpanzee what would be found in the hide. None of the four chimpanzees was able to do this. In the second task, the chimpanzees first observed how an experimenter hid an apple and a banana under two boxes. The subject was then distracted from the boxes for two min­ utes before seeing another person eating either an apple or a banana. Here, the question was whether the chimpanzee would infer that the second person had raided one of the boxes and that they should better approach the alternative box to obtain a reward. One of the four test subjects solved this second task instantaneously, while one always chose the container that held the food the other person had been eating. The other two chim­ panzees erred on the first trials, but then started to solve the task. Hence, the results of only one chimpanzee suggest that it is within the species’ realm to reach the highest level of causal reasoning. From these findings, Premack and Premack (1994) concluded that learning may be found in numerous species, but reasoning, at least at the highest level, only in a very few. But possibly, reasoning about the cause of an emotional state may be very different from rea­ soning about the causal relationships resulting in the eating of fruit. Failures to reason may therefore be due to a lack of understanding of a very specific causal relationship, rather than a lack of the ability to reason per se. Interestingly, in violation of expectancy paradigms, human infants detect some violations of causality at an earlier age than other violations (Baillargeon, 2004; Muentener & Bonawitz, Chapter 33 in this volume); similar­ ly, animals may not understand all causal relationship as causal and thus may not be able to reason about them.

Prerequisites for Causal Reasoning: Identifying Causally Relevant Features and Detecting Vio­ lations of Causality A first step in the exploration of causal reasoning in animals is to assess whether they are sensitive to causal relationships. O’Connell and Dunbar (2005) showed chimpanzees and bonobos (Pan paniscus) videos of causally plausible and implausible events. For instance, in one set of videos, a banana was either lifted by a hand or was flying upward before the hand touched it. If the apes had seen the causally plausible event several times until they lost interest (“habituation”), and were then presented with the implausible event, they looked longer at the screen than if they had been shown the videos in reverse order. The authors interpreted this as sensitivity to causal plausibility (see Cacchione & Krist, 2004, for similar findings). Hanus and Call (2011) found that (p. 701) chimpanzees learn discrim­ inations faster if these are based on causal rather than on arbitrary cues; for instance, when searching for food in opaque bottles, they learned to discriminate between lighter and heavier bottles faster than to discriminate between bottles of different colors, sug­ gesting that chimpanzees are tuned to attend to causally relevant cues.

Page 3 of 30

Causal Reasoning in Non-Human Animals Among monkeys, capuchin monkeys (Sapajus libidinosus) received the most attention. This species is a highly proficient tool user (Fragaszy, Visalberghi, & Fedigan, 2004) and uses stones as a hammer to crack nuts, which they position on hard surfaces (“anvils”). They are therefore seen as prime candidates for some understanding of causality. Like chimpanzees, capuchin monkeys quickly learn in nut-cracking tasks to attend to the weight of tools and to select heavier stones as hammers (Schrauf, Call, Fuwa, & Hirata, 2012; Schrauf, Huber, & Visalberghi, 2008). Visalberghi, Fragaszy, and colleagues demon­ strated that the monkeys also considered several other causally relevant features when selecting tools (i.e., mass of the stone, friability, distance to transport, features of the anvils; Fragaszy, Greenberg, et al., 2010; Fragaszy, Pickering, et al., 2010; Liu et al., 2011; Massaro, Liu, Visalberghi, & Fragaszy, 2012; Visalberghi et al., 2009). Besides primates, corvids (ravens, crows, magpies, and jays) are famous for their large brains (Emery & Clayton, 2004) and advanced cognitive abilities (e.g., Bugnyar & Hein­ rich, 2005; Clayton & Dickinson, 1998; Dally, Emery, & Clayton, 2006). Rooks (Corvus frugilegus), for instance, showed indications of surprise and looked longer at pictures of causally implausible spatial relationships (e.g., objects suspended in mid-air above a sur­ face) than at plausible illustrations (Bird & Emery, 2010). New Caledonian crows (Corvus moneduloides) belong to the most prolific tool users in the animal kingdom (Hunt, 1996, 2000; Weir, Chappell, & Kacelnik, 2002); like capuchin monkeys (Fragaszy, Greenberg, et al., 2010; Visalberghi et al., 2009) and chimpanzees (Sabbatini et al., 2012), they attend to causally relevant features and select tools of the appropriate, but not necessarily opti­ mal, length (Chappell & Kacelnik, 2002) and diameter (Chappell & Kacelnik, 2004). Simi­ lar to chimpanzees (Hanus & Call, 2011), these crows also quickly learned to select causally relevant tools when presented with a novel situation (Taylor et al., 2011; see Jel­ bert, Taylor, Cheke, Clayton, & Gray, 2014, for similar findings in a different task). While the aforementioned examples focused mainly on physical causal relationships, some studies investigated the sensitivity to cause–effect relationships in social actions and events. For instance, chimpanzees and bonobos were shown a video in which either a human pushed another person from a chair to obtain a fruit, or one person simply fell off a chair and the fruit moved by itself to the other person (O’Connell & Dunbar, 2005). In another video, they either saw chimpanzees hunting and killing a colobus monkey or they saw the same video played backward. In test trials, the apes looked longer if they had been habituated to the causally plausible video than vice versa, again suggesting an un­ derstanding for causal plausibility and surprise upon seeing an implausible event. In a study on free-ranging African elephants (Loxodonta africana), Bates et al. (2008) placed a mix of fresh urine and earth in the path of traveling groups and found that the animals in­ spected the urine samples longer if they stemmed from a female traveling behind them, indicating that they were surprised to detect the scent of this animal in a causally impos­ sible location. An indication for an innate preference for “causal agents” comes from a study with chicks (Gallus gallus). Right after hatching, chicks are imprinted on their parents and begin to constantly follow them. In the absence of their parents, chicks can also be imprinted on Page 4 of 30

Causal Reasoning in Non-Human Animals other individuals or even moving objects. Mascalzoni, Regolin, and Vallortigara (2010) showed freshly hatched chicks a so-called Michotte’s launching event, in which one mov­ ing object A touches another object B, which subsequently begins to move. These launch­ ing events elicit strong impressions of causality in humans, (i.e., that A causes B to move; White, Chapter 14 in this volume). Even though both objects had moved identical dis­ tances, chicks preferred object A when subsequently presented with a choice between both objects; in other words, they preferred the self-propelled over the not self-propelled object. As these chicks did not have any previous experiences with moving objects, it ap­ peared that the chicks have an innate sensitivity for causal agents. Taken together, these studies indicate that several species are sensitive to causally rele­ vant features or cues in their environment (e.g., the weight or length of a tool), may be in­ nately tuned to (at least some) of these features, and act as if surprised if causally sound relations are violated (e.g., objects floating in mid-air). Nevertheless, there is consider­ able variation between the tasks. In the study of O’Connell and Dunbar (2005) mentioned earlier, the strength of the dishabituation in the test trial was strongest with the (p. 702) hunting sequence shown in reversed order and lowest in case of a banana suspended in mid-air. In the work of Cacchione and Krist (2004), the apes looked equally long at a scene in which an apple was resting on a horizontal board and a scene in which the apple was touching a vertical board but was otherwise floating. Interestingly, many species also fail at solidity tasks. The general principle here is that a reward cannot pass through a solid barrier and the subjects are asked to search for the reward; most subjects, however, including non-human primates and dogs (Canis familiaris), often search behind the barri­ er, suggesting that they assume that the reward could have passed through the barrier (see Müller, Riemer, Range, & Huber, 2014, and references therein, and also Kundey, De Los Reyes, Taglang, Baruch, & German, 2010, for positive evidence in dogs). There are also notable species differences. In a task to assess if capuchin monkeys and chimpanzees are attentive to the functional, causally relevant features of tools, they were trained to insert a stick into a tube to obtain a reward hidden inside the tube (Sabbatini et al., 2012; see Figure 34.1). The animals could choose between sticks of different lengths (with only the longest stick long enough to reach the reward) and different han­ dles. During a transfer phase, the handles were switched between the tools. Only the chimpanzees attended to the functional features and continued to use the tool with the appropriate length, whereas the capuchin monkeys needed considerably more training to do so. Capuchin monkeys tested by Fujita, Kuroshima, and Asai (2003) had to choose be­ tween two rewards and two hooked tools to pull in the reward (designed after the original study by Brown, 1990). On each trial, the location of the reward, the presence of a trap into which a reward could fall, and the orientation of the hook ensured that only one tool was functional; hence, only the choice of the correct tool would enable the monkeys to ob­ tain the reward. The monkeys, however, were not able to incorporate the orientation of the hook, the location of the reward, and the location of a trap in their decisions at the same time. These limitations are in stark contrast to the monkeys’ performances in the nut-cracking tasks mimicking a natural foraging situation (see earlier discussion). Possi­ Page 5 of 30

Causal Reasoning in Non-Human Animals bly, the capuchin monkeys’ attention to causally relevant features is restricted and at­ tuned to their natural tool-using behavior.

Causal Reasoning Given that at least some species are sensitive to causally relevant features, one may probe if they are indeed able to make causal inferences, that is, if the animals can use their understanding of (certain) causal relationships to make deductions about unobserv­ able reasons for outcomes they have observed. Within the faculty of reason, differences seem to exist in the levels of complexity of the tasks, as well as the mental processes re­ quired to solve them. In the following, we will discuss different types of causal reasoning.

Inference by Exclusion One relatively simple form of causal reasoning is the ability to select the correct option by excluding potential alternatives. This corresponds to the “superficial level” identified by Premack and Premack (1994), and has been extensively studied in animals. To our knowl­ edge, the first experiment on inference by exclusion was conducted in the 1930s to test “insight” in monkeys. (p. 703) Grether and Maslow (1937) confronted seven different species with two cups, with food hidden underneath one of them. The empty cup was then lifted to show that nothing was underneath it. Some subjects instantaneously selected the alternative cup, suggesting that they grasp the relationship that because one cup is emp­ ty, the reward must be hidden in the other cup.

Figure 34.1 (a) A capuchin monkey solving the tube task of Sabbatini et al. (2012); a reward is located at the end of the tube; if pushed with the tool, the re­ ward drops on the sloped board underneath and can be accessed by the monkey. (b) (1) Tools with differ­ ent handles as used in the training phase; (2) tools with the new handles used in one transfer phase; for details, see Sabbatini et al. (2012). Photo (a) by Elisabetta Visalberghi; reprinted with permission from Sabbatini et al. (2012).

In the 2000s, the research community rediscovered Grether and Maslow’s (1937) task, and a modified version quickly became popular in animal cognition research. Call (2004), for instance, again hid a food reward in one of two cups, but now confronted the subjects not only with a condition in which they were informed that one cup was empty (and the exclusion could be made), but also with control conditions. In these, the animals either saw the content of the baited cup, of both cups, or of none of the cups. In a plethora of Page 6 of 30

Causal Reasoning in Non-Human Animals studies, success in this task could be demonstrated in the great apes (Call, 2004; Hill, Collier-Baker, & Suddendorf, 2011), capuchin monkeys (Paukner, Huntsberry, & Suomi, 2009; Sabbatini & Visalberghi, 2008), olive baboons (Papio hamadryas anubis) (Schmitt & Fischer, 2009), ravens (Corvus corax) (Schloegl et al., 2009) and Clark’s nutcrackers (Nu­ cifraga columbiana) (Tornick & Gibson, 2013). Although most of the tested species solved this task, performances differ as several species seem to be highly susceptible to modifications of the experimental procedures. Sheep (Ovis orientalis aries) and dwarf goats (Capra aegagrus hircus) were distracted by the hand movements of the experimenter when he lifted the empty cup to reveal its con­ tent. When the setup was changed to control for the distracting movement cues, a few in­ dividual goats could solve the task (Nawroth, von Borell, & Langbein, 2014; see Erdöhe­ gyi, Topál, Virányi, & Miklósi, 2007, for parallel results in dogs). Similarly, only one out of six carrion crows (Corvus corone corone) managed to avoid the empty cup when its con­ tent was shown, whereas two birds significantly preferred the empty cup (three other birds were indifferent). When the experimenters controlled for the movement during the lifting of the cup, those birds that had been previously distracted now managed to avoid the empty cup, whereas the indifferent birds continued to choose at random (Mikolasch, Kotrschal, & Schloegl, 2012). Kea (Nestor notabilis), a parrot species, failed in the visual task, but did not appear to be distracted by hand movements (Schloegl et al., 2009); a fol­ low-up study using a different paradigm suggested, however, that the experimenter’s ma­ nipulations may nevertheless have prevented the keas from solving the task (O’Hara, Gaj­ don, & Huber, 2012). Although it was originally assumed that success in this task would be based on “inferen­ tial reasoning by exclusion” (e.g., Call, 2004; Erdöhegyi et al., 2007), this interpretation is no longer upheld. To assume that the animals indeed reason, it would be necessary that they first exclude the empty cup and then logically infer that the other cup must be baited (Aust, Range, Steurer, & Huber, 2008; Paukner et al., 2009; Schmitt & Fischer, 2009). Un­ fortunately, the animals could also simply avoid the empty cup and choose the other cup. This “avoidance” strategy is now considered as the more parsimonious explanation for success in these tasks.

Reasoning About Object−Object Relationships: Noise as Causal Pre­ dictor A more complex level would be achieved if subjects were capable of (mentally) complet­ ing a cause–effect relationship (i.e., to infer the cause of an effect, or vice versa). To tack­ le this question, several tasks have been developed, among them an acoustic version of the “empty cup” task (Call, 2004). Here, the content of the cups must be inferred from the sound the bait is making: after baiting, the experimenter shakes the baited or the empty cup, or both; a rattling noise indicates the presence, a silent shaking the absence of the reward. Importantly, several control conditions are required, among them one in which the cups are not shaken but the rattling noise is played back (e.g., Call, 2004; Schloegl, Schmidt, Boeckle, Weiß, & Kotrschal, 2012). The reasoning behind this control Page 7 of 30

Causal Reasoning in Non-Human Animals condition is that the animals should not select the noisy cup if they understood that, in the test condition, the noise is a causal consequence of the food’s presence in the cup and its movement due to the shaking. In contrast, the animals should select the noisy cup if they had learned an association between the noise and the presence of the food. So far, only the great apes, grey parrots, and a group of noise-experienced capuchin monkeys chose the other cup if they witnessed a silent shaken cup, and passed all the control conditions (Call, 2004; Hill et al., 2011; Sabbatini & Visalberghi, 2008; Schloegl et al., 2012). In tests with wild boars (Sus scrofa scrofa) and pigs (Sus scrofa domesticus), only the domestic pigs that lived in an enriched environment with frequent contact with humans preferred the “noisy” cup, but controls suggested that they responded to the shaking movement, but not the noise (Albiach-Serrano, Bräuer, Cacchione, Zickert, & Amici, 2012). (p. 704) Bräuer, Kaminski, Riedel, Call, and Tomasello (2006) reported similar results for dogs, which also chose a non-moving cup from which an arbitrary, causally irrelevant sound was played. Monkeys received considerable attention, but are usually unsuccessful (He­ imbauer, Antworth, & Owren, 2012; Paukner et al., 2009; Schmitt & Fischer, 2009) or re­ quire extensive experience (Sabbatini & Visalberghi, 2008). This suggests that only very few species solve this task through reasoning, whereas most fail entirely or attend to oth­ er cues (e.g., shaking movement). For some, this may have to do with sensory differences, as Plotnik, Shaw, Brubaker, Tiller, and Clayton (2014) recently suggested that the acoustic domain is inappropriate for tests with Asian elephants (Elephas maximus), but that these animals may solve such tasks if olfactory cues are provided. Similarly, monkeys’ problems with this task reflect the general pattern that these animals have enormous difficulties in solving operant conditioning tasks involving noises (in contrast to visual stimuli); the rea­ son for this remains unclear, as monkeys in the wild rapidly learn to associate specific sounds with the appearance of predators (Fischer, 2002). It is also noteworthy that one study found tentative evidence for a better performance of two lemur species when acoustic rather than visual cues were provided (Maille & Roeder, 2012).

Reasoning About Object−Object Relationships: The Predictive Value of Covers That monkeys have rather limited skills in tasks requiring a deeper understanding of cause–effect relationships than simply avoiding an empty cup became also evident in a study on long-tailed macaques (Macaca fascicularis). These monkeys had to find food that was either hidden under a small board or a hollow cup (Schloegl, Waldmann, & Fischer, 2013). If the reward was hidden under the board (which consequently was then inclined), the monkeys did locate the reward (Figure 34.2). If the reward was under the cup (which obviously did not alter its appearance), the animals chose at random. Apparently, the monkeys could not use the absence of an inclination of the board to exclude this option, and to infer in a second step that the food must be under the cup. This finding parallels the results in the auditory exclusion task: here, many animals, including most monkey species, fail to use the absence of the noise from the silent shaken cup to exclude this op­ tion, and to infer that the food must be in the other cup.

Page 8 of 30

Causal Reasoning in Non-Human Animals

Figure 34.2 Illustration of the setup used in the study by Schloegl, Waldmann, and Fischer (2013) with long-tailed macaques. In the upper panel, a food reward was hidden underneath the board, which then was inclined (note that the food reward was not visible for the monkeys, but is shown here for illus­ tration purposes). The monkeys could use the inclina­ tion to find the reward. In the lower panel, the re­ ward was hidden in the cup; the monkeys failed to in­ fer the location of the reward based on the absence of an inclination of the board.

The study by Schloegl, Waldmann, and Fischer (2013) had been inspired by similar work with chimpanzees: when one board was inclined and the other one was lying flat on the ground, the apes demonstrated a clear preference for the inclined board. Thus, the apes seemed to be aware that objects influence the orientation of other objects (Call, 2007). In a next step, they were allowed to choose between a small piece of highly preferred ba­ nana and a large piece of much less preferred carrot. The chimpanzees clearly preferred the banana. If, however, the same pieces of food were hidden underneath two boards, the chimpanzees failed to incorporate the strength of the inclination to infer where the small banana and where the large piece of carrot must be hidden; instead, they went for the stronger inclination (Call, 2007). Interestingly, these findings mirror those of the longtailed macaques, which also tended to show the same bias for the stronger inclination (Schloegl et al., 2013). In a striking study, chicks that were a few days old, were imprint­ ed on objects and then were asked to find these objects underneath and behind occlud­ ers. In contrast to the apes and monkeys, the chickens were not distracted by the strength of inclination and approached the board with the imprinted object underneath even in the presence of another, stronger inclined board (Chiandetti & Vallortigara, 2011). When dogs were confronted with an inclined and a flat board, they preferred the inclined board, but (p. 705) only if they had observed the human experimenter hiding the reward and manipulating the boards; this is in contrast to chimpanzees, which chose the inclined board regardless of whether they had seen an experimenter manipulating it beforehand (Bräuer et al., 2006). Finally, in tests with wild boars and domesticated pigs, only the pigs Page 9 of 30

Causal Reasoning in Non-Human Animals living in a less enriched environment (i.e., living mainly on concrete floor) showed a pref­ erence for inclined boards (Albiach-Serrano et al., 2012). Taken together, species differ­ ences, rearing history, and experience may influence performance in these tasks. A very different approach was taken by Waldmann, Schmid, Wong, and Blaisdell (2012), who trained rats (Rattus norvegicus) to expect a reward when a light turned on. In the following extinction phase, for half of the rats, the access to the reward was blocked by a cover, whereas the other rats experienced that no reward was available from the dis­ penser. In a subsequent test phase, the access to the dispenser was open for all rats, and those animals that had experienced a blocked access in the extinction phase showed a stronger expectancy for the reward than the other group. This was then interpreted as support for the assumption that rats can distinguish between the absence of events and the lack of evidence for the absence of events.

Reasoning About Object−Object Relationships: Weight and Food Trails as Causal Predictors Similar to shape, weight can also be used as a predictive cue for the presence of an ob­ ject or a reward. In one study, a banana was hidden in one of two containers and both containers were placed on a balance; the balance subsequently tipped toward the heav­ ier, baited container. The observing chimpanzees used this information to infer where the banana must have been hidden (Hanus & Call, 2008). Likewise, when a baited and an empty container could be pulled up by a string, chimpanzees quickly learned to (a) pull both containers to compare their weight and then (b) to choose the heavier container (Schrauf & Call, 2011). Long-tailed macaques, in contrast, had considerable difficulties in a similar study (Klüver, 1933); it seemed as if they did not pay attention to the weight ini­ tially, and started to compare the two different containers only after the experimenter had increased the weight difference dramatically to highlight the relevance of the weight. Some authors argued that performances may increase if tasks are modified (reviewed by Seed & Call, 2009; Shettleworth, 2010) or if their cognitive load is reduced (Seed, Sed­ don, Greene, & Call, 2012). Völter and Call (2014) proposed that many of the previously mentioned tasks required a relatively elaborate technical understanding and therefore physical knowledge, which may interfere with pure causal knowledge. To test the latter only, they designed a task in which a small bowl of yogurt was hidden underneath one of two cups. The bowl was “leaking” and lost some yogurt. When the baited and the nonbaited cup were displaced, the baited cup left a visual trail of yogurt. The tested apes used the trail to identify the baited cup; in control conditions, they also were attentive to the temporal component of the cause–effect sequence and ignored yogurt trails that had been visible before the cups were displaced (Völter & Call, 2014).

Reasoning About Actions of Social Agents Whereas the previously mentioned studies focused on physical relationships, a number of studies also incorporated social actions. One example mentioned earlier is the Page 10 of 30

Causal Reasoning in Non-Human Animals apple−banana−task by Premack and Premack (1994). In this task, the animals had to in­ fer the social cause (the experimenter removing the apple) of the observed effect (experi­ menter eating the apple). These results were replicated in all great ape species (Call, 2006), and one grey parrot (but six others failed; Mikolasch, Kotrschal, and Schloegl, 2011). Subsequently, the performance of grey parrots in this task was increased when the cups used for hiding the rewards had different colors (Pepperberg, Koepke, Livingston, Girard, & Hartsfield, 2013). One (out of six) Clark’s nutcrackers was also successful in this task, but only when non-food objects had been hidden (Tornick & Gibson, 2013). Based on the results of an entirely different paradigm, it was assumed that New Caledon­ ian crows are able to reason about hidden causal agents (Taylor, Miller, & Gray, 2012). Captive crows were allowed to retrieve a food reward from a box mounted on a table. Next to the box was a hide, and before the crows could retrieve the reward from the box, they saw a wooden stick protrude from the hide and move back and forth; the stick then disappeared again in the hide (Figure 34.3). Importantly, the stick was moving in a posi­ tion where the bird’s head would be when retrieving the food (i.e., this stick could poten­ tially hit the bird in the head). The authors created two conditions. In the first condition, the birds had seen a human agent disappear behind the hide right before the stick ap­ peared; after the stick’s movement, (p. 706) the human agent left the hide. In the second condition, no human agent walked behind the hide. Thus, in the first condition it ap­ peared as if the human was responsible for the stick movement, whereas in the second condition the cause for the stick’s movement remained unclear. After the stick had disap­ peared and the birds had descended to the table to retrieve the reward from the box, the authors measured how often the birds inspected the hide to check if the stick might reap­ pear. These inspections were significantly more frequent in the second condition, which raises the possibility that the birds did indeed reason about the causes for the stick’s movement (but see Boogert, Arbilly, Muth, & Seed, 2013; Dymond, Haselgrove, & McGre­ gor, 2013; Taylor, Miller, & Gray, 2013a, 2013b, for a discussion of these findings).

Figure 34.3 Illustration of the setup used by Taylor et al. (2012) to investigate New Caledonian crows’ reasoning about hidden causal agents. In this condi­ tion, the agent enters a hide (left panel), a stick probes through a hole in front of the block the crow could feed from (second panel from left), the agent leaves the hide (second panel from right), and the crow approaches and retrieves food from the block (right panel). Figure by Alex H. Taylor; reprinted with permission from Taylor (2014).

Page 11 of 30

Causal Reasoning in Non-Human Animals Cheney, Seyfarth, and Silk assessed the responses of chacma baboon females (Papio ursi­ nus) to seemingly normal and anomalous interactions of other group members. When high-ranking animals approach lower-ranking ones in a benign fashion (because they want to inspect the subordinate’s infant, for instance), they frequently utter grunts, to which the subordinate sometimes responds with fear barks. Using playback experiments, causally consistent (dominant individual grunts, subordinate fear barks) and inconsistent (subordinate individual grunts, dominant fear barks) call sequences were played to fe­ males. The subjects looked significantly longer in the direction of the calls when an incon­ sistent call sequence compared to a consisted one was played; this finding was interpret­ ed as an indicator of the baboons’ understanding of the causes of this unlikely scenario (i.e., that a rank-reversal must have occurred; Cheney, Seyfarth, & Silk, 1995). Diana monkeys (Cercopithecus diana) are hunted by chimpanzees, and both species are preyed upon by leopards. If the monkeys see or hear chimpanzees, they usually retreat silently. When they hear chimpanzee alarm screams (which may indicate the presence of leopards), in contrast, they respond with their own specific leopard alarm calls. The au­ thors took this to indicate that the Diana monkeys understand the underlying causal structure (i.e., that the chimpanzees scream because they have seen a leopard; Zuberbüh­ ler, 2003). The problem with this interpretation is, however, that it cannot be excluded that the monkeys living in chimpanzee territory simply learned to associate the chim­ panzee alarm scream with the presence of leopards. Taken together, these studies suggest that several animal species can make inferences about physical- as well as social-causal relationships, with the great apes and some bird species demonstrating particularly advanced skills. It seems, however, that many perfor­ mances may not be very robust, as several species appear susceptible to sometimes only small modifications of the test conditions. In some cases, this may be due to sensory pref­ erences of a species (i.e., some species may have problems in dealing with certain senso­ ry modalities such as acoustic cues), but in other cases may reflect only weak causal un­ derstanding. However, species differences may also reflect evolutionary patterns, which we will discuss later in detail.

Reasoning About the Outcome of One’s Own Actions: Tool Use As mentioned earlier, tool users receive considerable attention, as these species appear predestined to possess causal reasoning skills. Therefore, a number of studies assessed if tool-using animals, for instance, apes (e.g., Mulcahy & Schubiger, 2014; Seed et al., 2012), New Caledonian crows (e.g., Hunt & Gray, 2003; Weir et al., 2002), and capuchin monkeys (e.g., Moura & Lee, 2004; Visalberghi et al., 2009), can predict the effect their own actions will have, (p. 707) and if a sophisticated understanding of cause–effect rela­ tions is the basis for their behavior. Researchers had noted that chimpanzees appear to modify tools “with a plan” (e.g., Boesch & Boesch, 1990), as if they knew in advance what to do to a tool to make it functional. This led to the suggestion that chimpanzees reason

Page 12 of 30

Causal Reasoning in Non-Human Animals causally (Visalberghi & Limongelli, 1994), in the sense that they have at least a rudimen­ tary understanding for the causally relevant characteristics of a given tool. Capuchin monkeys were trained to use a stick to push a food reward out of a horizontal, non-movable tube. After they had learned how to solve this task, the tube was modified and a trap (i.e., a floored hole in the tube, similar to the pockets in a pool-billiard table; see Figure 34.4) was added to its middle. Now, the animals needed to pay attention to the position of the trap and to push the reward away from the pocket. Would they be able to predict the effect of their actions? In the inaugural study using this now-famous experi­ mental design, only one of four capuchin monkeys avoided the trap (Visalberghi & Limon­ gelli, 1994). When, however, the tube was rotated so that the trap was above the reward and therefore non-functional, the monkey still avoided the trap; thus, capuchin monkeys may use tools without an understanding for cause–effect relationships. Similar results were also obtained in chimpanzees (see Seed, Call, Emery, & Clayton, 2009, and refer­ ences therein) and woodpecker finches (Cactospiza pallida) (Tebbich & Bshary, 2004; see also Emery & Clayton, 2009, for a review). Yet, several studies found that small modifica­ tions to the original design, such as allowing chimpanzees to move the reward with their fingers instead of a stick, can significantly alter the outcome (Seed et al., 2009). Further­ more, Silva, Page, and Silva (2005) demonstrated that human adults, just like capuchin monkeys, also continue to avoid the non-functional trap, leaving the diagnostic value of a failure to avoid the trap unclear. In an improved version of the trap tube (Figure 34.5), non-tool-using rooks were tested with four different versions of a two-trap tube: here, the birds had to identify in advance which of the two traps is functional, and to pull the reward in the other direction. Seven of eight rooks learned to solve the first two versions, and one of them managed to trans­ fer this instantaneously to two novel versions. Thus, this bird may indeed have under­ stood the causal properties of each trap and could predict what would happen if it pulled the reward toward the traps (Seed, Tebbich, Emery, & Clayton, 2006). In a similar task, (p. 708) three (of six) New Caledonian crows learned to solve an initial two-trap tube task and transferred their skills in two (of three) modified tasks and to another trap-table task; the trap-table is functionally similar, but visually distinct from the trap-tube. Because all three crows also solved the trap-table task, it was suggested that the crows may solve this through causal and potentially also analogical reasoning (Taylor, Hunt, Medina, & Gray, 2008).

Page 13 of 30

Causal Reasoning in Non-Human Animals

Figure 34.4 The trap-tube task as used in earlier studies with, for instance, capuchin monkeys and woodpecker finches. Figure by Amanda Seed; reprinted with permission from Seed et al. (2006).

Figure 34.5 Modified versions of the trap-tube task introduced by Seed et al. (2006) in their study with rooks. The stick is already inserted in the tube and the reward is located between two clear Perspex discs. Moving the stick causes the discs (and thereby the reward) to move. Horizontal black discs indicate closing of the traps. Arrows represent the correct so­ lution in each condition. Figure by Amanda Seed; Reprinted with permission from Seed et al. (2006).

An especially prominent example of the assumed cleverness of crows is Aesop’s (ca. 620– 560 B.C.) fable “The Crow and the Pitcher.” In this fable, a thirsty crow cannot reach the water at the bottom of a pitcher, and starts dropping stones into it until the water level is raised sufficiently for the bird to reach the water. When orangutans (Pongo abelii) were confronted with an analogous task in which a peanut was floating in a water-filled tube out of reach of the apes’ hands, they obtained water from a dispenser and spit the water into the tube to raise the water level. All subjects solved the task on the first trial and, on average, within less than 10 minutes; again, this suggests that they quickly understood how to solve the task (Mendes, Hanus, & Call, 2007). Rooks also solved the task instanta­ neously — this time, just as in the fable, by dropping stones into a water-filled tube (Bird Page 14 of 30

Causal Reasoning in Non-Human Animals & Emery, 2009b). Subsequent studies with Eurasian jays (Garrulus glandarius) and New Caledonian crows suggested, however, that the birds must first learn to drop stones into a tube with water for the water to rise (Cheke, Bird, & Clayton, 2011; Taylor et al., 2011). Learning seemed to be facilitated if causal cues were available: for instance, jays and crows learned to drop stones into the tube, but did not learn to select one of two differ­ ently colored tubes in the same number of trials (Cheke et al., 2011), or to select a tube based on the size of a stone placed in front of it (Taylor et al., 2011). This led Taylor and colleagues to propose a “causal learning account”, according to which correlations be­ tween stimuli are easier learned if they are causally relevant; in other words, dropping a larger stone into a tube with water is causally relevant, because it raises the water level, whereas the stone in front of the tube is an arbitrary feature and causally irrelevant. The birds must then have a predisposition for (or be attuned to) causal features (see also Hanus and Call, 2011, for a similar argumentation for chimpanzees).

Reasoning About the Outcome of One’s Own Actions: Causal Reason­ ing in Rats In the previous section we have seen that tool-using species seem to have an understand­ ing of what the outcome of their tool-related actions might be. They may not do so through sudden “insight”, but their predispositions to attend to causally relevant features may facilitate learning. But is this a specific skill of tool-using or large-brained species? In a highly influential study, Blaisdell and colleagues (2006) explored causal reasoning in rats. Using Pavlovian conditioning, the rats first learned to associate a light cue with a following tone and, in a second step, an association of the light and a following food re­ ward. The idea was that the rats would acquire a common cause model in which the light causes the tone and the food; as a consequence, the tone should also be predictive for the presence of the food reward. In a second training step, the rats were split into two groups. One group was confronted with a lever, which, upon being pressed by the rat, caused the same tone as during the first training period. The other group was also con­ fronted with a lever, but the occurrence of the tone was independent from pressing the lever. In the final test, then, the authors measured the rats’ expectancy to obtain a re­ ward after hearing the tone in the absence of the light (expectancy was measured by the frequency of rats poking their nose in the food dispenser). The rats that were confronted with the lever causing the tone (group 1) expected food less frequently than the rats whose lever presses had no observable effect (group 2). This suggests that the rats had some form of causal understanding: during training, the tone was predictive for the pres­ ence of food (via the light); in the test, this causality still held for the rats in the second group, even though they could not observe the light (actually, the light was off, but the rats may have assumed that they could not observe it). For the first group, however, this causality did not hold any more, as a new cause for the tone was introduced: their own lever pressing (Blaisdell et al., 2006; but see, for instance, Dwyer, Starns, & Honey, 2009, or Penn & Povinelli, 2007, for alternative views). Similar to the results of Blaisdell and colleagues, Beckers, Miller, De Houwer, and Urushihara (2006) could demonstrate that conditional forward blocking in rats is sensitive to experimental modifications that are Page 15 of 30

Causal Reasoning in Non-Human Animals not accounted for by classical associative theories, but can be expected from a causal rea­ soning perspective (see also Boddez, De Houwer, & Beckers, Chapter 4 in this volume).

Inferring a Required Action from Observation So far, we have talked about animals’ ability to reason about the outcome that their own actions (p. 709) will or should have. While at least some species may be able to do so, this may not be a result of “insight”, but rather they may have to learn about the outcome of their actions before applying them correctly. But would animals also know what to do from observation alone? Tomasello and Call (1997) rejected this idea, and suggested that after having seen fruits falling off a tree shaken by the wind, apes would not get the idea of having to shake a tree themselves to reproduce this effect. Interventions in this sense are seen as a pinnacle of human causal understanding (Taylor et al., 2014), but have re­ ceived relatively little attention in the animal cognition literature. Inspired by work by von Bayern and colleagues (von Bayern, Heathcote, Rutz, & Kacelnik, 2009), in which the authors demonstrated that New Caledonian crows can transfer a cause–effect relation­ ship to novel types of tools, Taylor et al. (2014) investigated if these birds would find a novel solution to reproduce an effect they had previously observed. In their experiment, the birds (and 24-month-old children) were confronted with a plastic block inside an ap­ paratus. If they accidentally touched the block, it could fall on a platform, which, in turn, would lead to a reward. In the following test, the plastic block was located outside the ap­ paratus, but could be inserted through a hole. According to the authors, operant condi­ tioning should cause the subjects to act on the block, but only an intervention would al­ low them to infer that they must insert the block into the apparatus so that it would act on the platform to repeat the effect of obtaining the reward. Only the children, but not the crows, came up with an idea of how to solve the task, while a control group of crows was able to learn the task through operant conditioning. Importantly, for the children this was also not an easy task; nearly 20% of the children did not create an intervention, and those who did needed several observations to understand what was required (but see Ja­ cobs, von Bayern, Martin-Ordas, Rat-Fischer, & Osvath, 2015; Taylor et al., 2015, for a discussion of these findings, and Bonawitz et al., 2010, for a study reporting toddlers’ problems in such tasks). In sum, there is so far no convincing evidence that animals truly understand the causal structure of interventions.

Evolutionary Patterns From an evolutionary view, a goal is to reconstruct the origins of different cognitive abili­ ties. At what point in evolutionary history did the ability for causal reasoning evolve? Did it evolve several times? What are the selection pressures promoting its evolution? So far, only few taxonomic groups have been studied in sufficient detail to allow for some very sketchy outlines of the potential evolution of causal reasoning abilities. The available evi­ dence suggests a convergent evolution of reasoning abilities in birds and mammals (e.g., Emery & Clayton, 2004; Pepperberg et al., 2013; Schloegl et al., 2012; Taylor et al.,

Page 16 of 30

Causal Reasoning in Non-Human Animals 2012); in the following, we will largely focus on differences within the primate lineage, domestication effects, and the evolution of reasoning in birds. Thompson and Oden (2000) argued that a cognitive divide exists between “paleological” monkeys and “analogical” apes, as only the latter can perceive relations between rela­ tions: for instance, if having seen a sample of two identical objects, and subsequently be­ ing tasked to identify a pair of objects that have the same relationship, they chose a pair of identical objects (but different from the sample) over a pair of non-identical objects; monkeys, in contrast, can only form categorizations on the basis of shared physical attrib­ utes (e.g., same shape or color). In support of convergent evolution in birds and mam­ mals, hooded crows (Corvus corone cornix) have recently been shown to understand rela­ tions between relations in an analogical reasoning task (Smirnova, Zorina, Obozova, & Wasserman, 2014) like the apes did in the previously mentioned example; at the same time, the debate about the cognitive gap between apes and monkeys has been fueled by similar findings in monkeys (e.g., Fagot & Maugard, 2013; Flemming, Thompson, & Fagot, 2013). Still, monkeys seem to fail in a number of tasks that apes solve. For in­ stance, apes master the acoustic version of the “empty cup” task introduced before, even though not all subjects are successful and the task in general is challenging (see Call, 2004). Studies on monkeys, in contrast, typically produced negative results (Heimbauer et al., 2012; Schmitt & Fischer, 2009), or required experience training (Sabbatini & Visal­ berghi, 2008; but see Maille & Roeder, 2012, for tentative positive evidence in lemurs, which had been explained as an ecological adaptation). Likewise, using weight as a causal predictor seems to be easier for apes than for monkeys (Hanus & Call, 2008, 2011; Klüver, 1933; Schrauf & Call, 2011), even though the tool-using capuchin monkeys seem to distort the picture (Fragaszy, Greenberg, et al., 2010; Visalberghi et al., 2009). Another distinction has been drawn between “causal apes” and “social dogs” (Bräuer et al., 2006), (p. 710) arguing that apes are sensitive to causal information, whereas dogs pay more attention to social cues (i.e., pointing, gazing). Dogs’ priority for social information is supposed to be a result of domestication, as dogs may have been selected to pay atten­ tion to signals provided by humans (Hare, Brown, Williamson, & Tomasello, 2002; see Hare et al., 2010; Udell, Dorey, & Wynne, 2010; Udell & Wynne, 2010, for a discussion of this domestication hypothesis). In a comparative study with wild boars and domesticated pigs, however, it seemed as if differences might be explained best by individual life histo­ ries (Albiach-Serrano et al., 2012). It thus is unclear whether the “social dogs” hypothesis can be expanded to domesticated animals in general. Two different evolutionary explanations have been proposed for corvids. Taylor and col­ leagues stressed that New Caledonian crows may possess superior reasoning skills as adaptations to their elaborate tool use (e.g., Taylor et al., 2008; Taylor et al., 2011; Taylor et al., 2012). Others have wondered if the performance in exclusion tasks could be linked to caching behavior (e.g., Mikolasch et al., 2012; Schloegl, 2011), as food-caching species may be more attentive to the presence or absence of rewards in hidden locations (Tornick & Gibson, 2013; but see Shaw, Plotnik, & Clayton, 2013, for a different view).

Page 17 of 30

Causal Reasoning in Non-Human Animals

Open Questions The debates about the cognitive underpinnings of exclusion or causal reasoning in rats mentioned earlier illustrate one of the key issues in this field of research. On the function­ al level, the evidence for causal reasoning in animals is increasing. On the mechanistic level, however, it often remains unclear whether the animals indeed have a deep under­ standing of the causal relationships or whether they respond to covariations (see also Le Pelley, Griffiths, & Beesley, Chapter 2 in this volume). In other words, do they understand that one event causes the other, or did they learn that these two usually occur together (e.g., Penn, Holyoak, & Povinelli, 2008; Penn & Povinelli, 2007)? Völter and Call (2012) have shown that great apes learn to solve mechanical problems faster if they can observe the mechanism underlying the problem, but the authors were cautious about interpreting this as evidence for causal understanding. Instead, they argued that the apes might have only learned “what caused the beneficial outcome but not necessarily how it was caused” (p. 935, emphasis theirs). New Caledonian crows were tested in a task in which a reward was placed on a platform at the bottom of a tube. To obtain the reward, the plat­ form had to be collapsed (e.g., by pushing it down; von Bayern et al., 2009; see also Bird & Emery, 2009a). Interestingly, for two birds, the experience of having collapsed the plat­ form themselves by pushing it down with their beak was sufficient to solve the task later by dropping stones into a tube and onto the platform from above. Thus, in this case it seems possible that they may have understood how to cause an effect. But is this evidence for “insightful” problem-solving based on causal understanding? For a long time, one of the standard tests for “insightful” problem-solving was the so-called string-pulling task, which has a long tradition in avian research (e.g., Pepperberg, 2004; Vince, 1961; see Jacobs & Osvath, 2015, for an extensive review). Here, a string, with food attached to it, is hanging from a branch. To obtain the food, the birds have to grasp the string with their beak, pull up the string, hold it in a loop, step on the loop with a foot, grasp the string, and pull again. This sequence can be repeated multiple times. Because of the complexity of the string-pulling behavior and its lack of an equivalent in the birds’ daily behavior, its spontaneous occurrence had long been interpreted as a textbook exam­ ple of “insight” (e.g., Heinrich, 1995). This view has been challenged recently (Taylor et al., 2010) by the demonstration that New Caledonian crows fail in this task if they are prevented from the visual feedback of the food moving closer to the puller. Thus, percep­ tual motor feedback may be required, and this may facilitate the acquisition of the task’s solution through operant conditioning (but see also Jacobs & Osvath, 2015). This uncertainty about the cognitive mechanisms is partly due to the problem that re­ searchers studying animal cognition, just like developmental psychologists, cannot ask their non- or pre-linguistic subjects about their understanding of a task or for the reason why they behaved as they did. It is therefore difficult to differentiate between what Kum­ mer (1995; cited by Visalberghi & Tomasello, 1998) had called “weak causal knowledge” (which is based on learning and acquired through numerous repetitions of a task), “strong causal knowledge” (which is based on rapid or even a priori interpretations of a situation), and “non-causal knowledge” (i.e., associatively learned covariations with­ Page 18 of 30

Causal Reasoning in Non-Human Animals out an understanding of the underlying cause–effect relationships). Researchers have tried to answer these questions by designing increasingly complex tasks with numerous control (p. 711) conditions, but critics have kept up with this development by offering more elaborate alternative explanations. Despite their generally skeptical stance, Penn and Povinelli (2007, p. 97) acknowledged that current comparative evidence “does not fit comfortably into [ … ] the traditional associationist” accounts. Nevertheless, they reject the notion of human-like causal reasoning skills in animals. But even if animals were able to reason causally, the immediately following question would then be whether they also would have “meta-knowledge about the concept of causality” (Blaisdell & Waldmann, 2012, p. 179), or if that is, as Penn et al. (2008) have suggested, a uniquely human capaci­ ty. A related issue is the question of how to extra-polate from individual task performances. If an individual animal could be shown to reason in a human-like fashion in a single study, what would this tell us about general reasoning abilities? Do such general reasoning abili­ ties exist at all? There seems to be considerable doubt, as evidenced by the performance differences between species. Seed and Call (2009) deemed it unlikely that causal under­ standing would constitute a singular ability; similarly, Taylor et al. (2014) suggested that “causal understanding is not based on a single monolithic, domain-general cognitive mechanism” (p. 5). So far, most studies have focused on relatively simple, all-or-nothing decisions (i.e., they tested if animals can exclude one wrong option to infer one correct option). In real life, however, one is often forced to base decisions on reasoning about probabilities, but very little is known about animals’ abilities to choose between options by inferring the likeli­ hood of each option being correct. When apes had to choose between two buckets filled with different ratios of highly valued banana pellets and lowly valued carrots, they pre­ ferred the bucket from which a banana pellet would be drawn with higher probability (Rakoczy et al., 2014). However, chimpanzees have also been confronted with two sets of cups of varying number, of which some would be baited and others would be not. Then, one cup was drawn from each set, and they could choose between these two cups. Whether the chimpanzees would identify the cup with the higher probability of reward was highly dependent on the ratio of baited to non-baited cups in each set (Hanus & Call, 2014). Finally, when chimpanzees had to choose between two partially occluded tools, they failed to identify the tool with the higher probability of being intact (Seed et al., 2012; see also Mulcahy & Schubiger, 2014, for similar findings with orangutans). Even though cause–effect reasoning and effect–cause reasoning can both be based on the same causal relationships, research suggests that human subjects are sensitive to direc­ tional differences (Waldmann & Hagmayer, 2013); furthermore, the direction may have consequences on the cognitive demands (Fernbach, Darlow, & Sloman, 2011; Waldmann & Hagmayer, 2013). Human children aged 3.5–4.5 years seem to perform better in cause– effect reasoning than in effect–cause reasoning (Hong, Chijun, Xuemei, Shan, & Chongde, 2005). Given that animals may not understand all causal relationships, just as human in­ fants may not develop an understanding for all causal relationships at the same time Page 19 of 30

Causal Reasoning in Non-Human Animals (Muentener & Bonawitz, Chapter 33 in this volume), it seems plausible to expect different performances in animals depending on the direction of reasoning. To our knowledge, however, this prediction has never been investigated, as we are not aware of any study using the same task to explore both reasoning directions. In the future, further emphasis should be placed on the development of formal models that are explicit about the logical operations required, and experiments that rule out gen­ eralization and associative learning. At the same time, we may note, however, that from an evolutionary perspective, it does not really matter whether the animal reasons, asso­ ciates, or expresses innate behavior, as long as it gets the job done.

Acknowledgments Our special thanks to Rebecca Jürgens for her invaluable help with the preparation of the manuscript and illustrations. We are grateful to Alex H. Taylor, Amanda M. Seed, Gloria Sabbatini, and Elisabetta Visalberghi for providing us with illustrations and photos, and Michael R. Waldmann and an anonymous reviewer for their very helpful comments on an earlier version of the manuscript. This work was supported through funding from the Leibniz Association for the Leibniz ScienceCampus Primate Cognition.

References Albiach-Serrano, A., Bräuer, J., Cacchione, T., Zickert, N., & Amici, F. (2012). The effect of domestication and ontogeny in swine cognition (Sus scrofa scrofa and S. s. domestica). Applied Animal Behaviour Science, 141(1–2), 25–35. doi:10.1016/j.applanim.2012.07.005 Aust, U., Range, F., Steurer, M., & Huber, L. (2008). Inferential reasoning by exclusion in pigeons, dogs, and humans. Animal Cognition, 11(4), 587–597. doi:10.1007/ s10071-008-0149-0 Baillargeon, R. (2004). Infants’ reasoning about hidden objects: Evidence for event-gener­ al and event-specific expectations. Developmental Science, 7(4), 391–424. (p. 712) Bates, L. A., Sayialel, K. N., Njiraini, N. W., Poole, J. H., Moss, C. J., & Byrne, R. W. (2008). African elephants have expectations about the location of out-of-sight family members. Bi­ ology Letters, 4, 34–36. doi:10.1098/rsbl.2007.0529 Beckers, T., Miller, R. R., De Houwer, J., & Urushihara, K. (2006). Reasoning rats: For­ ward blocking in Pavlovian animal cognition is sensitive to constraints of causal infer­ ence. Journal of Experimental Psychology: General, 135(1), 92–102. Bird, C. D., & Emery, N. J. (2009a). Insightful problem solving and creative tool modifica­ tion by captive nontool-using rooks. Proceedings of the National Academy of Sciences USA, 106(25), 10370–10375. doi:10.1073/pnas.0901008106

Page 20 of 30

Causal Reasoning in Non-Human Animals Bird, C. D., & Emery, N. J. (2009b). Rooks use stones to raise the water lever to reach a floating worm. Current Biology, 19(16), 1410–1414. doi:10.1016/j.cub.2009.07.033 Bird, C. D., & Emery, N. J. (2010). Rooks perceive support relations similar to six-monthold babies. Proceedings of the Royal Society B: Biological Sciences, 277, 147–151. doi: 10.1098/rspb.2009.1456 Blaisdell, A. P., Sawa, K., Leising, K. J., & Waldmann, M. R. (2006). Causal reasoning in rats. Science, 311, 1020–1022. doi:10.1126/science.1121872 Blaisdell, A. P., & Waldmann, M. R. (2012). Rational rats: Causal inference and represen­ tation. In E. A. Wasserman & T. R. Zentall (Eds.), The Oxford handbook of comparative cognition (pp. 175–198). Oxford: Oxford University Press. Boesch, C., & Boesch, H. (1990). Tool use and tool making in wild chimpanzees. Folia Pri­ matologica, 54, 86–99. Bonawitz, E. B., Ferranti, D., Saxe, R., Gopnik, A., Meltzoff, A. N., Woodward, J., & Schulz, L. E. (2010). Just do it? Investigating the gap between prediction and action in toddlers’ causal inferences. Cognition, 115(1), 104–117. doi:10.1016/j.cognition.2009.12.001 Boogert, N. J., Arbilly, M., Muth, F., & Seed, A. M. (2013). Do crows reason about causes or agents? The devil is in the controls. Proceedings of the National Academy of Sciences USA, 110(4), E273. doi:10.1073/pnas.1219664110 Bräuer, J., Kaminski, J., Riedel, J., Call, J., & Tomasello, M. (2006). Making inferences about the location of hidden food: social dog, causal ape. Journal of Comparative Psychol­ ogy, 120(1), 38–47. doi:10.1037/0735-7036.120.1.38 Brown, A. L. (1990). Domain-specific principles affect learning and transfer in children. Cognitive Science, 14, 107–133. Bugnyar, T., & Heinrich, B. (2005). Ravens, Corvus corax, differentiate between knowl­ edgeable and ignorant competitors. Proceedings of the Royal Society B: Biological Sciences, 272, 1641–1646. Cacchione, T., & Krist, H. (2004). Recognizing impossible object relations: Intuitions about support in chimpanzees (Pan troglodytes). Journal of Comparative Psychology, 118(2), 140–148. doi:10.1037/0735-7036.118.2.140 Call, J. (2004). Inferences about the location of food in the great apes (Pan paniscus, Pan troglodytes, Gorilla gorilla, and Pongo pygmaeus). Journal of Comparative Psychology, 118(2), 232–241. Call, J. (2006). Inferences by exclusion in the great apes: The effect of age and species. Animal Cognition, 9, 393–403.

Page 21 of 30

Causal Reasoning in Non-Human Animals Call, J. (2007). Apes know that hidden objects can affect the orientation of other objects. Cognition, 105, 1–25. doi:10.1016/j.cognition.2006.08.004 Chappell, J., & Kacelnik, A. (2002). Tool selectivity in a non-primate, the New Caledonian Crow (Corvus moneduloides). Animal Cognition, 5(2), 71–78. doi:10.1007/ s10071-002-0130-2 Chappell, J., & Kacelnik, A. (2004). Selection of tool diameter by New Caledonian crows Corvus moneduloides. Animal Cognition, 7(2), 121–127. Cheke, L. G., Bird, C. D., & Clayton, N. S. (2011). Tool-use and instrumental learning in the Eurasian jay (Garrulus glandarius). Animal Cognition, 14(3), 441–455. doi:10.1007/ s10071-011-0379-4 Cheney, D. L., Seyfarth, R. M., & Silk, J. B. (1995). The responses of female baboons (Pa­ pio cynocephalus ursinus) to anomalous social interactions: Evidence for causal reason­ ing? Journal of Comparative Psychology, 109(2), 134–141. Chiandetti, C., & Vallortigara, G. (2011). Intuitive physical reasoning about occluded ob­ jects by inexperienced chicks. Proceedings of the Royal Society B: Biological Sciences, 278(1718), 2621–2627. doi:10.1098/rspb.2010.2381 Clayton, N. S., & Dickinson, A. (1998). Episodic-like memory during cache recovery by scrub jays. Nature, 395, 272–274. Dally, J., Emery, N. J., & Clayton, N. S. (2006). Food-caching western scrub-jays keep track of who was watching when. Science, 312, 1662–1665. Dwyer, D. M., Starns, J., & Honey, R. C. (2009). “Causal reasoning” in rats: A reappraisal. Journal of Experimental Psychology: Animal Behavior Processes, 35(4), 578–586. doi: 10.1037/a0015007 Dymond, S., Haselgrove, M., & McGregor, A. (2013). Clever crows or unbalanced birds? Proceedings of the National Academy of Sciences USA, 110(5), E336. doi:10.1073/pnas. 1218931110 Emery, N. J., & Clayton, N. S. (2004). The mentality of crows: Convergent evolution of in­ telligence in corvids and apes. Science, 306, 1903–1907. Emery, N. J., & Clayton, N. S. (2009). Tool use and physical cognition in birds and mam­ mals. Current Opinion in Neurobiology, 19(1), 27–33. doi:10.1016/j.conb.2009.02.003 Erdöhegyi, À., Topál, J., Virányi, Z., & Miklósi, Á. (2007). Dog-logic: Inferential reasoning in a two-way choice task and its restricted use. Animal Behaviour, 74(4), 725–737. doi: 10.1016/j.anbehav.2007.03.004

Page 22 of 30

Causal Reasoning in Non-Human Animals Fagot, J., & Maugard, A. (2013). Analogical reasoning in baboons (Papio papio): Flexible reencoding of the source relation depending on the target relation. Learning & Behavior, 41(3), 229–237. doi:10.3758/s13420-012-0101-7 Fernbach, P. M., Darlow, A., & Sloman, S. A. (2011). Asymmetries in predictive and diag­ nostic reasoning. Journal of Experimental Psychology: General, 140(2), 168–185. doi: 10.1037/a0022100 Fischer, J. (2002). Developmental modifications in the vocal behavior of non-human pri­ mates. In A. A. Ghanzanfar (Ed.), Primate audition: Ethology and neurobiology (pp. 109– 125). Baton Rouge, FL: CRC Press. Flemming, T. M., Thompson, R. K. R., & Fagot, J. (2013). Baboons, like humans, solve analogy by categorical abstraction of relations. Animal Cognition, 16, 519–524. doi: 10.1007/s10071-013-0596-0 Fragaszy, D. M., Greenberg, R., Visalberghi, E., Ottoni, E. B., Izar, P., & Liu, Q. (2010). How wild bearded capuchin monkeys select stones and nuts to minimize the number of strikes per nut cracked. Animal Behaviour, 80, 205–214. doi:10.1016/j.anbehav. 2010.04.018 Fragaszy, D. M., Pickering, T., Liu, Q., Izar, P., Ottoni, E. B., & Visalberghi, E. (2010). Bearded capuchin monkeys’ and human’s efficiency at cracking palm nuts with stone tools: field experiments. Animal Behaviour, 79, 321–332. doi:10.1016/j.anbehav. 2009.11.004 (p. 713) Fragaszy, D. M., Visalberghi, E., & Fedigan, L. M. (2004). The complete capuchin: The bi­ ology of the genus Cebus. Cambridge: Cambridge University Press. Fujita, K., Kuroshima, H., & Asai, S. (2003). Ho do tufted capuchin monkeys (Cebus apella) understand causality involved in tool use? Journal of Experimental Psychology: An­ imal Behavior Processes, 29(3), 233–242. doi:10.1037/0097-7403.29.3.233 Grether, W. F., & Maslow, A. H. (1937). An experimental study of insight in monkeys. Jour­ nal of Comparative Psychology, 24(1), 127–134. Hanus, D., & Call, J. (2008). Chimpanzees infer the location of a reward on the basis of the effect of its weight. Current Biology, 18(9), R370–R372. Hanus, D., & Call, J. (2011). Chimpanzee problem-solving: Contrasting the use of causal and arbitrary cues. Animal Cognition, 14(6), 871–878. doi:10.1007/s10071-011-0421-6 Hanus, D., & Call, J. (2014). When maths trumps logic: Probabilistic judgements in chim­ panzees. Biology Letters, 10(12), 20140892. doi:10.1098/rsbl.2014.0892 Hare, B., Brown, M., Williamson, C., & Tomasello, M. (2002). The domestication of social cognition in dogs. Science, 298, 1634–1636.

Page 23 of 30

Causal Reasoning in Non-Human Animals Hare, B., Rosati, A., Kaminski, J., Bräuer, J., Call, J., & Tomasello, M. (2010). The domesti­ cation hypothesis for dogs’ skills with human communication: A response to Udell et al. (2008) and Wynne et al. (2008). Animal Behaviour, 79, e1–e6. doi:10.1016/j.anbehav. 2009.06.031 Heimbauer, L. A., Antworth, R. L., & Owren, M. J. (2012). Capuchin monkeys (Cebus apel­ la) use positive, but not negative, auditory cues to infer food location. Animal Cognition, 15(1), 45–55. doi:10.1007/s10071-011-0430-5 Heinrich, B. (1995). An experimental investigation of insight in common ravens (Corvus corax). The Auk, 112(4), 994–1003. Hill, A., Collier-Baker, E., & Suddendorf, T. (2011). Inferential reasoning by exclusion in great apes, lesser apes, and spider monkeys. Journal of Comparative Psychology, 125(1), 91–103. doi:10.1037/a0020867 Hong, L., Chijun, Z., Xuemei, G., Shan, G., & Chongde, L. (2005). The influence of com­ plexity and reasoning direction on children’s causal reasoning. Cognitive Development, 20(1), 87–101. doi:10.1016/j.cogdev.2004.11.001 Hunt, G. R. (1996). Manufacture and use of hook-tools by New Caledonian crows. Nature, 379, 249–251. Hunt, G. R. (2000). Human-like, population-level specialization in the manufacture of pan­ danus tools by New Caledonian crows Corvus moneduloides. Proceedings of the Royal So­ ciety B: Biological Sciences, 267, 403–413. Hunt, G. R., & Gray, R. D. (2003). Diversification and cumulative evolution in New Cale­ donian crow tool manufacture. Proceedings of the Royal Society B: Biological Sciences, 270, 867–874. Jacobs, I. F., & Osvath, M. (2015). The string-pulling paradigm in comparative psychology. Journal of Comparative Psychology, 129(2), 89–120. doi:10.1037/a0038746 Jacobs, I. F., von Bayern, A., Martin-Ordas, G., Rat-Fischer, L., & Osvath, M. (2015). Corvids create novel causal interventions after all. Proceedings of the Royal Society of London B, 282(1806), 20142504. doi:10.1098/rspb.2014.2504 Jelbert, S. A., Taylor, A. H., Cheke, L. G., Clayton, N. S., & Gray, R. D. (2014). Using the Aesop’s fable paradigm to investigate causal understanding of water displacement by New Caledonian crows. PLOS One, 9(3), e92895. doi:10.1371/journal.pone.0092895 Klüver, H. (1933). Behavior mechanisms in monkeys. Chicago: University of Chicago Press. Köhler, W. (1917/1963). Intelligenzprüfungen an Menschenaffen. Facsimile of 2., rev. ed. of “Intelligenzprüfungen an Anthropoiden I”. Berlin-Göttingen-Heidelberg: Springer.

Page 24 of 30

Causal Reasoning in Non-Human Animals Kummer, H. (1995). Causal knowledge in animals. In D. Sperber, D. Premack, & A. J. Premack (Eds.), Causal Cognition. A Multidisciplinary Debate (pp. 26–39). Oxford: Claren­ don Press. Kundey, S. M., De Los Reyes, A., Taglang, C., Baruch, A., & German, R. (2010). Domesti­ cated dogs’ (Canis familiaris) use of the solidity principle. Animal Cognition, 13(3), 497– 505. doi:10.1007/s10071-009-0300-6 Liu, Q., Fragaszy, D., Wright, B., Wright, K., Izar, P., & Visalberghi, E. (2011). Wild beard­ ed capuchin monkeys (Cebus libidinosus) place nuts in anvils selectively. Animal Behav­ iour, 81(1), 297–305. doi:10.1016/j.anbehav.2010.10.021 Maille, A., & Roeder, J. J. (2012). Inferences about the location of food in lemurs (Eulemur macaco and Eulemur fulvus): A comparison with apes and monkeys. Animal Cognition, 15(6), 1075–1083. doi:10.1007/s10071-012-0531-9 Mascalzoni, E., Regolin, L., & Vallortigara, G. (2010). Innate sensitivity for self-propelled caused agency in newly hatched chicks. Proceedings of the National Academy of Sciences USA, 107(9), 4483–4485. doi:10.1073/pnas.0908792107 Massaro, L., Liu, Q., Visalberghi, E., & Fragaszy, D. (2012). Wild bearded capuchin (Sapa­ jus libidinosus) select hammer tools on the basis of both stone mass and distance from the anvil. Animal Cognition, 15(6), 1065–1074. doi:10.1007/s10071-012-0530-x Mendes, N., Hanus, D., & Call, J. (2007). Raising the level: Orangutans use water as a tool. Biology Letters, 3, 453–455. doi:10.1098/rsbl.2007.0198 Menzel, R., & Fischer, J. (2011). Animal thinking: An introduction. In R. Menzel & J. Fisch­ er (Eds.), Animal thinking: Contemporary issues in comparative cognition (pp. 1–6). Cam­ bridge, MA: MIT Press. Mikolasch, S., Kotrschal, K., & Schloegl, C. (2011). African grey parrots (Psittacus eritha­ cus) use inference by exclusion to find hidden food. Biology Letters, 7, 875–877. doi: 10.1098/rsbl.2011.0500 Mikolasch, S., Kotrschal, K., & Schloegl, C. (2012). Is caching the key to exclusion in corvids? The case of carrion crows (Corvus corone corone). Animal Cognition, 15(1), 73– 82. doi:10.1007/s10071-011-0434-1 de A. Moura, A. C., & Lee, P. C. (2004). Capuchin stone tool use in Caatinga dry forest. Science, 306, 1909. Mulcahy, N. J., & Schubiger, M. N. (2014). Can orangutans (Pongo abelii) infer tool func­ tionality? Animal Cognition, 17(3), 657–669. doi:10.1007/s10071-013-0697-9 Müller, C. A., Riemer, S., Range, F., & Huber, L. (2014). Dogs’ use of the solidity principle: Revisited. Animal Cognition, 17(3), 821–825. doi:10.1007/s10071-013-0709-9

Page 25 of 30

Causal Reasoning in Non-Human Animals Nawroth, C., von Borell, E., & Langbein, J. (2014). Exclusion performance in dwarf goats (Capra aegagrus hircus) and sheep (Ovis orientalis aries). PLOS One, 9(4), e93534. doi: 10.1371/journal.pone.0093534 O’Connell, S., & Dunbar, R. I. M. (2005). The perception of causality in chimpanzees (Pan spp.). Animal Cognition, 8, 60–66. O’Hara, M. C. A., Gajdon, G. K., & Huber, L. (2012). Kea logics: How these birds solve dif­ ficult problems and outsmart researchers. In S. Watanabe (Ed.), Logic and sensibility (Vol. 5, pp. 23–38). Tokyo: Keio University Press. (p. 714) Paukner, A., Huntsberry, M. E., & Suomi, S. J. (2009). Tufted capuchin monkeys (Cebus apella) spontaneously use visual but not acoustic information to find hidden food items. Journal of Comparative Psychology, 123(1), 26–33. doi:10.1037/a0013128 Penn, D. C., Holyoak, K. J., & Povinelli, D. J. (2008). Darwin’s mistake: Explaining the dis­ continuity between human and nonhuman minds. Behavioral and Brain Sciences, 31, 109– 178. doi:10.1017/s0140525x08003543 Penn, D. C., & Povinelli, D. J. (2007). Causal cognition in human and nonhuman animals: A comparative, critical review. Annual Review of Psychology, 58, 97–118. doi:10.1146/ annurev.psych.58.110405.085555 Pepperberg, I. M. (2004). “Insightful” string-pulling in Grey parrots (Psittacus erithacus) is affected by vocal competence. Animal Cognition, 7, 263–266. Pepperberg, I. M., Koepke, A., Livingston, P., Girard, M., & Hartsfield, L. A. (2013). Rea­ soning by inference: Further studies on exclusion in grey parrots (Psittacus erithacus). Journal of Comparative Psychology, 127(3), 272–281. doi:10.1037/a0031641 Plotnik, J. M., Shaw, R. C., Brubaker, D. L., Tiller, L. N., & Clayton, N. S. (2014). Thinking with their trunks: Elephants use smell but not sound to locate food and exclude nonre­ warding alternatives. Animal Behaviour, 88, 91–98. doi:10.1016/j.anbehav.2013.11.011 Premack, D., & Premack, A. J. (1994). Levels of causal understanding in chimpanzees and children. Cognition, 50, 347–362. Rakoczy, H., Clüver, A., Saucke, L., Stoffregen, N., Grabener, A., Migura, J., & Call, J. (2014). Apes are intuitive statisticians. Cognition, 131(1), 60–68. doi:10.1016/j.cognition. 2013.12.011 Sabbatini, G., Truppa, V., Hribar, A., Gambetta, B., Call, J., & Visalberghi, E. (2012). Un­ derstanding the functional properties of tools: Chimpanzees (Pan troglodytes) and ca­ puchin monkeys (Cebus apella) attend to tool features differently. Animal Cognition, 15, 577–590. doi:10.1007/s10071-012-0486-x

Page 26 of 30

Causal Reasoning in Non-Human Animals Sabbatini, G., & Visalberghi, E. (2008). Inferences about the location of food in capuchin monkeys (Cebus apella) in two sensory modalities. Journal of Comparative Psychology, 122(2), 156–166. doi:10.1037/0735-7036.122.2.156 Schloegl, C. (2011). What you see is what you get—reloaded: Can jackdaws (Corvus mon­ edula) find hidden food through exclusion? Journal of Comparative Psychology, 125(2), 162–174. doi:10.1037/a0023045 Schloegl, C., Dierks, A., Gajdon, G. K., Huber, L., Kotrschal, K., & Bugnyar, T. (2009). What you see is what you get? Exclusion performances in ravens and keas. PLOS One, 4(8), e6368. Schloegl, C., Schmidt, J., Boeckle, M., Weiß, B. M., & Kotrschal, K. (2012). Grey parrots use inferential reasoning based on acoustic cues alone. Proceedings of the Royal Society B: Biological Sciences, 279, 4135–4142. doi:10.1098/rspb.2012.1292 Schloegl, C., Waldmann, M. R., & Fischer, J. (2013). Understanding of and reasoning about object-object relationships in long-tailed macaques. Animal Cognition, 16(3), 493– 507. Schmitt, V., & Fischer, J. (2009). Inferential reasoning and modality dependent discrimina­ tion learning in olive baboons (Papio hamadryas anubis). Journal of Comparative Psychol­ ogy, 123(3), 316–325. doi:10.1037/a0016218 Schrauf, C., & Call, J. (2011). Great apes use weight as a cue to find hidden food. Ameri­ can Journal of Primatology, 73(4), 323–334. doi:10.1002/ajp.20899 Schrauf, C., Call, J., Fuwa, K., & Hirata, S. (2012). Do chimpanzees use weight to select hammer tools? PLOS One, 7(7), e41044. doi:10.1371/journal.pone.0041044 Schrauf, C., Huber, L., & Visalberghi, E. (2008). Do capuchin monkeys use weight to se­ lect hammer tools? Animal Cognition, 11, 413–422. doi:10.1007/s10071-007-0131-2 Seed, A. M., & Call, J. (2009). Causal knowledge for events and objects in animals. In S. Watanabe, A. P. Blaisdell, L. Huber, & A. Young (Eds.), Rational animals, irrational humans (pp. 173–187). Tokyo: Keio University Press. Seed, A. M., Call, J., Emery, N. J., & Clayton, N. S. (2009). Chimpanzees solve the trap problem when the confound of tool-use is removed. Journal of Experimental Psychology: Animal Behavior Processes, 35(1), 23–34. doi:10.1037/a0012925 Seed, A. M., Seddon, E., Greene, B., & Call, J. (2012). Chimpanzee ‘folk physics’: bringing failures into focus. Philosophical Transactions of the Royal Society B: Biological Sciences, 367(1603), 2743–2752. doi:10.1098/rstb.2012.0222 Seed, A. M., Tebbich, S., Emery, N. J., & Clayton, N. S. (2006). Investigating physical cog­ nition in rooks, Corvus frugilegus. Current Biology, 16, 976–701. doi:10.1016/j.cub. 2006.02.066 Page 27 of 30

Causal Reasoning in Non-Human Animals Shaw, R. C., Plotnik, J. M., & Clayton, N. S. (2013). Exclusion in corvids: The performance of food-caching Eurasian jays (Garrulus glandarius). Journal of Comparative Psychology, 127(4), 428–435. doi:10.1037/a0032010 Shettleworth, S. J. (2010). Cognition, evolution, and behaviour (2nd ed.). New York: Ox­ ford University Press. Silva, F. J., Page, D. M., & Silva, K. M. (2005). Methodological-conceptual problems in the study of chimpanzees’ folk physics: How studies with adult humans can help. Learning & Behavior, 33(1), 47–58. Smirnova, A., Zorina, Z., Obozova, T., & Wasserman, E. (2014). Crows spontaneously ex­ hibit analogical reasoning. Current Biology, 25(2), 256–260. doi:10.1016/j.cub. 2014.11.063 Taylor, A. H. (2014). Corvid cognition. Wiley Interdisciplinary Reviews: Cognitive Science, 5(3), 361–372. doi:10.1002/wcs.1286 Taylor, A. H., Cheke, L. G., Waismeyer, A., Meltzoff, A. N., Miller, R., Gopnik, A., … Gray, R. D. (2014). Of babies and birds: complex tool behaviours are not sufficient for the evolu­ tion of the ability to create a novel causal intervention. Proceedings of the Royal Society B: Biological Sciences, 281(1787), 20140837. doi:10.1098/rspb.2014.0837 Taylor, A. H., Cheke, L. G., Waismeyer, A., Meltzoff, A., Miller, R., Gopnik, A., … Gray, R. D. (2015). No conclusive evidence that corvids can create novel causal interventions. Pro­ ceedings of the Royal Society of London B, 282(1813), 20150796. doi:10.1098/rspb. 2015.0796 Taylor, A. H., Elliffe, D. M., Hunt, G. R., Emery, N. J., Clayton, N. S., & Gray, R. D. (2011). New Caledonian crows learn the functional properties of novel tool types. PLOS One, 6(12), e26887. doi:10.1371/journal.pone.0026887 Taylor, A. H., Hunt, G. R., Medina, F. S., & Gray, R. D. (2008). Do New Caledonian crows solve physical problems through causal reasoning? Proceedings of the Royal Society B: Biological Sciences, 276(1655), 247–254. doi:10.1098/rspb.2008.1107 Taylor, A. H., Medina, F. S., Holzhaider, J. C., Hearne, L. J., Hunt, G. R., & Gray, R. D. (2010). An investigation into the cognition behind spontaneous string pulling in New Caledonian Crows. PLOS One, 5(2), e9345. doi:10.1371/journal.pone.0009345 (p. 715) Taylor, A. H., Miller, R., & Gray, R. D. (2012). New Caledonian crows reason about hidden causal agents. Proceedings of the National Academy of Sciences USA, 109(40), 16389– 16391. doi:10.1073/pnas.1208724109 Taylor, A. H., Miller, R., & Gray, R. D. (2013a). Reply to Boogert et al.: The devil is unlikely to be in association or distraction. Proceedings of the National Academy of Sciences USA, 110(4), E274–E274. doi:10.1073/pnas.1220564110 Page 28 of 30

Causal Reasoning in Non-Human Animals Taylor, A. H., Miller, R., & Gray, R. D. (2013b). Reply to Dymond et al.: Clear evidence of habituation counters counterbalancing. Proceedings of the National Academy of Sciences USA, 110(5), E337–E337. doi:10.1073/pnas.1219586110 Tebbich, S., & Bshary, R. (2004). Cognitive abilities related to tool use in the woodpecker finch, Cactospiza pallida. Animal Behaviour, 67, 689–697. Thompson, R. K. R., & Oden, D. L. (2000). Categorical perception and conceptual judg­ ments by nonhuman primates: The paleological monkey and the analogical ape. Cognitive Science, 24(3), 363–396. Tomasello, M., & Call, J. (1997). Primate cognition. Oxford: Oxford University Press. Tornick, J. K., & Gibson, B. M. (2013). Tests of inferential reasoning by exclusion in Clark’s nutcrackers (Nucifraga columbiana). Animal Cognition, 16(4), 583–597. doi: 10.1007/s10071-013-0595-1 Udell, M. A. R., Dorey, N. R., & Wynne, C. D. L. (2010). What did domestication do to dogs? A new account of dogs’ sensitivity to human actions. Biological Reviews, 85(2), 327–345. doi:10.1111/j.1469-185X.2009.00104.x Udell, M. A. R., & Wynne, C. D. L. (2010). Ontogeny and phylogeny: Both are essential to human-sensitive behaviour in the genus Canis. Animal Behaviour, 79, e9–e14. doi: 10.1016/j.anbehav.2009.11.033 Vince, M. A. (1961). “String pulling” in birds. III. The successful response in Greenfinches and canaries. Behaviour, 17, 103–129. Visalberghi, E., Addessi, E., Truppa, V., Spagnoletti, N., Ottoni, E. B., Izar, P., & Fragaszy, D. M. (2009). Selection of effective stone tools by wild bearded capuchin monkeys. Cur­ rent Biology, 19, 213–217. doi:10.1016/j.cub.2008.11.064 Visalberghi, E., & Limongelli, L. (1994). Lack of comprehension of cause-effect relations in tool-using Capuchin monkeys (Cebus apella). Ethology, 108(1), 15–22. Visalberghi, E., & Tomasello, M. (1998). Primate causal understanding in the physical and psychological domains. Behavioural Processes, 42, 189–203. Völter, C. J., & Call, J. (2012). Problem solving in great apes (Pan paniscus, Pan troglodytes, Gorilla gorilla, and Pongo abelii): The effect of visual feedback. Animal Cogni­ tion, 15(5), 923–936. doi:10.1007/s10071-012-0519-5 Völter, C. J., & Call, J. (2014). Great apes (Pan paniscus, Pan troglodytes, Gorilla gorilla, Pongo abelii) follow visual trails to locate hidden food. Journal of Comparative Psychology, 128(2), 199–208. doi:10.1037/a0035434

Page 29 of 30

Causal Reasoning in Non-Human Animals von Bayern, A. M. P., Heathcote, R. J. P., Rutz, C., & Kacelnik, A. (2009). The role of expe­ rience in problem solving and innovative tool use in crows. Current Biology, 19, 1965– 1968. doi:10.1016/j.cub.2009.10.037 Waldmann, M. R., & Hagmayer, Y. (2013). Causal reasoning. In D. Reisberg (Ed.), The Ox­ ford handbook of cognitive psychology (pp. 733–752). New York: Oxford University Press. Waldmann, M. R., Schmid, M., Wong, J., & Blaisdell, A. P. (2012). Rats distinguish be­ tween absence of events and lack of evidence in contingency learning. Animal Cognition, 15(5), 979–990. doi:10.1007/s10071-012-0524-8 Watson, J. B. (1913). Psychology as the behaviorist views it. Psychological Review, 20, 158–177. Weir, A. A. S., Chappell, J., & Kacelnik, A. (2002). Shaping of hooks in New Caledonian Crows. Science, 297, 981. Wild, M. (2008). Tierphilosophie zur Einführung. Hamburg: Junius. Yerkes, R. M. (1916). The mental life of monkeys and apes: A study of ideational behavior. Ann Arbor: Scholars’ Facsimiles & Reprints, University of Michigan. Zuberbühler, K. (2003). Causal knowledge in free-ranging Diana monkeys. In A. A. Ghaz­ anfar (Ed.), Primate audition: Ethology and neurobiology (pp. 13–26). Boca Raton, FL: CRC Press. (p. 716)

Christian Schloegl

Cognitive Ethology Laboratory Leibniz Institute for Primate Cognition German Pri­ mate Center Göttingen, Germany Julia Fischer

Cognitive Ethology Laboratory Leibniz Institute for Primate Cognition German Pri­ mate Center Göttingen, Germany

Page 30 of 30

Causal Cognition and Culture

Causal Cognition and Culture   Andrea Bender, Sieghard Beller, and Douglas L. Medin The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology, Cognitive Psychology Online Publication Date: May 2017 DOI: 10.1093/oxfordhb/9780199399550.013.34

Abstract and Keywords Causality is a core concept of human cognition, but the extent to which cultural factors constrain, trigger, or shape the way in which humans think about causal relationships has barely been explored. This chapter summarizes empirical findings on the potential for cul­ tural variability in the content of causal cognition, in the way this content is processed, and in the context in which all this occurs. This review reveals cultural variability in causal cognition along each of these dimensions and across physical, biological, and psy­ chological explanations. Specifically, culture helps defining the settings in which causal cognition emerges, the manner in which potential factors are pondered, and the choices for highlighting some causes over others or for expressing them in distinct ways. Future tasks include the need to re-conceptualize ‘culture’ and to overcome blind spots in re­ search strategies such as those linked to disciplinary boundaries and the ‘home-field dis­ advantages’ in cross-cultural comparisons. Keywords: causal cognition, culture, language, content, processing, context

Why does wood float on water, and why do donkeys remain donkeys even if they are painted to look like zebras? Who or what is responsible for the rising sea levels, and what can be done to slow it down? Why did my friend shout at me? And how do I know if I am in control of my own actions? All these questions share one important feature: they ask for causal explanations. For our attempts to make sense of the physical world and of so­ cial interactions, causality is a core concept. Corresponding knowledge is required for a basic understanding of causal mechanisms, for the constitution of ecological and religious worldviews, for the perception of agency and its assignment among different kinds of ac­ tors, and for ascribing responsibility to people and circumstances. Besides simply provid­ ing us with a sense of understanding in a variety of contexts and domains, causal explana­ tions also shape our attitudes, judgments, emotions, and intentions. For most of the preceding questions, we (as ordinary people) believe we know the an­ swers, and for most of them we (as scientists) believe we understand how people gener­ ate these answers. But when asked for the possible impact that our cultural background, Page 1 of 39

Causal Cognition and Culture both as people and as scientists, has on these answers and the processes that led to them, we have surprisingly little to offer. In spite of a research tradition on causal cognition that spans centuries, if not millennia, most of this research has revealed only incidental con­ cern for culture as a possibly constitutive factor. In an attempt to draw attention to culture as fundamental to causal cognition, in this chapter we compile empirical findings from cross-cultural research identifying factors that potentially constrain, trigger, or shape the way in which humans think about causal relationships. In doing so, we also hope to undermine implicit assumptions about meth­ ods, research questions, and even ideas about what is relevant and worthy of attention in research on causal cognition. For example, research not only has overinvested in studies with undergraduates (p. 718) at major Western research universities (i.e., the so-called WEIRD people, being from Western, educated, industrialized, rich, and democratic coun­ tries; Henrich, Heine, & Norenzayan, 2010), but also has overinvested in a narrow range of (highly simplified) research paradigms that, at a minimum, beg the question of general­ izability. We therefore also broaden the focus of causal cognition by considering not only research on reasoning principles at an abstract level devoid of content, but also studies that target causal concepts and explanations people engage in when reasoning about con­ crete, everyday phenomena (for similarly integrative approaches, see also Gerstenberg & Tenenbaum, Chapter 27 in this volume). Our review is organized around three major issues: the potential for cultural variability in the content of causal cognition; in the way this content is processed; and variability in the context in which all this occurs. Specifically, we first document how people differ in their causal perceptions and explanations and the culture-specific concepts on which these are based across the major domains, then focus on potential cultural differences in the pro­ cessing of causal information, and finally try to identify characteristics of the cultural con­ text that may be responsible for the emergence of these differences. The distinction in content versus processing reflects an old division of labor between anthropology and psy­ chology whereby anthropology studies content and psychology studies processing (D’Andrade, 1981). We reject this division but nonetheless employ it for purposes of illus­ tration before outlining its limitations.

Content and Causal Cognition The bulk of research on the content of causal cognition has adopted the notion of domainspecific cognition and reasoning processes. Specifically, the claim has been that there is a fundamental difference in reasoning about physical events (naïve physics), biological events (naïve biology), and social or psychological events (naïve psychology). These do­ mains have organized a great deal of research on cognitive development (e.g., Carey, 2009; Goswami, 2002; Hirschfeld & Gelman, 1994; Spelke & Kinzler, 2007; and see Muentener & Bonawitz, Chapter 33 in this volume). The distinction is also intuitively com­ pelling in the sense that notions like agency and intentionality are central to naïve psy­ chology, but presumably are irrelevant to naïve physics. The distinction between naïve bi­ Page 2 of 39

Causal Cognition and Culture ology and naïve psychology may be more debatable, though mind–body dualism appears to be a common understanding, at least in the Western world (Lynch & Medin, 2006). These ontological domains are motivated by the idea that each has a distinct set of princi­ ples that are intuitively available and that allow for understanding and inferences. It has long been proposed that causal explanations—and, as we will see, ideas about their po­ tential susceptibility to cultural influences—depend on the domain to which they refer. That is, each domain is defined by entities having the same kind of causal properties, marked, for example, by the way they move: physical entities are set into motion by exter­ nal forces, while biological kinds may propel themselves. A further distinction can be made between biological and psychological mechanisms based on the relative importance of agency and intentionality. While agency and intentionality, for instance, seem irrele­ vant to understanding food digestion or plant growth, they are prominent for organisms endowed with consciousness and deliberative decision-making abilities. We use this notion of domains to organize our review, but we would not bet a great deal on this distinction having enduring value, as recently researchers have begun to suggest that these domains represent a particular cultural model, and not universal building blocks of human cognition (Medin et al., 2013; Viveiros de Castro, 2004; for a review, see also ojalehto & Medin, 2015). We begin by reporting findings for each domain separately, with an emphasis on those areas for which cultural variation in the content of causal be­ liefs and explanations has been noted. To this end, we first identify the domain’s core en­ tities, key concepts, and principles, then present empirical findings from cross-cultural research, and finally discuss theoretical conclusions and the possible (in)variance of do­ main boundaries.

The Physical Domain: Objects, Forces, and Effects The physical domain consists of inanimate objects that behave in accordance with physi­ cal laws (animate entities are, of course, also subject to laws of physics such as gravity). Children appear to possess intuitive knowledge of at least the following mechanistic prin­ ciples: objects have continuity and solidity, their movements are caused by external forces through contact, and they are affected by gravity and inertia (e.g., Carey, 1985, 2009; Spelke & Kinzler, 2007). Specifically, some mechanistic events such as launching (in which an object hits another object, setting it in motion) and states such as floating (when an object floats on a liquid) have (p. 719) long been assumed to trigger universal and im­ mediate impressions of causality (Leslie, 1982; Lewin, 1935; Michotte, 1963; for an overview of causal perception, see White, Chapter 14 in this volume).

Early Understanding of Force and Motion, Animacy, and Weight The assumption that knowledge about physical principles is universal has rarely been tested across cultures. One exception is Bödeker’s (2006) pioneering study on the devel­ opment of intuitive physical knowledge pertaining to force and motion, animacy, and weight among children of different ages from Germany and Trobriand, an island group off the coast of New Guinea. Participants were asked, inter alia, to predict and explain the Page 3 of 39

Causal Cognition and Culture trajectories of moving entities, or to indicate whether and, if so, why specific entities are animate or float on water. Cultural differences did not arise with respect to the concepts of motion and force (or floating for that matter), but partly with respect to weight conser­ vation and especially with respect to animacy. Substantially more Trobriand than German participants described clouds, fire, and waves as animate. This difference in the percep­ tion of animacy is important, because it raises the possibility that Trobrianders might see the behavior of clouds, fire, and waves—by virtue of not being simple inanimate entities— as subject to different principles linked to agency. Bödeker’s findings suggest an influ­ ence of formal education on the development of basic concepts for causal cognition, but also hint at cultural framework theories regarding the living world that seem to play an important role for what and when children learn about causal principles. Whether the conceptual link between animacy and movement is additionally (or perhaps exclusively) paved by semantics remains an open question though, as the local term for “animate” (-mwawoma in Kilivila) is also used to characterize specific types of movement (Bödeker, 2006, p. 362f.).

Causal Attributions for Launching and Floating Research into adults’ understanding of why objects behave in specific ways has yielded a wealth of empirical evidence and theoretical elaboration, particularly for causal learning and reasoning, as attested to in the numerous chapters of this volume (e.g., Johnson & Ahn, Chapter 8; Lombrozo & Vasilyeva, Chapter 22; Oaksford & Chater, Chapter 19; Over, Chapter 18). However, most of this research appears to be predicated on the implicit as­ sumption that the perception of physical causality is relatively direct and unmediated by culture or education, thus justifying the use of highly restricted samples in Europe and the United States. In one of the first attempts to investigate whether causal reasoning in the physical do­ main is susceptible to cultural influences, Morris and Peng (1994, Study 1) presented ani­ mated displays to participants having either a US or Chinese background. Physical displays depicted interactions of geometrical shapes; those categorized as social depicted interactions of fish. In the launching scenario, for instance, one entity moved immediately upon impact of another entity. When rating the extent to which the movements seem in­ fluenced by internal factors (such as air pressure inside a ball or, for fish, the intention to move) or external factors (a person kicking the object or, for fish, being guided to move by other fish), US participants gave higher ratings than the Chinese to internal causes in the social domain, but not the physical domain, where both groups focused equally on ex­ ternal factors (Morris, Nisbett, & Peng, 1995). These findings were taken as evidence for the assumption that, while attribution of causality in the social domain may be suscepti­ ble to cultural influences, in the physical domain it is not. This conclusion was qualified by Peng and Knowles (2003, Study 1), who presented US and Chinese participants with animated displays of eight physical interactions (including launching and floating). For three of the eight scenarios, including the launching sce­ nario, the Chinese participants indicated more external causes for the movement of the focal object than the US participants. In a second study, they also investigated the effect Page 4 of 39

Causal Cognition and Culture of formal physics education, which seemed to supplant individuals’ folk theories in the physical domain and thereby to eliminate the cultural differences. Still, the study by Peng and Knowles reopened the idea of cultural differences in the perception of physical causality. Not discussed by the authors was the result of the floating scenario, for which both the US and the Chinese participants alike preferred external causes. This result is puzzling insofar as Bödeker (2006) had obtained explanations (across cultures) focusing almost ex­ clusively on properties of the floater, an internal cause (see also Lewin, 1935). Scrutiniz­ ing this further was the goal of a set of studies conducted in Germany, the Kingdom of Tonga, and China. In the first of these studies (Beller, Bender, & Song, 2009b), partici­ pants were asked to indicate (p. 720) which entity they regarded as causally most relevant for statements such as “The fact that wood floats on water is basically due to… .” Overall, the ratings depended on cultural background, exhibiting a preference for the floater (here: the wood, an internal cause) among German participants, a reversed preference for an external cause in the form of the carrier (water) among Tongan participants, and no clear preference among Chinese participants. However, the specific entities involved in the task also affected participants’ assessments, and did so distinctly for each cultural group. For instance, the German and Chinese participants, but not the Tongan partici­ pants, considered a carrier’s capability for buoyancy only when the floater was a solid ob­ ject such as wood, but not for a fluid like oil (Beller, Bender, & Song, 2009b). The general pattern was largely replicated in a second study with German and Tongan participants that included a broader range of physical relations, contents, and linguistic variations. Again, assessments of causal relevance were affected by cultural background, and the most pronounced difference was observed for the floating scenario, with a clear prefer­ ence for the floater as the main cause among Germans, but not Tongans (Bender & Beller, 2011). To sum up, the studies conducted in the physical domain indicate that people’s explana­ tions of causal relations can differ substantially across cultures. This holds not only for launching and floating, but also for other physical interactions (Bender & Beller, 2011; Peng & Knowles, 2003) and for physical concepts such as weight (Bödeker, 2006). As both the obtained findings and their theoretical interpretation diverge, however, more system­ atic and in-depth investigations are needed.

The Biological Domain: Animates, Essences, and Vitalistic Functions The biological domain is populated by animates, who share with physical objects the properties of continuity and solidity, but, in the case of almost all animals, can move on their own initiative. More important, animates grow, may become ill or injured and heal, reproduce, pass on essential properties, and eventually die (Inagaki & Hatano, 2002, 2006). Causal explanations in this domain may thus be characterized by “vitalistic” princi­ ples—that is, assumptions about a vital power or energy that keeps animates alive (Inaga­

Page 5 of 39

Causal Cognition and Culture ki & Hatano, 2004)—although there is some dispute as to whether these principles per­ tain to all animates alike (Goldberg & Thompson-Schill, 2009). Research on the biological domain has focused primarily on categories and induction (see also Rehder, Chapters 20 and 21 in this volume), and although this may not seem directly relevant to causal cognition at first glance, it indeed is. While questions like whether a pig raised by cows will “oink” or “moo” ask for categorization, they also probe for under­ standings of deep biological causal principles. Similarly, wondering whether a property present in one biological kind is also true of another biological kind may also draw on causal reasoning. In the following, we therefore begin by presenting research on folk-bio­ logical categories and induction before explicating its implications for causal reasoning.

Folk-Biological Reasoning The impact of culture on folk-biological concepts, taxonomies, and theories has attracted considerable attention within cognitive science (e.g., Astuti, Solomon, & Carey, 2004; Berlin, 1992; Boster & Johnson, 1989; Ellen, 2006; Inagaki & Hatano, 2002; Medin & Atran, 1999, 2004; ojalehto, Waxman, & Medin, 2013). Early work focused on taxonomic systems (Berlin, 1972; Berlin, Breedlove, & Raven, 1973; see also Atran, 1993; Berlin, 1992). It suggested that there are universal principles of folk naming and classification systems and that they correspond fairly well with classical taxonomic hierarchies. Thus, when an ethnobiologist asks an informant the local name for a tree, she or he can be rea­ sonably sure that the answer will correspond to the level of genus or species (in local con­ texts, most genera have only one species), and not to a more abstract and inclusive level like tree or plant. These observations also led to a major discovery in the cognitive science of concepts: demonstrations by Eleanor Rosch and her associates (Rosch, 1975; Rosch et al., 1976; see also Berlin, 1992) that one level in taxonomic hierarchies, the so-called basic level, is psy­ chologically salient and privileged by a range of converging measures. This observation is highly relevant to how adults label objects for children and how children in turn acquire concepts. This level is also privileged in inductive reasoning (Coley, Medin, & Atran, 1997), but whether or not this privileging of the basic level holds equally for causal learn­ ing has received very little attention. One possibility is that different levels in a hierarchy may be selected based on the causal coherence they provide for the particular relations of interest to be explained (Lien & Cheng, 2000). For example, if (p. 721) the causal rela­ tion involved the sweetness of the sap of a maple tree, then the causal description should pick out maple tree as relevant, but if the roots of the same tree pushed up through the ground under a sidewalk to produce a bulge in it, then “treeness” and not “mapleness” would be the relevant causal factor (unless, perhaps, maple trees were especially prone to producing bulges). On this account, privilege would be relative to what one is trying to understand—and not necessarily what Rosch calls the “basic level.” To our knowledge, there has been no cross-cultural research on the question of causal privilege.

Page 6 of 39

Causal Cognition and Culture Biological Essences The ability to pass on essential properties is distinctive for biological entities and plays a crucial role in categorization and in causal explanations. For example, one might ask why some bird has the particular song that it has. Ethologists do studies in which they vary the environment and auditory input conditions to which baby birds are exposed. If the bird has never heard its species’ own song, but nonetheless sings it perfectly as an adult, one concludes that this capacity is innate. If, instead, it is crucial for the bird to hear its species’ song, learning is implicated as a causal principle. A widely used scenario in cog­ nitive science entails a similar logic. Participants are given a “switched at birth” scenario, as in the case of a baby pig raised by cows mentioned earlier. The belief that the pig will “oink” and not “moo” reflects the idea that the capacity for species-specific sounds is in­ born and not based on experience. Even without an elaborate model of how innate mechanisms may work (e.g., something to do with genes), people consider essential properties as causative for other surface fea­ tures (Ahn et al., 2001; Medin & Ortony, 1989). Much of the literature on essentialism as a causal principle is nicely reviewed from a developmental perspective in Gelman (2003), and the role of language (e.g., use of generics) in essentialist reasoning has recently be­ gun to be explored (Gelman et al., 2005; Waxman & Gelman, 2010). Essentialist reasoning principles have been widely observed in different cultures (Astuti et al., 2004; Atran et al., 2001; Gelman, 2003; Sousa, Atran, & Medin, 2002). Culturally specific concepts and theories modify how essences are referred to and which inferences are drawn from them (e.g., Au & Romo, 1999; Hatano & Inagaki, 1999; Waxman, Medin, & Ross, 2007). All of this work implies that biological categories are organized by deep causal principles that determine not only physical properties and biological functions, but also patterns of behavior and psychological properties. Finally, we should note that there is a considerable body of work concerning the cognitive consequences of extending essentialism as a causal principle to social categories (e.g., Bastian & Haslam, 2006). Essentializing social difference implies a lack of capacity for change with experience.

Taxonomic Versus Ecological Causal Reasoning So far in discussing folk-biological reasoning, we have implied that inductive reasoning is a form of causal reasoning and have used the terms “category-based induction” and “causal induction” more or less interchangeably. A skeptic might suggest that this re­ search is not about causal reasoning, but rather similarity-based generalization. For ex­ ample, if people are told that eagles have some enzyme x and are asked whether hawks also have enzyme x, isn’t it plausible that they would say “yes” simply because hawks and eagles are fairly similar to each other, without invoking any notion about cause? One counterargument might be that eagles have enzyme x for a reason, and whatever caused them to have it very likely would also cause hawks to have it. For example, if our

Page 7 of 39

Causal Cognition and Culture informants knew or were told that enzyme x helps lubricate the eyes to counteract the wind’s drying action, they might be quite sure that hawks also have it. One might quibble about whether the preceding example involves causal reasoning, simi­ larity-based generalization, or some form of teleological reasoning. So, let’s turn to a more straightforward example. If people are told that field mice have enzyme x and are asked whether hawks also have it, they might well reason that they do, not because field mice are similar to hawks, but rather because hawks eat field mice and eating provides a mechanism for transmitting enzymes from field mice to hawks. Research on category-based induction has shifted from an initial focus on similarity-based generalization (whatever the merits might be of its potential link to causal reasoning) to the inclusion of reasoning in terms of causal mechanisms, as in the field mice example. Early studies on category-based induction used WEIRD participants, who tend to know relatively little about biological kinds (e.g., Lynch, Coley, & Medin, 2000). When causal (p. 722) induction research was extended to other cultural groups and to informants with more biological expertise, it soon became clear that there was another causal mechanismbased reasoning strategy that had been more or less ignored (Bailenson et al., 2002; López et al., 1997; Proffitt, Coley, & Medin, 2000). As another example of this strategy, some 5-year-old Native American Menominee children reason that bears might have the same novel substance inside them that bees do. Despite the fact that bees are taxonomi­ cally distant from bears, they suggested that bees might transmit this substance to bears by bee stings or through the honey that bears eat. So, when do reasoners consider causal mechanisms versus some general notion of simi­ larity? First of all, kinds of similarity matter. Physical similarity is important for reasoning about novel physical properties, but behavioral similarity may be more important for rea­ soning about novel behavioral properties (Heit & Rubinstein, 1994). It now appears that inductive reasoning involving biological kinds relies on a principle of relevance. Accord­ ing to the relevance principle, people tend to assume that the premises are informative with respect to the conclusion (Medin et al., 2003). It is as if they ask themselves, “Why are you telling me this?” and assume that you have a good reason for doing so. When a premise involves bees and the conclusion bears, invoking a relevance principle would suggest that it was important to find some connection between bees and bears (e.g., hon­ ey, stinging). Obviously, relevance may vary with culture and expertise.

The Psychological/Social Domain: People, Mind, and Agency The psychological/social domain is populated by sentient agents who possess mental states such as knowledge and beliefs, goals, self-reflexiveness, and possibly even free will (Leslie, 1995; Pauen, 2000). The set of core psychological principles likely includes au­ tonomous motion, goal-directedness, efficiency, and contingent and reciprocal interac­ tions, which is why motives and intentions play a causative role in their behavior. A sec­ ond core system has been claimed for identifying potential social partners and group members and for capturing the salient features of cooperation, reciprocity, and group co­ Page 8 of 39

Causal Cognition and Culture hesion (Spelke & Kinzler, 2007; Spelke, Bernier, & Skerry, 2013). Among the essential prerequisites for assessing psychological-social causality are a theory of mind and the concepts revolving around the notion of agency.

Theory of Mind A theory of mind provides the basis for inferring mental states in others and, more impor­ tant, for recognizing these states as causes of behavior. It is thus an essential prerequi­ site for folk psychology and causal attributions. While there is likely no doubt that people eventually acquire a theory of mind regardless of cultural background, the questions of whether it uniformly emerges at around 4 years of age and as a single ability (Wellman, Cross, & Watson, 2001) are less clear. Callaghan and colleagues (2005), for instance, re­ port that the Samoan children in their study were significantly older than the other cul­ tural groups when they passed the false belief test. These data does not support the claim (even made in the title of that very article) that mental-state reasoning has a synchronous developmental onset across cultures. Significant differences in the onset, in spite of equal trajectories of development, were also found when comparing two North American and two Chinese groups (Liu et al., 2008). And Junín Quechua children from Peru master the appearance/reality distinction well before they seem to acquire an understanding of rep­ resentational changes and false beliefs (Vinden, 1996). Of course, these assumptions and conjectures rest on the faith that measures and procedures for assessing theory of mind, developed in the West, are commensurable across cultures. As this issue is still under dis­ pute (e.g., Wassmann, Träuble, & Funke, 2013), we consider the status of theory of mind development to be an open question. In at least two of the three exceptional cases just mentioned, a later onset in mental-state reasoning seems to correspond to a cultural discouraging of “mind-reading.” Both Samoans and Quechua appear to be more concerned with what one can objectively know, and to be rather reluctant to speculate about feelings, thoughts, or intentions of others, especially when asked for causes of other people’s behaviors (Gerber, 1985; Shore, 1982; Vinden, 1996; for similar observations in other Pacific societies, see also Wassmann, Träu­ ble, & Funke, 2013). Sometimes this is described as “opacity of other minds” (Robbins & Rumsey, 2008; and see Danziger, 2006; Danziger & Rumsey, 2013). Not being exposed to mental state talk, in turn, may make it more difficult for children to acquire a full-fledged theory of mind (Pyers & Senghas, 2009; Woolfe, Want, & Siegal, 2002). When it comes to adults’ folk-psychological theories of mind, cultural differences are, sur­ prisingly, more, rather than less, pronounced (Lillard, 1998 (p. 723) ; White & Kirkpatrick, 1985). There appears to be a striking cultural variance in the willingness to adopt a firstperson-like perspective on others (Hollan & Throop, 2008; Wu & Keysar, 2007), with sub­ stantial consequences for the extent to which mental causes are attributed and responsi­ bility is ascribed. If, for example, it is difficult (or not desirable) to guess what others are thinking, people are likely held responsible for their wrongdoings, even in cases when ac­ cident or error has been acknowledged (Danziger & Rumsey, 2013; see also Astuti & Bloch, 2015). People placing high regard on the opacity of other minds thus appear to be concerned more with effects of actions than with their causes (Shore, 1982; Throop, Page 9 of 39

Causal Cognition and Culture 2008). Alternatively, one might say that they are less concerned with intentions as a source of causal explanations and are more focused on other, external factors.

Assignment of Agency in Causal Attribution While theory of mind accounts are concerned with the (mental) causes of behavior, ac­ counting for behavior as causes of events involves causal attribution (Hilton, Chapter 32 in this volume; for the relation of attribution and theory of mind, see also Alicke et al., 2015) and is thus closely related to how people perceive and/or construe agency. In this field of research, agency has been defined as the experience of being in control of one’s own actions and thus of the events they cause (Haggard & Tsakiris, 2009). The entity to which agency is assigned is almost exclusively the individual person, at least in many Western cultures, and consequently in most (Western) cognitive accounts. Elsewhere, agency may be assigned to non-human entities, such as plants, the ocean, or the sun (Harvey, 2005; Kohn, 2013; Narby, 2006), to social groups (Duranti, 1994; Menon et al., 1999; Morris, Menon, & Ames, 2001), and/or to supernatural entities (e.g., Bird-David, 1990, 2008; Norenzayan & Hansen, 2006)—to the extent that these agents are included in theories of mind (Knight, 2008; Knight et al., 2004). As emphasized by Widlok (2014), however, these differences in assigning personhood and agency to such entities are likely to be a matter of degree rather than kind, with equally strong intra-cultural variation in the West—and especially between the scientific model and layperson conceptions—as elsewhere. Closely related to assigning agency to other entities is the more fundamental question of how people perceive and construe their own agency. Going far beyond the interest in re­ sponsibility ascription, this question touches on discussions of free or conscious will to in­ clude issues of embodiment and action perception (Schütz-Bosbach & Prinz, 2007). Wegn­ er (2002, 2003), for instance, proposes that agency—the experience of conscious will—is in most cases merely an illusory add-on to perceived action. He identifies three principles for inferring agency: priority, consistency, and exclusivity. If a relevant thought reaches consciousness just before the action, is consistent with it, and is the only reasonable can­ didate for it, people perceive authorship, even of actions that were not theirs (Wegner, Sparrow, & Winerman, 2004), and if any of these principles is violated, they will fail to perceive authorship, even for events they did cause (Dijksterhuis et al., 2008). Cross-cultural accounts of how universal the processes of agency construal may be are still lacking. We do know, though, that several aspects of “supernatural” experiences are closely linked to notions of agency—agency either claimed for actions most likely not initi­ ated by the respective person (as in cases of “magic”), or disclaimed for actions evidently committed (as in cases of spirit possession)—and these are obviously culture-specific to a substantial extent. The same holds for agency disorders such as schizophrenia, which is almost universally characterized by symptoms such as hearing voices of others in one’s head. Still, depending on cultural theories of mind and of inter-individual relations, these voices are experienced in culture-specific ways (e.g., as intrusive, violent, and unreal symptoms, as communication by morally good and causally powerful spirits, or as useful guidance provided by kin), and, as a consequence, they are also distinctively diagnosed Page 10 of 39

Causal Cognition and Culture and treated (Luhrmann et al., 2015; and see Ahn, Kim, & Lebowitz, Chapter 30 in this vol­ ume; Hagmayer & Engelmann, 2014; Seligman, 2010).

Relevance and (In)variance of Domain Boundaries Many events in people’s lives do not fall neatly within domains, however they are defined. If the roof of a hall collapses after heavy snowfall, one factor involved is clearly the physics of weight and structural engineering, while a second factor is the people who did the construction work or supervised it. Similarly, accounts of illnesses in the field of eth­ nomedicine often comprise social aspects as well as biological ones (e.g., Froerer, 2007; Nguyen & Rosengren, 2004), thus exhibiting a concept of (p. 724) psycho-physical dualism (for a more detailed treatment, see ojalehto & Medin, 2015). In this subsection, we there­ fore focus on cultural variability at the fringes of the ontological domains, addressing dis­ tinct, yet related topics: (1) the overlap of domains for relevant events, (2) the blending of domains in people’s concepts and representations, and (3) the role of culture in determin­ ing domain boundaries.

Overlapping of Domains: Ecosystems and Mental Models of Nature Ecosystems involve components of more than one of the other domains. Typically, they are considered as part of the biological domain, because the biological species within an ecosystem are its most salient and (debatably) relevant components for humans. Howev­ er, physical entities such as rocks, soil, water, or wind are essential components of ecosys­ tems as well—and are perceived as thus in many indigenous groups (Medin & Bang, 2014). On the other hand, many people around the world regard social relationships with other inhabitants of their ecosystem as only natural (e.g., Atran & Medin, 2008; Knight, 2008; Le Guen et al., 2013). In addition to their heterogeneous composition, ecosystems contain qualitatively different properties that justify their treatment as a distinct domain. Entities within an ecosystem constitute a tight network that is characterized by complexi­ ty, non-transparent relations, non-linear processes, and emergent phenomena. Because of these properties, human interactions with ecosystems are extremely challenging, both in terms of causal perception, reasoning, and understanding, as well as in terms of problemsolving and management (Dörner, 1996; Funke, 2014; Hmelo-Silver & Azevedo, 2006; Tucker et al., 2015; White, 2000).

Causal Learning and Complexity One important line of research focuses on how people deal with complex problems (see also in this volume, Hagmayer & Fernbach, Chapter 26; Meder & Mayrhofer, Chapter 23; Osman, Chapter 16). To some extent, complexity is a question of perspective, and even for relatively simple artifacts, laypeople may come up with different models on how they function (Keil, 2003; Lawson, 2006). Depending on their causal model, people may also handle these artifacts differently. If they conceive, for instance, of a thermostat as turning the furnace on or off according to room temperature, running at a single constant speed (in line with a “feedback” model), they will tend to set it and leave it alone. By contrast, if they conceive of the thermostat as also controlling the rate of heat flow (in line with a Page 11 of 39

Causal Cognition and Culture “valve” model), they will tend to set it higher if they wish to heat up the house faster (Kempton, 1986). One method of experimentally investigating problem-solving strategies for complex prob­ lems is by so-called microworlds: computer simulations of dynamic systems that are char­ acterized by complexity, opaqueness, and dynamic development. Such microworlds range from simple scenarios, like regulating the temperature in a cold storage facility via a ther­ mostat, to the truly complex scenario of serving as mayor of a town (Dörner, 1996). Mi­ croworlds are also employed to investigate cultural differences in people’s performance in managing complex systems (Güss & Dörner, 2011; Strohschneider & Güss, 1998). Solving the complexity problems in such situations requires the participants to come up with at least implicit assumptions concerning the underlying causal principles as well as the consequences their decision have, which—due to the very nature of complex problems —poses severe challenges to the ordinary participant (Funke, 2014). As pointed out by Güss and Robinson (2014), efforts to manage the system are affected by cultural back­ ground in at least three different ways. First, culturally mediated experiences likely shape the knowledge, problem-solving heuristics, and perceptions of control that people bring to the table. Confucians, for instance, are expected to more likely adopt a heuristic orient­ ed toward a middle-way solution than non-Confucians. Second, cultural values likely shape which aspects of both the problem and the solution will be prioritized, for instance, whether social and relational aspects of a decision are taken into consideration. And third, the cultural learning environment of people likely shapes the temporal horizon for planning and decision-making. Whether participants prefer, for instance, long-term obser­ vations of changes and slow adjustments to complex systems, or direct reactions to the momentary situation, seems to depend on the extent to which their cultural environment has been stable and predictable, with participants from Germany exhibiting more re­ liance on long-term stability than those from India (Güss & Dörner, 2011; Strohschneider & Güss, 1998).

The Special Case of Systems Learning and Ecosystems Studies on the implications of environmentally and ecologically sound decision-making (p. 725) and behavior have been addressed during the last 25 years and have involved cross-cultural and anthropological research (e.g., Casimir, 2008; Kempton, Boster, & Hart­ ley, 1995; McCay & Acheson, 1987). Causal knowledge is of specific relevance here, as it affects risk perception and subsequent behavioral intentions (Bennardo, 2014; Viscusi & Zeckhauser, 2006). The third author has been part of an interdisciplinary research team aiming at a better understanding of how culture affects biological concepts, ecological models, attitudes to­ ward nature, and their implications for resource exploitation and conflicts. For example, our studies of ecological models in Guatemala reveal that one cultural group sees plants as helping animals and animals as helping plants, while another cultural group has the same model for how plants help animals but denies that animals help plants (Atran &

Page 12 of 39

Causal Cognition and Culture Medin, 2008). The richer model is associated with greater sustainability of agro-forestry practices. Related studies of Native-American and European-American hunters and fishermen in rural Wisconsin (Medin et al., 2006; Ross, Medin, & Cox, 2007) show different resource management strategies (and conflict over such differences) linked to different under­ standings of nature and the relation of humans to the rest of nature. For example, if a cul­ tural group sees itself as apart from nature, management practices may focus on having minimal effects, as in the strategy of “catch and release” for sports fishing, where one at­ tempts not to kill a single fish. If, instead, a group sees itself as a part of nature, manage­ ment strategies such as “do not waste,” where only those fish needed are taken, may be more appealing (Bang, Medin, & Atran, 2007; Medin, et al., 2006). Broad differences in cultural models, or what one might call “epistemological orientations” (i.e., ways of looking at and understanding the world), are associated with cultural differences in ecological reasoning in particular (Medin & Bang, 2014; Unsworth et al., 2012) and causal reasoning about complex systems more generally (Olson, 2013). As one example, in one set of interviews, Ross and colleagues (2007) asked Native-Ameri­ can Menominee and European-American hunters questions about whether different ani­ mals help or hurt the forest. For the probe involving porcupines, the European-American hunters uniformly said that porcupines hurt the forest because they girdle the bark of trees and the trees die. Menominee hunters were more likely to say that “everything has a role to play,” and a significant minority of Menominee hunters indicated that porcupines help the forest. They agreed that porcupines girdle the bark of trees and the trees die, but they went on to say that this opens up the forest to more light, which allows for more forest undergrowth, which in turn helps the forest retain moisture. This cultural differ­ ence suggests almost a qualitative difference in causal reasoning in the domain of ecosys­ tems. Note that ecological cognition is rich with causal relationships and therefore opportuni­ ties for causal learning. But also note that the issues of causal learning for such complex systems are almost different in kind from the most common cognitive science paradigms for studying causal cognition. Typically, the latter type of studies asks which of a set of potential causes is responsible for some effect. Ecosystems and other complex systems of­ ten involve feedback processes where A affects B and B affects A, as well as emergent processes that take place at different levels of scale. For example, wolves prey on individ­ ual deer, but wolves may help the deer population as a whole by selectively taking the weak and the sick. Ecosystems also typically involve a web of relationships, and simple causal reasoning may lead to incorrect inferences. If two predator species of fish target some prey species, re­ moving one of the predators does not necessarily lead to more members of the prey species, as WEIRD college students commonly assume (White 1997, 1999, 2000). More likely, it will lead to greater numbers of the surviving predator species.

Page 13 of 39

Causal Cognition and Culture Finally, we note that it is these complex contexts that elude a comprehensive causal analysis, in which religious beliefs may serve a vital function. For example, in the tragedy of the commons (Hardin, 1968), if each person acts rationally to maximize his or her own benefit, the resource will quickly be exhausted. Individual conservation may thus only serve to subsidize the greed of others. However, if there are spiritual entities that will punish greed and disrespect for the resource, then people may restrain their behavior (e.g., Atran & Medin, 2008). Cultural models of ecosystems or “nature” thus often go hand in hand with religious frameworks (e.g., Bird-David, 1990, 2008).

Blended Concepts and Representations: Science Versus Religion? Even if cognitive approaches succeed in identifying domain-specific concepts and domainspecific principles for causal reasoning, the boundaries (p. 726) between domains may be far less clear for laypeople than for researchers, particularly if we go beyond the average psychology lab and its population of highly educated (WEIRD) students. People do not al­ ways hold just one causal model for a specific event (see Gelman & Legare, 2011, for a re­ view), and conceptual clusters and models are often of a heterogeneous and variable composition, pertaining to different domains. Religion is one of the cultural domains that provide a particularly rich source for compet­ ing concepts—and sometimes for concepts that are in direct conflict with basic biological knowledge (Bering, 2006; Rice, 2003). And yet such concepts need not be incompatible (Legare et al., 2012; Malinowski, 1948; O’Barr, 2001; Watson-Jones, Busch, & Legare, 2015). The belief that a collapsing granary killing a person is proximally caused by ter­ mites (Evans-Pritchard, 1937), or that an illness is proximally caused by an infection (Legare & Gelman, 2008), does not contradict the belief that these events are also distally caused by witchcraft. The joint consideration of “natural” causes and “witchcraft” as sup­ plementing each other for bringing about an effect is explained by the Azande (from which the granary example is taken) by a metaphor. Just like two spears that hit an ele­ phant at the same time equally contribute to its death, so are the termites and the ill in­ tentions of a witch equally contributing to the collapse of the granary (Evans-Pritchard, 1937; and see Widlok, 2014, p. 5). Framing effects are another mechanism that accounts for assessing the same entities dif­ ferently in different interpretive systems (Keil, 2007). Animals, for instance, can be re­ garded as objects (e.g., when falling from a tree due to gravity), as living kinds (when be­ ing wounded), or as intentional agents (when defending their offspring). Interestingly, children appear to be more likely than adults to adopt a biological perspective, rather than a supernatural one, when asked for whether death terminates all bodily and mental processes, as was indicated by a study among the Vezo of Madagascar (Astuti & Harris, 2008). These observations suggest that biological explanations may predate those based on supernatural concepts, and that these original explanations are then increasingly mod­ ified by cultural input (see also Barrett & Behne, 2005; Bering, Hernández-Blasi, & Bjork­ lund, 2005).

Page 14 of 39

Causal Cognition and Culture Cultural worldviews or framework theories may define domain boundaries in specific ways, thus also suggesting different causal schemata. A dynamistic conceptualization, for instance, revolves around an impersonal force, such as mana in Polynesia (Firth, 1970; Shore, 1989), inherent even in objects of the physical domain. Animistic conceptualiza­ tions, on the other hand, consider more entities as animate than Western biologists are prepared to accept. The more people consider such concepts, the more we can expect to encounter permeable category boundaries in causal attributions and explanations. The in­ creasing interest in the cognitive foundations of such concepts (Legare et al., 2012; Whitehouse & McCauley, 2005) is thus also an important step in understanding the cul­ tural constitution of domain boundaries.

The (In)variance of Domain Boundaries Across Cultures The domain-specific approach to causal cognition seemed to be justified by the assump­ tion that domain boundaries are invariant (Atran, 1989) and that their perception is based on innate core knowledge (Spelke & Kinzler, 2007). These domain boundaries are violat­ ed, for instance, when ascribing (psychological) intentions to an (inanimate) computer. Several researchers consider such violations that intermingle the core attributes of physi­ cal, biological, and psychological entities and processes as “category mistakes” (Carey, 1985; Keil, 1994), and some even draw on them for defining “supernatural” explanations in contrast to “natural” explanations (e.g., Lindeman & Aarnio, 2006). But whether do­ main boundaries do restrict generalizations may depend on culture (e.g., Rothe-Wulf, 2014). Recall the findings reported earlier, according to which attribution patterns from the social-psychological domain can extend to the physical domain (Peng & Knowles, 2003). Overlap of the physical and the biological domain was also observed in the anima­ cy task in Bödeker’s (2006) study on the development of intuitive physical knowledge. While only a minority of the German participants regarded clouds, fire, and waves as alive, a majority of the participants on Trobriand did so. This cultural difference was mod­ ulated by the degree of formal education on Trobriand, with more non-schooled than schooled participants assigning animacy to these entities. Importantly, across cultural groups, the categorization of an entity as animate was justified with the manner of mo­ tion. Explanations for the opposite categorization as inanimate, on the other hand, dif­ fered considerably: material or inner structure of the entities and their dissimilarity with human beings were (p. 727) the main arguments of the German participants, but were not even considered by the non-schooled Trobriand participants, who exclusively focused on aspects of motion or on their nature as artifacts instead (Bödeker, 2006). With regard to the biological/psychological boundary, findings are even more puzzling. Some studies indicate that essentialist representations are easily transferred from the bi­ ological to the social-psychological domain, more specifically to social groups (e.g., Gel­ man, 2003; Gil-White, 2001), and this may be a tendency that appears across cultures (Sperber & Hirschfeld, 2004). Other cross-domain generalizations appear to be more con­ strained by cultural background (e.g., Medin & Atran, 2004; Morris & Peng, 1994). Obvi­ ously, cultural framework theories play a more important role in defining domains and their relevance for causal explanations than previously assumed, specifically as they have Page 15 of 39

Causal Cognition and Culture implications for which causal principles apply (see also ojalehto et al., 2015; Ross et al., 2003). While all these findings provided indirect evidence for the permeability of domain bound­ aries, only one study so far has directly targeted these boundaries. To investigate their susceptibility to cultural influence, Rothe-Wulf (2014) collected data on the emic constitu­ tion and delineation of domains, and on how causally relevant factors are assigned to these domains. She had Tongan participants categorize such factors from a broad range of semantic fields according to their similarity. These fields included inanimate objects, physical forces and entities (such as sun, wind, or ocean), plants, animals, people of dif­ ferent status and rank, social institutions, supernatural entities, and others. On a coarse level of granularity, her findings seemed to support the widespread assumption of three ontological domains for causal cognition: the physical, biological, and psychological do­ main. A closer look at the data revealed, however, (1) that animals and plants are rather strictly separated, (2) that the (physical) ocean and its (biological) inhabitants together constitute one distinct domain, and (3) that “supernatural” phenomena and entities are closest to, and in fact overlap with, the human domain. More important, the assumed do­ main boundaries appear by far more permeable than they should be if they were based on innate categories for human information processing. Rothe-Wulf’s (2014) findings also fit the general observation that the categories on which the sorting of entities—and hence domain structure—is based may shift depending on context (Atran & Medin, 2008).

Summary The main ontological domains are defined by entities having the same kind of causal properties. Based on the amount of agency involved, these domains are hierarchically structured: from the physical domain with no agency involved to the social domain with a maximum of agency. The degrees of freedom for interpretation increase accordingly from the physical to the social domain, where causal explanations are necessarily based on un­ certain data and fallible inferences. Cultural variability in causal explanations appears to increase along the same lines from the physical to the social domain, but nonetheless is evident for all of these domains. Yet, the data available so far also caution against the a priori assumption that causal cog­ nition depends on domain—or that domain boundaries are universal. If one were to take domain boundaries as fixed and non-negotiable, then a blending of domains could be con­ sidered a metaphorical extension or even a “categorial mistake,” and may be taken as the constitutive element for “supernatural” beliefs. However, such a stance would prevent us from addressing the fundamental role that culture may have in defining and delineating domains in the first place. What belongs to the domain of the ocean in Tonga may fall into either the physical or the biological domain in Germany, and what is “supernatural” from a Western perspective may be completely natural from an Azande perspective. Domains do overlap in reality and in people’s conceptualization of them (ojalehto & Medin, 2015). If we wish to understand how people with different cultural backgrounds acquire compet­ ing concepts, how they access and integrate them, and how they cognitively process them Page 16 of 39

Causal Cognition and Culture when engaging in causal explanations, we need to investigate this overlap much more thoroughly. Moreover, new domains may emerge if we adopt multiple cultural perspectives, such as (folk-)ecology with a focus on interactions of organisms, weather, and other forces (Medin et al., 2013), or (folk-)sociology with a focus on social relations (Hirschfeld, 2013). The latter would also be able to account for cultural differences in the emphasis on conse­ quences of human behavior relative to its causes (Danziger, 2006; Lillard, 1998; Robbins & Rumsey, 2008).

Cultural Impacts on Cognitive Processing The cross-cultural findings compiled in the preceding section justify the call for a more thorough (p. 728) investigation of the critical role that culture may play in causal cogni­ tion. In this section, we focus on an initial set of processes that have been identified as candidates for mediating such an influence, including (1) causal attribution tendencies and implicit theories, (2) the causal asymmetry bias, and (3) linguistic factors. Most of them revolve around shifts in attention and/or the activation of additional information.

Attribution Tendencies and Implicit Theories Earlier, we discussed the relevance of the concept of agency for causal cognition and not­ ed that, across cultures, people may differ in the extent to which they are willing to spec­ ulate on other people’s mental states as causes of their behavior. This is closely linked to research on causal attributions, which has identified a range of attribution biases, most of which lead people to attribute other people’s (negative) behavior in terms of their disposi­ tions, while underestimating the influence of circumstances (for a meta-analysis, see Malle, 2006). None of these biases, however, is immune to cultural influence. The first swath of re­ search on how culture may affect attribution tendencies was inspired by the allegedly deep divide between two cultural clusters: “Western” cultures (primarily US) were found to be comparatively more individualistic and analytical, to hold values that emphasize an independent self-concept, the importance of personal accomplishments for one’s identity, and the focus on rights over duties; “Eastern” (or East Asian) cultures, in contrast, were found to be comparatively more collectivistic, to regard the self as interdependent and as part of larger social groups that bind and mutually obligate the person, to value duties over rights, and to be strongly concerned with social harmony (Markus & Kitayama, 1991; Miller, 1984; Norenzayan & Nisbett, 2000; Triandis, 1995). Although subject to de­ bate (e.g., Fiske, 2002; Takano & Osaka, 1999), these observations are largely accepted as supporting an impact of culture on attribution styles (Oyserman, Coon, & Kem­ melmeier, 2002), and even on basic perceptual processes (Masuda & Nisbett, 2001; Oy­ serman & Lee, 2007). Recall that when Morris and Peng (1994), for instance, presented animated displays of interactions between a single fish and a group of fish, their US par­ ticipants were more likely than the Chinese participants to explain the single fish’s behav­ Page 17 of 39

Causal Cognition and Culture ior by internal causes. The same pattern was found in (US vs. Chinese) newspaper re­ ports on crimes (Morris & Peng, 1994, Study 2) and participants’ accounts of murder cas­ es (Study 3). As one candidate for the mechanism by which these differences in attribution styles are driven, Morris and Peng (1994) proposed “implicit theories” (or “systems of thought” in Nisbett et al., 2001)—a cognitive framework that guides the encoding and representation of behavioral information, and that has impacts on perceptions, evaluations, and judg­ ments involving causality. As implicit theories revolve around the relationship between the individual and the group, their influence should be constrained to the social-psycho­ logical domain (and perhaps extended to interactions in the biological domain that can be anthropomorphized), but should not generalize across domains. On this account, cultural differences in causal attribution should be found for the social domain, but not the physi­ cal domain (Morris & Peng, 1994; Morris, Nisbett, & Peng, 1995). Only if the trajectories of physical entities deviate from physical laws should impressions of animacy and thus so­ cial interpretations be invited (Heider & Simmel, 1944). Peng and Knowles (2003) found cultural differences even in the physical domain, but still explained them along similar lines: whereas the Aristotelian folk physics prevalent in the West focuses on objects and their dispositions, such as a stone’s propensity to sink in wa­ ter, Chinese folk physics is based on an inherently relational, contextual, and dialectical conceptualization, accommodating the idea that forces such as gravity may act over dis­ tance and may be exerted by a medium such as water (Needham, 1954). That the distinct influences of this folk understanding are attenuated by formal physics education (Morris & Peng, 1994; Peng & Knowles, 2003) is an interesting observation in itself, as it has been suggested that so-called naïve physics is not easily changed by formal education (much to the chagrin of physics teachers). For studies on causal attribution, which have been primarily concerned with the relative proportions of internal and external attributions, it may have been sensible to opera­ tionalize culture in terms of the relationship between individual and group (Peng, Ames, & Knowles, 2001). The assumption that this single, even if central, social-psychological dimension should be sufficient to account for the full range of cultural differences in causal cognition (or even the intracultural variation), however, is neither plausible nor tenable. For example, consider the set of studies conducted on the floating scenario (de­ scribed in the section “Causal Attributions (p. 729) for Launching and Floating”). The sam­ ples from Germany and Tonga constitute almost perfect complements, with Tongan par­ ticipants exhibiting a more interdependent self-concept than German participants (Ben­ der et al., 2012), and even more so than the Chinese ones (Beller, Bender, & Song, 2009a). Aggregated across scenarios, these cultural differences in social orientation in­ deed correlated with differences in causal attributions (Beller, Bender, & Song, 2009b), as would be predicted by attribution theorists: the (individual) floater was rated causally most relevant in Germany and least in Tonga. However, a closer look at the experimental conditions indicated that the factors content and linguistic cues not only affected partici­ pants’ assessment as well, but also did so distinctly for each cultural group. More con­ Page 18 of 39

Causal Cognition and Culture cretely, when asked about oil (rather than wood) floating on water, judgments not only shifted, they reversed among the Chinese and Tongan sample, and even into diverging di­ rections: the Chinese participants now focused more strongly on the floater (oil) than the carrier (water), and the Tongan participants more strongly on the carrier (water) than the floater (oil). Such effects of content emerge because people habitually access causal background knowledge that goes beyond the information explicitly given (Beller & Kuhn­ münch, 2007; Waldmann, Hagmayer, & Blaisdell, 2006), and likely includes culture-specif­ ic concepts, which apparently also modulate the reasoning process, even in rather simple physical scenarios (Beller, Bender, & Song, 2009a).

Causal Asymmetry Bias A related, yet distinct pattern to be found in this field is the causal asymmetry bias (White, 2006), whereby people see the forces exerted by bodies on each other as unequal, even when they are not. That is, even in strictly symmetric interactions (in physical terms defined by action = reaction), causal roles are assigned such as that one object is identi­ fied as being “the cause” and another as “the effect,” and the force exerted by the cause object is perceived as being greater than the force exerted by the effect object (if the lat­ ter is not neglected altogether). For this reason, a scene in which one object hits another object, which then begins to move while the first one stops, is typically described as “the first object launched the second,” rather than “the second object stopped the first.” White (2006) considers the asymmetry bias a general feature of causal cognition that affects most of what people perceive, believe, and linguistically express with regard to causal re­ lations and may even restrict research questions and methods of researchers. Initially, this bias was primarily demonstrated in experiments with English-speaking par­ ticipants on a variety of (dynamic) collision events (White, 2007). In a study that extended the range to static settings and to participants from two very different cultural back­ grounds (i.e., German and Tongan), similar patterns were observed, yet with important culturally variable nuances (Bender & Beller, 2011). Causal asymmetries varied across tasks and cultures: in four tasks, asymmetry patterns were the same in that both groups prioritized the same entity as causally relevant; in one task (i.e., the floating scenario), they were opposing in that Tongans prioritized the carrier and Germans the floater; and in the remaining four tasks, asymmetry patterns were detected in one cultural group only. The strength and direction of the asymmetry also varied across types of physical relations (like buoyancy or mutual attraction), the entities involved (such as freshwater, wood, or cornflakes as floaters), and whether the event was described in an abstract or concrete manner (as with “celestial bodies” vs. “earth” and “moon”). Interestingly, the question on whether, overall, the cultural differences reflect differences in social orientation (or “implicit theory” in Morris & Peng’s terminology) depended on the level of aggregation. Aggregated across individual ratings, the asymmetry bias was less pronounced among the Tongan participants, with average ratings close to the mid­ point of the rating scale, in line with the prevailingly collectivistic values (and in contrast to the German pattern that was skewed toward the floater). On the individual level, how­ Page 19 of 39

Causal Cognition and Culture ever, Tongan participants gave substantially more asymmetrical ratings than the German participants, with the vast majority giving strong ratings either for the floater or the car­ rier (Bender & Beller, 2011)—a finding that cannot be accounted for by any of the implicit theories or ensuing attribution biases presented earlier.

Linguistic Factors That linguistic frames and cues can influence cognitive processes and specifically memo­ ry has long been known (e.g., Loftus & Palmer, 1974). Language is also among the most potent tools for focusing attention on possible causes (for overviews, see in this volume, Solstad & Bott, Chapter 31; Wolff & Thorstad, Chapter 9). Introducing one of several equally relevant factors as “given”—as in “given enough sunlight, flowers (p. 730) grow well when a fertilizer is added”—relegates this factor (here: the sun) to an enabling condi­ tion and increases the likelihood of other factors (such as the fertilizer) to be considered as the prime causative (Kuhnmünch & Beller, 2005). The choice of a particular verb may also affect people’s assessment, as it implicitly suggests a specific causal relation; “be­ tray,” for instance, spotlights the agent as likely cause, whereas “praise” shifts the focus to the patient (overview in Pickering & Majid, 2007). The verb chosen may even reframe the entire scene: while both sentences “A launches B” and “B stops A” describe the same event, they assign agency in fundamentally different ways (e.g., Mayrhofer & Waldmann, 2014). Because languages differ systematically in how they frame causal events and express or mark agency (e.g., Bohnemeyer et al., 2010; Ikegami, 1991; Kanero, Hirsh-Pasek, & Golinkoff, 2016; Wolff et al., 2005, 2009), they are powerful candidates for mediating cultural effects on causal cognition (please note that we treat language here as one aspect of culture, while the relationship between the two is in fact much more complex). In the following, we focus on how the thematic roles of agent and patient are assigned to either entity in question, as this is not only a feature that can be individually manipulated, but also one for which cross-linguistic differences at both the structural and the conventional level have been documented. At the structural level, languages differ with regard to how they categorize agents and patients. Consider the intransitive sentence “He walked” versus the transitive sentence “She guided them.” Nominative-accusative languages treat subjects or agents in these two sentences alike, namely with the nominative case (“he”/“she”), while setting them apart from the objects or patients of transitive sentences (in the accusative: “them”). Ergative-absolutive languages, on the other hand, treat agents of intransitive and pa­ tients of transitive sentences alike (with the absolute case), while agents of transitive sen­ tences are set apart by the ergative case. These languages thus distinguish grammatical­ ly between actions of an agent that have effects only for him- or herself and those that al­ so have consequences for others (Duranti, 1994). According to Goldin-Meadow (2003), the patient focus inherent in the ergative pattern may be the “natural” way of viewing an action. Deaf children of English-speaking parents, for instance, direct more attention to the patient in a sentence like “The mouse eats the cheese” (Goldin-Meadow, 2003). If this holds more generally, then introducing in a phrase like “the cheese is eaten” a transitive Page 20 of 39

Causal Cognition and Culture agent (“by the mouse”), and marking it with the ergative case should provide a particular­ ly potent tool for attributing agency in an event. This is exactly what has been observed in social interactions in Samoa, where the ergative in the sociopolitical domain is used al­ most exclusively by people of higher rank and predominantly for praising God or the poli­ ty for positive things (such as social order), and for blaming single persons, typically of lower rank, for violations of rules (Duranti, 1994). Whether this tool may also affect causal attributions in other domains was investigated by Beller, Bender, and Song (2009b). One of the variations in their design involved a shift in the linguistic cue: describing wood either as floating on water or as being carried by water. In Tongan this implies a shift from absolutive to ergative marking, and this shift in­ verted the Tongan ratings from a preference for the floater to one for the carrier. The ef­ fect appears to be weak, however, and to depend on specific phrasings (Bender & Beller, 2011, submitted). Another, optional way of dealing with thematic roles is by dropping the agent when turn­ ing transitive or agentive sentences (as in “Tina broke the glass”) into intransitive or nonagentive sentences (“The glass broke”). Fausey and colleagues tested whether speakers of different languages have distinct preferences for using either type of sentence when describing events, and how this might affect eyewitness memory. They presented videos of an intentional or accidental version of various events. Participants were asked to de­ scribe the event and to identify the person involved in it. Speakers of English as well as Spanish (or Japanese) alike described intentional events agentively, but the latter were less likely to use agentive descriptions also for the accidental events. The same pattern emerged in the memory task, with the two groups remembering the agents of intentional events equally well, but with Spanish (Japanese) speakers being less likely to remember the agents of accidental events (Fausey & Boroditsky, 2011; Fausey et al., 2010). And fi­ nally, after having listened to a series of non-agentive phrasings with the goal of identify­ ing the corresponding picture, even English speakers became less likely to remember the agents of accidental events in a subsequent task than participants who had listened to agentive phrasings (Fausey et al., 2010, Study 3). In other words, priming a non-agentive speech pattern effectively shifted English speakers’ (p. 731) attention away from the agents of accidental events. Habitual patterns of linguistic framing thus appear to affect whether people pay attention to, encode, and remember different aspects of the same event.

Summary This section has demonstrated that, across domains, the way in which people reason causally about certain events may vary depending on their cultural background. We have addressed the question of whether cognitive processes may also vary depending on their cultural background, thus mediating such cultural influences. It appears that cultural background affects the readiness to attribute behavior and events to internal versus ex­ ternal causes or the tendency to single out one of several possible factors and to pay more or less attention to the agent. Most accounts revolve around shifts in the focus of at­ Page 21 of 39

Causal Cognition and Culture tention and/or the activation of additional information, and alternatively identify implicit theories, habituation processes, and/or linguistic factors as the driving force for them. To address the exact manner in which culture exerts this influence, we will need more so­ phisticated conceptions of culture that go beyond values on a single dimension of cultural differences. One aspect of a more nuanced conception of culture is the question of whether separation in content and processing is a sensible thing to begin with. We turn to both issues in the next sections.

The Cultural Context of Causal Cognition Causal inferences often depend on the causal field, that is, they depend on what is seen as background and hence of little causal relevance (such as birth being a condition for death and thus part of its causal field, but still not its cause), in contrast to factors that make a difference (Einhorn & Hogarth, 1986). Given that so much of these causal infer­ ences appear to be sensitive to circumstances—the question of when and how domain boundaries are relevant, for instance, or the concepts activated during processing—it seems only natural to also focus more explicitly on the context in which causal cognition occurs as one of its three major dimensions besides content and processing. Specific attention has been paid to situations in which religious (or “supernatural”) ver­ sus “natural” explanations for events are activated (e.g., Astuti & Harris, 2008; Legare et al., 2012; Malinowski, 1948; Tucker et al., 2015; Walker, 1992). A very prominent example is Evans-Pritchard’s (1937) chapter on the collapsing granary and how it is explained by the Azande (see the section “Blended Concepts and Representations: Science Versus Reli­ gion?”). Whether this event is attributed to an infestation with termites or to witchcraft depends on the question at stake: if the issue is the proximal cause of why the granary collapsed, then its destabilization by termites is the appropriate level to focus on. But there is also the question of the distal cause: Why did the granary collapse exactly at that point in time when a particular person was sitting underneath, resulting in his injury or death? By Azande sensibilities, such an apparent coincidence of unfortunate circum­ stances is much more likely to be accounted for by the ill intentions of a powerful person than by chance. The main difference between the European and the Azande approach, as identified by Evans-Pritchard (1937), arose thus not so much from how people internally construe causality, but from how far (proximal vs. distal) causal explanations are expand­ ed and from which entities are included as causal agents (Widlok, 2014). Similarly, Vezo participants from rural Madagascar have been shown to more likely de­ scribe death as terminating all bodily and mental processes when the context of the ques­ tion is related to the corpse than when it related to the ancestral practices associated with the afterlife (Astuti & Harris, 2008; for similar studies in Nigeria and Vanuatu, see also Walker, 1992; Watson-Jones et al., 2015). More recently, Astuti and Bloch (2015) also investigated the conditions under which Malagasy people take intentionality into account when assessing acts of wrongdoing, and again found an important effect of context (for psychological treatments of this topic, see Lagnado & Gerstenberg, Chapter 29 in this vol­ Page 22 of 39

Causal Cognition and Culture ume). A sharp distinction between intentional wrongdoing and wrongdoing through negli­ gence or by accident is drawn for mundane events, less so for breaches of ancestral taboos, and almost never for instances of incest. While intentionality is recognized as an important factor in cases of wrongdoing, its consideration depends on the importance of the issues at stake—in the case of incest, the catastrophic consequences for the whole community (Astuti & Bloch, 2015). In a nutshell, depending on culture and context, people appear to differ in their willing­ ness to make causal attributions and ascribe responsibility, especially being less inclined to do so when cultural values emphasize the secrecy or privacy of people’s mental life (Danziger & Rumsey, 2013; Robbins & Rumsey, 2008). They also differ in the extent to which they actively search for information for (p. 732) causal attributions and ascriptions of responsibility, either because they rely on what they already know, or because they take the potential causal factor not as enduring and hence irrelevant for anticipating fu­ ture events (Beer & Bender, 2015). People differ in the number of possible causes they take into consideration, in the extent to which they trace causal links in the system (e.g., in seeking distal as compared to more proximal causes and consequences), in their sensi­ tivity to covariation, and in their likeliness to anticipate changes based on previous trends (Choi et al., 2003; Maddux & Yuki, 2006). And even within cultural groups, the concern with these topics, the usage of respective vocabulary, and the conceptual ingredients can change over (historic) time (Iliev and ojalehto, 2015).

Tasks for Future Research People across the world search for causal explanations, but the degree of cultural varia­ tion in how they do so is substantial. Even the very assumption that people engage in causal considerations on a regular basis is debatable; in various cultural groups, it seems to be common practice not to reason about the motives of people’s behavior, but to focus on its consequences instead. This observation challenges laboratory studies of causal rea­ soning in two ways. One is that they may be relying on participant samples, who are atyp­ ical in their focus on intentionality or motive, and the other is that studies which ask di­ rectly about intentionality bypass the issue of when intentionality is seen as relevant to causal cognition. Moreover, the concepts upon which causal explanations are based may be drawn from a heterogeneous pool of competing concepts. The sources for these concepts range from (popular) scientific knowledge and folk wisdom to “supernatural” or religious beliefs. Both this pool of concepts and the principles for selecting among them are affected by culture and context. More important, the distinction between such categories is likely cul­ turally construed, as may be the domain boundaries on which these categories are based. And finally, even the mechanisms underlying causal conclusions may be affected by the cultural context in which people grow up and by characteristics of the language they speak. This constitutes an additional set of challenges for laboratory studies of causal rea­ soning and the theories growing out of them. Page 23 of 39

Causal Cognition and Culture Given these substantial influences of cultural background on causal cognition at a range of levels, the field of cognitive science needs to change its research practices, taking sam­ ple diversity much more seriously than it has in the past. Constraining research to partici­ pants from Western, educated, industrialized, rich, and democratic countries (Henrich, Heine, & Norenzayan, 2010) will not yield a comprehensive understanding of human causal cognition. Even the assumption that WEIRD people are all alike, or that the same methods and body of work used to describe cross-cultural differences may be useful in il­ luminating intra-cultural differences, is misleading (Romney, Weller, & Batchelder, 1986). We also need to refocus our perspective on core research questions that have remained unaddressed and therefore remain unanswered. To begin with the obvious, we need to ask for the culture-specific aspects of content, processing, and context involved in causal cognition, and we need to go beyond simply documenting cultural differences to strive for explanations of how people’s cultural background constitutes what we observe: • What do people learn about general characteristics of causality as part of their cul­ tural heritage, and which concrete concepts and models do they acquire that may shape their perception and reflections? • How does culture affect the constitution of (heterogeneous) concept pools and the selection between competing concepts, and how are alternative orientations acquired and coordinated in multicultural contexts? • How do shifting domain boundaries and extensive cross-domain interactions come to bear on causal cognition across cultures? • Do cultural and linguistic influences already affect causal perception and learning, or are they restricted to subsequent processes of explanation, reasoning, and problemsolving? • Which factors contribute to the emphasis that people may place on the understand­ ing of causes and reasons (relative to an appraisal of effects and consequences)? • When, why, and how do people search for additional information? To what extent are the attribution tendencies and reasoning biases observed in previous studies a product of our cognitive toolkit, and to what extent do they arise from learning and habitua­ tion? • And to what extent may cultural background and context modulate agency constru­ al? Culture is not simply a dichotomous variable that may explain why two groups of people differ in (p. 733) some respect. It is a constitutive part of our cognitive, social, and materi­ al world—as essential, and often as oblivious to us, as the oxygen we are breathing. This is why all causal cognition is ultimately cultural; the only possible justification for relegat­ ing its treatise to a single chapter of this volume is that we still know relatively little about it.

Page 24 of 39

Causal Cognition and Culture If culture is a constitutive factor in causal cognition, then research strategies based on the idea that culture can be seen as permitting modest variation on a universal causal reasoning module are doomed to fail. There may indeed be aspects of causal cognition that are universal—to humans at least (and perhaps to our primate relatives and/or other social species such as dolphins or crows)—but these aspects can only be appreciated as universal if we have considered in our research designs the possibility that they are not (Bender & Beller, 2016). An appreciation of the foundational role that culture plays in human cognition will help us to identify our cultural blind spots. Some of these have prevented attribution theorists from discovering that agency may be assigned to entities other than just individuals. These blind spots are largely generated by the “home-field disadvantages,” the most rele­ vant of which is the tendency to leave one’s own culture unmarked, taking it as the stan­ dard from which others deviate (Medin, Bennis, & Chandler, 2010). To overcome the home-field disadvantages, we need not only question our cultural presumptions, but also consider the potential for within-group variation, in contrast to between-group variation. A second type of blind spot is that generated by the causal asymmetry bias. The asymmet­ ric assignment of causal roles leads us (as people) to perceive, understand, and describe interactions not as symmetric relations, but as “relations between doer and doneto” (White, 2006, p. 143). If one entity is regarded as causative, its importance will be overestimated at the expense of the other. Yet, focusing on single factors as critical may impair problem-solving for systems with any complexity—for instance, with regard to technical malfunctioning (for severe cases, see Dörner, 1996), ecosystem management, or social conflicts. This asymmetry also biases the methods with which we (as scientists) in­ vestigate causal understanding, the causal inferences we draw from our findings, and, more important even, the questions we ask in the first place (White, 2006, p. 144; and see Bender & Beller, 2011). A third type of blind spot has to do with our scientific traditions, such as the distinction in content versus processing, reflected in a division of labor between anthropology and its search for cultural variability, on the one hand, and psychology and its focus on universal principles and mechanisms, on the other (D’Andrade, 1981). The very ways in which our subfields of science are organized may reflect the cultural history of Western science, and different histories might well lead to different organizational schemes (e.g., Cajete, 1999). This partitioning of content and process between anthropology and psychology has not only led to a fragmentation of research approaches and findings, but has also obstructed a perspective on content and processes as intricately linked and as affecting each other. While recent years have seen a revision of this distinction as neither reasonable nor ten­ able (Bang, Medin, & Atran, 2007; Barrett, Stich, & Laurence, 2012; Kitayama & Uskul, 2011; Medin and Atran, 2004), it takes much longer to overcome the scientific boundaries (Bender, Beller, & Nersessian, 2015; Bender, Hutchins, & Medin, 2010; Bloch, 2012; and see the debate in Bender, Beller, & Medin, 2012). For real scientific progress in this field, however, this is a challenge we need to rise to.

Page 25 of 39

Causal Cognition and Culture

Conclusions Causality is a core concept of human cognition, perhaps even the core concept. As we have seen, culture plays a critical role in causal cognition on various levels and in all do­ mains, albeit with graded intensity. Importantly, culture affects not only how, but even whether people engage in causal explanations, by defining the settings in which causal cognition occurs, the manner in which potential factors are pondered, and the choices for highlighting one of several potential causes or for expressing them linguistically. Our cultural background affects which situations we perceive as problems, how we com­ municate with respect to them, and how we try to solve them. In an increasingly global­ ized world, and with a diversity of stakeholders involved, cultural differences in how we perceive, communicate, and handle causal relations may have critically important impli­ cations. Given that questions about causes and reasons loom large in our lives—and are answered in such diverse ways across cultures—a systematic and thorough investigation into the cultural dimension of causal cognition is long overdue.

References Ahn, W.-k., Kalish, C. W., Gelman, S. A., Medin, D. L., Luhmann, C., Atran, S., Coley, J. D., & Shafto, P. (2001). (p. 734) Why essences are essential in the psychology of concepts. Cognition, 82, 59–69. Alicke, M. D., Mandel, D. R., Hilton, D. J., Gerstenberg, T., & Lagnado, D. A. (2015). Causal conceptions in social explanation and moral evaluation: A historical tour. Perspec­ tives on Psychological Science, 10, 790–812. Astuti, R., & Bloch, M. (2015). The causal cognition of wrong doing: Incest, intentionality and morality. Frontiers in Psychology, 6(136), 1–7. Astuti, R., & Harris, P. L. (2008). Understanding mortality and the life of the ancestors in rural Madagascar. Cognitive Science, 32, 713–740. Astuti, R., Solomon, G. E. A., & Carey, S. (2004). Constraints on conceptual development: A case study of the acquisition of folkbiological and folksociological knowledge in Mada­ gascar. Boston; Oxford: Blackwell. Atran, S. (1989). Basic conceptual domains. Mind & Language, 4(1–2), 7–16. Atran, S. (1993). Cognitive foundations of natural history: Towards an anthropology of sci­ ence. Cambridge: Cambridge University Press. Atran, S., & Medin, D. L. (2008). The native mind and the cultural construction of nature. Cambridge, MA: MIT Press.

Page 26 of 39

Causal Cognition and Culture Atran, S., Medin, D. L., Lynch, E., Vapnarsky, V., Ucan Ek’, E., & Sousa, P. (2001). Folkbiol­ ogy doesn’t come from folkpsychology: Evidence from Yukatek Maya in cross-cultural per­ spective. Journal of Cognition and Culture, 1, 3–42. Au, T. K.-f., & Romo, L. F. (1999). Mechanical causality in children’s “folkbiology.” In D. L. Medin & S. Atran (Eds.), Folkbiology (pp. 355–401). Cambridge, MA: MIT Press. Bailenson, J. N., Shum, M. S., Atran, S., Medin, D. L., & Coley, J. D. (2002). A bird’s eye view: Biological categorization and reasoning within and across cultures. Cognition, 84, 1–53. Bang, M., Medin, D. L., & Atran, S. (2007). Cultural mosaics and mental models of nature. Proceedings of the National Academy of Sciences, 104, 13868–13874. Barrett, H. C., & Behne, T. (2005). Children’s understanding of death as the cessation of agency. Cognition, 96, 93–108. Barrett, H. C., Stich, S., & Laurence, S. (2012). Should the study of Homo sapiens be part of cognitive science? Topics in Cognitive Science, 4, 379–386. Bastian, B., & Haslam, N. (2006). Psychological essentialism and stereotype endorse­ ment. Journal of Experimental Social Psychology, 42, 228–235. Beer, B., & Bender, A. (2015). Causal inferences about others’ behavior among the Wampar, Papua New Guinea—and why they are hard to elicit. Frontiers in Psychology, 6(128), 1–14. Beller, S., Bender, A., & Song, J. (2009a). Conditional promises and threats in Germany, China, and Tonga: Cognition and emotion. Journal of Cognition and Culture, 9, 115–139. Beller, S., Bender, A., & Song, J. (2009b). Weighing up physical causes: Effects of culture, linguistic cues, and content. Journal of Cognition and Culture, 9, 347–365. Beller, S., & Kuhnmünch, G. (2007). What causal conditional reasoning tells us about people’s understanding of causality. Thinking and Reasoning, 13, 426–460. Bender, A., & Beller, S. (2011). Causal asymmetry across cultures: Assigning causal roles in symmetric physical settings. Frontiers in Psychology, 2(231), 1–10. Bender, A., & Beller, S. (2016). Probing the cultural constitution of causal cognition: A re­ search program. Frontiers in Psychology, 7, 245. Bender, A., & Beller, S. (submitted). Agents and patients in physical settings: Is the as­ signment of causal roles affected by linguistic cues? Bender, A., Beller, S., & Medin, D. L. (Eds.) (2012). Does cognitive science need anthro­ pology? Topics in Cognitive Science, 4(3), 342–466.

Page 27 of 39

Causal Cognition and Culture Bender, A., Beller, S., & Nersessian, N. J. (2015). Diversity as asset. Topics in Cognitive Science, 7, 677–688. Bender, A., Hutchins, E., & Medin, D. L. (2010). Anthropology in cognitive science. Topics in Cognitive Science, 2, 374–385. Bender, A., Spada, H., Rothe-Wulf, A., Traber, S., & Rauss, K. (2012). Anger elicitation in Tonga and Germany: The impact of culture on cognitive determinants of emotions. Fron­ tiers in Psychology, 3(435), 1–20. Bennardo, G. (2014). The fundamental role of causal models in cultural models of nature. Frontiers in Psychology, 5, 1140. Bering, J. M. (2006). The folk psychology of souls. Behavioral and Brain Sciences, 29, 453– 498. Bering, J. M., Hernández-Blasi, C., & Bjorklund, D. F. (2005). The development of “after­ life” beliefs in secularly and religiously schooled children. British Journal of Developmen­ tal Psychology, 23, 587–607. Berlin, B. (1972). Speculations on the growth of ethnobotanical nomenclature. Language in Society, 1, 51–86. Berlin, B. (1992). Ethnobiological classification: Principles of categorization of plants and animals in traditional societies. Princeton, NJ: Princeton University Press. Berlin, B., Breedlove, D. E., & Raven, P. H. (1973). General principles of classification and nomenclature in folk biology. American Anthropologist, 75, 214–242. Bird-David, N. (1990). The giving environment: Another perspective on the economic sys­ tem of gatherer-hunters. Current Anthropology, 31(2), 189–196. Bird-David, N. (2008). Relational epistemology, immediacy, and conservation: Or, what do the Nayaka try to conserve? Journal for the Study of Religion, Nature & Culture, 2(1), 55– 73. Bloch, M. (2012). Anthropology and the cognitive challenge. Cambridge: Cambridge Uni­ versity Press. Bödeker, K. (2006). Die Entwicklung intuitiven physikalischen Denkens im Kulturvergle­ ich: Bewegung, Kraft, Leben, Gewicht [The development of intuitive physical thinking in cultural comparison: motion, force, life, and weight]. Münster: Waxmann. Bohnemeyer, J., Enfield, N. J., Essegbey, J., & Kita, S. (2010). The macro-event property: The segmentation of causal chains. In J. Bohnemeyer & E. Pederson (Eds.), Event repre­ sentation in language (pp. 43–67). Cambridge: Cambridge University Press. Boster, J. S., & Johnson, J. C. (1989). Form or function: A comparison of expert and novice judgments of similarity among fish. American Anthropologist, 91, 866–889. Page 28 of 39

Causal Cognition and Culture Cajete, G. A. (1999). Igniting the sparkle: An indigenous science education model. Sky­ land, NC: Kivaki Press. Callaghan, T., Rochat, P., Lillard, A., Claux, M. L., Odden, H., Itakura, S., Tapanya, S., & Singh, S. (2005). Synchrony in the onset of mental-state reasoning: Evidence from five cultures. Psychological Science, 16, 378–384. Carey, S. (1985). Conceptual change in childhood. Cambridge, MA: Bradford. Carey, S. (2009). The origin of concepts. Oxford: Oxford University Press. Casimir, M. J. (Ed.) (2008). Culture and the changing environment: Uncertainty, cognition, and risk management in cross-cultural perspective. Oxford: Berghahn. (p. 735) Choi, I., Dalal, R., Kim-Prieto, C., & Park, H. (2003). Culture and judgement of causal rele­ vance. Journal of Personality and Social Psychology, 84, 46–59. Coley, J. D., Medin, D. L., & Atran, S. (1997). Does rank have its privilege? Inductive infer­ ences within folkbiological taxonomies. Cognition, 64, 73–112. D’Andrade, R. G. (1981). The cultural part of cognition. Cognitive Science, 5, 179–195. Danziger, E. (2006). The thought that counts: Interactional consequences of variation in cultural theories of meaning. In N. Enfield, J. Nicholas, & S. Levinson (Eds.), Roots of hu­ man sociality: Culture, cognition and interaction (pp. 259–278). Oxford: Berg Press. Danziger, E., & Rumsey, A. (2013). Introduction: From opacity to intersubjectivity across languages and cultures. Language and Communication, 33(3), 247–250. Dijksterhuis, A., Preston, J., Wegner, D. M., & Aarts, H. (2008). Effects of subliminal prim­ ing of self and God on self attribution of authorship for events. Journal of Experimental Social Psychology, 44, 2–9. Dörner, D. (1989/1996). The logic of failure: Recognizing and avoiding error in complex situations. New York: Basic Books. [Orig. 1989: Die Logik des Mißlingens. Reinbek: Rowohlt]. Duranti, A. (1994). From grammar to politics: Linguistic Anthropology in a Western Samoan village. Berkeley: University of California Press. Einhorn, H. J., & Hogarth, R. M. (1986). Judging probable cause. Psychological Bulletin, 99, 3–19. Ellen, R. (Ed.) (2006). Ethnobiology and the science of humankind. [Special Issue of the Journal of the Royal Anthropological Institute]. Oxford: Blackwell. Evans-Pritchard, E. E. (1937). Witchcraft, oracles and magic among the Azande. Oxford: Clarendon.

Page 29 of 39

Causal Cognition and Culture Fausey, C. M., & Boroditsky, L. (2011). Who dunnit? Cross-linguistic differences in eyewitness memory. Psychonomic Bulletin & Review, 18, 150–157. Fausey, C., Long, B., Inamori, A., & Boroditsky, L. (2010). Constructing agency: The role of language. Frontiers in Psychology, 1, 162. Firth, R. (1970). An analysis of mana: An empirical approach. In T. Harding & B. Wallace (Eds.), Cultures of the Pacific (pp. 316–333). New York: Free Press. Fiske, A. P. (2002). Using individualism and collectivism to compare cultures—A critique of the validity and measurement of the constructs: Comment on Oyserman et al. (2002). Psychological Bulletin, 128, 78–88. Froerer, P. (2007). Wrongdoing and retribution: Children’s conceptions of illness causality in a central Indian village. Anthropology & Medicine, 14, 321–333. Funke, J. (2014). Analysis of minimal complex systems and complex problem solving re­ quire different forms of causal cognition. Frontiers in Psychology, 5, 739. Gelman, S. A. (2003). The essential child: Origins of essentialism in everyday thought. Ox­ ford: Oxford University Press. Gelman, S. A., Chesnick, R. J., & Waxman, S. R. (2005). Mother–child conversations about pictures and objects: Referring to categories and individuals. Child Development, 76, 1129–1143. Gelman, S. A., & Legare, C. H. (2011). Concepts and folk theories. Annual Review of An­ thropology, 40, 379–98. Gerber, E. R. (1985). Rage and obligation: Samoan emotion in conflict. In G. M. White & J. Kirkpatrick (Eds.), Person, self and experience: Exploring Pacific ethnopsychologies (pp. 121–167). Berkeley: University of California Press. Gil-White, F. (2001). Are ethnic groups biological ‘species’ to the human brain? Essential­ ism in our cognition of some social categories. Current Anthropology, 42, 515–554. Goldberg, R. F., & Thompson-Schill, S. L. (2009). Developmental “roots” in mature biolog­ ical knowledge. Psychological Science, 20, 480–487. Goldin-Meadow, S. (2003). Thought before language: Do we think ergative? In D. Gentner & S. Goldin-Meadow (Eds.), Language in mind (pp. 493–522). Cambridge, MA: MIT Press. Goswami, U. (2002). Blackwell handbook of childhood cognitive development. Oxford: Blackwell. Güss, D. C., & Dörner, D. (2011). Cultural differences in dynamic decision-making strate­ gies in a non-linear, time-delayed task. Cognitive Systems Research, 12, 365–376.

Page 30 of 39

Causal Cognition and Culture Güss, C. D., & Robinson, B. (2014). Predicted causality in decision making: The role of culture. Frontiers in Psychology, 5, 479. Haggard, P., & Tsakiris, M. (2009). The experience of agency: Feelings, judgments, and responsibility. Current Directions in Psychological Science, 18, 242–246. Hagmayer, Y., & Engelmann, N. (2014). Causal beliefs about depression in different cul­ tural groups: What do cognitive psychological theories of causal learning and reasoning predict? Frontiers in Psychology, 5, 1303. Hardin, G. (1968). The tragedy of the commons. Science, 162, 1243–1248. Harvey, G. (2005). Animism: Respecting the living world. Kent Town, Australia: Wakefield Press. Hatano, G., & Inagaki, K. (1999). A developmental perspective on informal biology. In D. L. Medin & S. Atran (Eds.), Folkbiology (pp. 321–354). Cambridge, MA: MIT Press. Heider, F., & Simmel, M. (1944). An experimental study of apparent behavior. American Journal of Psychology, 57, 243–259. Heit, E., & Rubinstein, J. (1994). Similarity and property effects in inductive reasoning. Journal of Experimental Psychology: Learning, Memory and Cognition, 20, 411–422. Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Be­ havioral and Brain Sciences, 33, 61–135. Hirschfeld, L. A. (2013). The myth of mentalizing and the primacy of folk sociology. In M. R. Banaji & S. A. Gelman (Eds.), Navigating the social world: What infants, children, and other species can teach us (pp. 101–106). Oxford: Oxford University Press. Hirschfeld, L. A., & Gelman, S. A. (Eds.) (1994). Mapping the mind: Domain-specificity in cognition and culture. Cambridge: Cambridge University Press. Hmelo-Silver, C. E., & Azevedo, R. (2006). Understanding complex systems: Some core challenges. Journal of the Learning Sciences, 15, 53–61. Hollan, D., & Throop, C. J. (2008). Whatever happened to empathy? Introduction. Ethos, 36, 385–401. Ikegami, Y. (1991). “Do-language” and “become-language”: Two contrasting types of lin­ guistic representation. In Y. Ikegami (Ed.), The empire of signs (pp. 285–326). Amster­ dam; Philadelphia: John Benjamins. Iliev, R., & ojalehto, b. (2015). Bringing history back to culture: On the missing diachronic component in the research on culture and cognition. Frontiers in Psychology, 6, 716. Inagaki, K., & Hatano, G. (2002). Young children’s naïve thinking about the biological world. New York; Brighton: Psychology Press. (p. 736) Page 31 of 39

Causal Cognition and Culture Inagaki, K., & Hatano, G. (2004). Vitalistic causality in young children’s naive biology. Trends in Cognitive Sciences, 8, 356–362. Inagaki, K., & Hatano, G. (2006). Young children’s conception of the biological world. Cur­ rent Directions in Psychological Science, 15, 177–181. Kanero, J., Hirsh-Pasek, K., & Golinkoff, R. M. (2016). Can a microwave heat up coffee? How English- and Japanese-speaking children choose subjects in lexical causative sen­ tences. Journal of Child Language, 43(5), 993–1019. Keil, F. C. (1994). The birth and nurturance of concepts by domains: The origins of con­ cepts of living things. In L. A. Hirschfeld & S. A. Gelman (Eds.), Mapping the mind: Do­ main-specificity in cognition and culture (pp. 234–254). Cambridge: Cambridge University Press. Keil, F. C. (2003). Folkscience: Coarse interpretations of a complex reality. Trends in Cog­ nitive Sciences, 7, 368–373. Keil, F. C. (2007). Biology and beyond: Domain specificity in a broader developmental con­ text. Human Development, 50, 31–38. Kempton, W. M. (1986). Two theories of home heat control. Cognitive Science, 10, 75–90. Kempton, W. M., Boster, J. S., & Hartley, J. A. (1995). Environmental values in American culture. Cambridge, MA: MIT Press. Kitayama, S., & Uskul, A. (2011). Culture, mind, and the brain: Current evidence and fu­ ture directions. Annual Review in Psychology, 62, 419–449. Knight, N. (2008). Yukatek Maya children’s attributions of belief to natural and non-natur­ al entities. Journal of Cognition and Culture, 8, 235–243. Knight, N., Sousa, P., Barrett, J. L., & Atran, S. (2004). Children’s attributions of beliefs to humans and God: Cross-cultural evidence. Cognitive Science, 28, 117–126. Kohn, E. (2013). How forests think: Toward an anthropology beyond the human. Berkeley: University of California Press. Kuhnmünch, G., & Beller, S. (2005). Distinguishing between causes and enabling condi­ tions—through mental models or linguistic cues? Cognitive Science, 29, 1077–1090. Lawson, R. (2006). The science of cycology: Failures to understand how everyday objects work. Memory & Cognition, 34, 1667–1675. Legare, C. H., Evans, E. M., Rosengren, K. S., & Harris, P. L. (2012). The coexistence of natural and supernatural explanations across cultures and development. Child Develop­ ment, 83(3), 779–793.

Page 32 of 39

Causal Cognition and Culture Legare, C. H., & Gelman, S. A. (2008). Bewitchment, biology, or both: The co-existence of natural and supernatural explanatory frameworks across development. Cognitive Science, 32, 607–642. Le Guen, O., Iliev, R., Lois, X., Atran, S., & Medin, D. L. (2013). A garden experiment re­ visited: Inter-generational change in environmental perception and management of the Maya Lowlands, Guatemala. Journal of the Royal Anthropological Institute, 19(4), 771– 794. Leslie, A. M. (1982). The perception of causality in infants. Perception, 11(2), 173–186. Leslie, A. M. (1995). A theory of agency. In D. Sperber, D. Premack, & A. J. Premack (Eds.), Causal cognition (pp. 121–141). Oxford: Clarendon Press. Lewin, K. (1935). A dynamic theory of personality. New York: McGraw-Hill. Lien, Y., & Cheng, P. W. (2000). Distinguishing genuine from spurious causes: A coherence hypothesis. Cognitive Psychology, 40, 87–137. Lillard, A. (1998). Ethnopsychologies: Cultural variations in theories of mind. Psychologi­ cal Bulletin, 123, 3–46. Lindeman, M., & Aarnio, K. (2006). Superstitious, magical, and paranormal beliefs: An in­ tegrative model. Journal of Research in Personality, 41, 731–744. Liu, D., Wellman, H. M., Tardif, T., & Sabbagh, M. A. (2008). Theory of mind development in Chinese children: A meta-analysis of false-belief understanding across cultures and lan­ guages. Developmental Psychology, 44, 523–531. Loftus, E. F., & Palmer, J. C. (1974). Reconstruction of automobile destruction: An exam­ ple of the interaction between language and memory. Journal of Verbal Learning and Ver­ bal Behavior, 13(5), 585–589. López, A., Atran, S., Coley, J. D., Medin, D. L., & Smith, E. E. (1997). The tree of life: Uni­ versal and cultural features of folkbiological taxonomies and inductions. Cognitive Psy­ chology, 32, 251–295. Lynch, E., & Medin, D. (2006). Explanatory models of illness: A study of within-culture variation. Cognitive Psychology, 53, 285–309. Lynch, E. B., Coley, J. D., & Medin, D. L. (2000). Tall is typical: Central tendency, ideal di­ mensions, and graded category structure among tree experts and novices. Memory & Cognition, 28, 41–50. Luhrmann, T. M., Padmavati, R., Tharoor, H., & Osei, A. (2015). Hearing voices in differ­ ent cultures: A social kindling hypothesis. Topics in Cognitive Science, 7, 646–663. Maddux, W. W., & Yuki, M. (2006). The “ripple effect”: Cultural differences in perceptions of the consequences of events. Personality and Social Psychology Bulletin, 32, 669–683. Page 33 of 39

Causal Cognition and Culture Malinowski, B. (1948). Magic, science and religion. New York: Free Press. Malle, B. F. (2006). The actor-observer asymmetry in attribution: A (surprising) metaanalysis. Psychological Bulletin, 132, 895–919. Markus, H. R., & Kitayama, S. (1991). Culture and the self: Implications for cognition, emotion, and motivation. Psychological Review, 98, 224–253. Masuda, T., & Nisbett, R. E. (2001). Attending holistically versus analytically: Comparing the context sensitivity of Japanese and Americans. Journal of Personality and Social Psy­ chology, 81, 922–934. Mayrhofer, R., & Waldmann, M. R. (2014). Indicators of causal agency in physical interac­ tions: The role of the prior context. Cognition, 132, 485–490. McCay, B. J., & Acheson, J. M. (Eds.) (1987). The question of the commons. Tucson: Uni­ versity of Arizona Press. Medin, D. L., & Atran, S. (Eds.) (1999). Folkbiology. Cambridge, MA: MIT Press. Medin, D. L., & Atran, S. (2004). The native mind: Biological categorization, reasoning and decision making in development across cultures. Psychological Review, 111, 960–983. Medin, D. L., & Bang, M. (2014). Who’s asking? Native science, Western science, and sci­ ence education. Cambridge, MA: MIT Press. Medin, D. L., Bennis, W. M., & Chandler, M. (2010). Culture and the home-field disadvan­ tage. Perspectives on Psychological Science, 5, 708–713. Medin, D. L., Coley, J. D., Storms, G., & Hayes, B. L. (2003). A relevance theory of induc­ tion. Psychonomic Bulletin & Review, 10, 517–532. Medin, D. L., ojalehto, b., Marin, A., & Bang, M. (2013). Culture and epistemologies: Putting culture back into the ecosystem. (p. 737) In M. Gelfand, C. Chiu, & Y.-Y. Hong (Eds.), Advances in culture and psychology (pp. 177–217). Oxford: Oxford University Press. Medin, D. L., & Ortony, A. (1989). Psychological essentialism. In I. S. Vosniadou & A. Ortony (Eds.), Similarity and analogical reasoning (pp. 179–196). New York: Cambridge University Press. Medin, D. L., Ross, N. O., Atran, S., Cox, D., Coley, J., Proffitt, J., & Blok, S. (2006). Folk bi­ ology of freshwater fish. Cognition, 99, 237–273. Menon, T., Morris, M. W., Chiu, C.-y., & Hong, Y.-y. (1999). Culture and the construal of agency: Attribution to individual versus group dispositions. Journal of Personality and So­ cial Psychology, 76, 701–717. Michotte, A. (1963). The perception of causality. New York: Basic Books. Page 34 of 39

Causal Cognition and Culture Miller, J. G. (1984). Culture and the development of everyday social explanation. Journal of Personality and Social Psychology, 46, 961–978. Morris, M. W., Menon, T., & Ames, D. R. (2001). Culturally conferred conceptions of agency: A key to social perception of persons, groups, and other actors. Personality and Social Psychology Review, 5, 169–182. Morris, M. W., Nisbett, R. E., & Peng, K. (1995). Causal attribution across domains and cultures. In D. Sperber, D. Premack, & A. J. Premack (Eds), Causal cognition (pp. 577– 612). Oxford: Clarendon Press. Morris, M. W., & Peng, K. (1994). Culture and cause: American and Chinese attributions for social and physical events. Journal of Personality and Social Psychology, 67, 949–971. Narby, J. (2006). Intelligence in nature: An inquiry into knowledge. New York: Penguin. Needham, J. (1954). Science and civilization in China. Cambridge: Cambridge University Press. Nguyen, S. P., & Rosengren, K. S. (2004). Causal reasoning about illness: A comparison between European and Vietnamese-American children. Journal of Cognition and Culture, 4, 51–78. Nisbett, R. E., Peng, K., Choi, I., & Norenzayan, A. (2001). Culture and systems of thought: Holistic versus analytic cognition. Psychological Review, 108, 291–310. Norenzayan, A., & Hansen, I. G. (2006). Belief in supernatural agents in the face of death. Personality and Social Psychology Bulletin, 32, 174–187. Norenzayan, A., & Nisbett, R. E. (2000). Culture and causal cognition. Current Directions in Psychological Research, 9, 132–135. O’Barr, W. (2001). Culture and causality: Non-western systems of explanation. Law and Contemporary Problems, 64, 317–323. ojalehto, b., & Medin, D. L. (2015). Perspectives on culture and concepts. Annual Review of Psychology, 66, 249–275. ojalehto, b., Medin, D. L., Horton, W. S., Garcia, S., & Kays, E. (2015). Seeing cooperation or competition: Ecological interactions in cultural perspectives. Topics in Cognitive Science, 7, 624–645. ojalehto, b., Waxman, S. R., & Medin, D. L. (2013). Teleological reasoning about nature: Intentional design or relational perspectives? Trends in Cognitive Sciences, 17, 166–171. Olson, I. D. (2013). Cultural differences between Favela and Asfalto in complex systems thinking. Journal of Cognition and Culture, 13, 145–157.

Page 35 of 39

Causal Cognition and Culture Oyserman, D., Coon, H. M., & Kemmelmeier, M. (2002). Rethinking individualism and col­ lectivism: Evaluation of theoretical assumptions and meta-analysis. Psychological Bulletin, 128, 3–72. Oyserman, D., & Lee, S. W.-S. (2007). Priming ‘culture’: Culture as situated cognition. In S. Kitayama & D. Cohen (Eds.), Handbook of cultural psychology (pp. 255–279). New York: Guilford Press. Pauen, S. (2000). Early differentiation within the animate domain: Are humans something special? Journal of Experimental Child Psychology, 75, 134–151. Peng, K., Ames, D., & Knowles, E. D. (2001). Culture and human inference: Perspectives from three traditions. In D. R. Matsumoto (Ed.), Handbook of culture and psychology (pp. 245–264). New York: Oxford University Press. Peng, K., & Knowles, E. D. (2003). Culture, education, and the attribution of physical causality. Personality and Social Psychology Bulletin, 29, 1272–1284. Pickering, M. J., & Majid, A. (2007). What are implicit causality and consequentiality? Language and Cognitive Processes, 22, 780–788. Proffitt, J. B., Coley, J. D., & Medin, D. L. (2000). Expertise and category-based induction. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 811–828. Pyers, J. E., & Senghas, A. (2009). Language promotes false-belief understanding: Evi­ dence from learners of a new sign language. Psychological Science, 20, 805–812. Rice, T. (2003). Believe it or not: Religious and other paranormal beliefs in the United States. Journal for the Scientific Study of Religion, 42, 95–106. Robbins, J., & Rumsey, A. (2008). Introduction: Cultural and linguistic anthropology and the opacity of other minds. Anthropological Quarterly, 81, 407–420. Romney, A. K., Weller, S. C., & Batchelder, W. H. (1986). Culture as consensus: A theory of culture and informant accuracy. American Anthropologist, 88, 313–338. Rosch, E. (1975). Cognitive reference points. Cognitive Psychology, 7, 532–547. Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M., & Boyes-Braem, P. (1976). Basic ob­ jects in natural categories. Cognitive Psychology, 8, 382–439. Ross, N. O., Medin, D. L., Coley, J. D., & Atran, S. (2003). Cultural and experiential differ­ ences in the development of folkbiological induction. Cognitive Development, 18, 25–47. Ross, N. O., Medin, D. L., & Cox, D. (2007). Epistemological models and culture conflict: Menominee and Euro-American hunters in Wisconsin. Ethos, 35(4), 478–515.

Page 36 of 39

Causal Cognition and Culture Rothe-Wulf, A. (2014). Verwobene Welt: Zur Rekonstruktion der Domänengrenzen von Kausalkonzepten in der Vorstellungswelt der Tonganer (unpublished MA thesis). Depart­ ment of Anthropology, University of Freiburg, Germany. Schütz-Bosbach, S., & Prinz, W. (2007). Perceptual resonance: Action-induced modulation of perception. Trends in Cognitive Sciences, 11, 349–355. Seligman, R. (2010). The unmaking and making of self: Embodied suffering and mindbody healing in Brazilian Candomble. Ethos, 38, 297–320. Shore, B. (1982). Sala’ilua: A Samoan mystery. New York: Columbia University Press. Shore, B. (1989). Mana and tapu. In A. Howard & R. Borofsky (Eds.), Developments in Polynesian ethnology (pp. 137–173). Honolulu: University of Hawai’i Press. Sousa, P., Atran, S., & Medin, D. L. (2002). Essentialism and folkbiology: Evidence from Brazil. Journal of Cognition and Culture, 2(3), 195–223. (p. 738) Spelke, E. S., & Kinzler, K. D. (2007). Core knowledge. Developmental Science, 10, 89–96. Spelke, E., Bernier, E. P., & Skerry, A. E. (2013). Core social cognition. In M. R. Banaji & S. A. Gelman (Eds.), Navigating the social world: What infants, children, and other species can teach us (pp. 11–16). Oxford: Oxford University Press. Sperber, D., & Hirschfeld, L. A. (2004). The cognitive foundations of cultural stability and diversity. Trends in Cognitive Sciences, 8, 40–46. Strohschneider, S., & Güss, D. (1998). Planning and problem solving differences between Brazilian and German students. Journal of Cross-Cultural Psychology, 29, 695–716. Takano, Y., & Osaka, E. (1999). An unsupported common view: Comparing Japan and the U.S. on individualism/collectivism. Asian Journal of Social Psychology, 2(3), 311–341. Throop, C. J. (2008). On the problem of empathy: The case of Yap, Federated States of Mi­ cronesia. Ethos, 36, 402–426. Triandis, H. C. (1995). Individualism and collectivism. Boulder, CO: Westview Press. Tucker, B., Tsiazonera, Tombo, J., Hajasoa, P., & Nagnisaha, C. (2015). Ecological and cos­ mological coexistence thinking in a hypervariable environment: Causal models of econom­ ic success and failure among farmers, foragers, and fishermen of southwestern Madagas­ car. Frontiers in Psychology, 6, 1533. Unsworth, S. J., Levin, W., Bang, M., Washinawatok, K., Waxman, S., & Medin, D. L. (2012). Cultural differences in children’s ecological reasoning and psychological close­ ness to nature: Evidence from Menominee and European American children. Journal of Cognition and Culture, 12, 17–29.

Page 37 of 39

Causal Cognition and Culture Vinden, P. G. (1996). Junín Quechua children’s understanding of mind. Child Development, 67, 1707–1716. Viscusi, W. K., & Zeckhauser, R. J. (2006). The perception and valuation of the risks of cli­ mate change: A rational and behavioral blend. Climate Change, 77, 151–177. Viveiros de Castro, E. (2004). Perspectival anthropology and the method of controlled equivocation. Tipití: Journal of the Society for the Anthropology of Lowland South Ameri­ ca, 2, 3–22. Waldmann, M. R., Hagmayer, Y., & Blaisdell, A. P. (2006). Beyond the information given causal models in learning and reasoning. Current Directions in Psychological Science, 15, 307–311. Walker, S. J. (1992). Supernatural beliefs, natural kinds, and conceptual structure. Memo­ ry & Cognition, 20, 655–662. Wassmann, J., Träuble, B., & Funke, J. (Eds.) (2013). Theory of mind in the Pacific: Rea­ soning across cultures. Heidelberg: Universitätsverlag Winter. Watson-Jones, R. E., Busch, J. T. A., & Legare, C. H. (2015). Interdisciplinary and crosscultural perspectives on explanatory coherence. Topics in Cognitive Science, 7, 611–623. Waxman, S. R., & Gelman, S. A. (2010). Different kinds of concepts and different kinds of words: What words do for human cognition. In D. Mareschal, P. C. Quinn, & S. E. G. Lea (Eds.), The making of human concepts (pp. 99–129). New York: Oxford University Press. Waxman, S., Medin, D. L., & Ross, N. O. (2007). Folkbiological reasoning from a cross-cul­ tural developmental perspective: Early essentialist notions are shaped by cultural beliefs. Developmental Psychology, 43(2), 294–308. Wegner, D. M. (2002). The illusion of conscious will. Cambridge, MA: MIT Press. Wegner, D. M. (2003). The mind’s best trick: How we experience conscious will. Trends in Cognitive Sciences, 7, 65–69. Wegner, D. M., Sparrow, B., & Winerman, L. (2004). Vicarious agency: Experiencing con­ trol over the movements of others. Journal of Personality and Social Psychology, 86, 838– 848. Wellman, H. M., Cross, D., & Watson, J. (2001). Meta-analysis of theory-of-mind develop­ ment: The truth about false belief. Child Development, 72(3), 655–684. White, G. M., & Kirkpatrick, J. (Eds.) (1985). Person, self and experience: Exploring Pacif­ ic ethnopsychologies. Berkeley: University of California Press. White, P.A. (1997). Naive ecology: Causal judgments about a simple ecosystem. British Journal of Psychology, 88, 219–233. Page 38 of 39

Causal Cognition and Culture White, P. A. (1999). The dissipation effect: A naive model of causal interactions in complex physical systems. American Journal of Psychology, 112, 331–364. White, P. A. (2000). Naive analysis of food web dynamics: A study of causal judgment about complex physical systems. Cognitive Science, 24, 605–650. White, P. A. (2006). The causal asymmetry. Psychological Review, 113, 132–147. White, P. A. (2007). Impressions of force in visual perception of collision events: A test of the causal asymmetry hypothesis. Psychonomic Bulletin & Review, 14(4), 647–652. Whitehouse, H., & McCauley, R. N. (Ed.) (2005). Mind and religion: Psychological and cognitive foundations of religiosity. Walnut Creek, CA: Altamira. Widlok, T. (2014). Agency, time and causality. Frontiers in Psychology, 5, 1264. Wolff, P., Jeon, G. H., & Li, Y. (2009). Causers in English, Korean, and Chinese and the in­ dividuation of events. Language and Cognition, 1, 167–196. Wolff, P., Klettke, B., Ventura, T., & Song, G. (2005). Expressing causation in English and other languages. In W. Ahn et al. (Eds.), Categorization inside and outside of the laborato­ ry (pp. 29–48). Washington: APA. Woolfe, T., Want, S. C., & Siegal, M. (2002). Signposts to development: Theory of mind in deaf children. Child Development, 73, 768–778. Wu, S., & Keysar, B. (2007). The effect of culture on perspective taking. Psychological Science, 18, 600–606.

Andrea Bender

Department of Psychosocial Science University of Bergen Bergen, Norway Sieghard Beller

Department of Psychosocial Science University of Bergen Bergen, Norway Douglas L. Medin

Department of Psychology Northwestern University Evanston, Illinois, USA

Page 39 of 39

Index

Index   The Oxford Handbook of Causal Reasoning Edited by Michael R. Waldmann Print Publication Date: Jun 2017 Subject: Psychology Online Publication Date: May 2017

(p. 739)

Index

A abduction, 181–84 Abelson, R. P., 284, 669 abnormality, 608–9 abstract data, fitting, 331–32 abstract relations, 235–36 action(s) character evaluation and permissibility of, 591–94 consciousness and control of, 272–73 identification of, by inferring motive, 651–52 inferring required, from observation, 708–9 from movement to, 266–67 reasoning about, 705–6 reasoning about outcome of one's own, 706–8 representations of agents and their, 680–81 actions on objects hypothesis, 256–59 Adams, E., 314 adaptive causal representation, 67–68 agency assignment of, in causal attribution, 723 of goal-directed actions, 271–72 in psychological/social domain, 722–23 agents reasoning about actions of social, 705–6 representations of, 680–81 social, 705–6 Ahn, W., 123, 127, 129, 132–34, 138, 139, 141, 261, 262, 348, 358, 361, 362, 364–66, 399, 402, 405, 426, 427, 503, 604–11, 613, 615 Aitken, M. R. F., 17 Ajzen, I., 436 algorithmic solutions, 298–300 Ali, N., 340–42 Alice's Adventures in Wonderland (Lewis Carroll), 217 Page 1 of 42

Index Alicke, M. D., 584, 657 Alksnis, O., 333 ALLOW relations, 154 alternative causes, 387–88 alternative models, 403–4 alternative representations, 106–8 amplification, of movement, 252–54 Amsel, G., 256, 678–79 Amsterdam, A., 365–66, 421 analogical reasoning, 459–70 Bayesian model of causal inference by analogy and, 469–70 integrating causal models with, 465–69 representation of causal relationships, 464 in science education, 461–64 subprocesses of, 460–61 analogical reminding, 460–61 analytic knowledge, 76–77 Anderson, J. R., 45–46 Anderson, P. A., 668 animacy, early understanding of, 719 animates, in biological domain, 720–22 Anjum, R. L., 157–59 argument, 475–91 Bayesian approach to strength of, 482–83 for causes, 476–84 from cause to effect, 479–81 cognition and, 484–88 from consequences, 481–82 consequentialist, 485–86 from corpus analysis, 476–77 from effect to cause, 482 reasons vs. causes in, 475–76 research agenda for, 488–91 scheme-based approaches to, 477–79 Aristotle, 2, 4 artifacts, 358–61 artificial categories, 350–58 Asai, S., 702 Asher, N, 627–29 Asoko, H., 462 aspiration, causal invariance as, 71–74 assignment of agency in causal attribution, 723 of causal responsibility, 421–23 associations, as mental representations, 133–34 association theory, 53–54 associative learning, 13–26 causal judgment vs., 238 Page 2 of 42

Index causal learning as, 13–15 causal order and, 19–22 contiguity, contingency, and cue competition in, 15–17 selective conditioning in non-human animals in, 23–25 temporal contiguity in, 22–23 theories of, 17–19 assumption(s) of causal power, 437–38 default, 71–72 implicit covariance, 655–56 Astuti, R., 731 asymmetry bias, causal, 729 Atran, S., 392 attention in multi-process account of causal learning, 58 as stimulus condition, 251–52 attribution automatic vs. controlled, 656–57 of causality, with legal reasoning, 581–86 implicit theories and, 728–29 See also causal attribution; dispositional attribution; social attribution attribution theory, 506–8 attributive inquiries, 667–68 Au, T. K.-F., 630 Austin, J. L., 626 automatic stereotyping and attribution, 656–57 Avrahami, J., 197 B Baetens, K., 648 Baetu, I., 140 Baeyens, F., 58–59 Bailenson, J., 132, 133 Baillargeon, R., 679, 681 Baird, G. L., 611 Baker, A. G., 41, 140 Baker, C. L., 536–39 Baldwin, D., 392 Band, G. P. H., 267 Barbería, I., 41 Barbey, A. K., 153 Bartalone, J., 662 basic causal relations, 170–73 Bates, L. A., 701 Battaglia, P. W., 518–19, 524–25 Baumrind, D., 137 Bayesian approach, to argument strength, 482–83 Bayesian model of causal inference by analogy, 469–70 Bayesian models, hierarchical, 116–17 Page 3 of 42

Index Bayes model, 434–36 Beach, L. R., 85 Bechlivanidis, C., 561 Bechtel, W., 427 Beck, S. R., 637 Beckers, T., 55, 58–59, 81–82, 98, 124, 708 (p. 740) Beebe, H., 2 Beesley, T., 14 Belanger, N., 679 beliefs, about causal structure, 94–95 Beller, S., 730 Bender, A., 730 Bennett, L., 611 Bennett, S., 321 Berry, D., 287 Bes, B., 106 bias(es) causal asymmetry, 729 causal relations as, 105–6 de-biasing tools, 47–48 density, 36–37 in social judgment, 657–58 biological bases, of mental disorders, 609–10, 614 biological domain, 720–22 biological essences, 720–22 Biro, S., 271 Bittner, M., 623 Blaisdell, A. P., 705, 708 blended concepts, 725–26 Bloch, M., 731 Bloom, L., 628 Bluemke, M., 194 Blum, B., 510 Bocklisch, F., 445 Boddez, Y., 58–59 Bödeker, K., 719, 726 Bodenhausen, G. V., 656–57 Bodner, R., 509 Bohr, Niels, 66 Bolles, R., 303 Bonawitz, E. B., 392, 416, 684, 688–93 Borsboom, D., 605 Bott, O., 631, 632 Botvinick, M., 299–300 Bouton, M., 303 Bowerman, M., 156 Boyle, Robert, 180 Bradner, A., 422 Page 4 of 42

Index Bramley, N. R., 94, 95, 449 Bräuer, J., 703–4 Bridgers, S., 124–25 bridging process, 528–29 Bright, A. K., 393, 396 Britton, B. K., 463 Broadbent, D. E., 287 Brown, R., 630 Brubaker, D. L., 704 Buehner, M. J., 22, 70, 71, 74, 426, 551–53 Bulloch, M. J., 368, 393 Burnett, R. C., 381–83, 386, 387 Burns, P., 555 but-for test, 581–83 extensions to, 571–73 imprecision in, 569–70 involvement in, 572 necessity in, 581–83 overdetermination in, 571 pre-emption in, 570–71 promiscuity in, 570 proof in, 569–70 structural model approach to, 572–73 sufficiency in, 581–83 Byrne, R. M. J., 309, 310, 312, 319, 333–35 Byrne paradigm, 337–38 C Cacchione, T., 702 Cafferty, T. P., 508 Call, J., 700–701, 703–5, 709, 710 Cándido, A., 119 Caño, A., 20–21 Caramazza, A., 630 carbon molecules, in causal invariance example, 72–73 Carey, S., 423, 424, 683, 684 Carroll, C. D., 71, 73–74, 120 Carroll, Lewis, 217 categorization, 347–72 as causal reasoning with inferential features, 366–70 concepts as causal models, 348–49 feature weights in, 361–66 with intrinsic features, 349–61 category-based induction, 377–408 category discovery, 398–402 concepts as causal models, 403–7 feature generalization, 388–98 feature prediction, 378–88 category discovery Page 5 of 42

Index category-based induction, 398–402 inter-category causal relations and, 399–402 inter-feature causal relations and, 398–99 category-to-category generalizations, 398 Catena, A., 37, 38, 46, 119 causal assertions, 176–77 causal asymmetry, 381–84 causal asymmetry bias, 729 causal attribution assignment of agency in, 723 attribution theory and, 506–8 implicit covariance assumptions and, 655–56 for launching and floating, 719–20 causal-based conditional probabilities, 378–85 causal-based feature generalization, 394–98 causal Bayes nets (CBNs), 328 causal interpretation and, 332 as causal structure, 86–88 conditional reasoning and, 335 as psychological model, 339–43 causal chains, 151–53 causal cognition causal reasoning and theories of, 5–6 cultural context of, 731–32 culture and content of, 718–27 causal conditionals, 307–22, 328–29 in causal argument, 484–85 close possibilities and, 313–16 experiments on probabilities of realistic, 319–22 imaging, 316–18 material conditional, 311–13 modus ponens, 309–11 probability conditional, 318–19 causal connection, 661 causal considerations, 496–99 causal constructions, 633–38 causal decision-making, 495–510 based on causal considerations, 496–99 engagement in, 499–502 future research on, 509–10 psychological theories of, 502–8 causal expected utilities, 501–2 causal explanation, 415–29 assignment of causal responsibility, 421–23 causal mechanisms and, 426–28 conversational processes in, 665–69 dispositional attributions vs., 655 of inconsistencies, 181–82 Page 6 of 42

Index inference to the best explanation and, 416–19 intuitive theories and, 535–39 process of explaining, 419–21 types of, 423–26 using Mill's method of difference, 653–54 causal exploration, 691–92 causal induction, 115–25 elemental, 98–99 functional form in, 119–24 hierarchical Bayesian models, 116–17 learning ontologies of, 117–18 models of, 123–24 plausible relations in, 118–19 prior knowledge in, 124–25 causal inference by analogy, 469–70 covariation in, 129 mechanisms in, 129–31 structural constraints in, 129–30 temporal cues in, 130–31 causal interpretation, 332 causal invariance, 65–82 in adaptive causal representation, 67–68 application of, 73–74 for binary cause-and-effect variables, 78–79 in causal power theory, 68–71 development of causal knowledge with, 71–72 domain-specific knowledge and, 76–77 empirical evidence on, 79–82 exit conditions for, 77–78 generalization and, 74–76 (p. 741) perceived causality and postulate of relativity, 65–67 as wishful thinking, 72–73 causality attribution of, in legal reasoning, 581–86 in concepts of mental disorders, 604–5 in control-based decision-making, 283–90 at discourse level, 624–28 implicit, in natural language, 628–33 intentionality vs., 559–60 laws of physics and, 260–62 in moral reasoning, 587–90 perceived, postulate of relativity and, 65–67 in planning, 283–85 temporal order and, 560–61 in verbal domain, 620–24 causal judgment, 29–48, 232–35 associative learning vs., 238 Page 7 of 42

Index cell weights and density bias in, 36–37 confirming vs. disconfirming evidence contrasts in, 42–44 counterfactual simulation model of, 529–30 de-biasing tools for, 47–48 Δp and causal power, 40–41 as dual-factor heuristic, 41–42 hypothesis dependence in, 37 intuitive physics and, 521–23 intuitive theories and, 525–34 learning-driven cues-to-causality approach to, 44–47 presentation format effects in, 37–38 probe question effects in, 39–40 rules of, 30–31 statistics as inputs for, 31–33 types of rules for, 33–36 causal knowledge causal invariance and, 71–72 decision-making and, 503 effects of, on DSM-based diagnosis, 610–11 effects of, on judgments of abnormality, 608–9 effects of, on mental disorder diagnosis, 605–11 partial observability and, 301 causal learning association theory of, 53–54 as associative learning, 13–15 complexity and, 724 multi-process account of, 57–59 process of explaining and, 419–21 causal mechanisms, 127–43 causal explanation and, 426–28 defining, 128 learning, 139–42 mental representations of, 133–39 using mechanism knowledge, 128–33 causal models causal model theory of choice and, 504 concepts as, 348–49, 403–7 integrating, with analogical reasoning, 465–69 causal model (CM) theory neural implications of, 230–31 as psychological theory, 224–25 causal narratives, 504–6 causal order, 19–22 causal power diagnostic reasoning under assumptions of, 437–38 Δp and, 40–41 causal power theory, 68–71 causal questions, 667–68 Page 8 of 42

Index causal reasoning, 1–7 alternative representations for, 106–9 of children, 690–91 cognitive functions and, 6 from covariation information, 681–82 descriptive and normative theories of, 2 disposition framework of, 157–59 domains of, 6 emergence of, from representations of agents and their actions, 680–81 emergence of, from representations of motion events, 678–80 evolutionary patterns and, 709–10 frameworks of, 3–5 with inferential features, 366–70 interdisciplinary approach to, 6–7 models of, 227–32 neural implications of models of, 227–32 prerequisites for, by non-human animals, 700–702 psychological theories of, 222–27 review of evidence on, 232–38 social, 236–38 taxonomic vs. ecological, 721–22 theories of, 5 theories of causal cognition and, 5–6 ubiquity of, 2–3 causal relations basic, 170–73 causal representation and, 283–84 control-based decision-making and, 285–87 deductions from, 177–79 implicit and explicit, 625–27 inductions of, 179–81 inter-category, 399–402 inter-feature, 398–99 meanings of, 170–76 planning behaviors and, 284–85 reasoning biased by, 105–6 causal relationships domain-specific mechanism information, 685–86, 688–89 learning, in early childhood, 685–91 prior knowledge about, 124–25 representations of, 464 statistical learning, 686–89 causal representations causal invariance in adaptive, 67–68 causal relations and, 283–84 control-based decision-making and, 285–90 planning behaviors and, 284–85 causal responsibility Page 9 of 42

Index assignment of, 421–23 from diagnostic probabilities to estimates of, 450–51 causal selection, 161–63, 661 causal simulation, 586–87 causal status effect, 605–8 causal strength controlling for other causes in inferences of, 99–100 feature prediction and, 384 between two variables, 98–99 causal structure, 85–111 causal Bayes networks, 86–88 challenges in, 109–10 diagnostic reasoning with uncertainty in, 438–40 future research on, 109–10 learning, 88–100 reasoning with, 100–109 temporal order and, 554–56 underlying decision problems, 500–501 causal thought, 487–88 causation common-sense, legal, and philosophical notions of, 568 counterfactual conditionals and, 173–74 dispositional theory of, 157–59 factual and legal, 568–69 general vs.singular, 201–2 in the law, 567–68 legal reasoning and, 566–67 meaning of conditionals and, 329–30 moral reasoning and, 566–67 by omission, 153–54 process vs. dependency accounts of, 525–34 causative lexical triggers, 633–38 cause(s) alternative, in feature prediction, 387–88 arguments for, 476–84 combinations of, 119–21 diagnostic reasoning with multiple, 439–46 from effects to, in causal argument, 482 empirical vs. analytic knowledge of, 76–77 enabling conditions vs., 161, 178–79 from, to effects, in causal argument, 479–81 (p. 742) as guide to space, 561–62 as guide to time, 557–62 from hidden states to latent, 301–2 how-cause, 531–33 reasons vs., in causal argument, 475–76 robust-cause, 533–34 selection of, from conditions, 665–67 Page 10 of 42

Index strength of, 121–23 sufficient-cause, 533 time as guide to, 549–57 whether-cause, 530–31 CBNs. See causal Bayes nets cell weights, 36–37 Chaigneau, S. E., 359, 360–61, 369–70 Channon, S., 583 character evaluation, 591–94 Chase, C. C., 427 Chater, N., 332 Chen, X., 418 Cheney, D. L., 706 Cheng, P. W., 40, 45, 69–74, 76, 118, 120, 137, 161, 179, 189–90, 400, 405, 406, 437, 451, 466 Chi, M. T. H., 420, 421, 427 children causal reasoning of, 690–91 causal relationships learned by, 685–91 studies of feature generalization with, 393–94 studies of feature weights with, 365–66 See also human development Chin-Parker, S., 422 Chiu, M.-H., 420 Chockler, H., 573, 586 Choi, H., 251–52 Churchland, Patricia, 427 classification as diagnostic reasoning, 366–68 as prospective reasoning, 368–70 Clayton, N. S., 704 Cleary, M. D., 506 Clement, C. A., 463 Clifford, P., 666 Clifton, C. J., 628 close possibilities, 313–16 CM theory. See causal model theory Cobos, P. L., 20–21, 61, 606 Coenen, A., 92, 93–94, 449, 510 cognition, 484–88. See also causal cognition cognitive functions, 6 cognitive neuroscience, 217–40 future research on, 239 neural implications of causal reasoning models, 227–32 psychological theories of causal reasoning, 222–27 review of evidence on causal reasoning, 232–38 cognitive processing impact of culture on, 727–31 reasoning as effortful, 55–56 Page 11 of 42

Index cognitive reasoning, 678–85 Cohen, L. B., 256, 678–79 coherence-based reasoning, 579–81 coherence effect, 350–61 Cole, S., 16 Coley, J. C., 425–26 Coley, J. D., 392, 393 Coll, R. K., 462 Collins, D. J., 39 common-cause structures, 444–46 common-effect structures, 441–44 common-sense notions, of causation, 568 complexity, causal learning and, 724 complex judgments, of abstract relations, 235–36 concepts as causal models, 348–49, 403–7 of mental disorders, causality in, 604–5 representations and blended, 725–26 conditional(s) causal (See causal conditionals) in causal constructions, 634–36 counterfactual (See counterfactual conditionals) material, 311–13 probability, 318–19 conditional probabilities, 378–85 conditional reasoning, 327–44 Byrne paradigm, 337–38 causal, 328–29 causal Bayes nets, 328, 332, 335, 339–43 Cummins paradigm, 335–37 discounting in, 335 fitting abstract data, 331–32 further research on, 338–39 meaning of conditionals and, 329–30 probabilized, 330–31 suppression effects, 333–35 conditions enabling, causes vs., 161, 178–79 selection of causes from, 665–67 stimulus, affecting visual causal impressions, 250–52 confirming evidence, 42–44 consciousness, goal-directed actions and, 272–73 consequences, argument from, 481–82 consequentialist argument, 485–86 considerations, causal, 496–99 constraints inductive, 689–90 structural, 129–30 Page 12 of 42

Index contiguity in associative learning, 15–17 temporal (See temporal contiguity) temporal contingency and, 549–52 See also spatiotemporal contiguity contingency(-ies) in associative learning, 15–17 joint frequencies as source of estimation of, 190 pseudocontingencies vs., 191–93 temporal, 549–52 temporal contiguity and, 549–52 contingency assessment as module of inductive cognition, 189–91 pseudocontingencies and, 193 contingency-based causal judgment rules, 31–33 contract cases, 665–67 control, causal representations and, 287–90 control-based decision-making, 279–91 causality in, 283–90 defining problems for, 280 future research on, 290–91 planning behaviors and, 280–82 public policy program example, 279–80 controlled stereotyping and attribution, 656–57 conversational processes, 665–69 Copernicus, N., 66 Copley, B., 155–57, 638, 640 Copley and Harley's force-theoretic model, 155–57 Cordier, F., 366, 606 Corner, A., 485–86 corpus analysis, 476–77 correlation, Bayesian approach and, 482–83 Corrigan, R., 630 counterfactual analysis, 590–91 counterfactual conditionals in causal constructions, 637 causation and, 173–74 in moral reasoning, 587–90 counterfactual questions, 668–69 counterfactual simulation model (CSM), 529–30 counter-normative alternative, 190–91 Courville, A. C., 304 covariance as causal inference, 129 dimensions of, 652–53 implicit covariance assumptions, 655–56 covariation information, 681–82 covers, predictive value of, 704–5 Page 13 of 42

Index Craik, Kenneth, 170 Cramer, A. O., 605 Craver, C. F., 427 Critcher, C. R., 592 Crupi, V., 448 CSM. See counterfactual simulation model cue competition, 15–17 cues, temporal, 130–31 cues-to-causality approach, 44–47 (p. 743) cultural evolution, 140–41 culture, 717–33 content and causal cognition, 718–27 context of causal cognition, 731–32 domain boundaries across, 726–27 future research on, 732–33 impact of, on cognitive processing, 727–31 indirect source of mechanism knowledge, and, 140–41 Cummins, D. D., 310, 333–34, 336–38, 342 Cummins paradigm, 335–37 Curtis, R., 462 D Danks, D., 510 Darley, J. M., 582–83 Darlow, A., 387–88, 438 Darwin, Charles, 645, 646 Davis, K. E., 651, 652, 655 Deacon, B. J., 603, 611–12 Deal, D. C., 560 de Almeida, R. G., 624 de-biasing tools, 47–48 de Blijzer, F., 628 Dechter, E., 524 decision-making causal (See causal decision-making) causal knowledge and, 503 control-based (See control-based decision-making) story model of, 504–6 deductions, from causal relations, 177–79 default assumption, 71–72 de Finetti, B., 318, 319 Dehghani, M., 428 De Houwer, J., 55, 80, 708 de Kwaadsteniet, L., 368, 613 delay, as stimulus condition, 250 De Leeuw, N., 420 Dennis, M. J., 261, 262, 361 density bias, 36–37 dependency accounts Page 14 of 42

Index bridging process and, 528–29 process vs., of causation, 525–34 dependency framework, of causal reasoning, 3–4 described situations, 552–54 description, causal invariance as, 73–74 descriptive research, on singular causation, 209–12 descriptive theories, of causal reasoning, 2 Desrochers, S., 256, 679 developmental studies of feature generalization, 393–94 of feature weights, 365–66 diagnosis, of mental disorders, 605–11 diagnostic hypothesis generation, 451–52 diagnostic inference, 444–46 diagnostic probabilities, 450–51 diagnostic reasoning, 433–53 classification as, 366–68 explaining away, 441–44 future research on, 449–52 power PC theory, 437–38 quantifying diagnostic value of information, 446–49 sequential, 444–46 simple Bayes model, 434–36 structure induction model, 438–40 diagnostic value, of information, 446–49 Dickinson, A., 14–16, 18, 53, 55, 296, 550 Dieterich, J. H., 590 difference-making approaches, to singular causation, 205–8 Dillingham, E. M., 424–25 dimensions of covariance, 652–53 direction, relative, 250–51 direct statistical induction, 139–40 disconfirming evidence, 42–44 discounting, in conditional reasoning, 335 discourse connectives, 627–28 discovery, 691–92 diSessa, A. A., 261 dispositional attribution(s) causal explanations vs., 655 using Mill's method of agreement, 654–55 disposition framework, of causal reasoning, 4–5, 157–59. See also force(s); force dynamics DiYanni, C., 424 domain(s) biological, 720–22 of causal reasoning, 6 physical, 718–20 psychological/social, 722–23 domain boundaries Page 15 of 42

Index overlapping of, 724–25 relevance and invariance of, 723–27 domain knowledge, 689–90 domain-specific knowledge, 76–77 domain-specific mechanism information, 685–86, 688–89 Domjan, M., 23–24 Donders, F. C., 273 Douven, I., 319, 417–18 Dowe, P., 4, 134, 164, 204 Dowty, D., 625 DSM-based diagnosis, 610–11 dual-factor heuristic, causal judgment as, 41–42 Dunbar, R. I. M., 700–702 Duncan, B. L., 656 dynamic stimuli, 523–25 E ecological causal reasoning, 721–22 ecosystems, 724–25 Edwards, B. J., 428 effects from cause to, in causal argument, 479–81 diagnostic reasoning with multiple, 439–46 from, to causes, in causal argument, 482 in physical domain, 718–20 effortful cognitive process, reasoning as, 55–56 Einstein, A., 66, 67 Eisenhower, Dwight D., 479 elemental causal induction, 98–99 elemental diagnostic reasoning, 434–40 power PC theory, 437–38 simple Bayes model, 434–36 structure induction model, 438–40 Elsner, B., 270, 271 empirical evidence, 79–82 empirical knowledge, 76–77 enabling conditions, 161, 178–79 enforced disintegration, 248–49 engagement, in causal decision-making, 499–502 Engbert, K., 559 Engelmann, N., 614–15 entraining, as causal impression, 247 environmental feedback, 194–95 Erb, C. D., 327–28, 335–37, 343 Erb, H.-P., 422 errors, in causal Bayes nets, 341–42 essences, in biological domain, 720–22 essentialism, 368 Estes, Z., 368 Page 16 of 42

Index Evans, J. St. B. T., 321, 485 Evans-Pritchard, E. E., 731 Evenden, J., 14, 53 events-to-be-changed, 668–69 Evers-Vermeul, J., 628 evidence confirming, 42–44 disconfirming, 42–44 empirical, 79–82 review of, on causal reasoning, 232–38 theory-laden confirming, 43–44 theory-laden disconfirming, 43–44 evidential expected utilities, 501–2 evidential reasoning, 574–81 evolutionary patterns, 709–10 exclusion, inference by, 702–3 execution, of goal-directed actions, 268–69 exit conditions, for causal invariance, 77–78 Exner, S., 268 explaining, process of, 419–21 explaining away as diagnostic reasoning, 441–44 reasoning about, 104–5 explanation-based understanding, of stories, 658–65 explanations causal (See causal explanation) generation of, 659–61 for visual causal impressions, 252–56 (p. 744) explanatory gap, implicit causality as, 632–33 explanatory inquiries, 667–68 explanatory reasoning, 574–81 explicit causal relations, 625–27 F factual causation, 568–69 Fair, D., 4 FC theory. See force composition theory Featherstone, C. R., 631 feature generalization, 388–98 category-to-category generalizations, 398 models of causal-based feature generalization, 394–98 object-to-category generalizations, 394–96 object-to-object generalizations, 396–98 relationship with similarity-based effects, 390–94 feature prediction, 378–88 alternative causes in, 387–88 as causal-based conditional probabilities, 378–85 Markov violations, 385–87 feature weights Page 17 of 42

Index in categorization, 361–66 causal status effect and, 605–6 Feeney, A., 393, 396, 438 Feldman, J., 258–59 Fenton, N., 578 Fernbach, P. M., 95, 105, 106, 327–28, 335–37, 343, 387–88, 438 Ferstl, E. C., 630 Fiedler, K., 191–92, 194, 197 Fincham, F. D., 538 Fine, K., 315, 635–36 Fischer, J., 704 Fish, D., 630 Fisher, Sir Ronald, 555 Flanagan, E. H., 503, 605, 609 Flanagan, J. R., 256 floating, causal attributions for, 719–20 Flores, A., 606 Fodor, J. A., 624 folk-biological reasoning, 720–21 food trails, as causal predictor, 705 Forbus, K. D., 135, 522–23 force(s) applied to skin, 259–60 early understanding of, 719 as mental representations, 134–35 in physical domain, 718–20 force composition (FC) theory neural implications of, 231–32 as psychological theory, 225–27 force creators, 161–63 force dynamics, 147–65 causes vs. enabling conditions in, 161 challenges in, 164–65 commonality between theories of, 160–61 Copley and Harley's force-theoretic model, 155–57 force creators, 161–63 force redirection, 163 mechanisms in, 163–64 Mumford and Anjum's dispositional theory of causation, 157–59 Pinker's theory of, 159–60 Talmy's theory of, 148–50 Wolff's force theory, 150–55 force redirection, 163 foresight, intentionality and, 583–84 forward reasoning, 368–70 Fragaszy, D. M., 701 frameworks, of causal reasoning, 3–5 Fraser, B., 210, 667 Page 18 of 42

Index Fratianne, A., 405 frequency estimates, 31–33 Freytag, P., 191–92, 194 Friel, D., 691, 692 Fugelsang, J. A., 46 Fujita, K., 702 functional features, 364–65 functional form in causal induction, 119–24 feature prediction and, 384 future research, topics for causal decision-making, 509–10 causal structure, 109–10 cognitive neuroscience, 239 conditional reasoning, 338–39 control-based decision-making, 290–91 culture, 732–33 diagnostic reasoning, 449–52 goal-directed actions, 274–75 human development, 692–94 intuitive theories, 539–40 mental disorders, 613–15 mental models, 185–86 non-human animals, 710–11 planning, 290–91 G Gandevia, S. C., 257 gap explanatory, 632–33 as stimulus condition, 250 Gärdenfors, P., 156–57, 159 Garssen, B. J., 478 Garvey, C., 630 Gavetti, G., 470 Gelman, S. A., 127, 365–66, 393, 426 Gemberling, G. A., 24 general causation, 201–2 generalization, 74–76 Gennari, S., 623–24 George, N. R., 156 Gershman, S. J., 299–300, 303 Gerstenberg, T., 518, 531, 582, 593, 594 Gigerenzer, G., 190, 436 Gilovich, T., 461 Ginsburg, G. P., 666 Girotto, V., 319 Gleason, C., 273 Glymour, C., 137, 434 Page 19 of 42

Index Glynn, S. M., 463 goal-directed actions, 265–75 agency and ownership of, 271–72 consciousness and, 272–73 future research on, 274–75 habit and, 267–68 monitoring of, 273–74 from movement to action, 266–67 preparation and execution of, 268–69 selection of, 269–71 goals, explanatory relevance of, 668 Godoy, A., 606 Goikoetxea, E., 630 Göksun, T., 156 Goldberg, A. E., 638 Goldberg, J. H., 657 Goldin-Meadow, S., 730 Golinkoff, R. M., 156 Gonzalez, J. S., 608, 610 González-Martín, E., 606 Good, I. J., 418 Goodman, M. D., 117 Goodman, N. D., 123, 403, 518, 636 Goodsale, M. A., 269 Goodwin, G., 313 Gopnik, A., 124–25, 365, 367, 419, 687 Graesser, A. C., 665, 668 Greene, E. J., 582–83 Greiff, S., 286 Grennan, W., 479 Grether, F. W., 702–3 Greville, W. J., 553 Grice, H. P., 646, 664 Griffiths, O., 14 Griffiths, T. L., 71, 98, 99, 109, 115–25, 406, 418, 518, 525, 687, 688 Gurevitch, G., 507 Güss, C. D., 724 Guy, S., 462 Gweon, H., 538, 690 Gwynne, N. A., 425 H Ha, Y. W., 43 habit, goal-directed actions and, 267–68 Hadjichristidis, C., 321, 396 Haggard, Patrick, 558, 559, 560 Hagmayer, Y., 118, 120, 131, 400–402, 500–502, 510, 552–53, 613–15 Hahn, U., 485–86 Hall, N., 203 Page 20 of 42

Index Halpern, J. Y., 573, 586 Hamm, F., 638 Hampton, J. A., 358, 360–61, 368 Hamrick, J. B., 525 Handley, S., 321, 485 Hanus, S., 700–701 Harley, H., 155–57, 638 (p. 745) Harman, G., 416 Harris, V. A., 443 Harrison, A. G., 462 Harrison, P. D., 508 Hart, H. L. A., 659, 667 Hartshorne, J. K., 630 Hastie, R., 103, 104, 328, 350, 352, 353, 357, 361, 378, 390, 444, 505, 575, 581 Hastings, A. C., 478, 479, 481 Hattori, M., 33, 41, 190 Hawking, S., 66, 72 Hawkins, G. E., 444 Hayes, B. K., 357, 366, 392, 393, 444 Hegarty, M., 136 Heider, F., 516, 535, 537, 647, 651, 669 Heit, E., 392, 393 Hemphill, D., 665 Hermans, D., 58–59 Herrera, A., 38 Hesse, F. W., 476–78, 483, 490 Hesslow, G., 147, 661 Heussen, D., 358 Hewstone, M., 193–94, 538 hidden states, to latent causes, 301–2 hierarchical Bayesian models, 116–17 Hilton, D. J., 422, 659–60, 662–65 Hirsh-Pasek, K., 156 Hitchcock, C., 2, 584, 664 Hoffrage, U., 436 Hohenstein, J., 365–66 Holbrook, J., 650 Holyoak, K. J., 20, 45, 405, 406, 451, 464–68, 470 Hommel, B., 270, 271 Honoré, A. M., 659, 667 Horowitz, A., 684 Hovav, M. R., 622 how-cause, 531–33 human development, 677–94 causal exploration and discovery, 691–92 future research on, 692–94 learning causal relationships in early childhood, 685–91 origins of cognitive reasoning in, 678–85 Page 21 of 42

Index Hume, D., 2, 3, 13–14, 78, 129, 169, 173, 201, 205, 549, 556, 561, 634 hybrid causal theories, 5 hypothesis dependence, 37 hypothesis generation, diagnostic, 451–52 hypothesis revision, 71–72 I IBE. See inference to the best explanation icons, as mental representations, 135–36 Iliev, R., 428 imaging, 316–18 impetus theory, 521–23 implicit causality, 628–33 implicit causal relations, 625–27 implicit covariance assumptions, 655–56 implicit reliance, 194 implicit theories, 728–29 inconsistencies, causal explanations of, 181–82 indeterminacy, of rational models, 449–50 indirect sources, of mechanism knowledge, 140–41 individual relations, 150–51 induction(s) causal (See causal induction) of causal relations, 179–81 direct statistical, 139–40 mechanisms as, 131–33 inductive cognition, 189–91 inductive constraints, 689–90 inference(s) action identification by motive, 651–52 algorithm for analogical, 464–65 analogical reasoning and, 460–61 Bayesian model of causal inference by analogy, 469–70 causal (See causal inference) diagnostic, 444–46 diagnostic, in common-cause structures, 444–46 by exclusion, 702–3 qualitative and quantitative, 103–4 of required action from observation, 708–9 social, 652–53 valid, 195–96 inference to the best explanation (IBE), 416–19 inferential features, 366–70 inferential reasoning theory (IRT), 53–62 associative theory of causal learning and, 53–54 components of, 54–57 evaluation of, 59–61 multi-process account of causal learning and, 57–59 information Page 22 of 42

Index causal reasoning from covariation, 681–82 dimensions of covariance, 652–53 domain-specific mechanism, 685–86, 688–89 quantifying diagnostic value of, 446–49 temporal, in described situations, 552–54 Inhelder, B., 33, 42, 691 innate structures, 254–56 inquiries, explanatory vs. attributive, 667–68 intentionality causality vs., 559–60 foresight and, 583–84 interactions, between the systems, 300–301 inter-category causal relations, 399–402 interdisciplinary approach, to causal reasoning, 6–7 inter-feature causal relations, 398–99 interrogation, 340–41 interventions learning causal structure from, 91–95 reasoning based on, 101–2 intrinsic features, 349–61 intuitive psychology, 535–39 intuitive theories, 515–41 causal explanations and, 535–39 defining, 516–17 with dynamic stimuli, 523–25 future research on, 539–40 modeling of, 517–19 process vs. dependency accounts, of causation, 525–34 purpose and use of, 519–21 intuitive theory of mind, 535–38 invariance causal (See causal invariance) of domain boundaries, 723–27 IRT. See inferential reasoning theory J Jackendoff, R., 154, 638 Jahn, G., 445 James, William, 182, 426 Jara-Ettinger, J., 538 Järvikivi, J., 631 Jaspars, J. M., 538 Jeffrey, R. C., 318, 319 Jeon, G., 163 Jern, A., 389, 398 Jiménez, G., 38 Johnson, S. B. G., 138, 139, 427 Johnson, S. G. B., 610 Johnson-Laird, P. N., 239, 312, 313, 319 Page 23 of 42

Index joint frequencies, 190 Jones, C. E., 303 Jones, E. E., 443, 651, 652, 655 judgments of abnormality, 608–9 causal (See causal judgment) contingency-based causal judgment rules, 31–33 of psychological vs. biological bases of mental disorders, 609–10 K Kahneman, D., 105, 135, 136, 314, 503, 586, 662 Kalish, C. W., 127, 141, 365–66, 426 Kamin, L. J., 54 Kaminski, J., 703–4 Kant, Immanuel, 2, 78 Kareev, Y., 197 Karmiloff-Smith, A., 691 Kashima, Y., 666 Katz, J., 420 Kaufman, L., 681 Kaufmann, S., 428 Keeble, S., 255, 679 (p. 746) Kehler, A., 628–30 Keil, F. C., 164, 367 Kelemen, D., 424 Kelley, H. H., 421, 538, 652–54 Kemp, C., 117, 118, 123, 389, 392, 396–98, 403 Kemp, J. J., 611–12 Kendler, K. S., 603 Khalife, D., 608 Khemlani, S. S., 313, 417 Kim, N. S., 361, 362, 367, 368, 405, 604, 605, 608, 610, 615 Kim, S., 356, 361, 364 Kim, S.-H., 258–59 Kincannon, B., 582 Kirkham, N. Z., 367, 682 Kirmayer, L., 609 Kistler, M., 164 Klayman, J., 43 Kleiman-Weiner, M., 590 Kliger, D., 507 Knobe, J., 210, 584, 664, 667 knowledge analytic, 76–77 causal (See causal knowledge) domain, 689–90 domain-specific, 76–77 empirical, 76–77 inductive constraints beyond domain, 689–90 Page 24 of 42

Index mechanism, 128–33, 140–41 prior, in causal induction, 124–25 world, in explanation of story events, 664–65 Knowles, E. D., 719, 728 Koehler, D., 285 Koehler, J. J., 436 Köhler, W., 700 Kominsky, J. F., 585 Koornneef, A. W., 631 Kotovsky, L., 679 Kratzer, A., 623 Krems, J. F., 445 Krist, H., 702 Krol, N. P., 613 Krugman, Paul, 459 Krynski, T. R., 108, 437 Kuhn, D., 420, 487–89, 577 Kuhn, T., 78 Kummer, H., 710 Kuperberg, G. R., 627 Kuroshima, H., 702 Kushnir, T., 367 Kutzner, F., 194 L Lagnado, D. A., 101, 343, 555, 560, 561, 583, 586 Lakusta, L., 162 Lalljee, M., 666 Lamb, R., 666 language, natural. See natural language Larrick, R. P., 443 Lascarides, A., 627–29 Lassaline, M. E., 361, 464 latent causes, 301–2 launching causal attributions for, 719–20 as causal impression, 246–47 LaVancher, C., 420 Lavendar, H., 615 Lavis, Y., 58 law, causation in the, 567–68 laws of physics, 260–62 Lawson, R., 141 learning associative (See associative learning) causal (See causal learning) causal mechanisms, 139–42 causal relationships in early childhood, 685–91 causal structure, 88–100 Page 25 of 42

Index ontologies, 117–18 reinforcement (See reinforcement learning) statistical, 686–89 systems, 724–25 temporal causal structures, 95–97 Lebowitz, M. S., 611 Lee, H. S., 465, 466, 470 Lee, K., 348 legal causation, 568–69 legal inquiry, 574 legal reasoning, 565–87 attribution of causality with, 581–86 causal simulation, 586–87 causation in the law, 567–68 explanatory and evidential reasoning, 574–81 factual and legal causation, 568–69 legal inquiry in, 574 and moral reasoning, 565–66 philosophical theories of causation and, 566–67 See also but-for test Legare, C. H., 365 Le Pelley, M. E., 14 Lepore, E., 624 Lerner, J. S., 657 Leslie, A. M., 255, 679 Leslie, D. S., 304 Levin, B., 622 Levin, S., 361 Lewis, D., 205, 314–18, 528, 625, 635, 636, 669 lexical triggers, 633–38 Libet, B., 273 Lickel, J. J., 611–12 Lieberman, M. D., 649 Lien, Y., 72, 118, 400, 405 Liljeholm, M., 40, 45, 76, 120, 406 linguistic constructions, 637–38 linguistic factors, in cognitive processing, 729–31 Liu, P. P., 62 Lloyd, K., 304 Lombardi, L., 448 Lombrozo, T., 210, 212, 364, 365, 416–19, 421, 423–26, 428, 527, 655, 689, 693 Lopez, A., 392 López, F. J., 20–21, 61, 606 Love, J., 624 Lovibond, P. F., 58 Lu, H., 45, 71, 109, 121–24, 406, 450, 465, 466 Lubarts, T., 333 Lucas, C. G., 98, 120, 124–25 Page 26 of 42

Index Luhmann, C. C., 62, 348, 362, 371, 405, 606 Luo, Y., 681 Luque, D., 61 M MacFarland, T., 623 Mackie, J. L., 476, 662 Mackintosh, N. J., 58 Maio, G. R., 486 Maldonado, A., 37, 38, 119 Malle, B. F., 630, 650, 651, 668 Malt, B. C., 358 Mandel, D. R., 37, 40, 42, 43, 662 Mansinghka, V. K., 117, 118, 124, 518 mapping, 460–61 Markman, E. M., 393 Markov condition, 102–3 Markov violations, 385–87 Marsh, J. K., 123, 348, 361, 402, 605 Martin, A. E., 637 Martin, J. B., 357, 404, 405 Mascalzoni, E., 256, 701 Maslow, A., 702–3 material conditional, 311–13 Mathes, R. M., 659–60 Matute, H., 38 May, J., 22 Mayrhofer, R., 103, 123, 162, 444 McCauley, C., 655 McClelland, J. L., 403 McCloskey, M., 522 McClure, J. L., 668 McCormack, T., 555 McDonnell, J. V., 406 McGill, A. L., 422, 666 McGrath, S., 154 McGregor, S., 426, 551–52 McKenzie, C. R., 45 McKintyre, A., 666 McKoon, G., 623, 624 McLaren, I. P. L., 60, 62 McNair, S., 438 McNally, G. P., 16 meaning(s) of causal relations, 170–76 of conditionals, 329–30 mechanism knowledge, 128–33, 140–41 mechanisms biological explanations of mental disorders, 614 Page 27 of 42

Index causal (See causal mechanisms) in force dynamics, 163–64 Meder, B., 109, 131, 439, 444, 510 Medin, D. L., 127, 350, 392, 426 Meehl, P. E., 608, 609, 614 Meiser, T., 193–94 memory interrogation and working, 340–41 in multi-process account of causal learning, 58–59 Mendelson, R., 550, 551 mental disorders, 603–15 causality in concepts of, 604–5 diagnosis of, 605–11 future research on, 613–15 prognosis of, 611–12 treatment of, 612–13 mental models, 169–86 abductions of causal explanations, 181–84 of causal assertions, 176–77 deductions from causal relations in, 177–79 future research on, 185–86 inductions of causal relations in, 179–81 meanings of causal relations in, 170–76 of nature, 724–25 See also mental models (MM) theory mental models (MM) theory neural implications of, 228–30 as psychological theory, 222–24 mental physics simulation engine, 523–25 mental representations associations as, 133–34 of causal mechanisms, 133–39 challenges concerning, 138–39 forces and powers as, 134–35 icons as, 135–36 networks as, 137–38 placeholders as, 136–37 schemas as, 138 mental simulations, 339–40 Menzies, P., 2 Mermin, David, 66–67 Mervis, C. B., 350 Meunier, B., 366, 606 Michotte, A., 4, 65, 124, 162, 232–33, 246, 247, 250–55, 258, 259, 261, 523, 550, 678, 679–80 Mikkelsen, L. A., 45 Milgram, S., 656 Mill, J. S., 189 Miller, D., 314 (p. 747)

Page 28 of 42

Index Miller, F. D., 650–51 Miller, J. G., 650 Miller, R. R., 55, 708 Mills, K. K., 627–28 Mill's method of agreement, 654–55 Mill's method of difference, 653–54 Milne, A., 248, 249 Milner, A. D., 269 Minard, E., 58 mind, in psychological/social domain, 722–23 Miresco, M., 609 Mitchell, C. J., 58, 61 Mlodinow, L., 66, 72 MM theory. See mental models theory Mobayyen, F., 624 model(s) alternative, 403–4 Bayesian model of causal inference by analogy, 469–70 causal (See causal models) of causal-based feature generalization, 394–98 of causal induction, 123–24 Copley and Harley's force-theoretic model, 155–57 counterfactual simulation model, 529–30 of evidential reasoning, 577–79 of intuitive theories, 517–19 of intuitive theory of mind, 536–38 mental (See mental models) probabilistic contrast, 68–71 psychological, 339–43 rational, 449–50 Rescorla–Wagner model, 17–19 simple Bayes model, 434–36 situation, 662–65 of social attribution, 650–58 story (See story model) structure induction, 438–40 as tools in science education, 461–64 modules, in visual causal impressions, 254–56 modus ponens (MP), 309–11 Mohamed, M. T., 628 Monfils, M.-H., 303 monitoring, of goal-directed actions, 273–74 Moore, J. W., 560 moral reasoning, 587–95 from action permissibility to character evaluation, 591–94 causality and counterfactuals in, 587–90 counterfactual analysis in, 590–91 and legal reasoning, 565–66 Page 29 of 42

Index philosophical theories of causation and, 566–67 Morís, J., 61 Morris, M. W., 443, 649, 719, 728 motion, understanding of, 719 motion context, 251–52 motion events, 678–80 movement action and, 266–67 amplification of, 252–54 MP. See modus ponens Muentener, P., 162, 683, 694 Müller, S. M., 46 multiple causes and effects, 439–46 multi-process account (of causal learning), 57–59 attention in, 58 memory in, 58–59 perception in, 57–58 Mumford, S., 157–59 Mumford and Anjum's dispositional theory of causation, 157–59 Murphy, G. L., 399 Murphy, R. A., 37 Murray, A. J., 611 Muth, K. D., 463 N Nakayama, K., 251 narratives, causal, 504–6 Natsoulas, T., 251 natural language, 619–40 causal constructions without causative lexical triggers, 633–38 causality at discourse level, 624–28 causality in verbal domain, 620–24 implicit causality in, 628–33 nature, mental models of, 724–25 Neeleman, A., 622 Neilens, H., 485 Nelson, J. D., 448 networks, as mental representations, 137–38 network topology, 381–84 neural implications, of causal reasoning models, 227–32 neurological bases, of social perception, 647–50 Newell, B. R., 444 Newman, G. E., 256 Newton, D. P., 463 Newton, Sir I., 67 Newton, L. D., 463 Nguyen, T. P., 610 Nieuwland, M. S., 637 Nisbett, R. E., 649 Page 30 of 42

Index Niv, Y., 303 Niyogi, S., 117 noisy Newton theory, 523–25 Nolden, S., 559 Nolen-Hoeksema, S., 611 non-human animals, 699–711 evolutionary patterns and causal reasoning, 709–10 future research on, 710–11 inference by exclusion, 702–3 inferring a required action from observation, 708–9 prerequisites for causal reasoning by, 700–702 reasoning about actions of social agents, 705–6 reasoning about object–object relationships, 703–5 reasoning about outcome of one's own actions, 706–8 selective conditioning in, 23–25 (p. 748) Noordman, L. G., 626, 628 Norman, K. A., 303 normative theories of causal reasoning, 2 of singular causation, 202–9 norms, in causal attribution, 584–85 Novick, L. R., 161, 179, 189–90, 451, 608 Nozick, R., 497 O Oaksford, M., 33, 41, 190, 332, 485–86 Oatley, K., 647 object–object relationships, 703–5 objects, in physical domain, 718–20 object-to-category generalizations, 394–96 object-to-object generalizations, 396–98 observation(s) inferring a required action from, 708–9 learning causal structure from, 88–91 reasoning based on, 101–2 O'Connell, S., 700–702 Oden, D. L., 709 Oestermeier, U., 476–78, 483, 490 Öllinger, M., 288 omission, causation by, 153–54 Onifade, E., 508 Opfer, J. E., 368, 393 Oppenheimer, D. M., 105, 366–67, 371, 417 order temporal, causality and, 560–61 temporal, causal structure and, 554–56 Ortega-Castro, N., 41 Osman, M., 39, 284, 285, 287 outcome of one's own actions, 706–8 Page 31 of 42

Index Over, D. E., 319–21, 396, 485 ownership, of goal-directed actions, 271–72 Oxford Handbook of Causation (Beebe, Hitchcock, and Menzies), 2 P p (causal power), 40–41 Pacer, M. D., 124, 417, 418 Page, D. M., 707 Pais, A., 66 Palmeri, T. J., 362, 606 Pander Maat, H., 626 parameterization, 384 Pardeshi, P., 623 Park, J. Y., 79, 82, 103, 386, 426, 427 partial observability, 301 Pasqualino, M., 444 Patalano, A. L., 390 Paulus, D. J., 608, 610 Pavlov, I. P., 14, 302 Payton, T., 37 PCs. See pseudocontingencies Pearl, J., 117, 137, 139, 171, 174, 315–18, 320, 434, 481, 517, 586, 636 Pearson, D., 14 Peirce, C. S., 170 penetration, as causal impression, 249–50 Peng, K., 649, 719, 728 Penn, D. C., 60, 711 Pennington, N., 505, 575 people, in psychological/social domain, 722–23 Perales, J. C., 37, 38–40, 42, 45, 46, 71, 119, 190 perceived causality, 65–67 perception as indirect source of mechanism knowledge, 141 in multi-process account of causal learning, 57–58 Peterson, C. R., 85 Phelan, J. C., 611 Phillips, A., 680–81 philosophical notions, of causation, 568 philosophical theories legal reasoning and, 566–67 moral reasoning and, 566–67 physical domain, causal cognition about, 718–20 physical events, 232–35 physical process approaches, 203–5 physics, intuitive, 521–23 Piaget, J., 33, 42, 266, 680, 685 Pierce, M. L., 362, 405 Pinker, S., 159–60 Pinker's theory of force dynamics, 159–60 Page 32 of 42

Index placeholders, as mental representations, 136–37 planning, 279–85, 290–91 causality in, 283–85 control-based decision-making and, 280–82 defining problems for, 280 future research on, 290–91 public policy program example, 279–80 plausible relations, 118–19 Plessner, H., 191–92 Plotnik, J. M., 704 pluralistic causal theories, 5 Poeppel, D., 220, 623–24 Politzer, G., 319 populations, sampled, 74–76 possibilities, close, 313–16 Povinelli, D. J., 60, 711 power(s), causal as mental representations, 134–35 p (See p) power PC theory, 437–38 Prasada, S., 424–25 preconditions, explanatory relevance of, 668 predictions and action selection, 269–71 covers and, 704–5 noise and, 703–4 Prelec, D., 509 Premack, A. J., 648, 652, 700, 702, 705 Premack, D., 648, 652, 700, 702, 705 preparation, for goal-directed actions, 268–69 presentation format effects, 37–38 prior knowledge, in causal induction, 124–25 probabilistic contrast model, 68–71 probabilities causal-based conditional, 378–85 diagnostic, 450–51 diagnostic reasoning with empirical, 434–36 estimates of, 31–33 feature prediction as causal-based conditional, 378–85 of realistic causal conditionals, 319–22 probability conditional, 318–19 probabilized conditional reasoning, 330–31 probe question effects, 39–40 problem(s) causal structure underlying, 500–501 defining, for control-based decision-making, 280 defining, for planning, 280 formulization of, in reinforcement learning, 297–98 Page 33 of 42

Index process, of explaining, 419–21 process accounts, of causation, 525–34 process framework, of causal reasoning, 4 Proctor, C. C., 503, 607, 609 Proffitt, J. B., 392 prognosis, of mental disorders, 611–12 propositions, 56–57 Proske, U., 257 prospective reasoning, 368–70 pseudocontingencies (PCs), 189–98 contingencies vs., 191–93 contingency assessment, 189–91 as proxy for dealing with paucity of environmental feedback, 194–95 as resilient cognitive illusion, 191–94 as smart heuristic, 195–96 psychological bases, of mental disorders, 609–10 psychological model, 339–43 psychological/social domain, 722–23 psychological theories of causal decision-making, 502–8 of causal reasoning, 222–27 public policy program, 279–80 Puebla, G., 360–61, 369–70 pulling, as causal impression, 247–48 Putnam, H., 423 Pylyshyn, Z. W., 135 Pyykkönen, P., 631 Q qualitative inferences, 103–4 qualitative reasoning, 521–23 (p. 749) quantifying diagnostic value of information, 446–49 quantitative inferences, 103–4 Quattrone, G., 509 R Rafferty, A., 419 Ramsey, F. P., 313, 316, 335 Rao, R. P., 301–2 rationality, mental disorder diagnosis and, 613–14 rational models, 449–50 realistic causal conditionals, 319–22 real-world categories, 358 reasoning about actions of social agents, 705–6 about explaining away, 104–5 about object–object relationships, 703–5 about outcome of one's own actions, 706–8 based on observations vs. interventions, 101–2 biased by causal relations, 105–6 Page 34 of 42

Index with causal structure, 100–109 as effortful cognitive process, 55–56 as indirect source of mechanism knowledge, 141 Markov condition and, 102–3 qualitative and quantitative inferences in, 103–4 See also specific types reasons, causes vs., 475–76 Rebitschek, F. G., 445 Redondo, M., 268 Regolin, L., 256, 701 regularity, temporal, 556–57 Rehder, B., 103, 338, 343, 350, 352–57, 359, 361, 363–64, 366, 381–84, 386–88, 390–91, 394–96, 398, 404–6, 443, 444, 465, 469 Reigeluth, C., 462 reinforcement learning (RL), 295–304 algorithmic solutions, 298–300 causal knowledge and partial observability, 301 formulization of problem, 297–98 from hidden states to latent causes, 301–2 historical background of, 296–97 structure learning and, 302–4 transitions and interactions between the systems, 300–301 relations abstract, 235–36 ALLOW, 154 basic causal, 170–73 causal (See causal relations) explicit causal, 625–27 implicit causal, 625–27 individual, 150–51 inter-category causal, 399–402 inter-feature causal, 398–99 plausible, 118–19 relationships causal (See causal relationships) object–object, 703–5 relative direction, 250–51 relative speed, 251 relativity, postulate of, 65–67 relevance, of domain boundaries, 723–27 religion, science vs., 725–26 representations adaptive causal, 67–68 of agents and their actions, 680–81 alternative, 106–8 alternative, for causal reasoning, 106–9 blended concepts and, 725–26 causal (See causal representations) Page 35 of 42

Index of causal models, 405–6 of causal relationships, 464 mental (See mental representations) of motion events, 678–80 Rescorla, R. A., 15–16, 17 Rescorla–Wagner model, 17–19 research agenda, for causal argument, 488–91 resilient cognitive illusion, 191–94 responsibility attribution of, in groups, 585–86 from diagnostic probabilities to estimates of causal, 450–51 reverse engineering, 183–84 Riedel, J., 703–4 Rips, L. J., 135–36, 343, 368, 428 Rist, R., 333 RL. See reinforcement learning Robertson, S. P., 668 Robinson, B., 724 Robinson, E., 501 robust-cause, 533–34 Rogers, T. T., 403 Rojas, R. R., 124 Rosch, E. H., 350, 720 Roscoe, R. D., 427 Rosenthal, J. E., 611 Ross, B. H., 359, 390 Ross, N. O., 725 Rothe-Wulf, A., 727 Rottman, B. M., 103, 104, 328, 378, 444 Rottman, J., 424 Roy, M., 427 Rubinstein, J., 392, 393 Rudolph, U., 630 rule-governed behavior, 30–31 Runeson, S., 254, 262 Ryan, M. M., 362, 405 S Salmon, Lord, 565 Salmon, W. C., 4, 203 Samland, J., 584–85, 667 sampled populations, 74–76 Sanborn, A. N., 124, 518, 523 Sanders, T. J., 626, 628 Sanislow, C. A., 605 Saxe, R., 538 Scandura, J. M., 30 Schaffer, M. M., 350 Schank, R. C., 284, 669 Page 36 of 42

Index Scheines, P., 434 schemas, as mental representations, 138 scheme-based approaches, 477–79 Schloegl, C., 704 Schlottmann, A., 21–22 Schmeltzer, C., 662 Schmid, M., 705 Scholl, B. J., 251–52, 255 Scholz, A., 445 Schulz, K., 636 Schulz, L. E., 538, 684, 688, 690–92 Schum, D. A., 579 Schupbach, J. N., 417–18 science, religion vs., 725–26 science education, 461–64 Seligman, M. E., 284 semantics, of implicit causality, 631–32 Semrud-Clikerman, M., 463 sequential diagnostic reasoning, 444–46 Seston, R., 424 Seyfarth, R. M., 706 Shafto, P., 392, 393, 398, 690 Shanks, D. R., 14–16, 18, 21, 25, 38–40, 42, 45, 53, 58, 71, 190, 550–51, 557 Shannon, C. E., 447 Shaw, R. C., 704 Shepard, J., 259, 260 Sheu, C. F., 45–46 Shibatani, M., 622, 623 Shultz, T. R., 527, 550, 551 Silk, J. B., 706 Silva, F. J., 707 Silva, K. M., 707 similarity-based effects, 390–94 Simion, F., 256 Simmel, M., 516, 535, 647 Simmons, C. L., 358 Simmons, S., 368 Simon, D., 580 Simpson's paradox, 193–94 simulations causal, 586–87 counterfactual simulation model, 529–30 mental, 339–40 mental physics simulation engine, 523–25 Singh, M., 258–59 Singmann, H., 320 singular causation, 201–13 challenges with, 212–13 Page 37 of 42

Index descriptive research on, 209–12 general vs., 201–2 normative theories of, 202–9 situation models, 662–65 Skills of Argument, The (Kuhn), 487–88 (p. 750) Sloman, S. A., 95, 101, 103, 164, 209, 225, 343, 361, 362, 386–88, 396, 403, 405, 426, 427, 438, 500–502, 527, 555 Slotta, J. D., 427 Slugoski, B. R., 666 smart heuristic, pseudocontingencies as, 195–96 Smith, E. E., 358, 392 Smith, E. R., 650–51 Smith, K. A., 524, 525 Snedeker, J., 630 Snow, John, 180, 181 Sobel, D. M., 365, 367, 682, 687 soccer balls, in causal invariance example, 72–73 social agents, 705–6 social attribution, 645–70 conversational processes in causal explanation, 665–69 explanation-based understanding of stories, 658–65 models of, 650–58 social perception and, 647–50 social causal reasoning, 236–38 social domain. See psychological/social domain social inference, 652–53 social judgment, biases in, 657–58 social perception, 647–50 Solstad, T., 622, 631, 632 Song, J., 730 Soto, F. A., 304 space, cause as guide to, 561–62 Spapé, M., 271 spatiotemporal contiguity, 549–62 causality vs. intentionality, 559–60 cause as guide to space, 561–62 cause as guide to time, 557–62 temporal contiguity and contingency, 549–52 temporal information in described situations, 552–54 temporal order and causality, 560–61 temporal order and causal structure, 554–56 temporal regularity, 556–57 time as guide to cause, 549–57 speed, relative, 251 Spelke, E., 680–81 Spellman, B. A., 464, 506, 582 Spirtes, P., 434 Spohn, W., 434 Page 38 of 42

Index Spooren, W. P., 628 Sprenger, J., 418 Spunt, R. P., 649 Stachenfeld, K. L., 299–300 Stalnaker, R. C., 313, 314, 316, 329–30, 635, 636 Stapleton, J., 571, 572 statistical induction, direct, 139–40 statistical learning, 686–89 statistics, as inputs for causal judgment, 31–33 stereotyping, 656–57 Stevenson, R., 396 Stewart, A. J., 637 Steyvers, M., 90, 92–95, 140, 449, 510 stimulus conditions, 250–52 Stitt, C. L., 655 Storms, G., 358, 392 story model of decision-making, 504–6 in legal reasoning, 575–77 story understanding explanation-based, 658–65 situation models in, 662–65 structural constraints, 129–30 structure. See causal structure structure, of social perception, 647–50 structure induction model, 438–40 structure learning, 302–4 Sturt, P., 631 subprocesses, of analogical reasoning, 460–61 sufficient-cause, 533 Suppes, P., 260 suppression effects, 333–35 Sussman, A. B., 417 Sutton, R. S., 300 Sweetser, E., 625, 626, 628, 632 systems, interactions between, 300–301 systems learning, 724–25 T Talmy, L., 148–50, 621, 622 Talmy's theory of force dynamics, 148–50, 154–55 tasks, of causal models, 404–5 taxonomic causal reasoning, 721–22 Taylor, A. H., 709 temporal causal structures, 95–97 temporal contiguity in associative learning, 22–23 as causal inference, 130–31 contingency and, 549–52 Page 39 of 42

Index temporal contingency, 549–52 temporal cues, 130–31 temporal information, 552–54 temporal order causality and, 560–61 causal structure and, 554–56 temporal regularity, 556–57 Tenenbaum, J. B., 71, 99, 108, 115–21, 123, 124, 392, 403, 406, 437, 510, 518, 524, 538, 687, 690 Tenney, E. R., 506 Tentori, K., 448 testimony, as indirect source, 140–41 Tetlock, P. E., 657 Thagard, P., 580 theory-laden confirming evidence, 43–44 theory-laden disconfirming evidence, 43–44 theory of mind, 722–23 Thirlaway, K., 611 Thompson, R. K. R., 709 Thompson, S. P., 393 Thompson, V. A., 46, 309, 321 Thrornley, S., 85 Tiller, L. N., 704 time cause as guide to, 557–62 as guide to cause, 549–57 Tolman, E. C., 296 Tomasello, M., 703–4, 709 Toulmin, S. E., 479 Trabasso, T. R., 659–60, 662 transitions, systems and, 300–301 Traxler, M. J., 628 Treagust, D. F., 463 “Treatise on Human Understanding, A” (David Hume), 13 treatment, of mental disorders, 612–13 Tremoulet, P. D., 255 triggering, as causal impression, 247 Trope, Y., 651 Tsividis, P., 406 Tversky, A., 105, 135, 136, 285, 314, 509, 586, 662 U Uhlmann, E. L., 593 Ullman, T. D., 117 unitary causal theories, 5 Urushihara, K., 55, 708 Uttich, K., 655 Uustalu, O., 581 V Vadillo, M. A., 38, 41 Page 40 of 42

Index valid inferences, 195–96 Vallée-Tourangeau, F., 37, 38 Valle-Inclán, F., 268 Vallortigara, G., 256, 701 van Berkum, J. J. A., 631 van de Koot, H., 622 Vandorpe, S., 56, 80 van Fraassen, B. C., 422 van Lambalgen, M., 638 Van Overwalle, F., 648 van Schijndel, T., 691, 692 variables, causal strength between, 98–99 variance, of domain boundaries, 723–27 Vartanian, O., 37, 40, 42, 43 Vasilyeva, N., 425–26, 428 Verbrugge, S., 319 Verschoor, S. A., 271 Visalberghi, E., 701 visual causal impressions, 245–63 actions on objects hypothesis, 256–59 causality and laws of physics, 260–62 defining, 245–46 (p. 751) explanations for, 252–56 forces applied to skin and, 259–60 stimulus conditions affecting, 250–52 types of, 246–50 vitalistic functions, in biological domain, 720–22 Vogel, T., 194 Völter, C. J., 705, 710 von Sydow, M., 131 Vul, E., 524, 525 W Wagenmakers, E.-J., 510 Wakefield, A. J., 29 Waldmann, M. R., 20, 97, 103, 118, 120, 121, 123, 131, 162, 400–402, 405, 437, 451, 552–53, 584– 85, 590–91, 667, 704, 705 Walker, C. M., 365, 419, 420, 421 Wallez, C., 662 Walsh, C. R., 164, 209, 527 Walther, E., 191–92 Walton, D. N., 478–83 Watson, P. C., 448 Wegner, D. M., 273, 723 Weidema, M., 271 weight as causal predictor, 705 early understanding of, 719 Weiner, B., 507 Page 41 of 42

Index Wellman, H. M., 421 whether-cause, 530–31 White, P. A., 38, 44, 162, 248–50, 253, 258, 259, 261, 680, 683, 684, 729 White, R. W., 268 WHO. See World Health Organization Widlok, T., 723 Wiegmann, A., 590–91 Wigmore, J. H., 579 Wilkenfeld, D. A., 421 Williams, J. J., 418, 419 Wilson, N. E., 23–24 Wing, A. M., 256 Winterbottom, A., 506 wishful thinking, 72–73 Wisniewski, E. J., 358–59 Witteman, C. L., 613 Wolff, P., 134, 150–55, 158, 162, 163, 213, 259, 260, 535, 623, 624, 640 Wolff's force theory, 150–55 ALLOW relations in, 154 causal chains in, 151–53 causation by omission in, 153–54 individual relations in, 150–51 Wong, J., 705 Woodward, A., 680–81 Woodward, J., 72, 534 Woodworth, R. S., 268 working memory, 340–41 World Health Organization (WHO), 603 world knowledge, in explanation of story events, 664–65 Wright, E., 273 Y Yanowitz, K. L., 463 Yela, M., 250, 255, 257 Yeung, S., 109, 122–23 Yopchick, J. E., 368 Yu, L., 163 Yuill, N., 647 Yuille, A. L., 45, 124, 406 Z Zwarts, J., 157

Page 42 of 42