Recipes for Science: An Introduction to Scientific Methods and Reasoning [1 ed.] 1138920738, 9781138920736

Today, scientific literacy is an essential aspect of any undergraduate education. Recipes for Science responds to this n

1,748 204 7MB

English Pages 348 [349] Year 2018

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Recipes for Science: An Introduction to Scientific Methods and Reasoning [1 ed.]
 1138920738, 9781138920736

Citation preview

Copyright © 2018. Taylor & Francis Group. All rights reserved. Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:28:30.

Recipes for Science

Today, scientific literacy is an essential aspect of any undergraduate education. Recipes for Science responds to this need by providing an accessible introduction to the nature of science and scientific methods, reasoning, and concepts that is appropriate for any beginning college student. It is designed to be adaptable to a wide variety of different kinds of courses, such as introductions to scientific reasoning or critical thinking, philosophy of science, and science education. In any of these different uses, the book helps students better navigate our scientific, 21st-century world.

KEY FEATURES •

Copyright © 2018. Taylor & Francis Group. All rights reserved.

• • • • • •

Contemporary and historical examples of science from many fields of physical, life, and social sciences. Visual aids to clarify and illustrate ideas. Text boxes to explore related topics. Plenty of exercises to ensure full student engagement and mastery of the information. Annotated ‘Further Reading’ sections at the end of each chapter. Final glossary with helpful definitions of key terms. A companion website with author-developed and crowdsourced materials, including syllabi for courses using this textbook, bibliography of additional resources and online materials, sharable PowerPoint presentations and lecture notes, and additional exercises and extended projects.

Angela Potochnik is Associate Professor of Philosophy and Director of the Center for Public Engagement with Science at the University of Cincinnati, USA. Matteo Colombo is Assistant Professor in the Tilburg Center for Logic, Ethics, and Philosophy of Science, and in the Department of Philosophy at Tilburg University, the Netherlands. Cory Wright is Professor of Philosophy and Director of Graduate Studies at California State University Long Beach, USA.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:28:30.

Copyright © 2018. Taylor & Francis Group. All rights reserved. Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:28:30.

Recipes for Science An Introduction to Scientific Methods and Reasoning

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Angela Potochnik Matteo Colombo Cory Wright

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:28:45.

First published 2019 by Routledge 711 Third Avenue, New York, NY 10017 and by Routledge 2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN Routledge is an imprint of the Taylor & Francis Group, an informa business © 2019 Taylor & Francis The right of Angela Potochnik, Matteo Colombo, and Cory Wright to be identified as authors of this work has been asserted by them in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging-in-Publication Data A catalog record for this book has been requested ISBN: 978-1-138-92072-9 (hbk) ISBN: 978-1-138-92073-6 (pbk) ISBN: 978-1-315-68687-5 (ebk) Typeset in Berling by Apex CoVantage, LLC

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Visit the companion website: www.routledge.com/cw/potochnik

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:28:57.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

For all the excellent teachers from whom we’ve learned our love of science

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:29:11.

Copyright © 2018. Taylor & Francis Group. All rights reserved. Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:29:11.

Contents

List of Figures and Tables Acknowledgments

1

Introduction: Science and Your Everyday Life

1

What Is Science?

7

1.1 1.2 1.3

2

Copyright © 2018. Taylor & Francis Group. All rights reserved.

46

Experiment: Connecting Hypotheses to Observations The Perfectly Controlled Experiment 62 Experimental and Non-Experimental Methods 72

46

89

Models in Science 89 Varieties of Models 102 Learning From Models 115

Patterns of Inference 4.1 4.2 4.3

5

7

Models and Modeling 3.1 3.2 3.3

4

The Importance of Science Defining Science 15 Recipes for Science 31

Experiments and Studies 2.1 2.2 2.3

3

ix xii

125

Deductive Reasoning 125 Deductive Reasoning in Hypothesis-Testing Inductive and Abductive Reasoning 150

141

Statistics and Probability 5.1 5.2 5.3

The Roles of Statistics and Probability Basic Probability Theory 172 Descriptive Statistics 182

167 167

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:29:43.

viii

Contents

6

Statistical Inference 6.1 6.2 6.3

7

8

Generalizing From Descriptive Statistics 207 Using Statistics to Test Hypotheses 221 A Different Approach to Statistical Inference 232

Causal Reasoning 7.1 7.2 7.3

242

What Is Causation? 242 Testing Causal Hypotheses 255 Causal Modeling 262

Explaining, Theorizing, and Values 8.1 8.2 8.3

207

275

Understanding the World 275 Theorizing and Theory Change 288 Science, Society, and Values 297

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Glossary References Index

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:29:43.

310 322 327

Figures and Tables

FIGURES 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.1 2.2 2.3 2.4 2.5 2.6

Copyright © 2018. Taylor & Francis Group. All rights reserved.

2.7 2.8 2.9 2.10 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.1 4.2

Notable early scientists studying carbon dioxide (CO2) and climate Keeling curve: ongoing increase in atmospheric concentrations of CO2 Ice core data from Antarctica Unprecedented increases in atmospheric CO2 in the past century Scientists in the Persian Golden Age Appearance of retrograde motion (a) Schematic flowchart of simple falsificationism; (b) Karl Popper Clever Hans and Wilhelm von Osten Reorientation from geocentrism to heliocentrism Illustrations of two crosses between pea plants Western Electric’s Hawthorne factory illumination study Isaac Newton’s illustration of his two-prism experiment William Herschel’s experimental setup to test the relationship between the color and temperature of light Three scientists who contributed to our knowledge of light Headlines reporting on Arthur Eddington’s observations during the 1919 eclipse, which confirmed Albert Einstein’s theory of general relativity Mars Curiosity rover selfie taken on Mount Sharp (Aeolis Mons) on Mars in 2015 Cholera epidemic, close-up of Snow’s Broad Street map Phineas Gage posing with the rod that passed through his skull Isaac Newton’s cannon thought experiment View of the San Francisco Bay Model The Reber Plan (a) Drosophila melanogaster; (b) The four chromosomes of Drosophila Visual representation of the Lotka-Volterra model The problem of curve-fitting James Watson and Francis Crick’s double helix model of DNA William Phillips’s MONIAC hydro-economic model Visual depiction of the sodium-potassium pump Accuracy versus precision Edwin Hubble at Mt. Wilson Observatory Frieze at the Social Hygiene Museum in Budapest, honoring Ignaz Semmelweis

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:30:07.

9 10 10 11 18 19 26 34 43 47 51 52 54 55 65 74 79 81 86 90 92 96 99 105 107 109 110 121 126 145

x

List of Figures and Tables

4.3 4.4 4.5 4.6 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11

6.1

Copyright © 2018. Taylor & Francis Group. All rights reserved.

6.2 6.3 6.4

6.5 6.6 6.7 7.1 7.2 7.3

7.4

(a) Flint Michigan water crisis (b) Lee Anne Walters, the Flint citizen-scientist who initially requested water-testing The black swan of the family (Black Australian swan surrounded by Bewick’s swans) (a) The Earth’s landmasses fit together a bit like puzzle pieces; (b) Marie Tharp and Bruce Heezen The pan-African dawn of Homo sapiens Visualization of the conditional probability of rolling a number less than four given that you roll an odd number (a) Pie chart of a coffeeshop’s sales; (b) Bar chart of per capita national beer consumption (a) Histogram of a unimodal grade distribution; (b) Histogram of a bimodal grade distribution Examples of (a) uniform, (b) 艛-symmetric, and (c) 艚-symmetric distributions; (d) Examples of asymmetric distributions (a) Histogram of the Quiz 1 grade distribution in Table 5.2; (b) Histogram of the Quiz 2 grade distribution in Table 5.3 Standard deviation in a normal distribution An imagined scatterplot of the relationship between alcohol consumption and decibel level in bars A regression analysis of Galton’s data on the diameter of sweet pea seeds Scatterplots depicting correlational strength and direction Francis Galton Visualizations for Exercise 5.17: (a) Average expenditure per dollar of Indiana property tax, 2013; (b) Composite score GRE and academic major; (c) Iris petal length; (d) Number of digs performed and amphorae found (a) Probability distribution of heads for 100 coin tosses; (b) Example of normal distribution for a continuous variable Diagram of the 68%–95%–99.7% rule for standard deviations Four histograms of roughly normal distributions Fabiola Gianotti, project leader and spokesperson for the ATLAS experiment at CERN involved in the discovery of the Higgs boson in July 2012 R. A. Fisher Probability distribution of the number of guesses your friend will get correct if she is randomly guessing Thomas Bayes Annual seismic activity in Oklahoma 1978–2017 USGS map showing locations of wells related to seismic activity 2014–2015 Visualization of the correlation between per capita consumption of cheese and number of people who died from getting tangled in their bedsheets Generic causal graph with nodes representing variables of interest and arrows representing direct causal relationships

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:30:07.

151 154 157 162 179 185 186 188 193 195 196 198 199 200

203 211 214 219

222 225 227 235 243 243

248 264

List of Figures and Tables

7.5

7.6 8.1 8.2 8.3 8.4 8.5 8.6

Causal graph of the relationships between posting copyrighted material on your Facebook page, a friend reporting you, and your Facebook page being shut down Causal graph for the dyspnoea case Oklahoma Senator James Inhofe speaking before the US Congress in 2015 while brandishing a snowball Ridership data for Yellow Taxis and Uber in New York City 2015–2017 Occurrence of the word law in PsychLit abstracts per 10,000 entries Partial sketch of a bicycle Scientists of the chemical revolution (a) Rosalind Franklin; (b) Franklin’s x-ray diffraction image that famously inspired Watson and Crick’s double-helix model of DNA

xi

266 269 278 280 284 287 293 299

TABLES 1.1 1.2 2.1 3.1 3.2 4.1 4.2 4.3 5.1 5.2 5.3

Copyright © 2018. Taylor & Francis Group. All rights reserved.

5.4 5.5 6.1 6.2 6.3 7.1 7.2 7.3 7.4 8.1 8.2

Checklist for evaluating whether an idea or project qualifies as scientific Individual and social norms that protect against bias and flaws in reasoning Elements of the perfectly controlled experiment Payoff matrix for the prisoner’s dilemma with Dominik Payoff matrix for a generic prisoner’s dilemma Conditional statements Valid inference patterns, invalid inference patterns, and informal fallacies Annual births, deaths, and mortality rates for all patients at the two clinics of the Vienna Maternity Hospital 1841–1846 Addition, multiplication, and subtraction rules and their conditions Imagined data set and central tendencies for 17 student scores on 10-point Quiz 1 Imagined data set and central tendencies for 17 student scores on 10-point Quiz 2 Average diameter of parent/offspring sweet pea seeds Data on Titanic survivors (a) Frequency distribution of a bag of 35 M&Ms; (b) Relative frequency distribution Imagined questionnaire scores of 100 university students Summary of statistical hypothesis-testing and its relationship to general hypothesis-testing Mill’s methods Conditional probabilities for the causal graph in Figure 7.5 Possible values for variables in the dyspnoea case Conditional probabilities of developing lung cancer given level of pollution exposure and whether or not a person smokes Thomas Kuhn’s four-stage view of scientific change Five questions that arise when doing science that our values help us answer

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:30:07.

27 38 71 112 112 130 143 144 176 190 192 197 205 209 213 224 259 267 268 269 292 302

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Acknowledgments

Many people have contributed to this book in a variety of ways. Thanks to Gila Sher, who made possible Cory’s initial conversations with Senior Editor Andrew Beck with Routledge. Without Gila’s encouragement, there never would have been a book proposal. Andy’s initial vision for the book was crucial for framing the project, and his later editorial guidance and support was matched only by his enduring patience and flexibility. Thanks also to Routledge Development Editor Alison Daltroy and Editorial Assistants Vera Lochtefeld and Emma Starr, along with the dozens of anonymous reviewers of both the original proposal and the later completed manuscript draft. Their feedback left an indelible imprint on what went into the book, as well as the final product that resulted. Several students provided helpful research assistance. Nathan Sollenberger, Alejandro Garcia, and Karina Laigo from the undergraduate research program at Cal State Long Beach helped kick off the book proposal, and Christopher Laplante provided very helpful editorial assistance at the end stages of production. Micah Freeman and Sahar Heydari Fard at the University of Cincinnati provided valuable comments on the whole manuscript and assistance with glossary compilation. Several colleagues provided extremely helpful feedback on parts of the manuscript, including Zvi Biener, Vanessa Carbonell, Jan Sprenger, Naftali Weinberger, and Nellie Wieland. Angela owes a further debt of gratitude to Zvi Biener for working with her to design the University of Cincinnati course How Science Works, which inspired her contributions to the book. More generally, she deeply appreciates her colleagues and friends at the University of Cincinnati, inside and outside of philosophy. She also thanks her family for their patience during the periods of time when she was a bit lost to this project. Cory is grateful to his family for their patience, and is looking forward to making up for lost time. He would also like to thank Henk, whose unfailing devotion to this project and daily emotional support and encouragement was as great as any hound’s could be. Matteo would like to thank his colleagues and friends at the Tilburg Center for Logic, Ethics, and Philosophy of Science (TiLPS), his family, and Chiara, for their encouragement, inspiration, and care. During this project, he was generously supported by the Deutsche Forschungsgemeinschaft (DFG) as part of the priority program New Frameworks of Rationality [SPP1516], and by the Alexander von Humboldt Foundation. He would also like to acknowledge Zio P.’s apt reminders of the quote constanter et non trepide.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:30:23.

Introduction Science and Your Everyday Life

Copyright © 2018. Taylor & Francis Group. All rights reserved.

POLIO, HPV, AND OTHER ILLNESSES What do the American president Franklin D. Roosevelt, the Mexican painter Frida Kahlo, and the Jamaican reggae trio Israel Vibration have in common? Many people today can’t guess the correct answer: they all suffered from polio (or poliomyelitis), which can cause paralysis and even death. This can be hard to guess because scientists and doctors have successfully turned polio from a global health problem to mostly just a part of history. Many other people throughout human history have suffered from this crippling infectious disease—most of them young children. In 1952 alone, the polio epidemic ravaged nearly 60,000 Americans. In 1955, led by the virologist Jonas Salk, a team of scientists discovered a vaccine for polio. Thanks to the introduction of mass vaccination programs immediately thereafter, polio cases have decreased worldwide by over 99%. Today, there are only three countries where polio still exists: Pakistan, Afghanistan, and Nigeria. As of 2016, there were only 37 known cases remaining. The eradication of polio counts among the most important human—and scientific— achievements. Vaccination provides you with immunity, which protects you for life. Going unvaccinated, in contrast, is a serious risk since polio is highly infectious and human migration is rapid. Outbreaks are still possible. It’s a no-brainer that people should demand that they and their children be vaccinated. And yet, many people today are not vaccinated for polio. In wealthier countries, like the US, UK, Italy, Australia, France, and Russia, the biggest challenges to vaccination come from skeptics opposed to vaccination for ideological reasons and from mere complacency. In other countries, like Nigeria, Pakistan, and Afghanistan, political and religious challenges intertwine with issues of marginalization and feasibility. It is harder to deliver vaccinations to at-risk communities, which might suffer from extreme poverty and lack needed infrastructure. In any nation, communication of the effectiveness, safety, and public health value of vaccination benefits from a sound understanding of the science of vaccines. Unlike polio, HPV (human papilloma virus) is extremely widespread, with roughly 40% of the world’s population infected; it’s the most frequently sexually transmitted disease in the world. Among other effects, HPV substantially increases the risk of various types of cancer. There’s also a vaccine for HPV. It was first available in 2006, after thorough testing for safety and efficacy. The World Health Organization (WHO) recommends HPV vaccines as part of routine vaccinations in all countries. This discussion of vaccination is meant to illustrate that understanding what’s involved in good science and scientific reasoning is of extreme importance. At some point in their

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:30:40.

2

Introduction

lives, many people need to make a decision about whether to be vaccinated for some disease or whether to have their children vaccinated. And sometimes vaccine skeptics have louder voices than doctors and other vaccination advocates, so it can seem difficult to get a clear account of vaccines’ safety, effectiveness, and necessity for public health. The polio vaccine has undergone thorough testing for safety and efficacy, initially in a study involving 1.2 million children and in many other studies since. The same is true for other vaccinations, including the HPV vaccine. And claims from vaccine skeptics about substantive risks of vaccination have been thoroughly debunked. But don’t take it from us. Learn about scientific experiments so you can assess the quality of vaccination studies. Learn sound and problematic forms of inference in order to assess the scientific inferences supporting the use of vaccines (and the problems with vaccine skeptics’ attempts to sow fear). Study causal reasoning so that you can critically assess the weight of the evidence against claims about vaccination causing autism (which has in fact been thoroughly debunked). These topics and others important for the critical assessment of scientific findings and their public reception are the focus of this book.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

WHY LEARN ABOUT SCIENCE? As the case of vaccination suggests, scientific findings, and the public’s reactions to them, dramatically shape our world. More than this, science also regularly and dramatically influences your life, whether or not you want it to. If this is not immediately apparent, that may be because of the extensiveness of science’s reach. One way or another, everybody is impacted by science. The reach of science means that you have a lot to gain from being able to understand and assess scientific reasoning. This enables you to make educated decisions about your own and your family’s medical care. It also makes it possible for you to critically evaluate reports of scientific findings and the credentials of experts in order to decide what to believe. This ability is important, since so much of our daily life is impacted by scientific findings. Here’s another example of unavoidable science related to health. Peanut allergies are serious and develop early in life, and rates of this allergy are on the rise. In 2015, medical recommendations regarding when to introduce peanut products to babies changed radically in the US from waiting until at least one year of age to introducing as early as possible. Both waiting and then introducing early were said to reduce the risk of allergic reaction. Should you follow this new advice for your baby, if and when you have one? If the medical researchers were (apparently) wrong with the last recommendation, why should you follow this new recommendation? A sophisticated user of science is also well positioned to make judgments about science more globally. Is it good for the government to fund basic scientific research? Is the level of funding for medical research adequate? Should we worry if private corporations fund science, given that governmental and university funding is in too short of supply? Answers to these questions require a view about the status of the scientific enterprise as a whole, how it should relate to society, and whether and how funding sources matter. Scientists are, of course, the main practitioners of science. Other researchers have as their primary focus understanding what science—and scientists—are up to. This latter group is interested in understanding what science is and how it works, its pitfalls and

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:30:40.

Introduction

3

limitations, and its relationship with society. These topics are what this book is all about. Several disciplines investigate science in this way; primary among these are history, philosophy, and sociology. Historians have worked hard to make sense of the history of science—how the events unfolded that contributed to making science what it is today. Sociologists also study science, especially the social and cultural influences on how science works and what it produces. This book draws from the history and sociology of science, but its main approach is philosophical. There’s a simple reason for that: we, its authors, are philosophers of science. If you haven’t studied philosophy of science, it may sound obscure. But philosophy of science is just a way of thinking hard about the scientific enterprise. It focuses especially on questions of what science should be like in order to be a trustworthy route to knowledge and to achieve the other ends we want it to have, such as usefulness to society. Although written from a philosophical perspective, this book does not dwell on philosophers’ debates about science. Instead, we aim to use philosophical insights about science without getting bogged down in controversies, technical terminology, or intricate details.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

RECIPES FOR SCIENCE The title of this book, Recipes for Science, is meant to evoke two ideas about science. First, recipes for baked and cooked items like bread, pies, and stir-fry come in lots of different versions. Some differences are rather trivial, like whether measurements are in weight or volume. Others are substantial, like whether a bread is leavened with yeast or with baking soda and powder. Enough substantial differences can result in very different products, even products that go by the same name but contain entirely different ingredients. Science is also like this. It proceeds in many different ways, and there’s no magical ingredient or essential list of ingredients that guarantees good science. At the same time, a recipe is a formula intended to lead to a specific outcome, with an intentional combination of ingredients and use of methods to achieve that outcome. Different recipes for a given type of food have certain elements in common, even if many of their other features vary. So, for example, breads generally incorporate grain of some kind as a major ingredient, most have a leavening agent of some kind, and they are cooked, usually but not always by baking in the oven. There are family resemblances among different breads and the recipes used to make them, even if there’s no simple definition of bread and no one recipe required to make bread. Science is like this as well. Even as it proceeds in different ways, and even as there’s no one overarching set of instructions or mechanical procedures that guarantees good science, there are certain generalizations that can be made about how good science is conducted. Many different activities count as science, and there are also differences in how each of these activities is carried out. But there are also family resemblances among instances of science, just as there are among breads. This book aims to facilitate a clear understanding of the key elements of science and why those elements are significant, even as it illustrates the tremendous variety of projects that count as science. The first three chapters address the nature of science and its key methods. Chapter 1 surveys what is distinctive and important about science while also showing how elusive

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:30:40.

4

Introduction

the very concept of science can be. We suggest a checklist approach to distinguishing science from non-science and fake science and suggest—in lieu of a single, one-size-fitsall method—that there are various recipes for science. Chapter 2 outlines the role of experimentation in science and the features of a perfectly controlled experiment. Then the chapter catalogues a range of methods for experimental and non-experimental studies and discusses the advantages and disadvantages of each. Chapter 3 focuses on scientific models: how they are constructed and used, and the main varieties in which they come. The chapter ends by discussing the relationship between modeling and experimentation and asking the question of what features of models contribute to their scientific value. The next four chapters focus on scientific reasoning. Chapter 4 describes the primary patterns of inference in science: deductive, inductive, and abductive reasoning. The chapter starts with patterns of deductive inference and their use in scientific hypothesis-testing, moves to the importance of and challenges with inductive inference, and then turns to the scientific significance of abductive reasoning, also known as inference to the best explanation. Chapter 5 surveys basic statistical methods, beginning with their basis in probability theory and proceeding through descriptive statistics. Chapter 6 expands on that discussion to outline inferential statistics, including sampling and hypothesis-testing. The chapter ends by introducing the Bayesian approach to statistics and discussing some of its differences from the classical approach. Chapter 7 engages with causal reasoning in science. Topics include the nature of causation, the relationship between causal reasoning and statistical reasoning, testing causal hypotheses, and causal modeling. Finally, Chapter 8 examines the purpose of science and its relationship to society. We address the nature of scientific explanation and scientific theories, how theory change and progress in science occur, and how society and values influence science. The book closes with a consideration of the current challenges facing science.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

INTENDED AUDIENCES AND HOW TO USE THE BOOK The intended audience for this book includes anyone who wants to have a more sophisticated understanding of the nature of science and a stronger basis for assessing scientific reasoning. This book is not just for students of philosophy or science majors. Indeed, the primary audience we had in mind as we developed this book is an undergraduate student in a general education course, who may not take any additional science courses in college. We asked ourselves, what would that student most benefit from knowing about how science works? What episodes from historical and current science would that student be interested to read about and contemplate? That said, we expect this book will also be useful for some more specialized or more advanced courses. These include science education courses, especially those that focus on the nature of science and scientific reasoning. These also include introductory philosophy of science courses, especially if supplemented with more canonical readings or readings that address some of the major philosophical controversies about science. We also expect this book to be of use in introductory science courses, especially methods courses, when supplemented with appropriate material specific to the particular scientific field of study.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:30:40.

Introduction

5

This textbook was designed to be usable in its entirety in a standard 15-week semester. Students spend one-third of the semester learning about the nature of science, including the key features of science, experimentation, and scientific modeling (Chapters 1–3). Most of the remaining semester is then spent learning about scientific reasoning, including deductive, inductive, and abductive reasoning patterns; probability and statistics; and causal reasoning (Chapters 4–7). The final unit of the course addresses the scientific successes of explanations and theories and science’s relationship with society (Chapter 8). Given the range of course levels and disciplines for which this book is appropriate, and given the reality that different instructors have different teaching goals, we have also designed the textbook to be usable in a variety of ways. The textbook is modular; each chapter can be used independently from the others. Instructors (or independent readers) can thus choose to use only the chapters that suit their needs. Each section may rely on information provided in earlier sections of the same chapter but does not presume facility with information from other chapters. Instructors may choose not to assign later sections in some chapters that seem overly specialized or too difficult given the focus of their course. Finally, some material that is more difficult or philosophical is separated from the main text in boxes. Here, too, instructors can choose whether to assign material in boxes. Here are a few examples of how this might play out in different courses. A critical reasoning course focused on science may limit its attention to Chapters 4–7—deductive, inductive, and abductive inference patterns, probability and statistics, and causal reasoning. A science education course on the nature of science may use Chapters 1–3 and 8, addressing the key features of science, experimentation, modeling, and theories and explanations. An introductory philosophy of science course might make use of Chapters 1–4 and 8, supplemented with primary philosophical texts. Other introductory courses might use the full book except for some of the more difficult sections, like 6.3 on Bayesian statistics and 7.3 on causal modeling.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

SUPPLEMENTARY MATERIALS Each section in this book ends with a list of exercises. We have tried to provide exercises that will solidify understanding or challenge students to apply what they have learned. We encourage instructors to make use of these exercises for in-class group or individual activities, homework, and exam questions. Individuals who are working through this book independently might also benefit from completing some of the exercises. There is a list of suggested further reading at the end of each chapter, which provide inroads into a more in-depth investigation of individual topics covered. The further reading selections thus provide some options for instructors and individual readers who want to focus on specific topics in more depth. At the end of the book, there is a glossary of technical terms and other specialized vocabulary that students can consult as needed. Terms defined in the glossary are indicated in the main text with bold and italics, as ‘philosophy of science’ was earlier in this introduction. Finally, there is a website to accompany this textbook: www.routledge.com/ cw/potochnik. The website includes example syllabi for different kinds of courses utilizing this text, additional exercises, and links to content available on the internet that

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:30:40.

6

Introduction

will enrich readers’ experience with the topics covered in this book. Because introductory scientific reasoning courses are not yet offered at some universities, and in even fewer philosophy departments, the website also provides information and links to information about the value of such courses and why philosophy departments are good places to house them.

EXERCISES 0.1 What do you expect to learn from this textbook and the course you’re reading it for? 0.2 What most concerns you about this textbook and the course you’re reading it for? 0.3 How do you think you will benefit from learning more about the nature of science and scientific reasoning? Why or why not? 0.4 What do you think is most valuable about learning about science and scientific reasoning? 0.5 Describe your relationship to science. To help you get started, you might consider the following questions. Have you taken many courses in science or read about science on your own? If so, on what topics? Do you know any scientists? Do you think there are reasons to distrust or dislike science? If so, what are the reasons?

Copyright © 2018. Taylor & Francis Group. All rights reserved.

FURTHER READING For more on HPV and vaccines, see World Health Organization (2017, May). Human papillomavirus vaccines: WHO position paper. 92, 241–268. Retrieved from http:// apps.who.int/iris/bitstream/10665/255353/1/WER9219.pdf?ua=1 For a concise explanation of myths surrounding vaccines, see PublicHealth.org (2018, May). Understanding vaccines: Vaccine myths debunked. Retrieved from https://www.public health.org/public-awareness/understanding-vaccines/vaccine-myths-debunked/ For a concise overview of global health and vaccination, see Greenwood, B. (2014). The contribution of vaccination to global health: past, present and future. Philosophical Transactions of the Royal Society B, 369(1645), 20130433. For a thorough treatment of immunization and vaccination, see World Health Organization, Research and development. Retrieved from www.who.int/immunization/ documents/research/en/

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:30:40.

CHAPTER 1

What Is Science?

1.1

THE IMPORTANCE OF SCIENCE

After reading this section, you should be able to do the following: • • •

Describe how scientific research supports the finding of human-caused climate change and why public opinion lags behind the scientific research Discuss the nature of knowledge and the varieties of scientific knowledge Articulate the limits of science and describe a type of project outside those limits

A Serious Practical Concern April 22, 2016, was a historic day. While people worldwide celebrated Earth Day and expressed their support for protection of the environment, representatives of 177 nations signed the Paris Agreement at the UN headquarters in New York. The Paris Agreement followed up on the Kyoto Protocol to unite most countries on our planet to deal with climate change. Indeed, there was near-total international unity and consensus, and only two countries did not sign: Nicaragua, because the agreement did not go far enough, and Syria, because of their civil war and subsequent governmental collapse. The Paris Agreement aims to keep the average global temperature rise this century to well below 2° Celsius. Two degrees might seem like a minor change in temperature, but as an average global temperature increase, it would be a really big deal. Think of this temperature increase like a fever. The human body maintains a relatively constant temperature in the range of 36.5–37.5° Celsius (97.7–99.5° Fahrenheit). Were your body temperature to increase just 2° Celsius, you would have a raging fever, as your temperature would be over 38.8° Celsius (102° Fahrenheit). If your body were suddenly that much warmer on average, it would be a serious and potentially devastating medical emergency—and all the more so without medical treatment options. An average global temperature increase of 2° Celsius would be similarly devastating for Earth, and there would be few if any treatment options. But why, exactly, would it be so devastating? First, because it changes the Earth’s climate. The atmospheric concentrations of greenhouse gases, such as methane (CH4), carbon dioxide (CO2), and water vapor, are a major factor affecting the Earth’s climate. Greenhouse gases work like a blanket. As incoming radiation from the Sun permeates our atmosphere, some of this heat hits the Earth and is reflected back out to space. But

15031-1864-Fullbook.indb 7

6/24/2018 7:38:22 AM

8

What Is Science?

greenhouse gases trap some of the heat in the atmosphere; this blanket of radiant heat warms the planet’s surface, making it hospitable to life. But increasing amounts of greenhouse gases trap increasing amounts of heat. As a result, mountain glaciers are shrinking and ice sheets are melting in the Arctic, Greenland, and Antarctica; sea levels are rising; precipitation patterns across seasons are more unstable; more droughts and heat waves are occurring; and the blooming times of flowers and plants are shifting. All these changes are consequences of global warming. Second, the changing climate has other downstream effects. The rise in global temperature and resulting climate changes threaten to push some animal and plant species to extinction, collapse ecosystems, and make extreme weather more frequent. It also threatens to destabilize social conditions. Drinking water will become scarcer and droughts more frequent and severe; crop yields may decrease. Coastal cities and island nations are at risk of serious floods and devastating hurricanes. In this way, climate change is also affecting global health, poverty, hunger, and various nations’ security. Ultimately, global warming will make the Earth less hospitable for all creatures, including humans, and probably also a more unjust place in virtue of who will suffer and how this suffering will be managed. Earth’s climate has never been static; it has been fluctuating for billions of years. Besides the concentration of greenhouse gases, factors that affect it include variations in the Earth’s orbit, the motion of tectonic plates, the impact of meteorites, and volcanism on the Earth’s surface. So, what’s special about the current climate changes? Why is this different? What’s special about the current changes in Earth’s climate is the role of human activities in generating them. The basic reasoning behind this conclusion is simple and clear. We have known since the 18th century that burning carbon-based fossil fuels releases carbon dioxide (CO2) into the atmosphere. During the last three centuries, at least since the beginning of the Industrial Revolution, human activities have been releasing CO2 into the atmosphere at an unprecedented rate. Large-scale releases of CO2—one of the greenhouse gases—into the atmosphere increase its heat retention, thus increasing the Earth’s average global temperature. And scientists have in fact measured such an increase in average global temperature. So it’s clear that human activity during the last couple of centuries has increased the Earth’s average global temperature. Systematic research on the relationship between CO2 emissions and climate change began in the 19th century, when the American engineer Marsden Manson noted that ‘the rate at which a planet acquires heat from exterior sources is dependent upon the power of its atmosphere to trap heat; very slight variations in the atmospheric constituents [produce] great variations in heat trapping power’ (Manson, 1893, p. 44). A few years later, the Swedish physicist and chemist Svante August Arrhenius (1859–1927) completed an extensive set of calculations, showing that the changes in CO2 function as a ‘throttle’ on other greenhouse gases like water vapor. He also calculated that there would be an Arctic temperature increase of approximately 8° Celsius (46.4° Fahrenheit) from atmospheric carbon levels two to three times their known value at the time. Arrhenius later predicted that ‘the slight percentage of carbonic acid in the atmosphere may, by the advances of industry, be changed to a noticeable degree in the course of a few centuries’ (1908, p. 54). Just before the outbreak of World War II, a British steam engineer, Guy Callendar, presented a breakthrough paper to the Royal Meteorological Society entitled ‘The Artificial Production of Carbon Dioxide and Its Influence on Temperature’. Callendar pointed out that the atmospheric concentration of CO2 had significantly increased between 1900 and

15031-1864-Fullbook.indb 8

6/24/2018 7:38:22 AM

What Is Science?

FIGURE 1.1

9

Notable early scientists studying carbon dioxide (CO2) and climate

1935, which he determined with temperature measurements from 200 meteorological stations. Based on further calculations, he concluded that: As man is now changing the composition of the atmosphere at a rate which must be very exceptional on the geological time scale, it is natural to seek for the probable effects of such a change. From the best laboratory observations it appears that the principal result of increasing atmospheric carbon dioxide . . . would be a gradual increase in the mean temperature of the colder regions of the earth. (1939, p. 38) Unfortunately, Callendar’s prescient recognition of the role of human activity on atmospheric temperatures had to wait several decades to become widely accepted. In May 1958, the American scientist Charles David Keeling (1928–2005) installed four infrared gas analyzers at the Mauna Loa Observatory in Hawaii; these recorded an ever-increasing atmospheric CO2 concentration. These measurements have been collected continuously since 1958, resulting in the so-called Keeling Curve (see Figure 1.2), which is a graph plotting ongoing change in concentration of CO2 in the Earth’s atmosphere. Keeling’s measurements provided evidence of rapidly increasing CO2 levels in the atmosphere, and a 1979 report by the National Research Council—an American nonprofit, non-governmental organization devoted to scientific research—connected this evidence to a rise in average temperature. This report predicted that doubling CO2 concentration in the atmosphere from 300 to 600 parts per million would result in an average warming of 2° Celsius to 3.5° Celsius. (Parts per million, or ppm, is a unit for measuring small amounts of a substance in some mixture.) We haven’t yet reached the ominous level of 600 ppm, but we’re now long past safe levels of CO2 in the atmosphere, which had been estimated to be about 350 ppm. In the past several decades, climate scientists have been tracking CO2 levels in the atmosphere with ever more precise and sophisticated techniques. For example, ice cores taken from various locations in Antarctica have enabled scientists to extrapolate historic CO2 levels for comparison to recent levels (see Figure 1.3). A group of 78 scientists gathered data from ‘climate proxies’ besides ice cores—including tree rings, pollen, corals, glacier ice, lake and marine sediments, and historical documents about the climate—to

15031-1864-Fullbook.indb 9

6/24/2018 7:38:22 AM

May 27th 2018 CO2 recording: 411.39 410 400

CO2 concentration (ppm)

390 380 370 360 350

Annual Cycle

340 330 320 310

= 44 1960

FIGURE 1.2

1965

1970

Jan Apr Jul Oct Jan 1975

1980

1985

1990

1995

2000

Years recorded at Mauna Loa Observatory

2005

2010

2015

Keeling curve: ongoing increase in atmospheric concentrations of CO2

May 27th 2018 CO2 recording: 411.39

CO2 concentration (ppm)

400

350

Ronne Ice Shelf Siple S Station

300

SOUTH POLE Ross Ice Shelf

Dome A

Taylor D Dome

V Vostok Dome C

Law Dome

250

200

FIGURE 1.3

15031-1864-Fullbook.indb 10

1750

1800

1850

1900

Antarctican ice core measurements (1700–1958)

1950

2000

Ice core data from Antarctica

6/24/2018 7:38:24 AM

carbon dioxide level (parts per million)

What Is Science?

500 480 460 440 420 400 380 360 340 320 300 280 260 240 220 200 180 160

11

current level For millennia, atmospheric carbon dioxide had never been above this line

400,000

350,000

300,000

250,000

200,000

150,000

100,000

50,000

1950 level

0

years before today (0 = 1950)

FIGURE 1.4

Unprecedented increases in atmospheric CO2 in the past century

demonstrate that there are multiple lines of evidence for increasing levels of CO2 in the atmosphere (see Figure 1.4) and that the average temperature for the end of the 20th century is higher than in the previous two millennia (Ahmed et al., 2013). The unprecedented pace of current climate change and its connection to human activities like burning fossil fuels, cattle ranching, and clear-cutting rainforests are clear. In the previous 800,000 years, the concentration of CO2 in the atmosphere had never been over 285 ppm. Since the Industrial Revolution—only 0.025% of the last 800,000 years—the concentration has spiked to 412 ppm. The milestone of 400 ppm was reached in March 2015 (see www.co2.earth). CO2 concentration measured 409.39 on May 30th 2017, the day before Donald Trump announced that he would withdraw the US from the Paris Agreement. One year later, in May 2018, the concentration has risen more to 412 ppm. The last time CO2 levels were this high, humans did not yet exist. The average temperature of our planet has gone up by about 0.85° Celsius (1.5° Fahrenheit) since 1880, and the last three decades are estimated to have been the hottest in the last 1,400 years.

The Role of Science We have already articulated the reasoning leading to the conclusion that human activities are radically altering Earth’s climate. But how do scientists really know? The short answer is that scientists know this in the same way that they have come to know anything else. Scientists know that the structure of DNA is a double helix, that Neptune takes more than 164.79 years to orbit the Sun, that HIV is a retrovirus that attacks T-cells, and so on. These and other facts all have good science behind them. None were obvious to begin with; scientists had to reason their way to the correct answer. Understanding how scientists acquire new knowledge, the basis for science’s authority as a source of knowledge, and the limits of that authority gives us greater reason to trust scientific knowledge. This is so whether the knowledge is about DNA, Neptune’s orbit, HIV, or climate change. First, it’s important to consider the nature of expertise. You should trust climate scientists to do climate science in the same way you trust your mechanic with your car or your

15031-1864-Fullbook.indb 11

6/24/2018 7:38:24 AM

12

What Is Science?

favorite restaurant with your dinner. The types of expertise required for these positions takes years, even decades, to develop, and the expertise doesn’t neatly transfer from one domain to another. Don’t trust the average climate scientist to fix your car or make you a delicious meal. Similarly, politicians and policy-makers know things about political and legislative matters, but they should not be looked to as authorities on climate change. This includes politicians who deny climate change, as well as those who grant its existence. Reputable scientists and scientific societies, including the national science academies of the world and the Intergovernmental Panel on Climate Change (IPCC), agree that human-caused, or anthropogenic, climate change is occurring. This includes virtually all climatologists. In 2004, for instance, the historian of science Naomi Oreskes analyzed 928 abstracts on climate change published in peer-reviewed scientific journals from 1993 to 2003; none expressed disagreement with the consensus position that anthropogenic climate change is occurring (Oreskes, 2004). In 2010, a group of researchers studied the views of the top 200 climate scientists (defined as the scientists with the most extensive publication records) and confirmed that more than 97% actively affirm the existence of anthropogenic climate change as described by the IPCC (Anderegg et al., 2010). So there is striking agreement among climate scientists about the existence of anthropogenic climate change. Climatologists’ agreement on climate change is grounded in a rich body of independent sources of evidence that support the same conclusion: human activities are causing Earth’s atmosphere to heat up. Well-established theories in physics explain how heat radiation works. Physical chemistry shows how CO2 in the atmosphere traps heat, contributing to greenhouse effects. As we pointed out, at least since the 1890s, scientists have known about the relationship between CO2 buildup and average global temperature. Satellites and other technology have enabled scientists to collect many different types of information about relevant changes on our planet—including variations of sea level and of oceans’ temperatures, and the decreasing mass of polar ice sheets. Since the 1950s, scientific models and computer simulations have been helping scientists to make testable predictions about what would happen to the global climate in response to different changes in human activities. Evidence has confirmed these predictions. And, yet, despite decisive scientific evidence, public awareness and concern for climate change lag behind the research (Lee et al., 2015). As of 2016, four out of every 10 adults worldwide hadn’t even heard of climate change. Whether or not people are sensitive to the risks of climate change mainly depends on understanding its human causes and on one’s level of education. In some countries, like the US, however, being better educated doesn’t guarantee that one is more likely to believe that climate change is really happening and is caused by human activities. Instead, political views are a better predictor of Americans’ belief in and concern about the reality of climate change. People who don’t know much about some topic also tend to experience an illusion of understanding, where a lack of genuine understanding of some topic is linked to a lack of appreciation for the depth of one’s ignorance about that topic. Applied to climate change, this means that people who have no advanced education or training in science, or who otherwise don’t understand how the climate works, tend to have unwarranted confidence in their ability to assess scientific findings or make pronouncements about climate change.

15031-1864-Fullbook.indb 12

6/24/2018 7:38:25 AM

What Is Science?

13

The illusion of understanding has become easier to sustain in today’s society. In part, this is because finding information through internet searches (so-called Google knowing) has diminished genuine understanding. We also have limited opportunities for productive public discourse and disagreement; our conversations online and in person tend to happen with people who have beliefs similar to our own. Improving public climate literacy is thus important for informed public engagement with global warming. And, more generally, understanding the processes that give rise to trustworthy scientific knowledge is vitally important to deciding what to believe, whom to believe about what, and how to learn more.

What Science Is Good For Let’s back up. Why is science so important? The most obvious answer is science’s role in satisfying our practical goals. Many fun and useful—even life-changing—innovations have come about through computer science. The biological and pharmaceutical sciences have vastly improved medical care and our ideas about healthy living. Skyscrapers and airplanes wouldn’t be possible without a lot of physics. The list goes on and on. But practical benefits aren’t the only important outcome of science. More generally, science is the best approach we humans have developed for answering questions about the natural world. At its heart, science aims at the production of knowledge. Philosophers have traditionally thought of knowledge as requiring at least three elements: belief, justification, and truth. First, belief is necessary for knowledge; you can’t know something without believing it is true. But to know something, it’s not enough to fervently believe it. Knowledge— including scientific knowledge—is an achievement; certain conditions must be met for a belief to count as knowledge. Knowledge requires justification. To know something, one must have good reasons to believe it is so. Finally, sufficiently justified belief isn’t enough. One could be justified in believing something that still turns out to be false. In 2007, most American football fans had the justified belief, in some cases a fervently held belief, that the New England Patriots would cap off their perfect season by winning the Super Bowl. But those football fans didn’t know this, because they were wrong. Justified beliefs must also be true to count as knowledge. Consider the knowledge that the Earth’s atmosphere is warming up. On the traditional conception of knowledge as justified true belief, you have this knowledge just if (i) it’s true that the Earth’s atmosphere is warming up; (ii) you believe that it is warming up; and (iii) you are sufficiently justified in believing that it is warming up. Science is important because it is our best route to knowledge about the world around us. And scientific knowledge also often has practical benefits and can influence how we act. If you genuinely know that Earth is warming up and understand why that’s the case, then you may change the ways you behave—for instance, by petitioning your government, your society, and your circle of family and friends to develop more energy-efficient practices. Some scientific knowledge is so-called pure knowledge, or knowledge for its own sake. For example, scientists have investigated the conditions under which rainbows form, not because they think that learning about rainbows will generate technological inventions or cure diseases, but simply because they are interested in optics. Investigating rainbows

15031-1864-Fullbook.indb 13

6/24/2018 7:38:25 AM

14

What Is Science?

yields knowledge about the nature of light and color. Knowledge of these things may have applications, but that is not why scientists study them. Scientific research that aims at knowledge for its own sake is sometimes called basic research. Not all knowledge is equally valuable. For example, it wouldn’t be valuable to know how many rainbows have ever occurred on Earth; such truths are pointless truths. When science aims for pure knowledge, the aim is explanatory knowledge, or generating knowledge of how things work and why things are the way they are. We know so much about our world, and we understand so many things because of scientific discoveries and theories. A different type of scientific research is applied research. Scientific research is applied when it exploits knowledge in order to develop some product, like software, pharmaceutical drugs, or new materials. Often, a central motivation for applied research is to generate products for profit. For example, the scientists who discovered the neurotransmitter dopamine in the human brain in 1957, Kathleen Montagu and Arvid Carlsson, were doing basic research; by contrast, scientists who are employed by pharmaceutical companies to improve upon existing dopamine-related treatments for Parkinson’s disease are doing applied research. As this suggests, basic and applied scientific research can operate synergistically. Scientists aiming at the production of knowledge for its own sake often rely on the new materials and techniques created by scientists doing applied research, while scientists doing applied research often exploit pure scientific knowledge in order to develop new products.

Science’s Limitations So science is our best route to knowledge about the world around us and to developing innovations based on that knowledge. To appreciate science’s significance, it’s also important to recognize what it doesn’t do. Scientists try to gain knowledge about certain kinds of phenomena, or appearances of things occurring in the world, and they do so in a certain kind of way. The list of the phenomena investigated in science is long; in principle, it includes everything in our universe. But there are some important limitations to the scope of science. Science doesn’t replace or limit non-scientific intellectual pursuits, like literature or philosophy—or politics for that matter. Basing our scientific knowledge about climate change on fluctuating political agendas would be a mistake. But when it comes to addressing climate change with policy interventions, debating which steps are politically feasible and desirable is fair game for politicians. Scientific knowledge differs from theological doctrine and religious practice too. Unlike religious practitioners, scientists attempt to explain things without appeal to supernatural entities or influences, such as deities or miracles, or to literary allegories or culturally significant myths. Of course, one can be religious in any number of ways, and people can be religious and believers in scientific knowledge, or even scientists themselves. People disagree about the role religion should play in our society, but whatever role that might be, science is not designed for fulfilling the role of religion. Science’s limitations will become clearer in the next section, where we examine what distinguishes science from other human projects.

15031-1864-Fullbook.indb 14

6/24/2018 7:38:25 AM

What Is Science?

15

EXERCISES 1.1 How do scientists know that human activities are radically altering Earth’s climate? Why are these changes a serious concern? 1.2 Do all scientists, by virtue of being scientists, have the expertise to make pronouncements about global warming? Give reasons to support your answer. 1.3 Some people know much more than the average layperson about some topics; these people are experts on those topics. Think of at least three people you consider to be experts and their areas of expertise. Why exactly do you consider them to be experts? Is your answer the same or different for the three experts you listed? Why? 1.4 Laypersons are not always in a position to recognize who is a genuine expert on a certain topic. Many people don’t know enough about the topic to assess expertise, and genuine experts sometimes disagree with one another about the topic of their shared expertise. Think again of the people you listed as experts in Exercise 1.3. How can laypeople identify whether they should trust each of these experts? Considering your answers, describe the kind of evidence, in general, that a layperson can use to identify genuine expertise. 1.5 Based on the text or your other knowledge, list a few reasons why public concern about anthropogenic climate change lags behind scientific research. Given that lag, how should climate scientists affect environmental policy in the government? Should they merely collect evidence and produce knowledge, leaving the construction of policy to policy-makers? Do they have any obligations to more actively engage with the public? 1.6 Define knowledge, and say how science relates to knowledge. What are the limitations to the kinds of knowledge science can produce? 1.7 What’s distinctive about science, in comparison to activities like literature, music, and art, as a source of knowledge about the world? Do you think there are any important differences between scientific and artistic ways of gaining knowledge? Support your answers with justification. 1.8 Define basic research, and describe why you think scientists may choose to pursue it. Is basic research important? If so, how? Should it be funded by the government? Why or why not? How do you think it should be decided what kind of scientific research to fund?

1.2

DEFINING SCIENCE

After reading this section, you should be able to do the following: • • • •

Define pseudoscience and give examples Describe how you might define science by its history, its subject matter, and its methods List the most important features of science and characterize each feature Analyze whether a given claim or topic of research counts as scientific

15031-1864-Fullbook.indb 15

6/24/2018 7:38:25 AM

16

What Is Science?

The Tricky Work of Defining Science In the last section, we described some of the clear and abundant scientific evidence for anthropogenic climate change. But why should we trust climatologists here and not, say, astrologists? After all, astrologists make predictions about human affairs and events in their horoscopes. A tempting answer to this question is that climatology is a science, while astrology is not—that’s why you should trust climatologists and not astrologists. But then this raises a new question: What is science? If we are going to exclude astrology, we have to be able to say what makes it unscientific. This is a question about the nature of science. It can be divided into two parts. First, there’s a question of what’s distinctive and important about science when it comes to generating knowledge. Second, there’s a question of where we ought to draw the lines between science and non-science. These questions are, respectively, about the definition of science and how to draw the boundaries of science—that is, how to demarcate or set the boundaries of science in contrast to other kinds of projects. We have already suggested that science is unrivaled in its ability to generate explanatory knowledge about our world. Science has earned a kind of authority and legitimacy from centuries of successes and improvements. The demarcation of science from nonscience is especially important because some non-scientific projects are designed to look enough like science to deceive people into thinking that these projects too can lay claim to the authority and legitimacy of science. These deceptive attempts to appear scientific are sometimes called pseudoscience, which literally means false, or fake, science. (Other non-scientific projects don’t pretend to be scientific, and these are perfectly fine.) A classic example of pseudoscience is astrology (which shouldn’t be confused with astronomy, the scientific field addressing celestial objects and space). Astrology is commonly associated with horoscopes, which use zodiac signs to make predictions about future events, romantic relationships, health, job prospects, and the like. Tests of astrologers’ theories, however, have demonstrated their utter failure to predict or explain. Perhaps for this reason, advocates of astrological theories, as a community, rarely engage in systematic attempts to test those theories, and the theories have changed little since astrology peaked in popularity centuries ago. These theories fail to incorporate accumulated scientific knowledge of physical mechanisms or add any such knowledge of their own. And yet, even though astrology is bunk, it is often promoted as a legitimate source of prediction and explanation. Massive numbers of astrologers, clairvoyants, psychics, and other charlatans earn billions of dollars every year for their consultations. The seductive allure of astrology illustrates why a definition of science is needed. Science has an ever-increasing impact on society, and it’s dangerous for pseudoscience to be taken seriously in this way or for good science to be dismissed as no better than pseudoscience. How we define science determines who has the authority to speak for the scientific community, who has the legal standing to offer expert testimony, what kind of ideas should inspire the health care we receive, who gets to apply for public funding of science, what ideas about the world we take seriously, what should be taught in the classroom, and many other important matters. Pseudoscientific ideas, by masquerading as science, have done great damage to our legitimate knowledge of the world and the laws and policies informed by science. Besides astrology, damaging pseudosciences include creationism and intelligent design, which are

15031-1864-Fullbook.indb 16

6/24/2018 7:38:25 AM

What Is Science?

17

religiously inspired beliefs intended to compete as scientific alternatives to evolutionary theory; conversion therapy, where psychological or spiritual intervention is used to try to change a person’s sexual orientation; and homeopathy, which is a system of alternative medicine based on the idea that substances that cause the symptoms of a disease can cure similar symptoms in sick people when repeatedly administered in diluted water. Some versions of climate change denial also have features of pseudoscience. So, a good definition of science must exclude pseudoscience. This is harder than it might at first seem. Take astrology, for example. Astrology’s mystical origins might be a reason to deem it unscientific. However, chemistry had its origins in alchemy, mystical ideas developed in the Middle Ages and Renaissance that aimed at discovering methods for converting baser metals into gold and finding an elixir for life, among other things. Some people believe astrology for irrational or illogical reasons. But beliefs in certain principles of quantum physics can appear strange and irrational. You might point out that, for proper sciences, it’s possible to get a degree in that discipline in academic institutions. But there are also organized institutes issuing degrees in graphology—the discredited, pseudoscientific study of handwriting to discern a person’s character. Perhaps astrology is not scientific because it’s too narrowly focused on personality and the position of celestial bodies, without taking into account other accepted scientific theories. Then again, economics is focused specifically on the production, exchange, distribution, and consumption of commodities and hardly attends to any scientific theories from other fields. As you see, it’s not simple to find the essential difference between science and nonscience, especially when pseudoscience is added into the mix. Sure, no one mistakes English literature for science. But can you state how they’re different exactly? Even this may be more difficult than you might think. Scientific results and discoveries are communicated in books, articles, and conferences, as is research in English literature. There are scientific and clinical literature reviews; why aren’t these just another form of literature? Studies of English literature employ textual, linguistic, and historical evidence. How is this different from the use of evidence in science? Even when the line between science and non-science is clear, articulating the relevant differences is challenging. There is also tremendous variety in science. One common distinction is between the physical sciences, like astronomy and inorganic chemistry; the life sciences, like botany and neurobiology; and the social sciences, like anthropology, social psychology, and behavioral economics. Different fields often study different kinds of things, make different assumptions, use different methods, and have different aims. This variety makes it even more difficult to formulate a general definition of science. Any feature used to define science may accidentally exclude not just pseudoscience and non-science but some scientific projects as well. For example, some projects in theoretical physics seem to be very distantly related to empirical evidence; so it seems we don’t want to require all science to be tested directly with empirical evidence. These difficulties may make it impossible to give a neat and exceptionless definition of science. But there are many candidates for distinctive ingredients of science. Our next task is to survey some of those ingredients and assess how well they do in demarcating science from other activities. Ultimately, we’ll conclude that none of these ingredients is decisive by itself, but together, they are useful as a guide to help us puzzle through what counts as science.

15031-1864-Fullbook.indb 17

6/24/2018 7:38:25 AM

18

What Is Science?

Defining Science by Its History We have already noted that science aims at the production of knowledge. This aim traces back to the origins of the very word science. This word derives from the Latin words scientia and scīre, which pertain to knowledge. So science, from its origins, has been about the pursuit of knowledge. But pursuing knowledge isn’t the exclusive province of science. Looking more closely at the origins and history of science might help us diagnose how science’s pursuit of knowledge is distinctive. Most historians of science agree that the cultural, social, and technological changes that unfolded in Europe between roughly 1550 and 1700 are very important to the origination of modern science. This period, often referred to as the Scientific Revolution, began with the work of the Polish astronomer Nicolaus Copernicus (1473–1543), who put forward a heliocentric theory of the cosmos, and ended with the work of the English physicist Isaac Newton (1642–1727), who proposed universal laws of physics and a mechanical universe in his famous treatise Philosophiæ Naturalis Principia Mathematica. The Scientific Revolution brought about fundamental transformations in our knowledge of the natural world and in how claims to knowledge ought to be justified. Many of the methods and ideas developed during that period remain at the heart of science. But let’s start our consideration of science’s history even further back. Way before the Scientific Revolution, a variety of innovations across diverse civilizations—including ancient Egypt, Iran, India, China, Greece, and the pre-Columbian Americas—provided fertile grounds for proto-scientific activity. For example, a variety of civilizations contributed to the refinement of systems of weights and measures, which was important for a number of later scientific developments. Early catalogues of descriptions of constellations provided a record of observations against which later astronomical predictions and discoveries could be checked. Arguably the most important period in the development of science prior to the Scientific Revolution was the 500 years from the 8th through 13th centuries, known as the Islamic or Persian Golden Age, involving the work of many scholars from Central Asia to the Iberian Peninsula. Here is a brief account of some of the scientifically important developments from that period. The Hindu-Arabic numeral system, which greatly advanced the symbolic representation of numbers and calculation, was invented between the 1st and 4th centuries in

FIGURE 1.5

15031-1864-Fullbook.indb 18

Scientists in the Persian Golden Age

6/24/2018 7:38:25 AM

What Is Science?

19

India. The Persian polymath Muḥammad ibn Mūsā al-Khwārizmī (c. 780–c. 850) further developed this system and brought it to Arabic mathematics, and his work later introduced this numeral system to Medieval Europe. Al-Khwārizmī also made significant contributions to algebra, geometry, and astronomy. Around the same time, the Persian Abū Bakr Muhammad ibn Zakariyyā al-Rāzī (854–925) was responsible for many innovations in medicine, including advocating for experimental methods and developing classifications of contagious diseases. And the Arab scientist Ibn al-Haytham (c. 965–c. 1040) did revolutionary work in optics and vision, including the insight that vision occurs by eyes detecting light deflected by objects. Other Persian and Arabic polymaths, including especially Ibn Sina (980–1037), known also by the Latinized name Avicenna, as well as ibn Aḥmad Al-Bīrūnī (973–1048) and Ibn Rushd (1126–1198), or Averroes, preserved and developed theories about the natural world from the Greek philosopher Aristotle (384–322 BCE). This was in turn the basis of ideas about the natural world in 15th-century Europe, with ideas added from Christian, Jewish, and Islamic cosmogony and theology. Based on Aristotle’s views, the universe was thought to be geocentric—the Earth at the center—and with two regions: terrestrial for Earth and celestial for the planets and stars. The celestial region was thought to contain transparent concentric spheres that rotate around the Earth. The Greco-Egyptian astronomer Ptolemy (c.100–168) had supplemented this with an account of the apparent motions of the stars and planetary paths, including detailed models and tables that could be used to calculate the positions of the stars and planets. Geocentrism in 15th-century Europe blended observations and calculations with religious ideas and ideas about humanity’s place in the universe. A longstanding problem with the geocentric view of the cosmos was the appearance of so-called retrograde motion. The planets sometimes seem, in observations made over a series of nights, to stop in their orbit, reverse course back across the sky, then stop again, and reverse course yet again to continue on their original way. An example of this is shown in Figure 1.6. Following Ptolemy, geocentrists explained retrograde motion by positing epicycles, mini-orbits of planets that themselves orbit the larger orbits. This successfully accounted for retrograde motion, but it wasn’t as intuitive and seemingly obvious as the other elements of geocentrism. In 1543, in what is considered to be the beginning of the Scientific Revolution, Copernicus presented a radical alternative conception of the cosmos as heliocentric, or

FIGURE 1.6

15031-1864-Fullbook.indb 19

Appearance of retrograde motion

6/24/2018 7:38:25 AM

20

What Is Science?

centered around our sun. This provided an alternative explanation for retrograde motion. According to heliocentrism, retrograde motion of planets was due to Earth changing position relative to other planets as they all revolved around the sun. Copernicus’s proposed heliocentric conception of the cosmos was met with skepticism. It violated widely accepted beliefs and called for a fundamentally new physics of the heavens. Besides, the mathematics of Copernicus’s system was just as complex as Ptolemy’s epicycle solution to retrograde motion, and it did not make predictions of planetary motion any more accurate. So, few astronomers were convinced by Copernicus’s system. The situation changed with the research of Michael Möstlin (1550–1631), Johannes Kepler (1571–1630), and Galileo Galilei (1564–1642), each of whom championed and improved the Copernican heliocentric system. Möstlin and Kepler were German mathematicians and astronomers with interest also in astrology. Kepler devised a set of laws that described the motions of planets around the Sun. Based on calculations of the orbits of Mars, he inferred that planets do not have the circular, uniform orbits proposed by Copernicus. Their orbits are ellipses. This simplified the Copernican theory and significantly improved the predictive accuracy of heliocentric models. Born in Italy, Galileo is one of the most important figures of the Scientific Revolution. He was instrumental in establishing Copernicus’s heliocentric system and, more generally, in replacing Aristotelian mechanics of the separate terrestrial and celestial realms with a new single physics. Galileo invented the telescope, which he used to observe the phases of Venus and to discover that Jupiter had moons orbiting it. This was a significant discovery for heliocentrism: if our Earth were the center of the universe around which all things orbited, then those moons should be orbiting Earth instead. Recalling the main purpose of this discussion, might this early period of science give us a way to approach defining it? In the Scientific Revolution, the rapid development of new ideas, methods, and tools resulted in the swift accumulation of knowledge. A similar process played out in the later development of the fields of chemistry, biology, and psychology. Perhaps, then, science can be defined simply as those pursuits that have descended from the Scientific Revolution. Something ‘clicked’ that facilitated the development of knowledge about our world, and today’s scientists are still engaged in that process of accumulating knowledge. One problem with this suggestion is that many of the pursuits that furthered scientific knowledge also included religious, theological, and philosophical ideas that we would not consider scientific nowadays. In the Persian Golden Age and the Scientific Revolution, philosophy, theology, and science were not divided as they are now, and often, the same ideas had significance for religious belief and for beliefs about the natural world. Another problem with defining science straightforwardly by its history is that it’s unclear whether and how some of today’s scientific disciplines, like economics and neurolinguistics, relate to the Scientific Revolution. Perhaps we can instead look to the methods developed as science was established as the defining features of science. Methods established in the Persian Golden Age and the Scientific Revolution that may be characteristic of science include looking to sense experience and performing experiments to decide what’s true, the systematic use of mathematics to study phenomena, and the institutionalization of investigation in formal organizations. These will all find their way into our eventual attempt to identify the main ingredients of science. But scientific methods have also significantly developed and changed since the Scientific Revolution. For example, statistical and computational methods emerged in the

15031-1864-Fullbook.indb 20

6/24/2018 7:38:25 AM

What Is Science?

21

late 1800s. These methods are staples of present-day science, and they are essential for understanding complex phenomena like Earth’s climate. The institutional and social structures governing scientific practice have also undergone massive changes in the last centuries. One profound transformation in the social organization of scientific activity was the professionalization of science in Europe and North America beginning in the mid-19th century. So, although science’s methods are key to defining it, we’ll have to look beyond the Persian Golden Age and the Scientific Revolution to fully characterize those methods. Here’s one more idea for defining science inspired by this quick look into science’s history: perhaps we can define science by focusing on what it is that scientists investigate. The Scientific Revolution was a decisive step toward the separation of scientific from non-scientific questions. Recall that geocentrism had implications not just for the natural world but also for religious belief and views of humanity’s role in the universe. Heliocentrism was more explicitly a view just about the universe around us. So maybe the definition of science relates to its subject matter—the world we see around us—as distinct from philosophical, religious, and theological investigations of, for example, meaning and purpose. We’ll explore this idea next.

Defining Science by Its Subject Matter In science, the world itself and all of its parts and properties are investigated in order to better understand and control them. This seems different from other human projects. So, we might look to the subject matter of science—planets, animals, disease, and so forth—to define it. An immediate problem with this approach is the sheer variety of topics among the various fields of science. Subjects of these investigations range from subatomic particles like quarks, to DNA, emotions, societies, and many other things besides. It can seem as if there is a science of absolutely everything! Professional sports are a good example. Some scientists devote their research to learning how to improve athletic performance. Other topics of scientific research are even more abstract or hypothetical. An example is string theory—a highly theoretical subject in physics that posits one-dimensional ‘strings’ as the basic building block of our universe. Even if we could give a list of all the topics of science, this wouldn’t be a good way to define science. We want to be able to say something about what all those subjects have in common and why pseudosciences like astrology don’t belong on the list. So we should look for what all these various topics of science have in common. Here’s an idea. Recall from the end of Section 1.1 that, in contrast to religious belief and literature, science attempts to explain things without appeal to the supernatural, to allegories, or to myths. We can thus describe the aim of science as providing natural explanations of natural phenomena. Natural phenomena are objects, events, regularities, or processes that are sufficiently uniform to make them susceptible to systematic study. Disease epidemics, lunar eclipses, and droughts are all natural phenomena. Inflation, poverty, and unemployment are all phenomena in human societies, but they also count as natural phenomena under this definition. We’ve already defined phenomenon as that which appears, is seen or otherwise experienced. Phenomena include all observable occurrences, where observable means detectable with the use of our senses, including the use of our senses aided by technological devices like telescopes that extend their reach. The requirement that natural phenomena be uniform, or occur according to some pattern, makes it so that different scientists in

15031-1864-Fullbook.indb 21

6/24/2018 7:38:25 AM

22

What Is Science?

different times and places can observe the same natural phenomena. Observability across people, times, and places is essential to scientific study. Natural explanations invoke observable features of the world to account for natural phenomena. If there’s an epidemic in Florida or increased employment in Colombia, you might wonder how that came to be. A natural explanation of the epidemic might specify a contagion and a mechanism of transmission, or other such factors. A natural explanation for the increase in employment might specify private investments in industry and legislative choices made by labor unions and political parties. These are natural explanations of natural phenomena.

Box 1.1 Naturalism and the Meaning of ‘Natural’ Two forms of naturalism often come up in discussions about the character of science. Methodological naturalism is the idea that scientific theories shouldn’t postulate supernatural or other spooky kinds of entities. Ontological naturalism is the idea that no such entities exist (ontology is the branch of philosophy concerned with the study of what exists). One can believe methodological naturalism is true without subscribing to ontological naturalism. The idea that something is or isn’t natural also comes up in public debates about applications of scientific knowledge, for example in debates about genetically modified organisms (GMOs). In these debates, calling something natural loosely means that it was produced by nature without human intervention. Most people believe that natural things are healthier, morally better, or kinder to the environment than unnatural things. This wasn’t always so. Until the end of the 19th century, natural product meant perishable or toxic (Stanziani, 2008). Our current understanding of natural as an indicator of healthfulness and safety is influenced by social concerns about technological innovation. There are certainly cases in which lack of human intervention is better, but the general association many people make between natural and good is not based on scientific evidence (Rozin et al., 2012). In any case, this sense of natural is wholly different from the idea of naturalism we discuss here.

The meaning of natural in this context can be better understood by contrasting it with the term supernatural. Supernatural entities and occurrences, if they exist, are not governed by natural laws and may not be observable. Any supernatural entities or occurrences that might exist are not natural phenomena, and so they are not relevant to science. Were there to be any supernatural entities or occurrences, like miracles or ghosts, science by definition won’t be able to deliver knowledge about them. Nor does science appeal to supernatural entities or occurrences in order to explain other things. For instance, ‘A miracle caused her to recover from disease’ couldn’t possibly be a scientific explanation, even though recovering from a disease is a natural phenomenon. Science is always naturalistic in what it investigates and how it explains. Notice that this does not mean science has demonstrated that there are no supernatural entities or occurrences. Science simply can’t tell us anything about miracles, ghosts, or other ssupernatural subjects, not even whether or not they exist. These are simply outside the

15031-1864-Fullbook.indb 22

6/24/2018 7:38:26 AM

What Is Science?

23

realm of what science can investigate because scientific investigation is limited to naturalistic inquiry. This suggests that science need not interfere with most forms of religious belief. The exception is when religious belief is used to provide competing explanations for natural phenomena. However, pursuit of natural explanations for natural phenomena doesn’t by itself adequately demarcate science from non-science. Some naturalistic approaches to natural phenomena aren’t things we consider to be scientific. Take football coaching, for example. Its subject matter ranges from physical training and development of individual technical skills to psychological motivation and knowledge of tactics and strategy, and coaching employs what we know of the world to engage with this subject matter. But football coaching is not a science. Naturalism might be an ingredient of science, but it isn’t definitive of science all by itself.

Defining Science by Its Methods In our attempts to define science by its history and subject matter, we’ve touched upon one distinctive ingredient of science’s methods: science involves empirical investigation using one’s senses. These methods facilitated the breakthroughs of the Persian Golden Age and the Scientific Revolution and are linked to the importance of naturalism for science. What scientists see, hear, smell, touch, and so forth can all be used as empirical evidence for or against some attempted natural explanation. But the use of empirical investigation is, by itself, not enough to define science. We all use our senses in everyday life to learn about the world around us, beginning when we are infants. You know when it is time to wake up because you hear your alarm go off. You know it’s a clear day because you can see and feel the sun shining through the window. This approach to gaining knowledge has been fine-tuned and perfected in scientific reasoning, but empirical investigation using one’s senses is part of the human condition, not distinctive to science. So, let’s look for other methods that, when coupled with empirical investigation, might be used to define science. A hint of where to start comes from the constant revision of scientific ideas. Even scientific theories that are widely held are continually subject to investigation. Occasionally, widely believed theories are rejected as a result of this continuing investigation. Much more often, continual, critical, and self-corrective investigation results in theories being fine-tuned and expanded in the light of new evidence. This continual investigation results from science’s commitment to evidentialism, the idea that a belief’s justification is determined by how well the belief is supported by evidence. Coupled with science’s commitment to empirical investigation, evidentialism suggests that scientific beliefs should be supported by empirical evidence. Any scientific claim about the world comes with the burden of showing why that claim should be believed. We should note, though, that much of the empirical evidence supporting your beliefs doesn’t come directly from your own sensory experience. Empirical evidence sometimes comes from other people reporting on their experiences. Only the physicists carrying out an experiment on subatomic particles will have sense experiences that provide evidence in support of the belief that certain subatomic particles—say, quarks—exist. Other people have only indirect access to that empirical evidence in the form of the physicists’ reports about their sense experiences. Furthermore, sense experience sometimes confirms

15031-1864-Fullbook.indb 23

6/24/2018 7:38:26 AM

24

What Is Science?

a scientific claim only indirectly. Even the physicists who study quarks haven’t directly observed quarks. Instead, they have made predictions based on the idea that quarks exist, and those predictions have been supported by empirical evidence collected in carefully controlled experimental conditions.

Box 1.2 Empiricism and Rationalism in Philosophy Historically, there has been philosophical disagreement about the extent to which knowledge about the world is dependent on sense experience. Rationalists like René Descartes (1596–1650) and Gottfried Wilhelm Leibniz (1646–1716) believed that some genuine knowledge about the world is independent of sense experience and can be gained via pure reasoning. Mathematical knowledge is sometimes used as an example. In contrast, empiricists like John Locke (1636–1704) and David Hume (1711–1776) believed that experience is our only way to gain knowledge about the world. In more recent discussions in science and philosophy, the terms empiricism and rationalism have been used to refer to the generic views, respectively, that experience is fundamentally important to knowledge and justification and that human reasoning is the basis of knowledge and should be the basis of beliefs.

In addition to science’s commitments to empirical investigation and to evidentialism, we should also note that scientists leave open the possibility that their ideas are mistaken—even their most cherished or most certain beliefs. No scientific claim, no natural explanation of a natural phenomenon, is ever taken to be beyond all doubt. Karl Popper was a philosopher who studied science in the early 20th century. He was troubled by the problem of separating science from pseudoscience. It occurred to him that both scientific ideas and pseudoscientific ideas can be supported with evidence, but only scientific ideas are tested against evidence that might refute them (Popper, 1963). Based on this insight, he developed the philosophical theory of falsificationism, which states that scientific reasoning proceeds by attempting to disprove ideas rather than to prove them right—that is, by advancing ‘bold and risky conjectures’ and then trying to falsify or refute them. The idea of falsificationism has been very influential among scientists, but it remains controversial for a number of reasons. We’ll consider this debate more in Chapter 4. For now, we’ll sound a few quick cautionary notes about falsificationism as a view of science before focusing on what seems accurate about it. First, the relationship between empirical evidence and a scientific theory can be complicated, so it is sometimes hard to say when the evidence disproves an idea. Second, trying to prove central ideas false again and again would limit scientific progress. Sometimes scientists accept a theory or a finding and move on to developing it or exploring its consequences. It seems like a stretch to claim that scientists are really always aiming to prove their theories false! But two key elements of falsificationism do seem to accurately describe scientific reasoning. First, any scientific claim should be falsifiable. This means that one should be able to describe what kind of empirical evidence would, if found, show that the claim is

15031-1864-Fullbook.indb 24

6/24/2018 7:38:26 AM

What Is Science?

25

Box 1.3 Evidence, Evidentialism, and String Theory Evidence is information that plays the role of making a difference to what one is justified in believing. Evidentialism implies that one’s beliefs should be backed by evidence and that a belief’s justification is proportional to its evidential support. Scientists use empirical evidence to test their theories. But it can be very hard to obtain empirical evidence regarding some scientific theories. String theory, for example, is a theory in physics that is currently detached from empirical evidence. This theory says that the fundamental objects in the world are strings, which are very tiny, extended, onedimensional objects that cannot be empirically detected with present-day instruments. Despite the lack of empirical evidence, string theorists hold strong belief in this theory. Are they being unreasonable? Maybe not—especially if not all evidence is empirical evidence. And, in fact, string theorists justify their belief by appealing to non-empirical evidence. They emphasize the unifying and explanatory power of their theory. String theory would unify quantum mechanics with general relativity theory, providing an integrated explanation of phenomena at a microscopic scale and at a cosmic scale. String theorists also claim that there simply are no alternatives to string theory; it is the only candidate for a ‘final theory of everything’. While these and other non-empirical considerations are routinely used to evaluate scientific theories, most scientists agree that the degree to which one is justified to believe any theory, including string theory, ultimately depends on how well the theory is supported by empirical evidence.

wrong. This is required for scientific claims to be subject to empirical evidence. Notice that true claims can still be falsifiable—you can describe what kind of evidence would prove them wrong; it’s just that, because they are true, you will never actually find such evidence. Even for false claims, scientists may never be in the right circumstances to obtain falsifying evidence. But for any scientific claims—any bold and risky conjectures—it should at least be possible to say what falsifying evidence would look like, even if we aren’t in the position to get such evidence or even if the evidence does not exist (because the claim is true). Falsifiable claims enable science to be based on empirical evidence and to reject ideas when the evidence warrants doing so. Second, science requires honesty when evidence seems to go against a claim or theory. When scientists discover apparently falsifying evidence, they should begin to doubt the idea under investigation. In general, we humans try really hard to hold on to our existing beliefs, even when they are challenged. Scientists are no different. But the norms of good science obligate them to doubt any scientific claims—even ones they really like or thought were really promising—in the face of evidence that challenges those claims. It is part of the very idea of science that any claim or theory should be abandoned when the preponderance of evidence suggests it’s wrong. We might call this openness to falsification. To summarize, falsificationism implies that scientists are always earnestly trying to falsify their scientific theories, even and especially the ones they are the most certain about. This is up for debate. But it does seem like all scientific claims should be falsifiable, at least in principle, and that scientists should be open to the possibility, at least in principle, that any

15031-1864-Fullbook.indb 25

6/24/2018 7:38:26 AM

26

What Is Science?

conjecture attempted refutation YES

FIGURE 1.7

NO

successful? Karl Popper (1902–1994 )

(a) Schematic flowchart of simple falsificationism; (b) Karl Popper

claim or theory will need to be given up if sufficient evidence is found that goes against it. This is depicted in Figure 1.7 as a process of conjecture and attempted refutation. Let us briefly mention two other candidates for hallmark methods of science. Much of science makes use of mathematical techniques ranging from statistics to linear algebra and geometry. This is another distinctive characteristic of science. Quantitative analysis, or the use of mathematical techniques to measure or investigate phenomena, is found in most science. Not all science employs numbers, however. So to say that quantitative analysis is a hallmark of science is not to say that qualitative analysis, or the investigation of phenomena without using mathematical techniques, is not. For example, social scientists often rely on in-depth interviews, focus groups, and other probative techniques that don’t involve any mathematics. Finally, another method distinctive of science is found in its social and institutional structure. Science relies on communities of many people working together on related projects but also with different ideas, techniques, aims, and values. Scientists are in some ways always collaborating; teams of scientists work together on research projects, and all research is based on the findings of other scientists’ work. In other ways, scientists are always competing with one another, for example, to make a discovery first, to get their research projects funded, and to show that one’s idea is better supported by the evidence than an opposing idea. This social aspect of science is one of its most salient characteristics. This social and institutional structure also relates to science’s role in society, which we’ll explore in Chapter 8.

The Nature of Science We have discussed many distinctive features of science. These include aiming to generate knowledge, a basis in the Scientific Revolution, naturalism, empirical investigation, evidentialism, falsifiability and openness to falsification, the use of mathematics, and social structure. Some people have advocated one or another of these features as the best way to define science. Others have suggested these different features together comprise a list of hallmark ingredients of science. We think such a list is the most promising approach.

15031-1864-Fullbook.indb 26

6/24/2018 7:38:26 AM

What Is Science?

27

So we shall characterize science as an inclusive social project of developing natural explanations of natural phenomena. These explanations are evaluated in the light of empirical evidence, and should be subject to additional open criticism, testing, refinement, or even rejection. Science regularly, but not always, employs mathematics in both the formulation and evaluation of its explanations. Consider how this characterization of science relates to our earlier example of climate change. Because science is naturalistic, it is limited to natural explanations of natural phenomena in the way we described earlier. The warming of the Earth’s climate is a natural phenomenon, subject to empirical investigation. All scientific claims must be testable, and potentially falsifiable, with the use of empirical evidence. Claims such as that the concentration of atmospheric greenhouse gases has increased since the Industrial Revolution and that the global sea level rose about 17 centimeters (6.7 inches) in the last century are testable and falsifiable. These claims have been thoroughly tested and not falsified; they are accepted by the scientific community only because there is strong evidence in their favor. Scientists gather evidence with a wide variety of tools and often by using quantitative methods. As new evidence becomes available, scientific hypotheses are corroborated, revised, corrected, or rejected. All of this is true of climate change research, which involves a number of different fields of science and techniques, and our understanding of climate change and predictions of its effects are always being adjusted and fine-tuned. Scientific hypotheses are open to criticism and correction by a network of researchers embedded in the social and institutional structures that regulate scientific practice. Climate change research involves numerous scientists and institutions working in tandem and also regularly with different hypotheses in competition with one another. The basic idea of anthropogenic climate change has persisted because no challenges to the idea or to the research supporting it have been successful. Let’s return now to the demarcation question and, in particular, the problem of distinguishing science from pseudoscience. The characterization of science developed here can provide a kind of checklist for assessing whether some activity qualifies as scientific, as pictured in Table 1.1.

TABLE 1.1

Checklist for evaluating whether an idea or project qualifies as scientific

A scientific activity or project: ✓ Aims to provide natural explanations of natural phenomena (naturalism) ✓ Puts forward ideas that can be tested with empirical evidence (empirical investigation, falsifiability) ✓ Updates ideas based on available evidence (evidentialism) ✓ Would abandon any idea that was thoroughly refuted (openness to falsification) ✓ Employs mathematical tools appropriately when they are useful (mathematical techniques) ✓ Involves the broader scientific community (social and institutional structure)

15031-1864-Fullbook.indb 27

6/24/2018 7:38:26 AM

28

What Is Science?

Here’s an obvious contrast with science as we have defined it: researchers in literature do not collect measurements or other similar forms of evidence to test hypotheses about the literary value of a piece of written work. Disagreements about the literary value of, say, Dante’s most famous work, La Divina Commedia, cannot be settled by running experiments. By reading this work, you can learn about 13th- and 14th-century social life in Italy and about moral and theological views in Europe. But the literary work itself is a work of fiction, not intended to directly provide natural explanations of features of the natural world. Now consider astrology, a canonical example of pseudoscience introduced earlier. The primary claims made in astrology, such as horoscope predictions, are not designed to be falsifiable, and many are even designed to be unfalsifiable. They are vague in ways that allow many different interpretations; so for any interpretation that is wrong, another can be offered in its place. Further, the systems of horoscopes used by astrologists are inconsistent with well-understood basic theories of biology, physics, and psychology. This violates the expectation of the collaborative exchange of ideas among scientists. Astrology may be a harmless fad, with negative consequences largely confined to misspent leisure time and money. Other pseudoscientific projects are much more dangerous. Denials of anthropogenic climate change, for example, can be no less pseudoscientific than astrology, and they have contributed to a lack of political will to address climate change—a failure that may well have catastrophic consequences. Generally, the prominent climatechange deniers have no genuine interest in engaging with the science. Their project is not the earnest and disinterested search for truth, wherever it leads, but instead one of shielding their cultural or political values by introducing doubt, distraction, and bluster and lobbing personal attacks (Oreskes & Conway, 2010). Their denial of climate change is not designed to be falsifiable; no amount or kind of evidence will change their minds. Indeed, some climate-change deniers have even rejected the idea that science is a trustworthy source of knowledge in order to hold fast to their commitment against the idea of climate change. Climate-change deniers also violate the expectation of collaborative and competitive exchange among scientists, insofar as they neither produce hypotheses and evidence for other scientists to evaluate nor acknowledge the vast empirical evidence that supports the theory of anthropogenic climate change. Anti-vaccination advocacy is another example of pseudoscience with pernicious effects. One popular anti-vaccination argument is that vaccines increase the risk of autism. But, as we will discuss in Chapter 7, all vaccines have been subject to incredibly extensive testing for safety, and those tests have demonstrated conclusively that there is no causal connection between vaccination regimes and the incidence of disorders like autism. This conclusion of safety is scientific; it is based on evidence, is open to falsification, and would be rejected if sufficient evidence against it were found. But existing research is so extensive and compelling that the possibility of newfound disconfirming evidence is virtually nonexistent. Nonetheless, propaganda outlets and anti-vaccination groups peddle misinformation, trying to induce doubt with hearsay and uncritical stories of children who were diagnosed with autism after vaccination. (This does regularly happen, for the simple reason that vaccination regimes and many symptoms of autism tend to emerge in the same stage of early childhood.) Yet another example of pseudoscience with pernicious effects is creationism and intelligent design. In the United States, for more than 50 years, creationism has posed under

15031-1864-Fullbook.indb 28

6/24/2018 7:38:26 AM

What Is Science?

29

the guise of ‘creation science’ as an alternative to evolution as a theory of the origins of life. In 1987, the US Supreme Court ruled in Edwards v. Aguillard that ‘creation science’ was not actually scientific but a particular religious belief. The verdict showed that science has certain features (guided by natural law, explanatory, testable, falsifiable, tentative, and so on) and that creation science failed on all counts. In response, creationists coined the term intelligent design to describe the idea that an intelligent creator is responsible for life on Earth. This was an attempt to avoid the religious connection of creationism by avoiding explicit reference to gods. In 2005, the US Supreme Court ruled in Kitzmiller v. Dover Area School District that intelligent design also fails to qualify as scientific. The basic idea of both creationism and intelligent design is that life forms are so complex that they couldn’t possibly have come about without the help of an intelligent designer (such as the Judeo-Christian God). It’s difficult to see how this claim of the necessity of an intelligent designer could be tested (or falsified). But this idea has inspired some claims against evolutionary explanations of various features of organisms that are testable. These claims have been tested with evidence from biology and related fields of science, and they have been thoroughly refuted. Notice that this does not mean evolutionary theory has proven there is no god. Rather, what has been shown is that science can provide—and has provided—natural explanations of the complex life forms that exist. Despite this scientific success, intelligent design advocates persist in promoting the idea of shortcomings in evolutionary theory, without engaging with the existing evidence against their view or even indicating what evidence would weigh in favor of evolutionary theory and against intelligent design. Contrast these examples of pseudoscience—astrology, climate-change denial, anti-vaccination advocacy, creationism, and intelligent design—with climate science, our main example of science in this chapter. As we have seen, evidence supporting the theory of anthropogenic climate change and informing claims about the effects of climate change comes from many different sources. As the Earth’s climate includes the oceans, the wind, the biosphere, the atmosphere, glaciers, and clouds, researchers can tap into sources of evidence like the rise of sea levels, the warming of the oceans, the shrinking of ice sheets, the glacial retreat, and the increased frequency of extreme weather conditions. Gathering this kind evidence and assessing the magnitude of climate change involve the use of sophisticated instruments and very complex mathematical models, and the reliance on the expertise of a number of different scientists from a variety of different fields of science. Multiple studies published in peer-reviewed scientific journals independently confirm that human activities have contributed to glacier retreat and climate-warming trends over the past century. In addition, most of the leading scientific organizations worldwide endorse this conclusion. The Intergovernmental Panel on Climate Change (IPCC), for example, issued a public statement in 2014 explaining that the evidence for the warming of the Earth’s climate systems is ‘unequivocal’. The panel wrote: [S]ince the 1950s, many of the observed changes are unprecedented over decades to millennia. The atmosphere and ocean have warmed, the amounts of snow and ice have diminished, and sea level has risen … Human influence on the climate system is clear, and recent anthropogenic emissions of greenhouse gases are the highest in history. Recent climate changes have had widespread impacts on human and natural systems. (IPCC, 2014)

15031-1864-Fullbook.indb 29

6/24/2018 7:38:26 AM

30

What Is Science?

EXERCISES 1.9

Choose one scientific development from the Persian Golden Age or the Scientific Revolution. Describe how that development constituted progress in the subject matter of science and in the methods of science.

1.10 Order the following disciplines from most scientific to least scientific, consulting the discussion of defining science and the checklist for science in Table 1.1: astrology, economics, cinematic theory, cultural anthropology, social work, paleontology, criminology. (You might need to first investigate what some of these disciplines are.) For each, briefly explain why you ranked it as you did, making reference to the hallmark features of science. 1.11 Describe how the history, subject matter, and methods of science are each relevant to the nature of science. 1.12 Outline the specific elements of science’s history, subject matter, and methods that relate to hallmark features of science. Rate each of these on a scale of 1 to 5, where 1 is the least important to the nature of science and 5 is the most important. Choose one feature you rated ‘1’ and one you rated ‘5’, and say why you gave each this rating. 1.13 Define pseudoscience in your own words. Then, choose one of the examples of pseudoscience from this section and evaluate it using the checklist of science. Describe how it is similar to science and how it is different. Can you identify any features of the example you’ve chosen that seem to be intended to appear more like science than they are? 1.14 Based on the information we have provided in this section, evaluate intelligent design against the checklist for science. Assign it a letter grade, where A+ is fully scientific and F bears no resemblance to science. Defend the grade you’ve assigned with reference to the checklist. 1.15 Enter the phrase intelligent design into an internet search engine. Find and consult at least one site that endorses intelligent design and at least one site that is critical of the idea that intelligent design is scientific. (a) Evaluate the case presented by each side, taking into account the checklist for science when it’s relevant. Describe your findings in writing. (b) Say what, if any, differences you identify between the sources—that is, between the two websites—and whether and how those differences matter to the authority of these sources on the question of whether intelligent design is a scientific theory. 1.16 Why must science be limited to the study of natural phenomena? Why must it give only natural explanations? Can you think of any scientific projects that don’t seem to satisfy these requirements? If so, describe one or more such projects and say why they might not be naturalistic. If not, describe a non-scientific idea that seems like it is not naturalistic and say why. 1.17 Mythology and science are generally understood to be very different from one another. And yet early science had its origins in, and then grew out of, mythology, and both myths and scientific theories provide explanations of the natural and social phenomena observed in the world around us. a. Look up three creation myths from different cultures and historical periods—that is, look up myths of how the world began and how people first came to inhabit it. b. Identify similarities and differences across the three myths. c. Describe similarities between the creation myths and scientific theories of human origin. In particular, identify potential similarities between the kind

15031-1864-Fullbook.indb 30

6/24/2018 7:38:26 AM

What Is Science?

31

of methods and evidence involved in devising creation myths and in building scientific theories of human origin. d. Describe differences between the creation myths and scientific theories of human origin. In particular, identify dissimilarities between the kind of methods and evidence involved in devising creation myths and in building scientific theories of human origin. How do these dissimilarities make a difference to what you’re justified to believe about human origin? 1.18 It was discovered in the 19th century that the planet Mercury was not following the orbit predicted by Newton’s theory of gravity. When this happened, Newton’s theory was not considered falsified. Instead, it was hypothesized that this anomaly was the result of another planet, named Vulcan, orbiting between Mercury and the Sun. Despite a systematic search, Vulcan was never found. The anomalies exhibited by Mercury’s orbit could be explained only a hundred years later by Albert Einstein’s theory of general relativity. a. Why do you think scientists initially refused to consider Newton’s theory falsified? b. Was this a failure of science? Should the scientists have given up Newton’s theory sooner? Why or why not? c.

1.3

Does this mean Newton’s theory of gravity was not falsifiable? Why or why not?

RECIPES FOR SCIENCE

After reading this section, you should be able to do the following: • • • •

Explain why there is not a single thing we can call ‘the Scientific Method’ Name two general flaws in human reasoning that science is designed to counteract, and give examples of their influence Describe five features of scientists and of the scientific community that are important to the trustworthiness of science Describe each of the three steps found in most recipes for science and why each is a challenge

The Scientific Method Is a Myth We now have a working definition of science and plenty of examples of what is, and is not, properly scientific. Our characterization of science ended up relying heavily on methods common in science, such as evidence-gathering, mathematics, openness to criticism, and collaboration. In this section, we focus further on the methods of science. This is an important part of figuring out what makes science so good at producing trustworthy knowledge about our world. This discussion of methods—or recipes for science—will set the stage for all the topics we will cover in the rest of the book. Spoiler alert: as with our characterization of science, we’re going to start by telling you the answer isn’t a simple one. In some science class along the way, perhaps in high school, you probably learned about the scientific method. But interpreted literally, the idea that science always uses the scientific method is a myth. The American Academy for the Advancement of Science (AAAS) put the point this way: ‘[The scientific method] is often misrepresented as a fixed sequence of steps’, when instead, it is ‘a highly variable and creative process’ (AAAS, 2001, p. 18).

15031-1864-Fullbook.indb 31

6/24/2018 7:38:26 AM

32

What Is Science?

Some of the most important scientific breakthroughs had decidedly unscientific-seeming origins. For example, there was no real method by which German chemist Friedrich August Kekulé (1829–1896) discovered that the benzene molecule was structured like a ring; he just had a daydream of a snake biting its tail. (Although, this daydream came after Kekulé had been studying chemistry and the nature of carbon-carbon bonds for years.) Similarly, the idea that natural selection is the mechanism of evolutionary change occurred to the British naturalist Alfred Russel Wallace (1823–1913) during a feverish attack of malaria while travelling in Indonesia in 1858—or so he wrote in his autobiography. Not only is there no real method for at least some crucial scientific discoveries; there are also many differences in how and the degree to which scientific claims are tested by empirical evidence. This is why, in this book, we talk of recipes for science. Think of culinary recipes. They have several components like the name and origin of the dish, the ingredients and their quantities and proportions, cooking times, and the necessary equipment to make the dish. Following the preparation steps and techniques involved in a culinary recipe doesn’t guarantee a delicious dish. Often, knowledge of ingredients, adapting the recipe to your circumstances, and even collaboration with another cook are also required. Similar to culinary recipes, recipes for good science have several components, involve a wide array of techniques and instruments, and rely on others’ expertise via collaboration or knowledge sharing. Similar to culinary recipes, there is no single set of mechanical instructions and step-by-step procedures that guarantees good science. Just like great cooking, good science is a highly variable and creative process. It’s also often messy.

Box 1.4 Normative Versus Descriptive Claims in Science Some English-speakers say ‘I don’t know nothing’, even though their teachers have lectured them that double negation isn’t good English. In this case, the teachers are making a normative claim—that is, they are expressing a value judgment. Noting that English-speakers sometimes use double negation is, in contrast, a descriptive claim; it’s simply describing what occurs. A normative claim says how things ought to be. In contrast, a descriptive claim attempts to describe how things in fact are, without making any value judgments. Descriptive and normative considerations are both part of science. For example, Nobel Prize winner Daniel Kahneman and his collaborator Amos Tversky used in their studies in the psychology of reasoning and decision-making a normative standard that specifies what choices rational agents ought to make in order to satisfy their own desires. Relying on this normative theory, Tversky and Kahneman then studied how people actually make decisions, constructing a descriptive theory of decision making under uncertainty. Just as science involves both normative and descriptive claims, both kinds of claims can be made about science. One can simply attempt to characterize what science is—that is, how scientists in fact develop theories and test claims—or one can attempt to say how science should work, that is, what features science should have for it to succeed at generating knowledge. We will be doing a little bit of both in this book.

15031-1864-Fullbook.indb 32

6/24/2018 7:38:26 AM

What Is Science?

33

The Flaws in Human Reasoning Let’s talk a little about the purpose of science before we get to the methods, or, in keeping with our culinary metaphor, the recipes. Why is science needed to give us knowledge about the world in the first place, beyond just our ordinary human powers of observation? We humans are predisposed to investigate our world using our senses from our first days of infancy, which is also central to how science works. But we humans are also predisposed to some pretty serious flaws in how we gather evidence and how we reason. Science is a valuable route to knowledge about the world because it incorporates ways to protect against those flaws. We’ll first introduce some significant flaws in human reasoning; then we’ll survey how science counteracts them. It is only natural for people to initially favor some ideas over others. We can then use our experiences in the world, investigation of existing knowledge, and hard thinking to make sure the ideas we favor are, in fact, good ideas. The problem is we also seek out and interpret information in ways that fit with our favored ideas, and we avoid information that challenges those ideas. This is a well-established feature of human reasoning called confirmation bias, the tendency we all have to look for, interpret, and recall evidence in ways that confirm and do not challenge our existing beliefs. For example, when someone asks her friends if they like the restaurant she’s chosen, she may say, ‘It’s good, isn’t it?’ Framing the question in this way promotes agreement with the view she has of the restaurant—it’s a way of looking for confirming evidence. Similarly, someone who’s skeptical about climate change may google ‘climate change doubt’ to learn more, or may focus on what critics say and ignore what climate scientists say. Both of these strategies tend to generate information only on the side of climate-change denial. In one study, proponents and opponents of the death penalty both read the same discussion of the death penalty. The two sides interpreted the discussion totally differently; each side thought it weighed more in favor of their own position (Lord, Ross, & Lepper, 1979). And then, a smoker may clearly remember the distant relative who lived to 100 and smoked a pack of cigarettes every day, while pushing from her mind other smokers who fared less well. The tendency toward confirmation bias is general to all people, but it tends to be stronger for politically and emotionally charged issues, such as climate change, vaccination of children, and health issues. Confirmation bias can involve looking only for ideas and evidence that support your existing beliefs, cherry-picking which research to believe and which to ignore, holding evidence against one’s views to a higher standard than evidence in favor of one’s views, and more easily remembering supporting evidence than contrary evidence. Confirmation bias is exhibited by everyone, and scientists are no exception. Sometimes, scientists’ expectations or desires about the results of scientific research end up leading to incorrect findings. One way in which this can happen is through the observer-expectancy effect, when a scientist’s expectations lead her to unconsciously influence the behavior of experimental subjects. A famous example of this involved Clever Hans, a horse who was thought to have sophisticated abilities, including performing arithmetic calculations. Hans’s owner, Wilhelm von Osten, was a mathematics teacher, horse trainer, and phrenologist. (Phrenology involved studying the shape of the skull as an indicator of personality and mental abilities, and its status as a science has since been discredited.) Hans was trained to recognize numerals from 1 to 9 and to tap his hooves to indicate which numbers he recognized. Eventually van Osten had Hans tapping out correct answers to questions like ‘What’s the number of 4s in 16?’

15031-1864-Fullbook.indb 33

6/24/2018 7:38:26 AM

34

What Is Science?

FIGURE 1.8

Clever Hans and Wilhelm von Osten

In 1891, van Osten travelled around Germany to exhibit his amazing horse. There was such fanfare that the famous psychologist Carl Stumpf appointed a special commission to provide critical scrutiny. In 1904, the commission concluded that Hans’s abilities were legitimate. The horse was able to answer a great variety of questions on topics from simple arithmetic to square roots, fractions, and decimals; units of time; musical scales; and the value of coins. Hans could even respond accurately even when van Osten wasn’t present. The commission was wrong. Stumpf’s pupil Oskar Pfungst demonstrated that Clever Hans was not actually performing the sophisticated mental calculations attributed to him (Pfungst, 1911). Pfungst used blinders to vary whether Hans could see the questioner, and he varied who played the role. Hans produced the correct answer even when van Osten himself did not ask the questions, but Hans’s performance fell apart when either the questioner did not know the answer or could not be seen by Clever Hans. In particular, when Hans could not see the spectators and questioners, his ability to produce correct answers fell dramatically from 89% to 6%. Further observations confirmed that Hans was being unwittingly cued by his human audience. Questioners’ body language and facial expressions became tauter as his tapping approached the correct answer, and then more relaxed upon the final tap; this change prompted Hans to stop tapping. Like van Osten and all the other people who posed questions to Clever Hans, our expectations can affect how things play out, even when we don’t intend for this to happen. More generally, because of confirmation bias, our expectations can influence what sources of evidence we seek, how we view the evidence we encounter, and how well we remember evidence later. These problems increase with emotionally and politically sensitive topics. All of this makes it hard for people—including scientists—to reason their way to the right answers.

15031-1864-Fullbook.indb 34

6/24/2018 7:38:27 AM

What Is Science?

35

Norms of Investigators An element of science’s great success in generating knowledge about our world is its features that protect against or counteract the basic flaws in human reasoning. We’ll discuss those features in two categories: first, norms of science that apply to individual scientists and second, norms of science that apply to the scientific community. You can think of these norms as rules or guidelines against which scientists’ actions can be deemed good or bad, desirable or undesirable. Because of science’s aim of producing knowledge, scientists are obligated and trained to have a certain kind of integrity. Scientific integrity requires scientists to be sincere and honest, and to avoid improper influence by others. Violations of these norms can undermine science’s ability to produce trustworthy knowledge. Plagiarism is an obvious example of dishonesty. Plagiarism consists of presenting somebody else’s ideas, scientific results, or words as one’s own work, intentionally or unintentionally, by not giving proper credit. When plagiarism is discovered in science, it is severely penalized, perhaps including a ban from publishing in peer-reviewed journals or suspension or expulsion from one’s institution. Faking data to support a desired conclusion is another egregious violation of scientific integrity. In 2011, Diederik Stapel, a Dutch social psychologist, published a widely read study in Science—one of the most prestigious scientific journals. The study presented evidence supporting the dramatic conclusion that trash-filled environments lead people to be more racist. But rather than earnestly collecting actual data, Stapel just made up the data. When this was discovered, Stapel’s reputation shifted from that of a respected academic to a prominent example of fraud. All of his other publications were scrutinized, and approximately 60 other papers were retracted for data fabrication. Other scientists have also been forced out of science after their ethics violations were discovered, such as the stem-cell researcher Hwang Woo-suk from Seoul National University and the Harvard evolutionary biologist Marc Hauser. Some science journalists have helped increase awareness of retraction due to issues like data fabrication by running blogs such as Retraction Watch. But scientific integrity requires more than just not misattributing or misrepresenting ideas and data. Scientists also ought to avoid conflicts of interest —that is, financial or personal gains that may inappropriately influence scientific research, results, or publication. Scientists are obligated to disclose any potential conflicts of interests. The existence of a potential conflict of interest does not necessarily lead to researcher bias or misinterpretation of data, but transparency about any potential conflicts of interest allows others to better evaluate the possibility of improper influence. Conflicts of interest, especially when research is funded by organizations with a financial stake in the findings, can result in researchers intentionally or unintentionally altering the research they conduct, their findings, or what they report in publications. Clair Patterson, a geochemist at Cal Tech in California, became famous for definitively calculating the age of the Earth (≈4.54 billion years) in the 1950s. He also led the campaign to remove lead from gasoline in the 1960s and 1970s. Leaded gasoline contained lead tetraethyl, which is extremely toxic—a single drop of pure lead tetraethyl can be fatal—and had to be handled with utmost caution by its manufacturers. Because the campaign against leaded gasoline threatened their profits, the fossil fuel industry—led by the Ethyl Corporation—fought bitterly against Patterson. Among their tactics was to

15031-1864-Fullbook.indb 35

6/24/2018 7:38:27 AM

36

What Is Science?

procure a shill, Robert A. Kehoe, who was paid handsomely by the fossil fuel industry to attest to the safety of leaded gasoline. Led by Patterson, honest science eventually carried the day. But serious damage had already been done: generations of Americans suffered from elevated lead levels in their blood through the end of the 20th century. Kehoe later used his scientific credibility to endorse pollutants like Freon, undermining scientific evidence showing their damage to the earth’s ozone layer. He was later commissioned by DuPont, General Motors, and other companies to produce studies asserting that dangerous carcinogens were safe. Ultimately, Kehoe’s efforts have been a model for executives in a range of industries (tobacco, asbestos, pesticides, fracking, and so on) for how to obstruct scientific efforts with misinformation. The ‘Kehoe approach’ is still being deployed by the fossil fuel industry to evade evidence and undermine the scientific research about anthropogenic climate change. One way to promote scientific integrity is holding scientists accountable for their work. Scientists should be prepared to engage with others about the ideas, methods, and data they use to support their scientific findings. In climate science, for example, scientists should be able to answer questions about what kinds of uncertainties their models involve and what kinds of evidence they have for the reliability of their findings. This is related to a second norm: scientists should be open to criticisms of their work and to new ideas. Remember that science is always in revision. We have said that scientific ideas should be in principle falsifiable and that scientists should be open to the possibility that any idea will turn out to be false. Similarly, scientists should at least sometimes be willing to entertain ideas that might initially seem unlikely, and they should be open to criticism of their ideas and methods. A third norm governing scientists as individuals is ingenuity. This is a natural partner to the norm of openness to new ideas. Science benefits from the development of lots of interesting ideas, violating our preconceptions. It’s often impossible to tell at the outset which ideas will prove to be promising. Many, even most, new ideas in science will turn out to be false. But time and time again, science has gone in unexpected directions, and that can’t happen without new, creative ideas about the world and about how to pursue science. The Blackawton Bees project is one striking example of how ingenuity can guide scientific research. In this project, 28 children between 8 and 10 years of age from Blackawton in Devon, UK, conducted a collaborative scientific study on bumblebees’ visuospatial abilities, supervised by an ophthalmologist and an educator. The children wondered how bumblebees decide to which flower to go to for food and whether bumblebees could learn to recognize different flower shapes and colors. The children ingeniously brainstormed about possible answers and creatively designed experiments to test their ideas. Their coauthored findings were published as an original article in the scientific journal Biological Letters. The article summarizes its discoveries as follows: We discovered that bumble-bees can use a combination of color and spatial relationships in deciding which color of flower to forage from. We also discovered that science is cool and fun because you get to do stuff that no one has ever done before. (Blackawton et al., 2011, p. 168) As the children involved in the bumblebee project can attest, scientific reasoning can and should be ingenious, challenging, and creative.

15031-1864-Fullbook.indb 36

6/24/2018 7:38:27 AM

What Is Science?

37

Social Norms No matter how many requirements are placed on scientists as individuals and no matter how good scientists get at satisfying those norms, science cannot be fully protected against the flaws inherent to human reasoning by that alone. Individuals—even very bright scientists—often aren’t aware of the flaws in their own thinking, and often aren’t in a good position fix them by themselves. Thus, another important form of protection against flaws in reasoning involves requirements placed on the scientific community as a whole. We’ve already mentioned the importance of the social organization of science. This organization is especially salient when research involves collaborations among lots of researchers, sometimes with different disciplinary backgrounds, working at different times and in different physical locations, often using different kinds of complex instruments. Climate science regularly involves radically collaborative endeavors of this sort. After all, climates are extremely complex, interconnected systems; no single person and no single field of expertise can alone produce knowledge of how the climate works or how it will change. Instead, many different people perform specific tasks based on their expertise and their available instruments. When these different tasks are added together, they lead to knowledge that no one could have produced alone. But even scientific research that is not so visibly collaborative relies on the communities of science. Research is informed by previous results found by other scientists, papers are reviewed for publication by other scientists, and other scientists decide whether and how to react to any given published finding. In massively collaborative research and solitary research alike, it is really scientific communities, instead of scientists as individuals, that produce knowledge. The good functioning of scientific communities depends on various social norms and incentive structures. One primary social norm in science is trust. Scientists’ trust in one another is the glue of scientific communities. For example, collaborative projects on climate change involve scientists with a range of different expertise, including climatologists, ecologists, physicists, statisticians, and economists. None of these scientists alone possesses comprehensive expertise to understand the full range of evidence that bears on our understanding of anthropogenic climate change. So these scientists must rely on each other and must trust one another’s scientific work. Individual members of the public often don’t have the expertise to evaluate most scientific findings, and so they too must trust the scientists who are experts on those topics. But it’s also true that the work of scientists must be critically evaluated, by other scientists and the public alike. For the public, the most straightforward way to evaluate scientific work is to assess the quality of the arguments presented. This is not always possible, however, as scientific information can be technical and difficult to understand by non-experts. For this reason, it is also important to pay attention to the reputation of the alleged expert. Scientists’ reputations are based on their track record of accomplishments in their field, as judged by other scientists with similar research expertise. Scientists critically evaluate one another’s work by deciding whether particular results warrant publication, evaluating the strengths and weaknesses of scientific studies, and choosing whether and how to respond to published findings. One form that scientists’ healthy skepticism toward the work of others and their openness to criticism takes on is attempts to replicate others’ findings. In replication,

15031-1864-Fullbook.indb 37

6/24/2018 7:38:27 AM

38

What Is Science?

TABLE 1.2

Individual and social norms that protect against bias and flaws in reasoning

Individual norms

Social norms

Integrity

Trust

Openness

Skeptical evaluation

Ingenuity

Diversity

an experiment or study is carried out again in order to see whether the same results are achieved. If successful, the replicated results further confirm the idea under investigation. If the results are not replicated, this raises doubts about the results of the original work. This balance of trust and skepticism among scientific communities also helps protect against individual biases. Imagine that a scientist with legitimate expertise is paid by an oil company as a consultant. The same scientist pursues research aiming to show that the evidence for anthropogenic climate change is inconclusive. Such a scientist would have an obvious conflict of interest; this should lead other scientists and members of the public to be very cautious about trusting those findings. It doesn’t guarantee the scientist is wrong, but it does raise questions about whether her judgment is clouded. Communities of experts also protect against the undue influence of individual economic, religious, and political values, and thus confirmation bias and expectancy bias, in another way. Scientists are each susceptible to biases that make them see the world more like what they expect it to be and how they hope it will be. But their expectations and hopes are different from one another’s. So, in a scientific community, these biases tend to balance each other out. The conclusions that different scientists all agree to are thus less likely to have resulted from bias. Hence, if a large and diverse group of scientists agree about some result, we should be more confident that the result is accurate. This brings us to one last point about the social norms of science, on which we will elaborate in Chapter 8. To adequately protect against individual bias and flaws in reasoning, scientific communities need to be diverse in any way potentially relevant to scientists’ values and, thus, to flaws in their reasoning. The best science is done by female and male scientists with diverse cultural knowledge, various ethnic and racial backgrounds, and from a variety of nations. This kind of diversity benefits science by guarding against any individual biases. The individual and social norms of science we have identified are summarized in Table 1.2.

Methods in Science The third and final topic about how science protects against flaws in reasoning is also the most important. The definition of science developed here stressed the methods used in science. And it’s these scientific methods that bear most of the responsibility for helping science overcome the flaws and limits of individual scientists’ reasoning. We have suggested that, despite what you might have heard in high school science classes, there’s no set scientific method: science proceeds in myriad ways. But this doesn’t

15031-1864-0FM-CH04.indd 38

6/24/2018 1:32:58 PM

What Is Science?

39

mean there are no methods to science. In fact, as the title of this book suggests, the opposite is true: science may have no set method, but it does proceed according to familiar recipes. It is the purpose of this book to outline some of the main recipes for science. At the heart of many of those recipes, there is a pattern that includes something like these three steps: the formulation of hypotheses, the development of expectations based on hypotheses, and testing expectations against observations. One common way this plays out is for scientists to formulate hypotheses about the world—what was described earlier as bold and risky conjectures—and then use those hypotheses to generate specific expectations regarding their experiences. If their observations conform to those expectations, their hypotheses are confirmed. If not, they return to the drawing board. These steps can occur in different orders, and they happen again and again in various combinations. Different methods may also be involved in each of the three steps. For example, sometimes, scientists have a specific hypothesis to investigate; other times, research is more exploratory and open ended. Sometimes, hypotheses have obvious empirical implications; other times, scientists need to use statistics to develop their expectations. Sometimes, scientists design experiments to test their expectations; other times, they develop models. Other scientific work simply isn’t described well by this trio of hypothesis, expectations, and observations, such as the theoretical work behind string theory. But these three steps are integral to the production of scientific knowledge. They are the basic ingredients that, with tremendous variation, occur in many of the recipes for science. With the help of these steps, scientific theories, laws, models, and other advances are developed and refined. And these advances, when successful, are the vehicles of our scientific knowledge. We’ll conclude this chapter by talking about each of these three steps in greater detail. Throughout the rest of the book, we’ll regularly refer back to these three ingredients and the different ways they factor into the recipes for science.

Hypotheses As we have seen, empirical investigation is how we learn about our world. Scientists make observations to try to figure out what’s out there, why things are the way they are, how things change, and so forth. But simple observations can’t accomplish that task by themselves. A second ingredient is needed: theoretical claims. Theoretical claims are claims made about entities, properties, or occurrences that are not directly observable. These things might be excessively large or small, like the whole universe or quarks; they might be too distant in time or space to observe, like the first forms of life on Earth or black holes; or they might not be directly observable at all, like what physicists call ‘dark matter’. As an example of a theoretical claim, consider a claim about all of something of some kind, like the claim that all salt dissolves in water. You might have seen plenty of salt dissolving in water, but you will never be able to witness all of the salt that exists dissolving in water. So direct observation can’t guarantee that all salt dissolves in water. We have plenty of evidence that this is so, but the claim is theoretical because it goes beyond what we can directly observe. Science is centrally concerned with a special kind of claim called a hypothesis. A hypothesis is a conjectural statement based on limited data—a guess about what the

15031-1864-Fullbook.indb 39

6/24/2018 7:38:27 AM

40

What Is Science?

world is like, which is not (yet) backed by sufficient, or perhaps any, evidence. Scientists do not yet know whether any given hypothesis is true or false; when there is sufficient evidence in favor of some hypothesis, it graduates from that category. Formulating a hypothesis often requires some imagination; if you could observe something we can’t—if you could witness the beginning of life on Earth, fly into a black hole, or see all the salt in the world—what would you find? Scientists might formulate a hypothesis before any observations have been made, just with the use of their imagination. But often initial observations, other hypotheses, or background knowledge about related phenomena help inspire new hypotheses. Before scientists knew about the properties of potassium chloride, they’d seen table salt—sodium chloride—dissolves in water. This informed their expectations for potassium chloride, a similar compound. Scientists’ hypotheses about the first life forms were shaped by what they know about organisms, existing and extinct, and how the Earth has changed over geologic time. Scientists can have different levels of confidence in different hypotheses. If a hypothesis is informed by lots of experience with similar objects or significant background knowledge of related phenomena, scientists might be much more confident in it than if it is a random guess. But, by their very nature, hypotheses are guesses. This is why hypotheses must be tested.

Expectations Learning whether a hypothesis is true is often more circuitous than just making direct observations. A second ingredient is usually needed to test hypotheses; this is developing expectations based on hypotheses. Expectations are conjectural claims about observable phenomena based on some hypothesis. These claims are conjectures since they go beyond what scientists have observed so far, but, unlike hypotheses, their truth or falsity can be discerned directly from the right observations. Indeed, expectations are claims about what scientists expect to observe if a given hypothesis is true. Expectations do not regard just any potential observations, but observations that scientists anticipate being able to make. We could say what we would expect to experience if we were in the middle of a black hole (given some hypothesis about black holes), but since we don’t expect to ever be making observations from inside a black hole, those expectations are useless. Instead, expectations based on a hypothesis regarding black holes should be about what scientists would expect to see through a telescope, patterns of x-rays and gamma rays detected from Earth or from a satellite, and so on. Depending on the nature of a hypothesis, developing expectations in light of the hypothesis can be relatively straightforward or incredibly complicated. On one extreme, the hypothesis that all salt dissolves in water leads directly to an expectation: any given sample of salt will dissolve when placed in water. But even then, there are some other conditions to stipulate. Should salt dissolve when placed on a chunk of frozen water (that is, ice)? What if some salt is already dissolved in the water? Should we still expect the sample of salt to dissolve? Expectations regarding black holes are much more complicated to develop. Black holes are objects so massive, so dense, that even light gets pulled inside. No one will ever

15031-1864-Fullbook.indb 40

6/24/2018 7:38:27 AM

What Is Science?

41

take measurements on the edge of a black hole. Even if someone could get there and survive to take measurements, the measurements couldn’t be recorded in a way that escaped the black hole. Nor does anyone expect to see a black hole through a telescope, since that would require light to leave the black hole and travel to our telescope. Hypotheses about black holes must thus be investigated entirely by formulating expectations regarding their effects on other objects, objects that can emit electromagnetic radiation (and thus be seen through telescopes) and give off measurable x-rays or gamma rays. No matter whether deriving expectations is relatively straightforward or incredibly complicated, this is an important and nontrivial step of scientific work. Expectations set scientists up to make observations that can provide evidence for or against the truth of a hypothesis. Deriving expectations thus serves as a bridge between conjectural claims (hypotheses) and immediate observational claims (data).

Observations Nearly all science fundamentally depends on observations. This is because, as we have discussed, scientific inquiry is ultimately an empirical inquiry. It’s not enough to think up interesting ideas about how the world might work; those interesting ideas must also be evaluated by how well they fit with our observations of the world. Observations include any information gained from your senses—not only what you see, but also what you hear, smell, and touch and sense in any other way you can experience the world. Your observations belong only to you. If we are on a hike together, we might both hear a rattling sound coming from behind a boulder. But each of us only has access to our own experience of the sound. You can’t compare my observation to yours. Data are different. Data are public records produced by observation or by some measuring device. (The singular form in Latin is datum.) Observations are important because they are your only way to directly access the world. Data based on observations are important because they allow us to record and compare our observations. Our powers of observation are ultimately capacities to detect physical information and then—literally—to incorporate it into our bodies. For example, when one hears a serpentine rattle from behind a boulder, an acoustic waveform with a number of distinct physical features enters into the ear canal and causes the tympanic membrane to vibrate. The vibrations are ‘forwarded’ to the cochlea via the bones of the middle ear, where shearing force of the tectorial membrane mechanically moves the hair cells of the basilar membrane. Hair cell movements are then transduced into electrical signals, which leave the organ of Corti and travel via the main auditory nerve to the brain. The embodied brain then has to interpret that signal (as a serpentine rattle rather than a baby rattle, for instance). This is what it takes for humans to hear a serpentine rattle from behind a boulder. A similar transformation occurs when you see that a test strip of litmus paper has turned blue, feel the heat produced by a chemical reaction, and so forth. Observation isn’t passive. We can move our heads to see different things and relocate our bodies to different places where we can hear different things. We can also use observations from multiple senses together. If you’re wondering about that rattling sound from behind the boulder, you can walk around to the other side to see whether there’s a rattlesnake there. Besides changing our position and using multiple senses to enhance our

15031-1864-Fullbook.indb 41

6/24/2018 7:38:27 AM

42

What Is Science?

observations, we can also change the world around us to create opportunities for different observations. Crushing a leaf lets you better smell whether it’s sage or mint. When we do such things, we have begun to do simple experiments on the world around us. We will talk about experiments in Chapter 2. Humans have also found many ways to use tools to enhance our powers of observation. Light can be refracted with mirrors, prisms, and lenses to extend the reach of vision. We now can see not just through our eyes alone but also through our eyes aided by telescopes, microscopes, and other devices. To help us hear beyond our ears’ capabilities, we have developed microphones, stethoscopes, and so on. These technological enhancements range from observational correctives like eyeglasses and simple sensory aids like microscopes to much more complex technology with highly specific purposes, like an fMRI machine to show brain activity and the Large Hadron Collider, which uses superconducting magnets to cause streams of high-energy particles to collide in a detectable way. Such enhancements have allowed humans to generate what we might call super-observational access to what would otherwise be undetectable to us, given our sensory modalities. Making observations, collecting data, is at the heart of science’s ability to generate knowledge of our world. But observations aren’t always independent from the ideas about the world we already have. Changes in what we believe to be true can have a significant impact on what we observe. For instance, when we observe the Sun at the horizon, what we seem to see is the Sun at one point on its path across the sky. Geocentricism organizes this and similar observations into an easily understood pattern, and those observations confirm the geocentric conception. But from the perspective of the heliocentric conception of the solar system, with your head slightly turned to the side, the Sun at the horizon and the other planetary bodies that appear comprise a different observation. See Figure 1.9. Heliocentrism is a different perspective, and it may also create a different perceptual experience, or observation—the Sun setting not because it moves below the horizon but because your position on Earth rotates away from it. New ideas can sometimes have a strong effect on what we think we see, on our very observations. Observations are crucial to science, but they aren’t always the starting point, and they aren’t always decisive.

EXERCISES 1.19 Why do the authors suggest there’s no unitary scientific method? Evaluate that idea, raising considerations in favor of it as well as considerations opposed to it. 1.20 Describe three types of influence confirmation bias has, and define the observerexpectancy effect. Think of a novel example for each of the four (three types of influence of confirmation bias and observer-expectancy effect). Make sure it’s clear how each example illustrates each idea. 1.21 Describe three kinds of scientific fraud or scientific misconduct, giving an example of each. Explain how each example undermined science’s ability to produce trustworthy knowledge. 1.22 How should trust and skepticism be balanced in scientific communities, and why is this important to science? How should trust and skepticism of the public toward scientific findings be balanced, and why is this important for the public’s relationship to science?

15031-1864-Fullbook.indb 42

6/24/2018 7:38:27 AM

FIGURE 1.9

15031-1864-Fullbook.indb 43

Reorientation from geocentrism to heliocentrism

6/24/2018 7:38:27 AM

44

What Is Science?

1.23 Choose one of the following, and invent a pseudoscientific theory about it. Feel free to be creative! a. The origin of the universe b. The healing power of music c. People’s handwriting d. The change of organisms over time Then, describe how the norms for scientists, the scientific community, and the methods of science could help guard against your made-up theory. Try to make your answer comprehensive, involving all the main topics from this section. 1.24 Search the internet (news websites, magazines, and so forth) for a story or advertisement about a scientific finding or a medical or health treatment that purports to be based on science, and answer the following questions about it. Make sure you include the story or advertisement when you submit your answers, as well as a link to it on the internet. a. What is the source? Is the person or entity making the claims someone with genuine expertise in what he or she is claiming? b. Does it seem like there’s any conflict of interest? Why or why not? c. Does the claim involve vague or ambiguous language? d. Do the claims fit with other well-confirmed scientific theories? e. What is the evidence cited in support of the claim? f. Does this describe good science? Why or why not? 1.25 What is the difference between observations and data? What is important about observations in particular and why? What is important about data in particular and why? 1.26 Hypotheses, expectations, and observations are all important ingredients for most science. Describe the importance of each, a typical way that the three ingredients work together, and what they accomplish together. 1.27 Hypotheses, expectations, and observations are all important ingredients for most science. Describe a difficulty with each, or circumstances in which it can be difficult. 1.28 Imagine you are a doctor in a large medical practice. The other doctors are considering introducing a homeopathic service for their patients. They ask you to prepare a report summarizing the pros and cons of doing so. One of the other doctors, Dr. A, is entirely dismissive of homeopathy on the grounds of the weakness of its scientific basis; another doctor, Dr. B, has read a report of a study that she says shows that homeopathy can outperform placebo and is inclined to be sympathetic. Yet another doctor, Dr. C, has said that he doesn’t care about the evidence, so long as homeopathy works and is not toxic. Write a 500- to 800-word report describing homeopathy (you’ll probably have to do a bit of research), addressing each of the other doctors’ points of view, and recommending whether to introduce homeopathic service. You should employ any of the concepts from this chapter that you find useful.

FURTHER READING For more on the science of climate change, see the Intergovernmental Panel on Climate Change. (2014). Fifth Assessment Report (AR5). Geneva: IPCC. Retrieved from www. ipcc.ch/report/ar5/

15031-1864-Fullbook.indb 44

6/24/2018 7:38:27 AM

What Is Science?

45

For the latest data and information for stabilizing Earth’s atmosphere, climate, and living environments, see CO2.earth: Retrieved from www.co2.earth/ For more on political influence used to cast doubt on climate change research and other scientific findings, see Oreskes, N., & Conway, E. (2010). Merchants of doubt. New York: Bloomsbury. For more on the demarcation between science and pseudoscience, see Pigliucci, M., & Boudry, M. (eds.) (2013). Philosophy of pseudoscience: Reconsidering the demarcation problem. Chicago: University of Chicago Press. For more on the Scientific Revolution, see Kuhn, T. S. (1957). The Copernican revolution: Planetary astronomy in the development of Western thought. Cambridge: Harvard University Press. See also Shapin, S. (1996). The scientific revolution. Chicago: University of Chicago Press. For more on science in the Persian Golden Age and other periods around the world, see the History of Science Society: Introduction to the history of science in non-Western traditions. Retrieved from https://hssonline.org/resources/teaching/teaching_nonwestern/ For a concise treatment of the illusion of explanatory depth, see Keil, F. C. (2003). Folkscience: Coarse interpretations of a complex reality. Trends in Cognitive Sciences, 7(8), 368–373. For the psychology of confirmation bias and bias more generally, see Nickerson, R. S. (1998). Confirmation bias: A ubiquitous phenomenon in many guises. Review of General Psychology, 2(2), 175–220. See also Hahn, U. & Harris, A. J. (2014). What does it mean to be biased: Motivated reasoning and rationality. The Psychology of Learning and Motivation, 61, 41–102. For more on how social norms and social structures influence scientific inquiry, see Merton, R. K. (1942). Science and technology in a democratic order. Journal of Legal and Political Sociology, 1, 115–126. See also Boyer-Kassem, T., Mayo-Wilson, C., & Weisberg, M. (eds.) (2018). Scientific collaboration and collective knowledge. Oxford: Oxford University Press.

15031-1864-Fullbook.indb 45

6/24/2018 7:38:27 AM

CHAPTER 2

Experiments and Studies

2.1

EXPERIMENT: CONNECTING HYPOTHESES TO OBSERVATIONS

After reading this section, you should be able to do the following: • • • • •

Describe the role of experiments in testing hypotheses Identify the main features of an experiment in an example of scientific research Define extraneous variables and articulate why these must be controlled in an experiment Describe the problem of underdetermination and how scientists deal with it Identify three other uses of experiments in science besides hypothesis-testing

How Experiments Contribute to Science

Copyright © 2018. Taylor & Francis Group. All rights reserved.

A central aim of science—perhaps the central aim—is to produce knowledge about the world, which involves formulating natural explanations of natural phenomena. Experimentation is one primary strategy used to achieve this aim. Many scientists perform experiments in order to test new hypotheses and to extend existing knowledge. So let’s ask: what are experiments? Recall from Chapter 1 the three common ingredients in recipes for science and the relationships among them: 1. 2. 3.

Hypotheses are used to generate expectations. Expectations are compared with observations. That comparison is used to develop, confirm, reject, or refine a hypothesis

When it comes to testing hypotheses, experiments contribute to ingredient 2. Experiments provide a structured way to make observations—that is, to collect data—and to compare those observations with what we would expect to observe if the hypothesis under investigation were true. We will see later in this chapter and subsequent chapters that there are different ways to collect data besides experiments and that experiments are not only used to test hypotheses. But this is a good starting point. Suppose you want to find out how the physical characteristics of plants and animals influence the characteristics of their offspring—for example, how your height depends on

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

Experiments and Studies

47

Copyright © 2018. Taylor & Francis Group. All rights reserved.

the heights of your parents or how the shape of a pea plant seed depends on the shapes of the seeds of the parents of that plant. How could you investigate this? The scientist and friar Gregor Mendel (1822–1884)—born in the Austrian Empire in what is now the Czech Republic—investigated such questions by breeding pea plants. He fertilized some pea plants with pollen from their own flowers and others with pollen from the flowers of plants with different physical characteristics. In this way, Mendel controlled the physical characteristics of the parent plants, such as their seed shape (smooth or wrinkled) and flower color (purple or white). He could then observe what characteristics resulted for their offspring. For example, if a pea plant with purple flowers (whose parents all had purple flowers) is crossed with a pea plant with white flowers, the offspring will all have purple flowers. Mendel’s selective fertilization of pea plants illustrates a key feature of experiments. In an experiment, a researcher introduces specific changes to a system and observes the effects of these changes. The patterns in characteristics resulting from his selective breeding of pea plants led Mendel to posit units of heredity (now known to be genes) that determine variation in inherited characteristics according to set patterns across biological organisms from pea plants to humans. In part, Mendel conjectured that some heredity units are dominant and others recessive; this accounts for why purple-flowered plants and white-flowered plants have purple-flowered offspring (Mendel, 1865/1996). Figure 2.1 illustrates two crosses between pea plants, showing flower color (purple or white) and dominant or recessive heredity units, or genes (A and a, respectively; each plant has two). Flower color was observed from experiments; from these, Mendel postulated the genes shown here. The grid on the left shows that a purple-flowered pea plant with two dominant genes and a white-flowered pea plant with two recessive genes have offspring with entirely purple flowers. But, as the grid on the right shows, two purple-flowered pea plants bred in this way have one dominant and one recessive gene. Despite having entirely purple flowers, these plants have 25% offspring with white flowers. In experiments, as Mendel’s work illustrates, scientists introduce specific changes to a system in order to make observations about how the system responds. In Chapter 1, we learned that data are public records produced by observation. Experiments are used to produce data of one kind or another. Experimental data can include various kinds of

a A

a

Aa

A

Aa

A

A

a

AA

Aa

Aa

aa

a Aa F1 generation

Aa

F2 generation

FIGURE 2.1 Illustrations of two crosses between pea plants, representing dominant and recessive genes for flower color

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

48

Experiments and Studies

measurements, artifacts, signs, the location of some object, or even an object’s absence. Mendel’s data consisted of systematic records of the fertilization history and physical properties of each pea plant. For physicians, the results of blood tests and testimony about one’s medical history can both count as data. Fossils, tracks, and recordings of the geochemical features of rocks all can count as data for a paleontologist; and for anthropologists, data may include monuments, pottery, and written documents. Another concept from Chapter 1 that we need to structure this discussion is scientists’ use of empirical evidence to justify their scientific beliefs. When a hypothesis is used to develop clear expectations for the outcome of some experiment, and data are gathered from the experiment that match or conflict with those expectations, then the experiment has produced empirical evidence for or against the hypothesis. The data collected by Mendel, for instance, turned out to be empirical evidence supporting the belief that inherited characteristics are caused by discrete units of heredity that come in pairs, one from each parent. Often, it’s not obvious what a hypothesis should lead one to expect to observe. Explicit expectations must be developed from a hypothesis before that hypothesis can be tested with empirical evidence. Before Mendel’s experiments, most people believed that physical characteristics resulted from a blending of each parent’s characteristics. This hypothesis would lead us to expect that offspring tend to have traits that are intermediate between the traits of their parents. So, for example, the offspring of purple-flowered pea plants and white-flowered pea plants should have light-purple flowers. Mendel’s observations did not support this expectation; both crosses in Figure 2.1 illustrate this. But is flower color just an exception to a general pattern of blended inheritance? Notice that this is a question about what the hypothesis of blended inheritance should lead us to expect. Many different experiments, on different traits, helped confirm that Mendel’s hypothesis of hereditary units holds up more generally. Scientific experiments are designed to be a particularly powerful way to test expectations against observations. Challenges stem from humans’ tendency toward biased reasoning, including a tendency to observe what we want to be true, and the difficulty of discerning what hypotheses should lead us to expect. Experiments offer two different approaches to overcoming such difficulties. In some experiments, typically performed in a laboratory, data are produced under tightly controlled conditions that are designed to make expectations and observations both as clear as possible. In other experiments, often performed outside a laboratory or ‘in the field’, scientists compare specific features of individuals in different, variable groups to make it easier to check observations against expectations. We will dig into the features of both styles of experiment in the next two sections.

Variables and Their Control Experiments involve introducing specific changes to a system in order to make observations about how the system responds. At this point, introducing some terminology will help clarify this central feature of experiments. A variable is anything that can vary,

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Experiments and Studies

49

change, or occur in different values. For example, the number of books read during the past year, height, the flower color of a pea plant, and the temperature in your hometown are all variables. The value of a variable is just its state or quantity in some instance. For example, the flower color of a pea plant may have values like white, purple, or pink; and your hometown temperature might have the value 62° Fahrenheit one summer evening. In experiments, there are three categories of variables: independent, dependent, and extraneous. An independent variable is a variable that stands alone, that is, whose values vary independently from the values of other variables in an experiment. When scientists introduce specific changes to a system in an experiment, they do so by changing the value of one or more independent variables. This is often called an intervention. A dependent variable is a variable whose change depends on another variable. When scientists change the value of an independent variable in some experiment, they do so in order to investigate how that change affects one or more dependent variables. For example, one might vary the amount of visible light (independent variable) in a factory or workspace and then look for changes in workers’ productivity (dependent variable). Experimental methods are designed to enable scientists to isolate the relationship between independent and dependent variables. This requires controlling background conditions, or extraneous variables, as much as circumstances allow. Extraneous variables are other variables besides the independent variable that can influence the value of the dependent variable. If you’re exploring the relationship between the amount of visible light in a factory (independent variable) and workers’ productivity (dependent variable), then extraneous variables include noise levels in the factory, the heights of the workers, the amount of coffee workers drink daily, the country in which the factory is located, the weather, and so on. If extraneous variables are not taken into account, they, and not the independent variable, may be responsible for any changes in the dependent variable. Alternatively, extraneous variables may counteract the influence of the independent variable on the dependent variable. In these ways, extraneous variables can ‘confound’ the relationship between the independent and dependent variables. If they do so, they are known as confounding variables. These are extraneous variables, which vary in ways that influence the value of the dependent variable in unanticipated ways. Confounding variables can interfere with the accuracy of the conclusions drawn from an experiment. Imagine now that you want to investigate the relationship between the amount of visible light in a factory and workers’ productivity. In particular, your hypothesis is that better lighting increases workers’ productivity. To test this hypothesis, you could run an experiment by varying the amount of light (independent variable) and subsequently looking for changes to workers’ productivity (dependent variable). What are some ways you can think of to change the value of the independent variable? One option is to change the number of light fixtures in some workspace; another option is to wait for the time of year to change (there’s more sunlight for longer hours in summer than winter). A third choice is to compare two factories, one with better lighting than the other. One thing to consider when weighing these options is the possibility of confounding variables. Of these options, which introduces the least number of extraneous variables? Think about all the changes between summer and winter beyond the amount of light;

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

50

Experiments and Studies

perhaps wearing scratchy wool clothing in the winter or the shorter winter days decrease work productivity, or perhaps summer heat decreases productivity. These are extraneous variables that could easily become confounding variables. Likewise, different factories can have many other differences between them beyond just the quality of lighting. Perhaps one pays a better hourly wage than the other, offers more vacation, or has free coffee in the break room. The best option seems to be to choose one workspace and then vary the number of light fixtures, while keeping all other conditions as uniform as possible. You will want to measure and record the values of the independent variable (amount of light), so as to compare those values with the values observed for the dependent variable (workers’ productivity). How could you measure worker productivity? Perhaps by the number of widgets produced in one hour? But what if the number of people who come to work on a given day varies? It’s probably better to measure the number of widgets produced in one hour divided by, or averaged over, the number of workers. That takes into account, or controls for, the extraneous variable of number of workers. The general point here is that the setup of the experiment, including its location and timing, how the independent variable is intervened upon, the type of measurements, and so forth, are all shaped by the need to minimize the possibility of confounding variables. This is the key to effective experimental design: an independent variable is varied in a controlled way, and the value of a dependent variable is measured, while keeping all extraneous variables fixed or taking them into account in some other way. (This is why we divided widgets produced by number of workers present to take into account any variation in the number of workers who showed up to work that day.) In experiments involving human participants, one confounding variable can be the Hawthorne effect or observer bias, where experimental participants change their behavior, perhaps unconsciously, in response to being observed. The name of this effect originates from a series of experiments from the late 1920s and early 1930s that were performed in a Chicago suburb at Western Electric’s Hawthorne factory (Parsons, 1974). Some of these experiments investigated the effects of lighting conditions on workers’ productivity. Two groups of workers participated in the study. One group worked in an area where there was a dramatic increase in the quality of the lighting. For the other group, the lighting conditions remained just as before. Experimenters discovered that the worker productivity in the well-illuminated area increased significantly compared to the other group. This finding seemed to support their hypothesis that improved lighting increases productivity. However, the experimenters were surprised to discover that workers’ productivity also improved with changes to rest breaks, to working hours, and to the types of meals offered in the factory’s canteen. As they experimented further, they found that even dimming the lights to original levels increased productivity! This result undermined the initial findings about the relationship between the amount of light and productivity. The experimenters eventually concluded that the changes to the quality of illumination had no real impact on job productivity. As it turned out, workers became more productive simply when they knew they were being studied. This is, of course, the Hawthorne effect in action.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

Experiments and Studies

FIGURE 2.2

51

Western Electric’s Hawthorne factory illumination study

The Hawthorne effect can be found in almost any experiment with human participants and can be a serious confounding variable. This is related to the observer-expectancy bias discussed in Chapter 1, which is when researchers’ expectations are themselves a confounding variable in an experiment. Fortunately, there are experimental methods that control for the extraneous variables of researchers’ and participants’ expectations; we’ll get to that topic later in the chapter.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

What’s the Stuff of Light? Let’s dig in more deeply to features of experiments by looking at how experimentation contributed to our knowledge of light. The light we ordinarily see is visible or ‘white’ light; what is not illuminated appears to us as shadow, darkness. What is the nature of light? Is it made of more basic matter, and if so, what? And is the light we see the only kind of that stuff there is? Intuitively, it’s hard to imagine that light could be anything other than something visible to us. The nature of light and its relation to the color spectrum visible in rainbows have been studied for millennia. In Chapter 1, we mentioned Ibn al-Haytham (Latinized as Alhazen), who, during the Persian Golden Age, made important contributions to the scientific understanding of vision, optics, and light. In his book Kitāb al-Manāẓir (Book

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

52

Experiments and Studies

Copyright © 2018. Taylor & Francis Group. All rights reserved.

of Optics), he evaluated existing theories of light and vision, emphasizing that carefully designed experiments are a basis of our knowledge of the world. Through experiments using lenses and mirrors, Ibn al-Haytham showed that light travels in straight lines. From dissections, he began to explain how the eye works and synthesize the medical knowledge of previous scholars. In particular, Ibn al-Haytham demonstrated that light is not produced by the eye, as some theories had claimed, but instead that it enters the human eye from the outside. Once it was clear that light given off by objects enters the eyes, this raised new questions about the nature of light (Al-Khalili, 2015). In the centuries following Ibn al-Haytham’s breakthrough work, many other philosophers and scientists engaged with those questions. In the 17th century, influential natural philosophers thought that colored light was produced by the modification of white light by interactions with objects and the materials through which it travels. So, passing light through a glass prism was thought to produce a spectrum of colors because white light is modified by the impurities of the glass. Similarly, it was thought that we perceive colorful rainbows because sunlight is modified by going through drops of moisture. Isaac Newton (1643–1727), one of the most influential scientists of all time, was not convinced by this view. Instead, he hypothesized that colors are always contained within the light itself and that passing light through materials just separates out the colors of which light is made. To test these competing hypotheses, Newton darkened his room and bored a small hole in the window shutters, so that only a thin beam of light could enter the room. When Newton placed a glass prism in the beam, the spectral colors—a rainbow of light—appeared on his wall. This observation was consistent with both hypotheses, however. Both the modification hypothesis and Newton’s hypothesis that white light is a mixture of colors could explain the observation that a beam of light travelling through a glass prism produces a spectrum of colors. In another experiment, Newton passed a beam of light through two prisms instead of one. What would you expect to observe if the modification hypothesis were true? Presumably, the impurities contained in the two glass prisms would continue to modify white sunlight and just spread out the color spectrum further. When Newton let the beam of light pass through the first prism, it split into a spectrum of colors as expected, just like in the previous experiment. But when the spectrum of colored light passed through

FIGURE 2.3

Isaac Newton’s illustration of his two-prism experiment

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Experiments and Studies

53

the second prism, it recomposed back into white light! This observation was unexpected under the modification hypothesis, but it was consistent with Newton’s thought that white light is composed of colors. So, this experiment provided Newton with evidence against the modification hypothesis and in support of his own hypothesis that passing light through a prism merely separates out what is already there. While experiments can generate scientific knowledge, they also often prompt new questions. This was so for Ibn al-Haytham’s finding that light does not originate in the eye and also for Newton’s later prism experiments. Light isn’t just something we see but also something we feel; surely you’ve noticed that ordinary sunlight is warm. Newton’s finding that visible white light is actually a spectrum of colors prompted further questions. If light is a spectrum of colors, is it also a spectrum of temperatures? Or are different colors of light the same temperature as one another? In 1800, the British astronomer William Herschel (1738–1822) used a telescope to observe sunspots, which are regions on the Sun that appear temporarily dark (Herschel, 1801). Observing sunspots is hazardous for the eyes, so he used colored glass filters to reduce the intensity of the rays. Herschel noticed that he could feel the Sun’s heat coming through the filters. Different filters seemed to differ in temperature; but since the filters didn’t differ in material, Herschel wondered whether the different colors of the filters might actually be responsible for the differences in temperature. Notice that this wasn’t what Herschel had set out to investigate; sometimes experiments, or observations more generally, take us in unanticipated directions. Herschel tested his hypothesis about a relationship between light’s color and temperature by directing sunlight through a prism to spread the spectral colors, as Newton had. Then he measured each color—red, orange, yellow, green, blue, indigo, violet—with a mercury thermometer. He also measured the ambient temperature in the room in order to have a baseline temperature to compare with the temperature measurements of the light. This setup yielded data in the form of measured values of color (independent variable) and measured values of temperature (dependent variable), which could be used as evidence to evaluate the hypothesis that different colors of light differ also in temperature. The evidence confirmed this hypothesis: Herschel found that the temperatures increased incrementally from the ‘cool’ colors like blue to the ‘warm’ colors like orange. Another of Herschel’s observations introduced a new question about light. Herschel also measured the temperature of the air just beyond the beam of red light, outside the edge of the spectrum created by sunlight through the prism, where no light was visible. His hypothesis was that this temperature would be the same as the ambient temperature in the room, since it was beyond the edge of the light spectrum. To his surprise, the temperature at that location was much warmer than the ambient room temperature, even higher than any of the temperature measurements for the light spectrum. How could that be? Herschel’s observation immediately led to a new hypothesis: some kind of invisible, hot light exists just beyond the red part of the visible spectrum. This hypothesis would explain the observation—anticipated by the French physicist Émilie du Châtelet (1706–1746) almost 65 years earlier—that the temperature continued to increase beyond the edge of red light. Later observations confirmed this hypothesis, and we now accept the existence of this hot, invisible light. It’s called infrared light.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

William Herschel’s experimental setup to test the relationship between the color and temperature of light

FIGURE 2.4

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

Experiments and Studies

FIGURE 2.5

55

Three scientists who contributed to our knowledge of light

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Experimental Setup Experiments have different aspects—physical, technological, and social—that need to fit together in the right way for scientists to harvest useful evidence; how these aspects are arranged is the experimental setup. First, there are concrete, physical aspects. Experiments involve one or more subjects: humans, non-human animals, or inanimate objects. They also often include instruments: technological tools or other kinds of apparatus that help enable the experimental process. Newton and Herschel used telescopes, lenses, prisms, light filters, pencils, and notebooks to collect and analyze their data. Present-day experiments in high-energy physics at the European Organization for Nuclear Research, CERN, take place in the Large Hadron Collider. This is located in a tunnel on the border between France and Switzerland, and it is used to accelerate and collide subatomic particles. The Large Hadron Collider took 10 years to construct (1998–2008) and involved the collaboration of over 10,000 scientists and technicians from more than 100 countries and hundreds of universities and laboratories. With a circumference of 27 kilometers, it is currently the largest scientific instrument in the world. CERN experiments also require the use of powerful computers for data collection, analysis, and visualization of the myriad particles produced by collision in the accelerator. Experiments also occur in some place, over some period of time. Experiments can take place in laboratories located in universities and hospitals or in the field, that is, in natural settings like classrooms, subway stations, glaciers, coral reefs, nesting areas, and so on. Some experiments have a short duration; others can last many years. Herschel observed different temperatures related to different colored sun filters in one day, on February 11, 1800. Mendel’s experiments with pea plants stretched over a seven-year period. Presentday experiments at CERN can take dozens of years, as do the experiments carried out in space by the US National Aeronautics and Space Administration (NASA). Experiments are also normally carried out by one or more individual scientists. Collaborative experiments are common in contemporary science; this is one element of the social structure of science discussed in Chapter 1. Most collaborative experiments involve

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

56

Experiments and Studies

scientists with different backgrounds who rely on one another’s expertise. Experiments at CERN, for example, are highly collaborative, run by hundreds of scientists and engineers from all over the world, each of whom brings some specific expertise to bear. This is more extensive collaboration than is common across science though, and some scientific experiments are still run by a single lab or even an individual. But even in those cases, communities of scientists, represented by scientific institutions and societies, determine protocols to be followed in experimental design and data analysis. Another aspect of experimental setup is harder to discern but just as important to producing evidence. These are the background conditions or extraneous variables. Consider Newton’s prism experiments. The room at Trinity College, Cambridge, where he performed these experiments, had a certain ambient lighting, temperature, and humidity. The angle at which sunlight hit the room’s windows varied by time of day and season. Prisms, the instruments Newton used, were not commonly thought of as scientific instruments in the 1660s and so were sold simply for their entertainment value. As a result, they were irregular in both size and composition. These factors were all in the background of Newton’s experiment. So, Newton needed to show that none of these background factors undermined his conclusion that apparently white sunlight contains distinct colors within it (Newton, 1671/1672). As it happened, the Royal Society—the learned society for science of which Newton was a member—criticized his results on the basis of the condition of the prisms. The Royal Society suggested that, consistent with the earlier modification hypothesis, the prisms’ bubbles, veins, and other impurities caused the light to become colored as it passed through. In general, managing background conditions is one of the most challenging issues of running experiments.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Collecting and Analyzing the Data An experimental setup elicits data. Those data must be collected and analyzed in order to compare the experimental outcome with expectations. Collecting data involves gathering and often measuring information about the values of variables of interest at particular times, places, and contexts. Climate scientists collect data from things like glaciers, oceans, and the atmosphere—for example, glaciers’ mass balance, sea surface temperatures, and the atmospheric pressure at sea level. The choice of an appropriate method for data collection depends on many factors, including one’s research interests, the hypothesis under investigation, the variables of interest, and the available instruments. Any method should ensure that data are collected thoroughly and accurately—enough to provide evidence of the desired form and to enable replication. Quite often, data collection involves one or more specialized instruments. This may sound odd, but the acceptance of instruments for data collection in science was not achieved without struggle. During the Scientific Revolution, a main challenge was to legitimize the data collected using glassware like prisms, telescopes, and microscopes, as well as scales, chronometrical devices, and other instruments. We saw earlier how this challenge factored into the reception of Newton’s findings (Schaffer, 1989).

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Experiments and Studies

57

While there’s no longer any question that instruments in general can play an essential role in data collection, questions about the reliability of specific instruments still arise. No scientific instrument is free from error. For example, in 2017, scientists at the National Institute of Standards and Technology (NIST) used a Kibble balance—an instrument for making extremely accurate measurements of the weight of an object—to determine the most precise value yet of the Planck constant, which is an important quantity in quantum physics named after the German physicist and Nobel Prize winner Max Planck (1858– 1947). But even after more than 10,000 measurements, those scientists were still left with uncertainty about the exact value of the Planck constant, partly because of the error involved in any measurement (Haddad et al., 2017). (The value of the Planck constant is about 6.626069934 × 10−34 Joule · second, in case you were wondering.) Such measurement error is an inherent part of data collection. Ultimately, the best that scientists can do is to avoid systematic measurement error by continually calibrating instruments, where calibration involves the comparison of the measurements of one instrument (for example, an electronic ear thermometer), with those of another (for example, a mercury thermometer), to check the instrument’s accuracy so it can be adjusted if needed. Different types of data can be analyzed in different ways. One basic distinction is that data can be either quantitative or qualitative. Quantitative data are in a form—often numerical—that makes them easily comparable. Climate science data, for example, are often quantitative. It is recorded as arrays of numbers, numerical indices, and symbols that correspond to measurable physical quantities. Such quantitative data can be used for statistical analysis (see Chapters 5 and 6) and computer simulation (see Chapter 3). Qualitative data consist of information in non-numerical form. This information can be obtained, for example, from diary accounts, unstructured interviews, and observations of animal behavior. Analysis of qualitative data is often less straightforward than quantitative analysis. It requires accurate description of subjects’ responses and behavior, trustworthy informants, and significant background knowledge. We will say more about qualitative research in Section 2.3. In experiments with human subjects in the social, cognitive, and behavioral sciences, data collection often involves questionnaires that create quantitative, numerical data from qualitative information. These questionnaires may include multiple-choice questions and scales of various kinds. For example, standardized tests like the SAT, used for admission decisions to colleges and universities in North America, are considered predictors of student performance. Student performance varies along multiple dimensions, but the SAT and similar tests boil this down to a single score for each test taker that is relative to other test takers’ performance. Other questionnaires provide quantitative data about personality traits, political opinions, attitudes toward some topic or group of people, and so on. Questionnaires can be a very useful form of data collection, but good questionnaire design is vital for collecting reliable data. This is like the need for instrument calibration described earlier. And, for questionnaires, effective design and calibration can be surprisingly difficult. A poorly designed question can prime subjects to answer in a certain way—often, because of the observer-expectancy effect, the way the experimenter expects or desires them to answer. Questions can also be vague or ambiguous, eliciting different kinds of responses from different people or unintentionally ask about more than one thing

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

58

Experiments and Studies

at once. Frankly, there are many ways to go wrong, and so there are many more poorly designed surveys out there than well-designed surveys. Poorly designed questionnaires can result in data that are too weak to count as evidence or to support inferences, or that are otherwise useless because they cannot be analyzed in the intended way.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Crucial Experiments and Repeat Experiments Think back to Newton’s two-prism experiments. Newton thought these experiments proved that white light is a mixture of colors, that it is not modified by a prism to become colored. Newton called this a crucial experiment (experimentum crucis), which is an experiment that decisively adjudicates between two hypotheses, settling once and for all which is true. Such decisive experiments are exciting, but very few experiments are actually crucial experiments. One reason why relates to extraneous variables. We have said that controlling extraneous variables, or background conditions, is important to an experiment’s ability to provide good evidence. But it’s virtually, if not entirely, impossible to control all background conditions. Some minor background condition assumed to be irrelevant might turn out to be a confounding variable, invalidating the experimental result. Even if an experiment could completely control all extraneous variables, this still might not be enough to guarantee the experimental result is correct. Data may match the expectations, or may fail to do so, for unexpected reasons. Some other, unknown phenomenon that hasn’t yet been investigated might actually turn out to be responsible for the experimental result; some other hypothesis might turn out to be true instead. This is called the underdetermination of hypotheses by data: the evidence not sufficient to determine which of multiple hypotheses is true. Some think that every hypothesis is always underdetermined by the data, that there is always some hypothesis (perhaps not yet known) that is also consistent with all the data, no matter how much is collected. Here is an illustration of underdetermination. Suppose you want to test the hypothesis that playing violent video games causes violent behavior. If this hypothesis is true, then you should expect that more time spent playing violent video games is linked to more instances of violent behavior. But this is also what you should expect if those people who are already prone to violence tend to play violent video games more often than other people, or if people’s tendencies to be violent and to play violent video games are both caused by some other factor, such as a personality disorder or parental neglect. There are experiments that can determine which of these three possibilities is right. For instance, you might assign people to play different amounts of violent video games (intervening on the independent variable of violent-video-game playing) and then record their level of violence. If you observe increased violent behavior then the intervention—the violentvideo-game playing—is responsible for it. But is it the violent video games, or would playing any video games at all result in more violent behavior? A new experiment is called for. If it’s the violent video games, is it a particular form of violence or any violent video games? These kinds of questions are always possible. Other untested hypotheses often lurk right around the corner. For that reason, few if any experiments are crucial experiments that decisively favor a given hypothesis. There’s also a problem with the idea that an experiment can definitively prove some hypothesis is wrong. An experiment to test some hypothesis involves a number of auxiliary assumptions—assumptions that need to be true in order for the data to have the

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

Experiments and Studies

59

intended relationship to the hypothesis under investigation. When data do not match expectations, this might be because the hypothesis is wrong, or it might be because one of the auxiliary assumptions is wrong. Perhaps your data collection instrument is miscalibrated, or your group of subjects is atypical, or there’s some confounding variable you haven’t predicted. So, whether the data from an experiment match your expectations or not, this is not truly decisive. One experiment can weigh in favor of or against some hypothesis, but it generally can’t settle the matter once and for all.

Box 2.1 How Should Scientists Handle Underdetermination?

Copyright © 2018. Taylor & Francis Group. All rights reserved.

We have suggested that the underdetermination of hypotheses by the data is common or possibly unavoidable. How should scientists proceed in light of this? One response to the problem would be to suspend judgment about which hypothesis should be accepted. But suspending judgment just isn’t an option when we need to build a bridge or design an effective drug. One solution is to seek more evidence to help us decide between the hypotheses we’re most concerned with. Additionally, hypotheses and theories that fit with the existing data are sometimes more or less appealing in other regards. In the mid-16th century, both the Ptolemaic geocentric and Copernican heliocentric theories fit all of the existing cosmological data. But Copernicus’s theory was said to be more elegant and harmonious than Ptolemy’s. It certainly was simpler, as Ptolemy’s theory could accommodate some data only by introducing adjustments that complicated the theory tremendously. Considerations of this kind were part of the reason why Copernicus’s theory superseded Ptolemy’s. In other cases, one hypothesis might lead to more fruitful novel experiments, might fit better with other scientific findings, and so on. The general point is that underdetermination seems to be a circumstance in science where considerations beyond empirical evidence contribute to which hypotheses or theories scientists accept.

We’ve discussed three sources of uncertainty about what an experiment shows: extraneous variables, unanticipated hypotheses, and auxiliary assumptions. One of the primary ways to minimize uncertainty from these three sources is for experiments to be replicated. Replication involves performing the original experiment again—often with some modification to its design—in order to check whether the result remains the same. If, for example, the spectrum of light recombining into white light observed by Newton is also observed by different people, using different prisms, in different places and at different times, then this additionally supports Newton’s hypothesis that white light contains a spectrum of colors. If some experimental result cannot be replicated—if different scientists follow similar experimental procedures but do not get the same result—then the original experimental result may be a fluke, or it may be due to some confounding variable in the experimental setup that the scientists haven’t yet identified. The replicability of experiments is an indispensable ingredient of science, so much so that a persistent failure to replicate findings may undermine a scientific field’s credibility.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

60

Experiments and Studies

For example, we saw in Chapter 1 that astrology’s failure to replicate findings is part of its pseudoscientific status. Recently, it has also been suggested that the field of social psychology faces a crisis in replicability, where different research groups have tried but failed to replicate some classic experimental results. This suggests we should perhaps not put too much stock in those findings, unless this failure in replicability is resolved (Pashler & Wagenmakers, 2012). The difficulties in designing genuinely crucial experiments and the importance of replication fit with the idea that science is essentially a collaborative, social venture. Because of this, gaining scientific knowledge via experimentation is generally more complicated and slower than a single dramatic experiment. Also, scientific knowledge can go in unexpected directions: a surprising finding that upends something we thought we understood might be right around the corner.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Other Roles for Experiment So far, we have focused on one central purpose of experiments: to test hypotheses by providing confirming or disconfirming evidence. But experiments play many other roles as well. Experiments can be used to evaluate whether scientific instruments like telescopes and prisms function as expected. For example, to persuade other members of the Royal Society that his hypothesis was true, Newton had to show that his prisms ‘worked properly’. Many of his experimental trials were then aimed at testing how prisms with different shapes and composition affected the spectrum produced. The publication of his Opticks (1704/1998) described these trials and their results in detail. Supported by this extensive data and Newton’s theory of colors, prisms became accepted scientific instruments. We have already discussed calibration in this chapter. Any instrument for data collection must be calibrated using known measurements before it can be used in an experiment with uncertain results. For example, a thermometer must be shown to measure the known temperature accurately before we can trust its measurement of an unexpected temperature. Calibrating thermometers requires the establishment of ‘fixed points’, such as the boiling (100° Celsius) and freezing (0° Celsius) points of water, to create a meaningful temperature scale to apply across different thermometers. When some standardized scale is established, instruments can be used repeatedly and their measurements compared over time and across instruments. This body of measurement data might then be used to construct more stable measurement scales and more accurate instruments. Brain-imaging techniques provide another illustration of using experiments to establish the function of and to calibrate an instrument for data collection. Functional magnetic resonance imaging (fMRI) machines track blood flow in the brain. They do not directly measure neural activity, but that is what the scientists employing these machines want to assess. Neuroscientists use data about blood flow to reason about neural activity because they know that greater neural activity requires more energy, which requires increased metabolism, which uses more oxygen, and oxygen is delivered by blood flow. The expectation that blood flow provides a good proxy for neural activity is also confirmed by findings concerning brain metabolism and the relationship between different brain areas and functions. Besides evaluating and calibrating instruments, experiments can be used to determine the value of physical constants, or quantities that are believed to be universal and unchanging over time. We mentioned Planck’s constant earlier. Another physical constant

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

Experiments and Studies

61

is the speed of light in a vacuum. In the Opticks, Newton reported the calculations of the Danish astronomer Ole Rømer (1644–1710) regarding the speed of light. Rømer observed that there could be a difference of up to 1,000 seconds between the predicted and observed times of the eclipses of Jupiter’s moons. Based on the estimated distance between Jupiter and the Earth, Rømer concluded that light travels at about 200,000 kilometers per second. In 1849, the French physicist Hippolyte Fizeau ran the first major experiment to precisely determine the speed of light. Fizeau built an experimental apparatus in which an intense light source and a mirror were placed eight kilometers (about five miles) apart. He placed a rotating cogwheel between the light source and the mirror and increased the speed of the wheel until the reflection back from the mirror was obscured by the spinning cogs. Based on the rotational speed of the wheel and the distance between the wheel and the mirror, Fizeau calculated that the speed of light is 313,000 kilometers per second. Rømer’s estimate and Fizeau’s later calculation were on the right track; today, we take the speed of light to be 299,792 kilometers per second. A third role of experiments is exploratory. In this use, experimentation does not rely on existing theory and may not be aimed to test a specific hypothesis. An exploratory experiment is used to gather data to suggest novel hypotheses or to assess whether a poorly understood phenomenon actually exists. Herschel’s work on the relationship between heat and light, for example, did not rely on a particular theory or a hypothesis about the relationship. When, in the course of investigating sunspots, he discovered that red light has a greater heating effect, Herschel surmised that the light spectrum is made of both heat and colors. This idea was on the right track, but it was not until James Maxwell’s (1831–1879) theory of electromagnetic radiation that Herschel’s observations could be adequately explained and his work vindicated.

EXERCISES

Copyright © 2018. Taylor & Francis Group. All rights reserved.

2.1

Review the discussion of Newton’s prism experiment. Identify the hypothesis under investigation, the independent variable, and the dependent variable, and describe the intervention.

2.2–2.5 The Anglo-Irish scientist Robert Boyle (1627–1691) used equipment like vacuum chambers, air pumps, and glass tubes in his experiments. With the assistance of Robert Hooke, Boyle conducted a series of experiments in the 1660s to ascertain how the pressure and volume of the air vary when the air is either ‘compressed or dilated’. He used a J-shaped glass tube. The tube was closed off at the short end, and the long end was left open. By adding mercury in the longer end, Boyle could trap air in the curved end of the tube; by changing the amount of mercury, he was also able to change the air pressure at the short end. Boyle repeated this experiment, measuring the volume of the air in the short end of the tube at a range of pressures. What he discovered was that, as he increased the pressure on the air, the volume of the air would decrease. Boyle’s formulation of this relationship would become the first gas law, now known as Boyle’s law. 2.2

What was the hypothesis under investigation? Use that hypothesis to identify the independent variable and the dependent variable. What evidence was gained from this experiment?

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

62

Experiments and Studies

2.3

Make a list of 10 extraneous variables in Boyle’s experiment. Put a star next to any variables that you think might have been confounding variables, and say why. Try to do this for at least two variables on your list.

2.4

Think of an alternative hypothesis that could account for the results of Boyle’s experiment. State that hypothesis, and describe how it could account for the data.

2.5

Define calibration, and describe how it was involved in Boyle’s experiments.

2.6

Describe three features of experiments that are particularly valuable to testing hypotheses, and describe the value of each of those features.

2.7

What is the relationship between extraneous variables and confounding variables? Why are experiments designed to limit confounding variables?

2.8

List the three kinds of sources of uncertainty regarding what a given experiment shows. Describe each one, and give an example of each.

2.9

Describe the problem of underdetermination, and discuss how scientists deal with it.

2.10 Briefly describe three roles for experiments other than testing hypotheses, and give an example of each. Then discuss how each of these might relate indirectly to testing hypotheses. 2.11 Before Ibn al-Haytham’s work, some thought that vision involved light shining out of the eye, coming into contact with objects, and thereby making them visible. This was known as the emission theory of vision.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Describe an experiment that would test the emission theory of vision. What would you expect to observe in that experiment if the emission theory were true? Finally, list the auxiliary assumptions you would need to make in order for the emission theory to generate those expectations. 2.12 Ibn al-Haytham set up the following experiment to test the emission theory of vision. He stood in a dark room with a small hole in one wall. Outside of the room, he hung two lanterns at different heights. He found that the light from each lantern illuminated a different spot in the room. For each, there was a straight line between the lighted spot, the hole in the wall, and one of the lanterns. Covering a lantern caused the spot it illuminated to darken, and exposing the lantern caused the spot to reappear. a. What data were produced by this experiment? b. How do the data provide evidence against the emission theory? c. Describe one way in which the emission theory might be adapted to account for the data (but still remain an emission theory of vision). d. Describe one new hypothesis you can formulate based on the results of Ibn alHaytham’s experiment.

2.2

THE PERFECTLY CONTROLLED EXPERIMENT

After reading this section, you should be able to do the following: •

Identify the features of a perfectly controlled experiment and characterize the importance of each

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

Experiments and Studies

• •

63

Describe the difference between direct and indirect variable control Describe the steps to conducting a perfectly controlled experiment of a given hypothesis

The Perfect Experiment In a perfectly controlled experiment, experimenters perform an appropriate intervention on an independent variable and then measure the effect of this intervention on the dependent variable. All extraneous variables are fully controlled, so no confounding variables are possible. Any change in the behavior of the system thus must be due to the experimenters’ intervention. This doesn’t eliminate the possibility that some unknown hypothesis also accounts for the data or that some auxiliary assumption was wrong, but it does eliminate the possibility that some confounding variable interfered with the effect. Such an experiment is simple to describe, but in fact no experiment is perfect. It’s very difficult to get even close to this ideal in practice. But a great way to shed light on important elements of experiment design is to consider the ideal of the perfectly controlled experiment. We’ll start the discussion with a step that didn’t even make it into the brief characterization of the perfectly controlled experiment just given: defining expectations. We’ll then discuss intervention, variable control, and controlling for bias. Our description of the perfectly controlled experiment leaves out the later stages of data collection and analysis. We gave an overview of those stages in Section 2.1, and Chapter 6 discusses the statistical analysis of data in depth.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Defining Expectations To test a hypothesis with an experiment, an important first step is to articulate what the hypothesis would lead you to expect for the outcome of the experiment. Those expectations are predictions of the results of some intervention if the hypothesis in question is true. The expectations might also be informed by background knowledge or some general accepted theory. Ideally, expectations are clearly and precisely defined in advance in a way that makes them easily comparable to the data the experiment will produce. This is important for controlling the extraneous variable of experimenters’ beliefs, which otherwise may influence their perceptions of the experimental results (recall from Chapter 1 the power of confirmation bias). Suppose that your knowledge of Sigmund Freud’s psychoanalytic theory leads you to form a hypothesis about someone’s personality. Perhaps you wonder whether your friend Philippe’s fear of horses is due to an Oedipus complex—that is, to Philippe’s unconscious and suppressed desire for his mother. How might you test this? Freudian psychoanalysis involves interesting ideas, but they’re just too imprecise and intractable to use as the basis for the formulation of clear predictions. Since psychoanalysis does not yield any clear predictions in advance, it seems to be open only to confirmation, not falsification or disconfirmation. This recalls the discussion of pseudosciences, like astrology and homeopathy, from Chapter 1.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

64

Experiments and Studies

Contrast this with Albert Einstein’s theory of general relativity. This theory revolutionized our understanding of space and time. While Newton believed that space is a sort of absolute stage on which events unfold, Einstein conceived of space and time as a single interwoven manifold, a fabric of sorts. For Newton, gravity was a force; Einstein instead explained gravity as the curvature of the space-time manifold. Just as marbles placed on a fabric sheet held in the air bend the sheet around them, massive objects like the Sun warp space-time in their vicinity. This is why other objects accelerate toward those massive objects. Unlike Freud’s psychoanalytic theory, Einstein’s theory of general relativity generates clear expectations. One of these expectations is that light, just like any other form of matter, is affected by gravity. If a beam of starlight passes near the Sun, then it should be deflected, or bend, toward the Sun. The beam’s deflection can be measured as the angle between where we actually see the star and where we would expect to see the star if the beam of light had travelled in a straight path. Einstein’s theory also provides us with a precise prediction of this angle. This prediction could first be tested a few years after Einstein completed his theory in 1915. On May 29, 1919, when a total solar eclipse blocked out the dazzling light of the Sun, a group of scientists led by English astronomer Arthur Eddington took photographs of stars visible near the dimmed Sun. They compared these to other photographs taken at night, when the light of those same stars did not pass close to the Sun before reaching Earth. From this comparison, Eddington was able to test, and confirm, Einstein’s prediction of the light’s deflection. The Sun changed the path of nearby starlight as the theory of general relativity predicted, providing confirmation of that theory (Dyson, Eddington, & Davidson, 1920). When the press reported that a key prediction of Einstein’s theory had been borne out by observation, Einstein became a famous public figure. Here’s another example of a clear and precise expectation based on a hypothesis. This example comes from game theory, which is a broad framework for thinking about conflict and co-operation among strategic decision-makers. Imagine you are given $10. You’re asked to share this sum with a partner, and you and your partner must agree about how to divide it. You can propose a division of the $10, and your partner can accept or reject that offer. If your partner rejects your proposed division, neither of you will get any money; if your partner accepts your offer, you’ll each get your agreed-upon share of the money. What would you do? Based on standard game theory, if everyone acts in their own self-interest, one would expect that proposers in this situation will offer close to nothing to their partners and that responders will accept anything more than $0. For responders, it’s rational to accept anything, since otherwise they’ll get nothing. And proposers know this, so it’s rational for them to offer only a small amount. This expectation has been experimentally tested time and again, and it turns out to be wrong (Güth, Schmittberger, & Schwarze, 1982). The average offers are around 40 to 50% of the total sum, that is, about $4 or $5 dollars when $10 is being divided. And when proposers offer less than 30%, responders consistently reject the offer, deeming it unfair, even though this results in them getting no money at all. The proposers and responders were on the same page, apparently willing to sacrifice self-interest for fairness. This was not at all what standard game theory predicted.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

Experiments and Studies

65

Copyright © 2018. Taylor & Francis Group. All rights reserved.

FIGURE 2.6 Headlines reporting on Arthur Eddington’s observations during the 1919 eclipse, which confirmed Albert Einstein’s theory of general relativity

As both this example and the one before illustrate, scientists’ hypotheses and theories often involve concepts and variables that we don’t have an obvious way to test. This is a stumbling block in formulating clear expectations for an experiment. How can you measure the values of variables like wealth, violence, mood, and fairness? To manage this difficulty, scientists often use operational definitions and clusters of indicators to characterize fuzzy concepts in a way that allows for measurement. An operational definition is a specification of the conditions when some concept applies, enabling measurement or other kinds of precision. In the game theory example from the previous page, we might operationally define fair offers to include any offer of 40–60% of the money. This definition is clear and useful, not because correctly states the nature of fairness, but because it offers a precise way to proceed with testing. Operational definitions can lack nuance. Wealth, for example, means more than simply having a high income or a parent who is a high earner. This is why economists often study poverty and wealth using a combination of indicators such as yearly income, access to education and health care, and permanent housing. Such cluster indicators identify several markers of some variable in order to more precisely measure it while not oversimplifying it.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

66

Experiments and Studies

For some concepts, there simply is no single best definition or measure. There is, accordingly, choice in how to operationally define it or which cluster indicators to use. Still, some definitions are better than others. Some definitions or sets of indicators get closer than others to capturing what we have in mind for, for instance, a fair deal or being wealthy. Our theories of the phenomena under investigation regularly inform how we define concepts. For example, some definitions may be shown to specify the nature of poverty more accurately than others because they accord better with our best economic and sociological theories or because these definitions have been shown to better predict future events consistently across studies.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Intervention An experimental intervention is the centerpiece of a perfectly controlled experiment. Recall that an intervention is a direct manipulation of the value of a variable. Because of this intervention, that variable is called the independent variable. Interventions could include the administration of a drug to a group of patients, fertilizer to a plot of land, or deliberate changes in the lighting conditions in a workplace. During an experiment, scientists deliberately intervene on the independent variable and then measure the impact of their intervention on the dependent variable. In an agriculture experiment, for example, scientists may assess the hypothesis that a particular fertilizer is better for crop yield. Their intervention would consist in changing the value of the variable of interest: the type of fertilizer. In particular, they would change the fertilizer to the particular fertilizer the hypothesis predicts is better. They then would watch for changes in crop yield, the dependent variable. The expectation based on the hypothesis is that crop yield will increase; their measure of the value of the dependent variable, crop yield, is a way to assess this hypothesis. There are many different ways to perform experimental interventions. But ideally, scientists want interventions to be ‘surgical’. This metaphor suggests interventions should be made with the precision that surgeons bring to the operating table; the incision should be carefully made at the exact location that will bring about the desired effect. If an intervention is surgical in this sense, it affects only the independent variable. Any change in the value of the dependent variable can then be traced back to the independent variable’s influence. A surgical intervention on the type of fertilizer will simply switch out the old fertilizer for a new kind. Everything else should remain the same: when and how frequently the fertilizer is applied, the method used to apply it, the location of the field, the crop, the growing time, and so forth. Some interventions cannot be performed for ethical or practical reasons. For example, it obviously would be unethical to subject healthy individuals to major brain damage or to diseases like syphilis in order to study their consequences. In other cases, experimental intervention is impractical. Suppose you want to find out about how the distance of the Moon from the Earth influences the motion of the tides. The most direct way would be to intervene on the distance of the Moon from the Earth and see how the tides change in response, but no scientist can currently alter the orbit of the Moon. And even if this were possible to do, this change would not be surgical. Altering the orbit of the Moon will almost certainly have other effects on Earth, and these changes may in turn have an effect on the tides. In such cases, when

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

Experiments and Studies

67

ethical or practical considerations prevent a surgical intervention, scientists look for ways  to approximate a desired intervention. Different experimental and nonexperimental approaches do this in different ways, as we will see in the next section.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Controlling Variables In ‘surgical’ interventions, conditions are created in which no variables, other than the independent variable and the dependent variable, change when an intervention is performed. So, another key feature of a perfect experiment is the full control of all extraneous variables. Full variable control is exceedingly difficult to accomplish. There are always countless extraneous variables in an experiment, many of which scientists don’t fully understand or aren’t even aware of. All of those extraneous variables need to be controlled in order to avoid confounding variables, but it’s hard to control what you have not identified! Control over variables can be approached in a number of ways. These can be divided into two broad categories: direct and indirect. Direct variable control is when all extraneous variables are held at constant values during an intervention. Because the extraneous variables are unchanging, they cannot be responsible for any changes to the dependent variable. So, if direct variable control is successful, only the intervention can be responsible for a change in the dependent variable. Recall Newton’s prism experiments. Newton could directly control some extraneous variables, like the time of day at which he ran his experiments and the lighting conditions in his chambers. Keeping those variables constant ensured that, for example, any difference in the composition of morning and afternoon light didn’t affect his findings. Newton also attempted to control for the confounding influences of air bubbles and other impurities in the prisms by using higher-quality prisms. The carefully arranged conditions in today’s laboratories help scientists to directly control many variables. Temperature, cleanliness, lighting, noise, instructions to human subjects—all of these factors and more are extraneous variables, and all should be held fixed during an experiment. Consider again experiments conducted with the Large Hadron Collider at CERN, the world’s largest laboratory. One important independent variable is the proton-proton collision. Dependent variables, which are measured and analyzed by scientists at CERN, are features of the by-products of these collisions. During experiments, scientists use sophisticated technologies to keep many variables under direct control, such as the magnetic fields and temperature in the collider. In many experiments, however, direct control of all extraneous variables is simply not possible. As we have seen, scientists often don’t even know all the extraneous variables that may be relevant. The second category of variable control, indirect variable control, helps with this. The basic idea is to allow extraneous variables to vary in a way that is independent from the intervention. Then, although extraneous variables will vary, they should vary in a way that is the same for the different values of the independent variable. Any systematic differences in the dependent variable between different values of the independent variable can then be reasonably attributed to the independent variable. The first step to indirect variable control is to set up two groups of experimental entities (whether cells, plots of land, people, mice, or other subjects) to compare. The intervention should be the only thing that distinguishes these groups from one another. One group,

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

68

Experiments and Studies

the experimental group, receives the intervention to the independent variable. The other group, the control group, experiences the default other value(s) of the independent variable. And then, some approach is used to try to ensure that all extraneous variables affect the two groups equally. One approach to indirect variable control is randomization: the indiscriminate assignment of experimental entities to either the experimental group or the control group. Some method of group assignment is adopted so that no features of the experimental entities can be taken into account, even unconsciously, in determining group membership. This is meant to ensure that any differences among the experimental entities vary randomly across groups and thus bear no relation to the systematic difference between groups, the intervention. Many scientists believe randomization is the gold standard of indirect variable control. Randomization is one of the best approaches to indirect variable control, but it’s not a surefire guarantee. It could happen that all patients with some characteristic—say, all smokers—are randomly assigned to the experimental group, while all nonsmokers are randomly assigned to the control group. In an experiment designed to test, say, the effect of exercise on health, whether people smoke is surely a significant confounding variable. This example is extreme, but there is a much more general point behind it. Random group assignment guarantees extraneous variables are not related to group assignment, but it does not guarantee that extraneous variables do in fact vary equally across the two groups. Even with random assignment, the experimental and control groups may still differ from one another in ways other than the intervention. For this reason, there’s another condition that must be met for randomization to be an effective approach to indirect variable control: the sample size must be sufficiently large. Sample size refers to the number of individual sources of data in a study; often, this is simply the number of experimental entities or subjects. If the sample size is very small, chance variations between randomly assigned experimental and control groups is likely. If the sample size is very large, such chance variation is exceedingly unlikely, so unlikely that these variables can be considered effectively controlled. Imagine an experiment that involves only four people, two of whom are smokers. It is reasonably likely that both smokers will be randomly assigned to one group. Indeed, this would happen one out of every three times they are randomly assigned to groups. Now think about all of the variables among those four people: age, gender, medical history, education level, and so on. It’s all but guaranteed that at least some of these extraneous variables will be randomly distributed unevenly between the experimental and control groups, becoming confounding variables. Imagine, in contrast, an experiment with a sample size of 10,000 people, roughly half of whom smoke. It is exceedingly unlikely—so unlikely as to be virtually impossible—that all smokers would be randomly assigned to one group. More generally, a large sample size helps to make sure that variation in all extraneous variables is more or less equally distributed across randomly assigned groups.

Controlling for Bias An important set of extraneous variables that must be controlled are human expectations. As we saw with the Hawthorne effect or observer bias in Section 2.1, human experimental subjects’ expectations or desires can confound the results of an experiment. Likewise,

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Experiments and Studies

69

scientists often harbor background beliefs or even specific expectations about the outcome of an experiment. These are also extraneous variables that can readily become confounding variables; recall from Chapter 1 the power of confirmation bias. The strategies of direct and indirect variable control that we have talked about so far don’t help with these kinds of extraneous variables. Recall the example of investigating the effects of some exercise regime on health and, in particular, how randomization and a sufficiently large sample size control for the extraneous variable of cigarette smoking (and many others as well). If the researchers administering the tests used to evaluate health (the dependent variable) know whether the subjects they are testing exercised or not, then this knowledge and their expectations regarding the effects of exercise might subtly influence their evaluation of subjects’ health. Randomization and large sample size are no help here. To control for potential researcher bias, scientists sometimes design their experiments so that not even they know which subjects are in the control group and which are in the experimental group. This protocol is called a blind experiment. In the exercise/ health experiment, assignment to groups should be not only random but also blind; researchers shouldn’t know which subjects are in which group. Then, when they test a subject’s health, their expectations regarding the effects of exercise can’t influence their judgments of that individual’s health, since they won’t know whether the subject has exercised or not. With a blind experimental setup, the researchers’ expectations cannot influence the findings but the expectations of the experimental subjects might. Imagine you’re assigned to the experimental group, and you dutifully exercise as assigned. You might be motivated to work extra hard on the assessment of your health, or your expectation of good health might decrease your blood pressure, or there may be some other unintended influence on your health because of your expectation of the exercise’s effects. You might also simply want to please the researchers by helping show they are right about the value of exercise. This possibility is eliminated if both researchers and subjects are unaware of which subjects are in which group. This is called a double-blind experiment. Double-blind experiments are especially important for drug trials that test out new medicines. If participants or experimenters expect a particular medicine to be effective, then that expectation can directly lead to improved health. This is called the placebo effect. For this reason, it’s important that neither experimenters nor experimental participants know which participants receive the medicine being tested. The control group receives a placebo, an impotent substance or therapy. This way, no participants can discern whether they are receiving the real medicine, and they will be equally subject to the placebo effect. (This is, then, indirect control of the extraneous variable of placebo effect.) Another way to control for participants’ expectations is with deception. Whereas blinding involves omitting some piece of information, deception involves actively misinforming participants to interfere with how their expectations influence their behavior. The American social psychologist Stanley Milgram (1933–1984) often used deception in his experiments. For instance, Milgram wanted to understand people’s willingness to obey an authority figure who instructed them to inflict serious harm to others. It probably wouldn’t have worked to tell the experimental participants this was what was being tested. Few of us want to be viewed as inflicting harm on others just because someone in power told us to! So, Milgram falsely told participants that

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

70

Experiments and Studies

they were helping another person learn some material by quizzing the other person and delivering electric shocks to them to punish any incorrect answers. In reality, there was no other person learning, and no electric shock. The experimenters were simply studying how far participants would go in harming others simply because they were told to. (Ethics guidelines are more stringent now than they were then, and this study would probably not pass muster now.)

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Box 2.2 The Milgram Experiment Milgram’s experiment involved three roles: learner, experimenter, and teacher. Each subject waited in a lounge with another person whom they were led to believe was a second subject. In fact, the second person was a confederate —an actor pretending to be a subject—who was to play the role of learner. A third person—yet another confederate—played the role of authoritative experimenter. This person briefly and vaguely gave a contrived explanation of the experiment (not the real experiment). The supposed experimenter then pretended to randomly assign the other two individuals to play the roles of teacher and learner. In fact, the assignment was rigged; the naïve subject was always assigned to play the role of teacher, and the second ‘subject’—Milgram’s confederate—was always assigned to be the learner. The experimenter accompanied both individuals into a staged laboratory setting, using heavy restraints to strap in the learner to what appeared to be an electrified chair apparatus. To ensure that the naïve subject believed that the chair was actually operative, the experimenter delivered a real but mild shock. The experimenter then led the subject to a separate room with what appeared to be an electric shock generator. The machine had an instrument panel, consisting of 30 horizontal switches, each labeled by voltage, or strength of electric current. The labeled voltage ranged from 15 to 450 volts. Switches were grouped into eight categories of shock: slight, moderate, strong, very strong, intense, extreme intensity, danger: severe shock, and, finally, just ‘XXX’. A switch was flipped, then a red light turned on, an electric buzzing was heard, and the voltage meter would fluctuate. The experimenter pretended to have the teacher (the naïve subject) administer a learning task of four word pairs, which the learner was supposed to learn. The experimenter instructed the subject to flip a switch for each wrong answer, starting from 15 volt shocks and increasing for each error until the learner had learned all the pairs correctly. The dependent variable of the real experiment was the maximum shock subjects were willing to administer before refusing to continue. What results do you think Milgram obtained? Out of, say, 100 subjects, how many do you think would have administered shocks up to the highest level when instructed to do so? In Milgram’s first study, he found that, although many displayed deep discomfort at doing so, a full 65% of subjects administered the highest level of shock, marked ‘XXX’.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

Experiments and Studies

TABLE 2.1

1. 2. 3.

4.

71

Elements of the perfectly controlled experiment

Expectations are clearly articulated; if needed, concepts are defined operationally or using cluster indicators. An intervention is performed on the independent variable. All other variables are controlled, either (a) directly, by holding all other features constant, or (b) indirectly, by comparing an experimental group to a control group, with randomization and large sample size. The experiment is blind or double-blind, as appropriate, to control for bias.

EXERCISES 2.13 List all the features of a perfectly controlled experiment. For each, say what is important about that feature and what is challenging about accomplishing it. 2.14 Imagine you want to establish what effect, if any, taking notes on a laptop during class instead of on paper has on retention of information. a. Specify your hypothesis regarding the note-taking medium and memory. What are your expectations for your experiment, given this hypothesis? b. Describe your ideal experiment to test this hypothesis. Don’t worry about how easy it would be to actually conduct the experiment or if it’s even possible. Make sure to specify all the main features of the experiment. c. Identify three major challenges to conducting the ideal experiment you have described. Say why each is a problem.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

2.15 Philosophy majors tend to perform very well on all of the main entrance exams required by graduate programs and professional schools. They are the only major to score above average on all four of the following: the General Management Admissions Test (GMAT), the Law School Admissions Test (LSAT), the verbal portion of the Graduate Record Examination (GRE), and the quantitative portion of the GRE. Philosophy majors are vying with physics majors each year for the best comprehensive GRE scores, and they also have had the highest average on the verbal portion of the GRE, second highest on the GMAT (after mathematics), and third highest on the LSAT (after physics and economics). Formulate three different hypotheses that are each compatible with these data. Choose one of the three hypotheses, and design an experiment that could test it. Make sure you specify the independent and dependent variables, the intervention, your expectations for the findings if the hypothesis is true, and how you will control for extraneous variables, including experimenter and subject bias. 2.16 We have discussed how Einstein’s theory of general relativity generates the expectation that light, just like any other form of matter, is affected by gravity. This was surprising in the sense that it predicted certain events that had not been observed before. a. Why are surprising expectations, or novel predictions, important for testing hypotheses? b. How can surprising expectations, or novel predictions, be generated in sciences like archaeology and paleontology that study the past?

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

72

Experiments and Studies

c.

How can surprising expectations be generated about events that have already occurred or about data that scientists already have?

2.17 Suppose you want to test the hypothesis that baseball players who eat pizza every day hit more home runs. Let’s suppose that to test this hypothesis, you want to divide the baseball players of some team into two groups that are balanced in all important background variables that can affect players’ performance. The only difference you want between the two groups is that the members of one group eat pizza every day and the members of the other group do not. Rank the following four strategies from best to worst for accomplishing this goal: 1. Sit in the clubhouse after a game. The first players who enter the clubhouse are assigned to the group of pizza eaters (the experimental group), while the following players are assigned to the control group. 2. Allocate players born in the first six months of the year to the experimental group and players born in the second six months of the year to the control group. 3. For each player in the team you toss a coin. If the coin lands on heads, then the player is in the experimental group; otherwise, the player is assigned to the control group. 4. Assign all players over 230 pounds to the experimental group and the rest of the players to the control group. Justify each of your rankings by describing how well or poorly you expect that strategy will control the extraneous variables. 2.18 What is the purpose of having an experimental group and a control group in an experiment? How does division into two groups achieve this purpose? 2.19 Describe what randomization involves, why it can help to control for confounding variables, and what its limitations are.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

2.20 Define direct variable control and indirect variable control. Then, describe (a) how each is accomplished and (b) the advantages and disadvantages of each approach. 2.21 The American Psychological Association (APA) code of ethics maintains that experimentation may not involve use of deceptive techniques unless doing so has significant prospective scientific, educational, or applied value; that effective non-deceptive alternative procedures are not feasible; that participants are not deceived about research that is reasonably expected to cause physical pain or severe emotional distress; and that psychologists explain any experimental deception to participants as early as is feasible. Now, given these guidelines, think about Milgram’s (1963) experiment, and answer these questions: a. How were Milgram’s experimental participants deceived? b. Was deception necessary for this study? Why or why not? c. Evaluate the importance of this research. In your view, did this work justify deception? Why or why not?

2.3

EXPERIMENTAL AND NON-EXPERIMENTAL METHODS

After reading this section, you should be able to do the following: • •

Distinguish between lab and field experiments and identify the features of each Define external validity and internal validity and describe the importance of each

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

Experiments and Studies

• •

73

Describe the main types of non-experimental design Assess the advantages and disadvantages of the various features of experiments and non-experimental studies

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Variation from the Perfect Experiment The perfectly controlled experiment may be the ideal way to test hypotheses, but such experiments are seldom if ever possible to perform. And even when a nearperfect experiment is possible, circumstances can favor other approaches. So, real experiments deviate from the ideal in a variety of interesting ways. The type and degree of variation are influenced by the kind of phenomena under investigation, the goal of the investigation, the nature of the hypothesis, what confounding variables are expected, and the types of experimental entities. When experiments cannot be performed, there are a variety of non-experimental methods of empirical investigation that may provide insight into phenomena of interest. We might call investigations that use these methods, generically, non-experimental studies. In this section, we describe a variety of experimental and non-experimental approaches used to acquire scientific knowledge, indicating some of the main advantages and disadvantages of each. Non-experimental studies may be called for when performing an intervention needed of investigate a hypothesis experimentally is unethical, impractical, or downright impossible. Suppose you are investigating whether major childhood stress decreases life span. The relevant intervention, imposing on an experimental group of children distressing conditions like parental death, extreme poverty, or poor nutrition, would be morally repugnant. Other interventions are impractical. Space exploration provides many straightforward examples. In 1975, two probes—Viking 1 and Viking 2—were launched to conduct experiments on Mars aiming to determine whether the chemical makeup of Mars’s soil supports microbial life. A year after launch, the probes landed and conducted their experiments, but they returned negative or inconclusive results. The cost of designing and constructing a new probe, the time needed to travel to Mars, and other limitations weighed against repeating the experiments. In the end, NASA’s next successful Mars landing wouldn’t be for another 20 years, in 1996, and then nearly another 20 years passed before the Mars Curiosity rover became operational (at a cost of $2.5 billion). Finally, some interventions are literally impossible to conduct because of the laws of nature. Astrophysicists and cosmologists have long pondered the nature of black holes, which have such strong gravitational fields that they bend the surrounding space-time, so that all light and matter spiral inescapably into them. No one can possibly be in the right position to directly observe this, let alone to intervene on it.

In the Lab or in the Field? We have noted that some experiments occur in laboratories and others are field experiments, occurring in the outside world. There are advantages and disadvantages to each approach.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

74

Experiments and Studies

Copyright © 2018. Taylor & Francis Group. All rights reserved.

FIGURE 2.7

Mars Curiosity rover selfie taken on Mount Sharp (Aeolis Mons) on Mars in 2015

Laboratory experiments give researchers control over many aspects of the experiment, specifically over any interventions performed and the direct and indirect control of many extraneous variables. Depending on the nature of the experiments, a lab’s design features may include constant temperature, sterile environment, special equipment to produce unusual conditions, or, for experiments with human subjects, carefully selected lighting and furniture, soundproofing, and experimenters’ confederates who behave in a specified way. Those design features, and the control they provide, constitute one of the greatest advantages of the laboratory. Laboratory conditions are designed to control extraneous variables, to aid in detection and measurement of focal variables, and to create unique situations that don’t often or ever occur outside the lab. These features can enable scientists to discover regularities that are not easy to discern in the outside world. The high degree of control enabled by laboratory conditions brings with it a high degree of internal experimental validity. An experiment has high internal validity when scientists can correctly infer conclusions about the relationship between the independent and dependent variables with great certainty. This amounts to the absence of confounding variables, achieved by direct or indirect control of all relevant extraneous variables. A second advantage of laboratory experiments is that the experimental setup and data analysis can follow predetermined, standard procedures, which make it easier to assess and replicate an experimental finding. However, there are also some disadvantages to lab research. To start with, some phenomena are not easily investigated in a lab. Suppose you are investigating the effects of climate change on large marine mammals. Specifically, you want to determine the effects of elevated Arctic Ocean temperatures on the deep-diving behavior of narwhal whales. Narwhals—the so-called unicorns of the sea because of their tusks—can dive as deep as

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Experiments and Studies

75

1.8 kilometers (6,000 feet) in Arctic waters. To directly investigate this phenomenon in a lab, you will need—for starters—a huge tank of freezing salt water nearly two kilometers deep. Good luck with that, right? Furthermore, the same conditions that make it easy to directly and indirectly control variables make the lab conditions different from the outside world, and that has some disadvantages too. The artificiality of the experimental setting might mean that the results obtained in the lab do not generalize well to real-life settings outside the lab. This is problematic, since it’s ultimately the features of real-world phenomena that we want to know about. Laboratories thus facilitate high internal validity, but potentially at the cost of external validity. External experimental validity is the extent to which experimental results generalize from the experimental conditions to other conditions—especially to the phenomena the experiment is supposed to yield knowledge about. External validity has two components: population validity and ecological validity. Population validity is the degree to which experimental entities are representative of the broader class of entities of interest. For experiments with human subjects, this is the broader population they represent. The more representative a sample is of the broad class or population, the more confident scientists can be of the experiment’s external validity. Here’s an illustration of the importance of population validity. Many clinical trials testing the efficacy and side effects of drugs are performed only on men, but the results are expected to generalize to women as well. This decreases the population validity of the results, since women and men differ in a number of medically relevant ways. There is thus relatively limited experimental knowledge about the effects of some drugs on women, and this may have serious consequences for health and medicine. Indeed, many prescription drugs have been withdrawn from the market after they were belatedly revealed to pose greater health risks for women than for men (Simon, 2005). The second component of external validity, ecological validity, is the degree to which experiment circumstances are representative of real-world circumstances. Experimental settings or what subjects are asked to do can be artificial, unlike real-world circumstances, in ways that impact the phenomenon under investigation. Consider again Milgram’s experiment on compliance. How do you think the ecological validity of this experiment rates? To answer this question, we need to consider how similar the situation encountered in this experiment, administering electrical shocks to other people following instruction from an authoritarian leader, is to scenarios in which people are usually asked to comply. Limited ecological validity is a reason to question an experiment’s external validity, that is, its significance for the broader conclusions we want to draw from it. Field experiments are conducted outside of a laboratory, in the participants’ everyday environment. Researchers still manipulate an independent variable, and they still aim to control extraneous variables. Often, this involves indirect control, perhaps with randomization if circumstances allow. Field experiments are more prevalent in the social, behavioral, and biological sciences than in physics and chemistry. The previously described experiment on the effects of lighting conditions on the productivity of the workers at Western Electric’s Hawthorne factory is one example of a field experiment. Field experiments tend to have more external validity than lab experiments because they occur in natural circumstances. Their ecological validity is higher as a result. The experimental subjects are also likely to be a somewhat arbitrary subsection of the broader population of interest, which increases population validity. The downside to

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

76

Experiments and Studies

Copyright © 2018. Taylor & Francis Group. All rights reserved.

these advantages is decreased internal validity. Less influence over the circumstances and the selection of experimental subjects is also linked to decreased control over extraneous variables and sometimes a decreased ability to intervene in the desired way. Because randomization may not be feasible in field experiments, the researchers should decide how best to divide the subjects into control and experimental groups, which may introduce confounds. Besides decreasing internal validity, this decreased influence on experimental design also makes it more difficult for other researchers to replicate the experiment. Researchers conducting field experiments may also be constrained in what they can be in a good position to observe or measure, the number of subjects they can involve, and how long they can run the experiment. Many field experiments, for example, require special permissions from individual subjects or from authorities that control access to areas like nature preserves. Gaining these permissions can be difficult, and authorities can impose limitations on researchers. Uncontrollable events like inclement weather, or warfare, can disrupt observation or limit the length of study that’s feasible. Let’s see how these features play out in a real field experiment. In their study entitled ‘Women as policy makers’, Raghabendra Chattopadhyay and Esther Duflo (2004) investigated how women village council leaders, or pradhan, might affect the social services provided by councils in India. This experiment was possible because of an Indian constitutional amendment in 1993, calling for one-third of pradhan positions to go to women. Thus, the experimenters had no say in the assignment of pradhan positions to women, as this was established by the Indian government. This also means the intervention was not implemented by the researchers, but the law was structured so that the change in leadership was randomly implemented across villages, mimicking a surgical intervention. Data were collected on 265 village councils in West Bengal and Rajasthan. In each village council, the two researchers collected the minutes of village meetings and interviewed the pradhan. They also collected data from each village about social services, infrastructure, and complaints or requests that had been submitted to the village council. The pradhans’ policy decisions and villagers’ requests were not affected by their interactions with the experimenters, since those requests and decisions were already made at the time of data collection. It was found that women policy makers (independent variable) had important effects on social service policy decisions (dependent variable). Women pradhan invested more in the social goods that were more closely connected to women’s concerns in a village: drinking water and roads in West Bengal and drinking water in Rajasthan. They invested less in public goods connected to men’s concerns: education in West Bengal and roads in Rajasthan.

Choices in Variables, Sample Size, and Groups Just as how some aspects of lab experiments are better and other aspects of field experiments are better, there are many trade-offs among other elements of experimental design as well. First, experimenters choose the independent and dependent variables and then decide how best to intervene on the independent variable. Selecting the right variables is crucial for successful experimentation, but it’s sometimes not obvious how best to proceed. In fact, in several scientific disciplines, including climate science, macroeconomics, neuroscience, and psychiatry, the worry has been raised that experiments are often conducted using the wrong variables. The wrong variables may not allow proper

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Experiments and Studies

77

intervention or accurate measurement. For example, current classifications of psychiatric disorders, such as schizophrenia, are criticized for being too broad and coarse-grained. Because psychiatric classifications lump together several different psychiatric symptoms and variables that may have little in common, experiments based on such classifications may not provide evidence reliable for diagnosis and treatment. One approach to variable choice is to select variables that correspond to properties or quantities that are well-defined targets for intervention and measurement. These are usually easy to intervene upon and measure (things like the pressure and volume of a gas, hours slept, and grade on math test). Another approach is to select independent variables based on what you can intervene on ‘surgically’, manipulating their values independently of the values taken by other variables. A third approach is to focus on macro-variables that aggregate measurable variables in a meaningful way. In climate science, for example, temperature and atmospheric pressures at sea level are measured at various locations around the Earth’s surface, and then these are aggregated to form macro-variables. With the right aggregation procedures, these macro-variables stand in relations that can be captured in climate models, which are used by climatologists to formulate more reliable and stable predictions than could be made about individual temperature or pressure measurements (Woodward, 2016). A second choice in experimental design is sample size. In describing the perfectly controlled experiment, we described how randomization as an approach to indirect variable control is successful only if the sample size is adequately large. In general, a larger sample size increases the success of indirect control of extraneous variables, thus increasing the experiment’s internal validity. In this way, larger sample sizes make it more likely that experimental results are actually dependent on only the experimental intervention. But these considerations must be balanced against the downsides of large sample sizes. Large samples are more difficult to assemble, and they can be more difficult to manage in the experiment. In some cases, data collection and analysis are also more difficult for a larger sample. These drawbacks are practical. A different kind of problem is that a large sample size increases the chance of spurious findings. Just as large samples make it easier to discern the intervention’s effects on the dependent variable, they also make it easier to discern other kinds of differences in the dependent variable. This increases the chance of a confounding variable influencing the dependent variable in a way that impacts the experimental results. A third choice in experimental design regards group assignment. Randomization is one particularly effective way to assign experimental entities to experimental and control groups. But sometimes randomization isn’t possible for practical or ethical reasons. If you’re studying the effects of gestational diabetes on fetuses, for example, you can’t simply assign subjects to mothers with or without gestational diabetes (the independent variable). And it’s not ethical to randomly assign pregnant women to experimental conditions aimed to increase the chance of developing gestational diabetes. Other methods can be used to control variables when randomization isn’t feasible. One method is to restrict participation in an experiment to experimental subjects with the same levels of some extraneous variable. For example, suppose that age and smoking are the two extraneous variables of greatest concern in an experiment aimed to test the relationship between cholesterol level and heart disease. Randomization is not possible here, or at least not ethical, but the extraneous variables of age and smoking can be

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

78

Experiments and Studies

controlled by restricting admission into the experiment to subjects who are non-smokers age 30–50. This method is simple. However, it decreases the achievable sample size and limits the external validity of the experimental findings (due to decreased population validity). Another approach is to use data about extraneous variables and their effects in order to account for their influence on the dependent variable. For example, in a landmark study known as the Harvard Six Cities Study, researchers investigated the effects of air pollution on health (Dockery et al., 1993). During the 1980s and 1990s, different areas in the US had very different levels of air pollution. The researchers studied 8,000 experimental participants living in six cities in different areas, including Boston, an industrial area in Ohio, and rural Wisconsin. Participants’ health was monitored for 20 years and compared with air pollution measurements in the six cities. The researcher used statistics regarding the health effects of socioeconomic factors, demographics, and smoking to estimate the likely effects of those extraneous variables on participants’ health. This was a way to indirectly control for those variables, even if there were systematic differences in how they affected participants in different studies (the different groups in the study). The researchers found that, taking all these other variables into account, decreased air pollution is linked to increased life expectancy. Yet another approach to indirect variable control is to match the members of the experimental and control groups so the groups don’t differ in the values of known extraneous variables. This involves matching every subject in the experimental group with a subject in the control group, based on knowledge of how certain extraneous variables, such as age and smoking history, affect individual subjects. For example, researchers might include pairs of smokers of the same age and pairs of non-smokers of the same age in their study. One member of each pair should experience the experimental condition (say, complete an exercise regime) and the other should experience the control condition (say, exercise as they ordinarily would). In this way, groups of subjects can be made similar with respect to the primary extraneous variables, thereby indirectly controlling them. This method is often effective, but it has some limitations. It only works for extraneous variables researchers are already aware of. It can also be time-consuming and expensive to find matched subjects, and this may limit the sample size. A fourth choice in experimental design concerns how many groups to include in an experiment. So far, we have focused on experiments with two groups: an experimental group and a control group. More complicated experimental designs include multiple experimental groups, each of which experiences a different but related intervention. We saw an example of this in the Harvard Six Cities study. There were six different groups, each corresponding to a city with some measured value of air pollution. Participants were assigned to groups simply according to which city they lived in. Including multiple experimental groups can be enlightening but also complicates experiments, making them more difficult to perform. They also make it more difficult to get adequately large sample sizes for each group, which leads to the drawbacks we’ve already discussed. And finally, multiple groups can make analysis of the results more difficult.

The Cholera Outbreak of 1854 Most of the variations in experimental design we have discussed involve compromises away from the aim of surgical intervention and full control of extraneous variables. Let’s move on to discuss methods of observation employed in non-experimental studies, when

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

Experiments and Studies

79

Copyright © 2018. Taylor & Francis Group. All rights reserved.

intervention and variable control are significantly compromised or impossible. Most varieties of non-experimental scientific study are observational studies, which involve collecting and analyzing data without performing interventions or controlling extraneous variables. One example of an observational study is John Snow’s investigation into the source of a cholera outbreak in London, England. Cholera epidemics ravaged London in the mid19th century, with notable outbreaks in 1831–1832 and again in 1849. Snow studied these outbreaks, recording the details of dozens of cases. Because his research seemed to indicate that cholera was transmitted from person to person, Snow wanted to find out how it was transmitted. Previous reports suggested that cholera began with ‘an affection of the alimentary [digestive] canal’. From this, Snow hypothesized that cholera was transmitted through the inadvertent ingesting of ‘morbid material’ from the vomit and ‘evacuations’ of cholera patients. Then, on Thursday, August 31, 1854, cholera hit London’s Soho district. The outbreak appeared to be concentrated in certain areas. One such area was the corner of Broad and Cambridge Streets, where more than 100 neighbors died in three days. Three-quarters of the neighborhood residents fled within a week, but hundreds more died nonetheless. Snow reported that ‘within 250 yards of the spot where Cambridge Street joins Broad Street, there were upwards of 500 fatal attacks of cholera in 10 days’ (Snow, 1855). At this intersection, there was a water pump from which locals could draw water. Snow’s own observations of the pumped water led him to note that it looked abnormal. Given his prior reasoning about cholera transmission, Snow began to suspect that the

FIGURE 2.8

Cholera epidemic, close-up of Snow’s Broad Street map

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

80

Experiments and Studies

pumped water contained ‘morbid material’. He learned that of the 89 cases of deceased cholera victims, 61 were known to have consumed water from the Broad Street pump. This was suggestive evidence. However, there was an apparent anomaly—that is, a phenomenon that deviates from the expectations of a theory or hypothesis. One detail didn’t fit the pattern suggested by Snow’s hypothesis: very near the Broad Street pump was a brewery, but none of the more than 70 brewers had died from cholera. This was puzzling. Snow had the Broad Street pump handle disabled seven days after the outbreak began. Even though the epidemic had already begun to fade, he was convinced of having reasoned correctly from his detailed observations: Whilst the presumed contamination of the water of the Broad Street pump with the evacuations of cholera patients affords an exact explanation of the fearful outbreak of cholera in St. James’s parish, there is no other circumstance which offers any explanation at all, whatever hypothesis of the nature and cause of the malady be adopted. (1855, p. 54) In other words, Snow could think of nothing else that could account for the outbreak’s features, other than the hypothesis of contaminated water from the Broad Street pump. Snow was right. It was later discovered that the well serving the Broad Street pump had been dug only a few feet away from an old cesspit, which had begun to leak fecal bacteria. The lack of cholera deaths among brewers turned out to be further evidence in favor of Snow’s inference; the brewers only drank their own beer, which used water from their own well, water that was sterilized in the beer-brewing process. In this study, Snow did not perform an intervention, control variables, and study the results. What he did was assemble a system of detailed observations and reason his way to the one hypothesis that best explained those observations.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Case Studies and Natural Experiments One form of observational study that is very different from a controlled experiment is a case study, a detailed examination of a single individual or system in a real-life context. Case studies allow researchers to gain a first-hand qualitative understanding of a phenomenon as it occurs in its specific context and from various sources of data—including perhaps observations of a person’s daily routine, unstructured interviews with participants and informants, letters, e-mails, social media activity, health or archival records, and physical artifacts. Case studies are frequently employed within the context of qualitative research in epidemiology, psychiatry, education, ethnography, and other social sciences. One of the most famous case studies in science is in neuropsychology. Phineas Gage was an American railroad construction foreman. In 1848, he was helping to manage the construction of the Rutland and Burlington Railroad, located near Cavendish, Vermont. While he was using dynamite to blast away a rock, an iron tamping rod, measuring 1.1  meters long and almost 3.2 centimeters in diameter, was blasted through Gage’s skull. The tamping rod entered through his left cheekbone and erupted through the top

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

Experiments and Studies

Copyright © 2018. Taylor & Francis Group. All rights reserved.

FIGURE 2.9

81

Phineas Gage posing with the rod that passed through his skull

front of his head, ultimately landing about 25 meters away. The rod destroyed much of his brain’s left frontal lobe, but Gage survived (Harlow, 1848). In 1868, Dr. John Harlow, one of the physicians attending Gage, reported on the patient’s mental condition after this accident. He described Gage as ‘fitful, irreverent, indulging at times in the grossest profanity’, ‘manifesting but little deference for his fellows’, and ‘at times pertinaciously obstinate’. He claimed that this was a radical change for Gage after the accident, ‘so decidedly that his friends and acquaintances said he was no longer Gage’ (1868, p. 277). Overall, the damage seems to have resulted in a major degradation of, among other things, Gage’s social skills. Since the 19th century, neurologists, neuropsychologists, and cognitive neuroscientists have studied the case of Phineas Gage to understand the role of the frontal cortex in social behavior. But it has been difficult to make precise inferences from this case, since the immediate damage to Gage’s frontal cortex was so extensive, with surgical repairs and subsequent infections complicating matters further. Another complicating factor is, of course, that there is just one instance of Gage’s injury; a single case study creates no opportunity for variable control or the observation of how different instances play out. For these reasons, although case studies can provide a rich body of qualitative information, they have limited internal and external validity. A case study’s internal validity is limited by the lack of control over extraneous and confounding variables. Case studies are also particularly vulnerable to bias due to the evaluation of qualitative data and no blinding. And because the research focuses on only one individual, event, or group, results can be difficult to replicate and to generalize.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

82

Experiments and Studies

Every now and again, nature yields a case that can play the role of an experiment. These so-called natural experiments occur when an intervention on an independent variable occurs naturally in real life without any experimenters doing anything. This very thing happened in the case of Phineas Gage and also in other famous cases from the history of neuropsychology, like the case of Louis Leborgne. When he was about 30 years old, Louis Leborgne lost the ability to speak. He could utter only a single syllable, tan, which he usually repeated twice in succession, giving rise to his nickname ‘Tan Tan’. Apart from his inability to speak, Leborgne exhibited no symptoms of physical or psychological trauma. He could understand other people, and his other mental functions were apparently intact. After Leborgne died at the age of 51 in a hospital in Paris in 1861, the French physician Paul Broca performed an autopsy, and found that Leborgne had a lesion in the frontal lobe of the left cerebral hemisphere (which later came to be known as ‘Broca’s area’). This case is a kind of natural intervention. The variable of interest, brain region x, was not deliberately manipulated, but there was no evidence of any confounding variables associated with that manipulation. Broca used this case to identify a brain region important for the articulation of speech; injure Broca’s area, and an inability to produce speech—that is, Broca’s aphasia—would ensue. Leborgne just happened to suffer the very kind of brain damage that could make clear the function of that area of the brain. Sometimes, even groups of individuals just happen to get sorted—naturally and without any scientific intervention—into something approximating experimental and control groups. Some natural or historical process separates them out, such that one group but not the other can be construed as receiving an experimental treatment or condition. The Indian councils and Harvard Six Cities studies discussed earlier are examples of natural experiments. Their conditions approximated experiments well enough that we described them as such, but really the experimenters were not in the position to intervene. Another example of a natural experiment on experimental and control groups occurred with the separation of the Korean territory and population into two sovereign nations. When the Korean War ended in 1953, the peninsula was partitioned in half. Many aspects of the resulting two nations—South and North Korea—have remained similar. For example, both nations have a shared history, and they have similar geographies, climates, languages, and cuisines. But they differed in one main respect: political regime. North Korea adopted single-party state socialism, headed by a totalitarian military dictatorship, whereas South Korea eventually became a multi-party liberal democracy. The separation of the Korean population into two groups is often described as a largescale natural experiment, in so far as the political regime (independent variable) seems to be related to many observable differences between the two nations. These differences include changes in economy, infrastructure, religion, education, and health. By 2010, the difference in infant mortality, an indicator of population health, was striking: 3.8 deaths per 1,000 births in South Korea but 27.4 deaths per 1,000 births in North Korea. By 2011, life expectancy in South Korea was 77.5 years for men and 84.4 for women but only 65.1 years for men and 71.9 for women in North Korea (Khang, 2013). The differences are even visible from space: the per capita power consumption in the two countries differs greatly (South Korea at more than 10,000 kilowatt hours, North Korea at less than 750 kilowatt hours).

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

Experiments and Studies

83

Yet another example of a natural experiment comes from an investigation in which researchers tracked the development of 65 children in order to study the effects of institutional upbringing on later attachments (Hodges & Tizard, 1989). The participants were all 16 years old and had been living in residential nurseries and institutional care from infancy to at least two years of age, when most of them had been either adopted or restored to their biological parents. A comparison group was also studied, consisting of children who had been with their families all their lives. So, the independent variable in this study—the children’s environments—varied because of an accidental course of events. Researchers could study the effect of this change on the children’s later social relationships (the dependent variable). It was found that parental deprivation at early ages did not necessarily prevent children from forming strong and lasting bonds to parents once they were placed in a family. Whether such bonds developed depended on the later family environment. Yet, because of their early institutional experience, children did have more difficulty in socializing with peers and developed fewer close relationships.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Studies Extending Over Time Some observational studies extend over time. These can be critical to understanding, for example, the long-term effects of treatments. An important observational method for such studies is the cohort study, where researchers select a group of subjects according to set parameters, and then track those subjects over time, at set intervals, to observe the effects of some condition they experience. Cohorts can have fixed membership, as with the people in West Sierra Leone, Liberia, or Guinea during the Ebola virus epidemic of 2014–2015, for example, or can have changing membership, as with double majors in public universities or state organ donor registries. In either case, the cohorts are determined by some property of interest. Cohort studies include retrospective and prospective studies, or backward-looking and forward-looking studies. In a retrospective study, researchers first identify a group of subjects who have the property of interest, and then investigate their past in an attempt to identify the cause of that property. A common use of retrospective studies is in epidemiology, in which subjects are grouped according to their exposure status and incidence of disease, and then compared using available data about them. John Snow’s cholera investigation was like this. In a prospective study, researchers still identify a group of subjects with some property of interest but then track their development forward in time to check the effects of that property. The Harvard Six Cities study was like this. Longitudinal research is another approach that tracks subjects over time. In a longitudinal study, the same subjects are measured repeatedly over a period of time, sometimes many years, allowing the researchers to track subjects’ change. A benefit of such diachronic studies is that they can reveal changes over time in the characteristics of a group of subjects. The Early Childhood Longitudinal Study started in the late 1990s and followed 20,000 American children, examining their development, performance at school, and early school experience. Researchers also conducted extensive interviews with their families. This study provides a lot of information about American children’s development and family life. Analyzing this longitudinal data, the economists Steven Levitt and Stephen Dubner (2005) showed that many things that parents do to make their kids ‘smarter’ do not seem to actually help children do well on tests. Reading to kids every day, for

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

84

Experiments and Studies

example, does not relate to higher test scores. Higher test scores are strongly related to being born to a mother over 30, but not to a mother taking time off to raise the child. In a cross-sectional study, different subjects are measured at a single time in order to get a sense for the prevalence of some trait(s) in the population at large. For example, a cross-sectional approach to studying children’s development and family life would involve assessing the kinds of variables just discussed—family characteristics, reading exposure, test scores, and so on—at once. One advantage of cross-sectional studies is that they enable researchers to measure and compare several variables. They are also easier to accomplish, as there is no need to track individuals over time. But the information they provide is correspondingly more limited and perhaps less accurate. For example, instead of assessing whether kids are read to every day based on subjects’ actual experiences, researchers must rely on their memories of earlier years.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Studies Using Big Data A different category of non-experimental studies uses so-called big data. Big data are very large data sets that cannot be easily stored, processed, analyzed, and visualized with traditional methods. Big data sets are especially interesting because they can reveal unexpected patterns, trends, and associations relating to human behavior. A number of fields of science use big data to understand, for example, the factors influencing the climate, genetic disease prevention, and business trends, among many other phenomena. Social media, business transactions, cameras, audio files, e-mails, and the internet, more generally, have produced an ever-increasing stream of data in recent years. It’s been estimated that humanity has accumulated 10 times the data from 2006 to 2011 as was accumulated between the advent of writing thousands of years ago and 2006. This amount of data is expected to get four times as large again every three years (Floridi, 2012). It ranges from the 500 million tweets per day on Twitter to the tremendous amount of data produced each year by the extremely sensitive detectors of the Large Hadron Collidor at CERN. Supercomputers and machine learning techniques, also known as data analytics, are used to manage and mine these large data sets. Machine learning techniques help researchers compress and visualize big data sets in charts or graphs; they also help filter data sets so as to allow researchers to draw conclusions about their characteristics. Imagine, for example, that you want to determine general trends in food preferences, and you have a data set containing all tweets produced in one year at your disposal. Filtering those tweets to a subset relevant to food preferences is extraordinarily valuable, as is visualizing the data about the popularity of various foods. The patterns and trends uncovered by analyzing big data can give insight into relationships among variables of interest and can be used to make predictions. One well-publicized example of ambitious research based on online data is the long-term analysis of user data from the online dating website OKCupid (Rudder, 2014). But it can be difficult to assess big data research, and some are concerned that it’s taken more seriously than it should be. In 2008, researchers from Google claimed that they could immediately predict what regions experienced flu outbreaks based simply on people’s online searches. The idea was that when people are sick with the flu, they often search for flu-related information on Google. Unfortunately, this idea wasn’t borne out. Google Flu Trends made very inaccurate predictions, significantly overestimating flu outbreaks, and was shut down (Lazer et al., 2014).

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

Experiments and Studies

85

Perhaps the biggest challenge facing big data techniques is their opacity. The algorithms used to sample, filter, and order data are often unknown to outside researchers, and the people who create the data in the first place are generally unknown to even the researchers performing the investigation. This makes it difficult to assess study procedures, the significance of the data, and the possibility of confounding variables. Another challenge with big data techniques regards population validity (see Section 2.2). Many people in the world don’t use any social media, so those who do may not be representative of the broader population, and more nuanced versions of this problem exist for any particular form of online data. There are issues with privacy too. Online data are often in the public domain, but big data research publicizes data and reveals trends that the people responsible for the data may not be comfortable with. The publication of OKCupid user data was an instance of this issue widely discussed in the popular press. These challenges do not erase the scientific value of big data though. And the analysis of data can even help us better understand how science works. For example, in the field of library and information science, bibliometrics is used to understand the dissemination and production of literary work by analyzing big data sets of written publications. This approach is also directed to scientific publications. Bibliometric methods, including the analysis of networks of citations in published work, can be used to investigate the level of productivity of a certain field of research, trends in the topics of scientific research, and even the social dynamics underlying scientific practice. The number of citations of a published article is an index of recognition, which is one of the primary rewards for scientists. So, citation rates and patterns can be used to quantify scientific impact and to predict what factors might affect the future course of science.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Other Kinds of Interventions Most of the approaches to non-experimental studies we have discussed are observational studies. Big data studies are, perhaps, the exception. We’ll conclude this chapter by briefly discussing another approach that, in a way, still employs some of the ideas behind experimentation. This is the indirect study of a phenomenon by studying interventions on something similar. The most significant form this approach takes in science is when models are developed and studied in order to learn about phenomena of interest. This is the topic of Chapter 3. For now, we’ll just mention one example. One extension of, or replacement for, experimentation is accomplished with the use of computer models or simulations. Such computer simulations can play a role analogous to experiments. Computer programs are developed that use algorithms to mimic the behavior of a real-world system. For example, computer simulations of the Earth’s climate represent the dynamic interactions of solar energy, chemicals in the atmosphere, oceans, landmasses, ice, and other factors. Such simulations can then be studied to yield insight into real phenomena such as anthropogenic climate change. Interventions can be performed in a simulation of the climate system that would be undersirable or impossible to actually perform in Earth’s climate system. For example, climate scientists might investigate what a specific increase of the amount of carbon in the atmosphere would do to the rate of glacier melt. Another extension of the concept of intervention is to our rich imaginations. Thought experiments are devices of the imagination that scientists sometimes use to learn about reality. Thought experiments involve an imagined intervention on a system. In the right

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

86

Experiments and Studies

Copyright © 2018. Taylor & Francis Group. All rights reserved.

conditions, these can be used to test a hypothesis, to show that nature does not conform to one’s previously held expectations, and to suggest ways in which expectations can be revised. Just like experiments in a lab or in the field, thought experiments may be criticized because their setup is faulty or because scientists draw unjustified conclusions from them. Galileo used many thought experiments in his investigations of physics and astronomy. In one instance, he wished to investigate an idea of Aristotelian physics that objects with different weights fall at different speeds. Galileo asked his readers to assume, as Aristotle did, that heavier objects fall faster than lighter objects. He then imagined two objects, one light and one heavy, connected to each other by a string and dropped from the top of a tower. If Aristotle’s assumption was correct, then the string would pull taut as the heavier object falls faster than the light object. But, Galileo reasoned, both objects together are heavier than the heavy object. So, for Aristotle, the two objects together should actually

FIGURE 2.10

Isaac Newton’s cannon thought experiment

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

Experiments and Studies

87

fall faster than either object alone. These objects cannot simultaneously fall both faster and slower, so the Aristotelian idea that was the starting point for this reasoning process could not be right. Galileo’s thought experiment provided a refutation of the Aristotelian theory of motion, suggesting that the speed of a falling body is not dependent on its weight. Newton also used thought experiments to help show how his theory of gravitation worked. He had readers imagine a cannon at the top of an extremely tall mountain, and then asked what would happen if somebody loaded the cannon with gunpowder and fired. Plausibly, Newton reasoned, the cannonball would follow a curve, falling faster and faster because of gravity’s force, and would hit the Earth at some distance from the mountain. But what if one used more gunpowder? The velocity of the cannonball would be greater, and it would travel farther before falling back to Earth following a curve trajectory. But if one used vastly more gunpowder, then, Newton suggested, the cannonball would travel so fast that it will fall all the way around the Earth, never landing. The cannonball would be in orbit, going around again and again just like the Moon! This is pictured in Figure 2.10. If the cannonball went even faster, then it would escape Earth’s gravity, heading out in space. Newton’s theory of gravitation provided the resources to arrive at these same conclusions through mathematical calculations. Imagining this situation gives a satisfying, intuitive sense for how an object like the Moon can stay in orbit by remaining in constant free fall.

EXERCISES 2.22 Recall the ideal experiment you described in Exercise 2.14 and the three challenges to that experiment you identified. Describe an alternative experiment that is more practical but that still can successfully test your hypothesis.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

2.23 Describe a different approach to the experiment you described in 2.22. Then list the advantages and disadvantages of each approach, with an eye to the trade-offs among features of experiments described in this section. 2.24 Recall, from section 2.1, the experiment when participants divide $10, with one person offering some division and the other only being able to accept or reject the offer. (Rejecting the offer results in neither participant getting any money.) The finding was that people offered fairer divisions and also rejected divisions deemed unfair even though this resulted in no money won. The researchers concluded that, in general, people seem to be willing to sacrifice self-interest to promote fairness. In this experiment, participants haven’t previously interacted with one another, and they don’t interact with the same participant more than once. Let’s assume participants are randomly selected and randomly assigned to roles. a. Define internal validity, and assess this experiment’s internal validity, justifying your assessment. b. Define external validity and name and define each of its two components. Assess this experiment’s external validity, justifying your assessment. c. What was the researcher’s conclusion from this study? Does the experiment’s internal validity or external validity cast doubt on this conclusion? Why or why not? 2.25 What are the main advantages and disadvantages of a laboratory experiment? How about a field experiment?

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

88

Experiments and Studies

2.26 Decide whether each of the following statements is true or false. For any false statement, write a new sentence, changing the original sentence so it is true. a. A completely randomized design offers no control for confounding variables. b. Randomization controls for the placebo effect. c. A cohort is a group of subjects with some defining characteristic in common. d. Longitudinal studies involve repeated observations of the same variables over long periods of time. e. Natural experiments occur when experimenters intervene on an independent variable in the real life setting of their subjects. f. In observational studies, the independent variable is under the control of the researcher. 2.27 What are three reasons experiments sometimes cannot be performed? For each reason, say whether it absolutely prohibits experiment or experimentation might be possible at another time or in another way. 2.28 Briefly describe case studies, cohort studies, prospective studies, and longitudinal studies. What features do these have in common? How do they differ?

Copyright © 2018. Taylor & Francis Group. All rights reserved.

FURTHER READING For an introduction to the philosophy of experiments with a focus on the natural sciences, see Hacking, I. (1983). Representing and intervening: Introductory topics in the philosophy of natural science. Cambridge: Cambridge University Press. For a historical perspective on experiment with a focus on the debate between Robert Boyle and Thomas Hobbes over Boyle’s air-pump experiments in the 1660s, see Shapin, S., & Schaffer, S. (1985). Leviathan and the air-pump: Hobbes, Boyle, and the experimental Life. Princeton: Princeton University Press. For more on the experimental approach in the social sciences with a focus on economics, see Guala, F. (2005). The methodology of experimental economics. Cambridge: Cambridge University Press. For a case study on the role of instruments and measurements in experiments and studies, see Chang, H. (2004). Inventing temperature: Measurement and scientific progress. Oxford: Oxford University Press. For an account of the scientific method in physics and an early statement of the problem of underdetermination, see Duhem, P. (1954/1991). The aim and structure of physical theory. Princeton: Princeton University Press. For a concise treatment of qualitative research and its methodology, see Golafshani, N. (2003). Understanding reliability and validity in qualitative research. The Qualitative Report, 8(4), 597–606. For more on the role of thought experiments in science, see Horowitz, T. & Massey, G. (eds.) (1991). Thought experiments in science and philosophy. Lanham: Rowman & Littlefield. For more on the use of big data in science, see O’Neil, C. (2017). Weapons of math destruction: How big data increases inequality and threatens democracy. New York: Broadway Books.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:11.

CHAPTER 3

Models and Modeling

3.1

MODELS IN SCIENCE

After reading this section, you should be able to do the following: • • • • •

Characterize models, target systems, and how they relate Describe how similarities, differences, and scientists’ purposes are each important for modeling Give three examples of scientific models, describing their features and how they have been used Describe why and when modeling can be a useful scientific approach Outline the three main steps that are involved in modeling and say how each works

Copyright © 2018. Taylor & Francis Group. All rights reserved.

The Bay Model In an unassuming warehouse north of San Francisco, California, there lies an enormous model of the San Francisco Bay and the surrounding Sacramento–San Joaquin River Delta. This Bay Model is amazing. The Bay Model is basically a downsized reconstruction of an area in Northern California—an area the size of the state of Rhode Island in the US, stretching from the Pacific Ocean inland to Stockton and almost all the way to the state capital of Sacramento. The model is more than 1.5 acres large (over 6,000 square meters) and is made out of 286 five-ton concrete slabs pieced together like a jigsaw puzzle. If you viewed it from above, you would see the whole Sacramento–San Joaquin River Delta, and you could gaze directly from the Port of Oakland to the Golden Gate Bridge (about 12 miles, or 19 kilometers, away in real life). This is possible because, as large as it is, the Bay Model is 1,000 times smaller than the actual San Francisco Bay, a large body of salty ocean water surrounded by a large urban population living in a variety of geological terrains and climates. The Bay Model is a hydraulic model; it can be filled with water, just as the real San Francisco Bay is. Pumping systems move the hundreds of thousands of gallons (1 gallon = 3.785 liters) of water in the model and do so in a way that mimics the tides and currents of the real bay. This works in part because the model is three-dimensional and proportional, so the different parts of the bay and river delta in the model are the right amount lower than sea level, and the surrounding land is the right amount above sea level. The

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

90

Models and Modeling

Copyright © 2018. Taylor & Francis Group. All rights reserved.

FIGURE 3.1

View of the San Francisco Bay Model

Bay Model also includes many other features that affect water flow, such as rivers, canals in the delta, wharfs, bridges, and breakwaters. The Bay Model is not just a toy model, however. It’s a scientific model, and this has some important implications. Scientific models are constructed and investigated in order to learn, not just about the model itself, but also about phenomena in the real world. This particular model is a terrific tool for learning about the San Francisco Bay and how human activities can affect it. Teachers, students, and scientists use it to study geography, ecology, human and natural history, and hydrodynamics. It has been used to help answer questions about how dredging new shipping channels would affect the San Joaquin River Delta, about how mining during the California Gold Rush changed the rivers, and about what would happen if the system of dikes and levees in the delta failed.

Why Models? Chapter 2 discussed the role of experiments and non-experimental studies in science, considering especially how these are used to generate data to compare with expectations, providing evidence for or against hypotheses. In this chapter, we will survey another important feature of science that relates to experimentation in interesting ways: the use of models. To uncover the roles that models play in science and to see how the Bay

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Models and Modeling

91

Model in particular works, let’s look back at why that model was originally constructed. (See Weisberg, 2013, on this case study and an overview of the use of models in science.) John Reber moved from Ohio to California in 1907 and set up as an amateur playwright, dramatist, and theatrical producer in the 1920s and 1930s. Because of his work, he enjoyed social connections with numerous businessmen and politicians. In the 1940s, Reber became dismayed that the transcontinental railroad terminated in Oakland rather than San Francisco, and came to believe that the bay that isolated San Francisco from the rest of California and the United States interfered with industry. He saw that large body of water as a ‘geographic mistake’ to be corrected. Reber’s career was in entertainment, and he had no expertise in science or engineering. Nonetheless, Reber intrepidly proposed a grand plan to re-engineer, and then exploit, natural features of the bay that he thought would enable more efficient use of it. He suggested filling some parts of the bay to create additional land for things like airports and factories and to establish two lakes to store freshwater supplied by the rivers that empty into the bay. As freshwater has always been a limited resource in the San Francisco Bay area, it could be valuable to repurpose the bay for potable drinking water and irrigation. Reber’s plan was taken seriously, and the US Army Corps of Engineers decided to test it out. An immediate problem, though, was that the corps couldn’t effectively test out Reber’s plan in the actual bay without implementing the plan. What to do? How could they consider the effects of the plan without going ahead and carrying it out? Such circumstances highlight one way in which scientific models are particularly useful. When performing an intervention on a system of interest isn’t possible, practical, or otherwise desirable, a model of the system can be used instead. Consider another example of a circumstance when modeling is useful. Suppose you are playing chess against a computer and are considering moving, say, your rook. How will that move affect the next three moves in the game? The easiest way to find out would just be to move the rook and see what happens. But the easy thing to do isn’t always the best thing to do. Without thinking through the consequences first, such a move might result in a quick defeat. It would be helpful to have a second chessboard set up to be just like the game that you’re actually playing but ‘offline’—in other words, it isn’t in the midst of an actual game. That way, you could try out various moves and consider moves that might be made in response. Doing so would help you anticipate how the actual game might proceed without suffering any bad consequences in the process. The offline chessboard might be chessboard you’ve set up beside you, or it could just be a chessboard you imagine, or it could be another game on a computer but not in active play. Regardless, if the second chessboard is used in this way, it is a model of the actual chess game. You’ve set it up to have the pieces in the same places, and you can then try to figure out what your opponent might do were you to move your rook. This is just like the decision of how to study Reber’s plan for the San Francisco Bay. The Army Corps of Engineers wasn’t prepared to radically alter the bay and the surrounding river delta before knowing what the results would be. They recognized that such changes might have unintended negative consequences for the local water supply, wildlife, vegetation, agriculture, and human population. So, like a second chessboard used to explore possible consequences of moves in a real game of chess, the Corps of Engineers built a hydraulic model designed to be like the San Francisco Bay in some important respects.

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

FIGURE 3.2

The Reber Plan

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

Models and Modeling

93

This enabled them to investigate the consequences of the changes Reber had proposed, by this time known as ‘the Reber Plan’. Once they were confident that their model was sufficiently similar to the real San Francisco Bay in the important respects, scientists could make predictions about the real bay based on what they saw happening in the Bay Model. The model could then be manipulated—an intervention could be performed on it—to determine what would happen in the real bay were the Reber Plan implemented. The scientists did exactly that. They built scale models of the dams that would create the proposed lakes and landmasses, and then they sat back to see what would happen. It turned out that, when the Reber Plan was implemented in the Bay Model, its unintended consequences were disastrous. The dams didn’t create lakes at all but instead stagnant pools with poor water quality that wouldn’t support ecosystems and couldn’t be used for drinking or irrigation. Altering the dam configuration in the model in an attempt to solve that problem just created another problem: fast currents that again destroyed ecosystems and made travel in the bay significantly more dangerous. When the Corps of Engineers reported these findings, the Reber Plan was abandoned.

Similarity and Difference The real-world system that scientists want to study using a model is often called a target system, or sometimes just a target. Many different kinds of things can serve as a model of a given target. The important requirement for something to be used as a scientific model is that it is taken to stand in for, or represent, a target system. Typically, this means that the model needs to be like the target system; that is, it should be similar to, or resemble, the target.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Box 3.1 History of Modeling The rise of scientific modeling began in earnest in the mid-20th century, although scientific models were by no means new at that point. The word model originates from the Latin modulus and was used as early as the 1st century BCE by the architect Vitruvius and later the theologian Tertullian to describe sculptural replicas. The use of models gained traction in the 14th and 15th centuries for various artistic and engineering purposes; for instance, in 1576, the astronomer Thomas Digges described Copernicus’s heliocentrism as a model of the world, and half a century later, Francis Bacon described mental representation as model- or copybased. Nonetheless, the general trajectory of scientific research through the 18th and 19th centuries aimed at naming, ranking, and classifying entities in nature, as well as at the discovery of physical laws, causal generalizations, and mathematical equations by direct empirical investigation and theorizing. It was only in the 1940s and 1950s, when the search for laws began to wane in many fields of science, that the use of scientific models became increasingly common. In psychology, Edward Tolman and Kenneth Craik revived Bacon’s notion of mental models, whose structure corresponds to the structure of the world and

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

94

Models and Modeling

Copyright © 2018. Taylor & Francis Group. All rights reserved.

which we use to interact successfully with our environment. In cognitive science, a wide range of formal and computational models have been developed to capture specific aspects of the mind. Modeling also became common in biology, perhaps most famously with the groundbreaking double-helix model of DNA. In philosophy, Mary Hesse’s work on models and analogies was equally important; she showed how models provide scientists with resources for metaphorical redescriptions of what is being observed or experimented upon. This rise of scientific modeling also coincided with important (and controversial) efforts to develop classificatory models of people on the basis of racial, cultural, ethnic, and socio-economic differences. These models not only serve scientific goals but are also used for bureaucratic purposes and to inform public policy.

Everything is similar to everything else in at least some regards, so any old similarity won’t necessarily result in a good model. Rockets from the US Apollo space program were white, cylindrical, rigid bodies, which were shaped much like a parsnip, but no one uses an Apollo rocket as a scientific model of parsnips. Scientific models need to be similar to their targets in relevant ways and dissimilar in irrelevant ways, at least for the most part. This is why the Bay Model replicated tides and currents and other important features of the San Francisco Bay, but not the number of sailboats in the bay. So, the features of a model that scientists construct should be relevantly similar to the features of the target system they think are important. This is what makes it possible to get accurate information about a target from studying a model. Things are a bit more complicated, though, since relevant similarity can be achieved in different ways. In the example of the second offline chessboard in which you try out chess moves, it wouldn’t matter too much if you replaced the chess pieces with colored paperclips or berries of various sizes. You could even just draw your own chessboard on a napkin. The dissimilarities between these approaches and the target—the actual chess game—don’t matter, so long as they don’t interfere with the model’s ability to represent the intended features of the chess game. Here’s a difference that would matter: using different-sized piles of sand on a chessboard to represent chess pieces isn’t a good idea, since these piles can’t be easily moved like chess pieces. Intuitively, one way to achieve relevant similarity is to construct a model as similar as possible to the target system. But as it turns out, this is usually a bad idea. Too much similarity between a target and a model can actually be counterproductive. Had the Corps of Engineers tried to build a model exactly like the San Francisco Bay in all relevant respects, it would have been too large for them to have anywhere to put it, and it would have changed so slowly they would have had to wait years to find out about the consequences of the Reber Plan. Consider constructing a map of your hometown that is exactly like it in every respect; it is three-dimensional, the same size as the real town, contains a full representation of every building, shrub, alley, fire hydrant, stray cat, and so on. Even if this could be done, why even bother with the model? You might as well just investigate the town itself! So, scientific models need not—indeed, should not—be similar to their targets in every respect or even in most respects. Like maps, models are incomplete and usually

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Models and Modeling

95

simpler than their targets. They’re designed to represent selected features of the target, the features about which scientists want to learn. Their lack of completeness is part of what makes them useful. But what’s the right amount of similarity then? This is an important question that doesn’t have a general answer. All scientists who work with models regularly consider the extent to which some particular model should be like its target and the extent to which it should be different. The Bay Model’s different spatial and temporal scales are two features that made it useful for learning about the real San Francisco Bay and Delta. The model is much smaller than the real bay, with much faster tidal cycles, which allowed the scientists to observe what would happen with a spatially distributed, long-lasting sequence of events in a short time and without having to leave the warehouse of the Bay Model. Some other features of the real bay that were changed or ignored either didn’t matter or would have been too difficult to accurately incorporate. For instance, the model doesn’t have any trees or buildings, as those were unimportant for its purpose. And being inside a big warehouse is a difference with a practical benefit: the model isn’t exposed to changing weather like the real bay is. The model also doesn’t incorporate the oceanic wind currents that affect the bay; it’s tricky to see how those could be replicated and whether the outcome of doing so would be worth the effort. The scientists thus decided which features of the Bay Model should be similar to the real bay and which could, or should, be different. They also had to decide how to represent changing features of the San Francisco Bay. For example, they had to decide whether the model should be like the actual bay is during dry seasons or wet seasons or some combination of these. They had to get all of these features right, or right enough, for the model to give them trustworthy information about how the bay would change if the Reber Plan were carried out. As it turned out, the model they developed was sufficiently similar to the real bay not only to serve this purpose but for it to eventually be put to other uses as well. For example, the Bay Model was also used to study how a later plan of deepening water channels would affect water quality. One special type of similarity is called exemplification. For a model to exemplify some group of target systems, it must itself be one of the target systems. Such a model is called an exemplar. Researchers can use an exemplar to represent the broader class of targets that includes the exemplar and can thus draw conclusions about the whole class of targets by investigating the exemplar. For example, the fruit fly (which goes by the scientific name Drosophila melanogaster) is a common model organism in genetics and developmental biology. Just like Mendel used pea plants to understand how certain characteristics are passed from one generation to the next, biologists have used the fruit fly to learn how genes influence the development of embryos from single cells to mature organisms. Fruit flies are small and reproduce quickly, and large populations are easily maintained in labs. In addition to fruit flies being easy to keep and work with, scientists know about their entire genome and so can intervene on their genes in precise ways. These interventions allow scientists to identify specific sections of DNA within the genome that carry information needed to produce specific molecules like proteins, which in turn influence characteristics like fruit-fly size and color. As a model organism, the fruit fly is used to reason about other organisms, such as the biological mechanisms of hereditary disease and the regularities in the inheritance of physical characteristics observed by Gregor Mendel. Scientists might study one population of fruit flies to learn about all fruit

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

96

Models and Modeling

FIGURE 3.3

(a) Drosophila melanogaster; (b) The four chromosomes of Drosophila

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Image from droso4schools.wordpress.com

flies or to learn about all insects or even about all forms of life, including human life. The last, broadest range of target systems is surprisingly common in genetics research. Like all models, exemplars are both similar to and different from the target systems they represent. For example, fruit flies have genes organized into chromosomes, as do all other living organisms. This is an important similarity for their use as a genetic model. But fruit flies have only four chromosomes, so they are much simpler genetically than many other organisms. Further, because they breed very quickly, they have much shorter generations than many organisms. These features make them very good models to use in labs, but they also make them somewhat unrepresentative of all other organisms out there. To sum up, target systems are real-world phenomena selected for study, models are constructed to represent target systems for particular purposes, and models are similar to but also different from their targets in various ways. Most similarities and differences are carefully chosen, not only so the model can be developed and studied, but also—importantly—so it can provide accurate information about the target system. Studying a model can lead to knowledge about a target system insofar as the model can stand in for that system.

Specification of Target Systems Next, let’s think more carefully about the process of scientific modeling. Recall that scientific experimentation tends to follow the general pattern of generating expectations, performing an experimental intervention, and then comparing the data produced to the predictions to confirm or disconfirm the hypothesis. There are, of course, many variations on this pattern, and experiments are used for more than just testing hypotheses. Just like experimental reasoning, model-based scientific reasoning comes in many different forms and occurs in many different ways. But despite the numerous variations, there is also

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Models and Modeling

97

a general pattern of how models tend to be constructed and used in science. That pattern has three basic steps: (1) specification of the target system(s), (2) construction of the model, and (3) analysis of the model. Let’s consider each of these steps, starting with the first. At first glance, it might seem easy to specify the target system; this basically just requires scientists to decide what it is they want to find out about using a model. Do they want to learn about the effects of proposed changes to the San Francisco Bay? Examine the genetic influences on some trait? Or, say, learn more about how the number of predators influences other animal populations? But like everything else in science, things aren’t as simple as they at first seem. An archer cannot accurately hit a target with her arrow if she doesn’t know where the target is or what it looks like. Similarly, scientists need to know quite a bit about a target system before they can construct a model of it. This is a version of an age-old problem called the paradox of inquiry: if you don’t already know what you’re looking for, how can you inquire about it? The central reason to develop a model in the first place is to gain knowledge about the target, but in order to learn about a target using a model, scientists must already know about that target. Scientists may initially know little to nothing about the target systems they want to investigate—especially when those systems are very distant in space or time, or excessively large or small. Yet, without some knowledge about a target, scientists can’t evaluate whether the model is similar enough to the target, and in the right ways, to accurately represent it. So, at the beginning of the modeling process, scientists need to be able to conceive of what a model should be a model of and what they want to learn from the model. This can be preliminary and partial, just enough to get the process going. For the Bay Model, for example, the task was to evaluate the feasibility and any unforeseen consequences of the Reber Plan for damming up the bay. Scientists didn’t know what in particular they’d be evaluating—for example, whether strong currents would result or excessive evaporation would occur. In order to later construct a model that relates to the target in the right ways, scientists must also possess more specific information about at least some aspects of the target system. This point actually suggests two requirements: scientists need to know which features of the target system are important, and they need to have more specific information about those features. For example, when planning the Bay Model, scientists had to guess that the tides and currents might be important features. And then, in order to be able to calibrate the model to have the same tides and currents as the real San Francisco Bay, the engineers needed access to a lot of information about these features of the real bay. To get the needed data, 80 people took measurements at different locations throughout the 1,424 square kilometer (550 square mile) bay every 30 minutes throughout a full tidal cycle of 48 hours. They recorded tide velocity and direction, changes in the water’s salinity (salt content), and the concentration of sediment. All of these data were needed in order to even decide what features a model of the bay should have.

Constructing the Model Once a target system has been specified, scientists can begin constructing models. Some of their preparatory work is already accomplished in the specification of the target system, since part of that task was to specify its important features. But there are still questions

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

98

Models and Modeling

to answer about how a model should be constructed in relationship to a target. What kind of model is called for? What features should be designed to be similar to the target and to what degree should they be similar? Should the model be designed to apply to more than one target system? The answers to these and other questions are influenced by scientists’ exact goals and the nature and extent of their background knowledge about the target system. For the Bay Model, the scientists elected to construct a physical replica of the target, but, as we’ll see, there are many other approaches to models. The San Francisco Bay is a complex system, and one advantage of a physical model is that the scientists didn’t need to understand how changes occur in the bay to predict those changes. Instead, their approach was to make the replica as similar to the bay as possible in all the ways they thought might matter, and then sit back and see what happened. Still, the model required extensive calibration—comparison with the real bay followed by adjustment—before it was sufficiently accurate. The engineers had to tinker with the scales used for depth and width of the bay in order to get the proper water flow. They ended up making the model bay much deeper proportionally than the real bay, which helped. But this resulted in water moving too quickly in shallow parts of the model. The researchers compensated for this by adding 250,000 copper strips to the bay floor in the model to increase water resistance. They chose how many copper strips to add to any given place by comparing the model’s water flow with that of the real bay. Other modeling approaches offer different advantages and involve different difficulties of model construction. We’ll survey different kinds of models later in this chapter. For now, consider an example of a different kind of scientific model. The Lotka-Volterra Model is an influential model in ecology developed (independently) by Alfred Lotka and Vito Volterra in the 1920s (see Volterra, 1928). Unlike the Bay Model, the Lotka-Volterra model does not lie in any warehouse. It’s a simple, abstract mathematical model. What this means is it uses mathematical equations to represent the interactions of predators and their prey, like foxes and hares, lions and wildebeest, polar bears and seals, and so on. Here are the equations:

Copyright © 2018. Taylor & Francis Group. All rights reserved.

dx/dt = αx − βxy dy/dt = δxy − γy One variable, x, stands for the number of prey animals (for example, seals), and another variable, y, stands for the number of predator animals (in this case, polar bears). In this model, both x and y represent independent variables in the target system. (Independent and dependent variables were discussed in Chapter 2.) These equations can be used to calculate how predator and prey population numbers change over time (represented in the model as the derivatives dx/dt and dy/dt) from the combination of those population numbers and a few other parameters. A parameter is a quantity whose value can change in different applications of a mathematical equation but that only has a single value in any one application of the equation. In this equation, α, β, δ, and γ are parameters. These help the model take into account the prey population’s rate of growth without predation, the rate at which prey encounter predators, the predator population’s rate of growth, and the loss of predators by either death or emigration.

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

Models and Modeling

99

Copyright © 2018. Taylor & Francis Group. All rights reserved.

The Lotka-Volterra model represents predator-prey interactions, but there’s no straightforward way in which these equations are similar to animals eating other animals. Instead, the similarity is between the numbers that solve these equations for particular values of the variables and parameters and the change in size of predator and prey populations over time in particular circumstances. Recall that the pieces in a chess game can be represented with paperclips or berries, so long as they can make similar moves. The Lotka-Volterra model is like that but with even more radical a difference between the model and the target. The variables and parameters of the Lotka-Volterra model are both explicit parts of the model; they are visible in the equations printed here. What doesn’t appear are the model’s assumptions, but those are just as important a part of the model. An assumption, in this sense, is a specification that a target system must satisfy for a given model to be similar to it in the expected way—in this case, in order for the numbers solving the equations to indicate the actual change in predator and prey population sizes. Numerous assumptions must be satisfied for the Lotka-Volterra model to apply. For example, the model assumes that the prey population will expand if there are no predators and that the predator population will starve without prey. Both of these assumptions are pretty likely to be true. The model also assumes that prey populations can find food at all times, that predators are hungry at all times, and that both predators and prey are moving randomly through a homogenous environment. These three assumptions are probably not true of any target system, that is, of any predator and prey populations. These assumptions are idealizations, or assumptions made without regard for whether they are true, often with full knowledge they are false (see McMullin, 1985). These and other idealizations enable scientists to concentrate on the bare essentials of predator-prey interactions they want to focus on, without getting lost in complicating details of real predator and prey populations. There are many deep questions to ask about idealizations in science; for now, notice that these assumptions are good enough, even if they are wrong, if the model’s solution matches up with how the population size really changes. In that case, these idealizations don’t interfere with the model adequately representing the target.

FIGURE 3.4

Visual representation of the Lotka-Volterra model

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

100

Models and Modeling

Because models can be similar to target systems in different ways, a single target is sometimes represented by multiple models. This can be useful when the real-world phenomenon is so complex that no single model can provide scientists with all of the desired information. The weather is a good example of this. Any meteorological model can only capture a few of the factors needed to generate reliable predictions about the weather. Some meteorological models may invoke humidity, temperature, and dew point to describe and predict certain basic weather patterns like precipitation. Other models may invoke more specialized parameters, such as central pressure deficit, along with more basic ones, such as wind speed and direction, to describe and predict a particular phenomenon like hurricanes. Sometimes meteorologists aim to make more reliable predictions by carefully cobbling together the results of different models of a given weather system. It’s also possible for a single model to have more than one target system. A model might be designed to represent a repetitive activity or a type of event that occurs in many different places. The Lotka-Volterra model is like that; it is designed to capture something important about seal and polar bear populations, wildebeest and lion populations, and many more. And the same meteorological models can be used to represent a number of different hurricanes, as well as typhoons and cyclones.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Analyzing the Model Once a target has been specified and a model selected or constructed with that specification in mind, the model must be analyzed in order to learn about the target. Using or manipulating a model can occur in different ways. Scientists might literally move parts of the model or alter certain internal relationships or introduce some external condition. This kind of physical manipulation was used on the Bay Model to test the Reber Plan. For a model organism like the fruit fly Drosophila, scientists may alter a gene and see how their offspring then change. Models that involve equations, like the Lotka-Volterra model, can be mathematically analyzed with different values for parameters or variables; these represent specific assumptions about the target populations. Such manipulations produce data that—if all goes well—can be used to learn about the target. This is perhaps the main purpose of analyzing a model: to draw conclusions about the target system(s). For example, the Bay Model was eventually used to show that freshwater lakes couldn’t be maintained in the San Francisco Bay, as the Reber Plan called for, and that the planned dams would have disastrous unintended consequences to the local environment. On this basis, it was concluded that the Reber Plan shouldn’t be implemented in the real San Francisco Bay. Another purpose of analyzing models is to use existing data to assess and improve the extent to which a model represents its target. Recall that specifying the target and constructing the model involves a bit of guesswork. If scientists fully understood a target system, it wouldn’t be necessary to model it. And some of the assumptions needed for a model might end up interfering with how well the model represents the target. For these and other reasons, researchers may not trust that what happens in the model will happen exactly as it does in the target. An example of this use of model analysis is the extensive calibration of the Bay Model that we discussed earlier. This also highlights how the different steps of modeling can come in different orders or be intertwined. Different models with the same target are sometimes also analyzed to see whether and to what extent the different models have the same results. This kind of analysis is called

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

Models and Modeling

101

robustness analysis. This is one way of determining which models are trustworthy for prediction and explanation—especially when their targets are highly complex systems like the climate or predator-prey interactions. Robustness analysis begins by generating multiple models of a target. For example, climatologists develop several distinct models for predicting changes in the temperature in a specific region. If multiple meteorological models with different variables, parameters, and assumptions all predict an upcoming increase of temperature in the region, this prediction is robust (and should be taken particularly seriously). On the basis of similar predictions from different models, scientists may be able to find the common features of the models that give rise to the robust prediction. They can then examine how this core structure might relate to stable relationships involved in the complex phenomenon of interest. In this way, climatologists and other scientists studying complex systems can learn whether and to what degree the predictions of a model should be taken seriously.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

EXERCISES 3.1

Define model and target system in your own words, and say how the two relate. For a modeling example from this section, say what the model is, what the target system is, how they are related, and what the model is useful for.

3.2

One very familiar kind of scientific model is a mechanical model of the solar system, called an orrery. These models are used to represent the relative positions and movements of the Sun, planets, and moons. (If you have never heard of an orrery model, then do some research on the internet or elsewhere to get a better idea of what they are.) a. List as many similarities and as many differences between this model and target system as you can. You should have at least six similarities and at least six differences. b. Order the similarities from the most important to least, and then do the same with the differences. c. Describe the significance of each of the two similarities and two differences that seem to be the most important. For each, say why you think the model-builders chose to make the model similar to or different from the target system in that way.

3.3

State in your own words the main goal of each of the three steps of modeling, as described in this section. Then, describe how each step may be involved for some use of an orrery (a mechanical model of the solar system).

3.4

Suppose that you want to model the interactions between predators and prey, for example, hawks (the predator) and mice (the prey). Make a list of at least five features of that target system you think your model should take into account. Then, for each feature, say how it is similar or different in other predator-prey systems. For any features that are different, can you think of a related feature that would be similar between the systems?

3.5

What features of modeling make it a useful approach when an experiment is not possible and why? What features of modeling make it a useful approach when a phenomenon of interest is highly complex and why?

3.6

Chapter 2 outlined the perfectly controlled experiment, which some refer to as the ‘gold standard’ for science. However, the National Weather Service usually opts

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

102

Models and Modeling

for modeling when studying the weather and making weather forecasts. Does this suggest the Weather Service’s results are less scientific, in so far as they don’t aim for this ideal? Why or why not? 3.7

The National Weather Service uses lots of climate models. Each of the models (1) represents the climate system in a different way and (2) is inaccurate in some way. Explain each of these features with reference to information from this section. Why do you think the National Weather Service does not rely on just one single climate model in making its predictions?

3.8

Can you think of another complex target system that, like the weather, may require multiple models to investigate? Name two such systems. Then, explain what makes those systems so complex. Why do you think scientists may benefit from constructing multiple models of these systems?

3.9

Sketch how experiments involve the three main steps of generating expectations, performing an intervention, and then analyzing the resulting data. State the three main steps in modeling, and describe the similarities between those and the three main steps in experimenting. Then, describe how modeling and experimenting are different.

3.10 Find two different maps of your city or town, on the internet or on paper. a. For each map, assess its (i) completeness (does it represent all/most/many or just a few features of the city/town? Which features?), (ii) accuracy (does it provide an accurate representation of the city/town? How accurate? What does it get wrong?), and (iii) purpose (what does it seem like people use the map for? How is that purpose served by the attributes you identified with respect to completeness and accuracy?). b. In light of your analysis, say whether one of these maps is better than the other. If so, in what way(s) is it better? If not, why not?

3.2

VARIETIES OF MODELS

After reading this section, you should be able to do the following:

Copyright © 2018. Taylor & Francis Group. All rights reserved.

• • • •

Indicate the differences between models of data and models of phenomena Describe the three steps to constructing a data model, using an example Give examples of models of these five types: scale, analog, mechanistic, computer, mathematical Discuss how each of the five types of models vary along the concrete/abstract dimension

Types of Models As we have seen, scientific models aren’t always like toy models of airplanes or bays filled with water. Indeed, the range of things that count as scientific models is extremely broad. Scientific models can be concrete physical objects, such as the Bay Model or Watson and Crick’s double helix model of DNA, which is made of metal plates. They can also be abstract mathematical objects, like the Lotka-Volterra model of predator/prey

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

Models and Modeling

103

interaction, or mental simulations of possible sequences of events. Some models have both concrete and abstract features, such as computer models, which include concrete physical components as well as software components that allow us to interact with the computer and perform tasks. Scientists often rely on computers for modeling complex phenomena, including the weather and global climate change, the origin of the universe, and what the world economy will be like in 20 years. In this section, we’ll classify some types of models. This will help clarify how scientific models differ, and the kinds of choices scientists make when they use models to investigate the world. We’ll first distinguish between models of data and models of phenomena. Everything we’ve discussed so far in this chapter has been about models of phenomena; data models play a different role. Then, we’ll discuss five different types of models of phenomena. The categories identified aren’t mutually exclusive: a single model might count as more than one of these types. Nor are the categories jointly exhaustive, since there are also other types of models beyond those we discuss—robot models, for example, which are sometimes used in science and engineering to model how humans or other animals can interact with their environment to successfully perform complex tasks.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Models of Data A model of data, or data model, is a regimented representation of some data set, often with the aim of highlighting whether or not the data count as evidence for a given hypothesis. The concept of data was encountered in Chapter 2, in the discussion of experimental and observational studies. Recall that data are any public records produced by observation, measurement, or experiment. Video recordings of capuchin monkey behavior, observations of the positions of planets in the night sky, readings of a thermometer, participants’ answers on a questionnaire in a psychological experiment, and log locations with GPS on phones are all examples of data. Such recordings are raw data, which must be processed before they are useful to scientists. For instance, observations of the positions of planets in the night sky need to be corrected for measurement errors, organized by time and day, arranged into some scale, and put into a visual format such as a graph or table. Only then can astronomers use those data to gain knowledge about the behavior of the planets. This process of data correction, organization, and visualization results in a model of the data. Data models are a rather different kind of model from the models discussed so far. They do fall under our general definition of a model, since they are representations that are investigated in place of what they represent. But what is represented are not phenomena—what we’ve called target systems—but data. Data models thus play a wholly different role in scientific reasoning than models of phenomena. The first step in constructing a data model is to eliminate presumed errors from the data set. Consider measurements of the positions of a certain planet in the sky—say, Mercury, over a period of days. Those measurements will be influenced by more than Mercury’s position. They will also be affected by some combination of human mistakes, flaws and limitations of instruments, like the telescope, and inaccuracies due to changing atmospheric conditions. Scientists can try to identify and correct these errors in various ways. They might calibrate the telescope or record the atmospheric conditions along with their measurement of Mercury’s position. This additional information can guide the

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

104

Models and Modeling

decision of which data are questionable and should be eliminated. This process is called data cleansing. Once erroneous data are removed from the data set, the next step is to represent the clean data in a meaningful way. Data of Mercury’s position in the sky over a period of days may initially be visualized as points on a chart. These points will probably be used as the basis for a curve that represents Mercury’s progression in the sky. The points represent the scientists’ measurements. The curve, in turn, represents the scientists’ best guess for Mercury’s continuous path through the sky. This final representation is the data model. We can generalize from this example to other data models. Of course, it’s not always spatial position that’s being measured. There is, though, a common progression of (1) eliminating errors, (2) displaying measurements in a meaningful way, and then (3) extrapolating from those measurements to the expected data for measurements that weren’t actually taken. This is what happens when scientists use points on a chart to draw a curve representing Mercury’s position, even for times and days when data weren’t collected. As we’ve suggested, this involves some amount of guesswork. Indeed, how to extrapolate from measurements to create a data model is a complicated enough task that it has its own name: the problem of curve fitting. To get an idea of the problem, suppose that you have data for two variables—say, air pollution and life expectancy—and you want to figure out the general mathematical relationship between the two. That is, you want to learn how people’s life expectancy changes as a function of the level of air pollution where they live. The mathematical equation capturing this relationship will describe a curve that will ‘fit’ your observations. The basic problem of curve fitting is that data, no matter how much you collect, are always consistent with different curves. Put in terms of underdetermination, which was introduced in Chapter 2, the data underdetermine which equation captures the relationship between these two variables, air pollution and life expectancy. See Figure 3.5. So, how should scientists decide which of the equations defining a curve passing through their data captures the real relationship? There is no easy answer. Finding the curve that best fits all available data, no matter what, is seldom the best approach. Sometimes, data models can fit the data too well; this is called overfitting a model to the data. The problem with sticking too closely to the actual data is that those data are never perfect. There might be outliers, or values that deviate from the norm for one reason or another. There is also the possibility of noise, or influences on the data that are incidental to the focus, such as confounding variables. Scientists want their data models to be better than the actual data they’ve collected. In the end, which model of data is the right one depends on several factors, including the goals of the scientists, their background knowledge, and considerations of how easy the data model is to use to make predictions. Big data approaches, discussed at the end of Chapter 2, present significant data modeling challenges. Big data sets provide science, public policy, and business with an impressive resource for answering important questions. Data collected from social media, for example, can be used to understand how often the public talks about politics, sports, and sex; to make predictions about complex political and social events; and to explain consumer behavior. But using big data to make predictions requires finding the right models of the data. The difficulties we have briefly surveyed here are compounded when modeling big data sets, as the conditions for and features of the data tend to be

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

Models and Modeling

FIGURE 3.5

105

The problem of curve-fitting

Copyright © 2018. Taylor & Francis Group. All rights reserved.

less well understood. Chapters 5 and 6 elaborate on the statistical techniques scientists employ to represent data and draw inferences from them.

Models of Phenomena As we’ve already seen in this chapter, models of phenomena provide ways to learn about a phenomenon indirectly by studying the model. This use of models is very different from data models, both in model development and in the role the models play. Models of phenomena have been the main focus in this chapter, so our focus here will be on the contrast between data models and models of phenomena. Data models are used in experiments and non-experimental studies, where the phenomena are investigated directly. In contrast, models of phenomena are often used to indirectly investigate phenomena. In order to do this, scientists have to first learn about the model itself. Then they have to find a way to convert their knowledge about the model into knowledge about the phenomenon being modeled.

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

106

Models and Modeling

Building a model of a phenomenon is kind of like taking apart a toaster and putting it back together again. A great way to learn about something is to try to build it, or something that’s like it. Physical models might literally be built; other kinds of models, like equations and computer programs, are also built, only in a more metaphorical way. Regardless, model construction should result in a model that represents the target system(s). Scientists then manipulate and analyze the model to learn about the target system(s). Just as the model represents the target system, manipulations of the model represent manipulations of the system. Depending on the type of model, though, the manipulations might be very different from what would happen in the actual target. And then, so long as the model is similar in the right respects to its target system, scientists can transform the knowledge they gain about the model’s behavior into knowledge of the target system. Recall how data models are better, more informative, than the data themselves. Similarly, good models can be better for study than their targets. Consider a few ways in which this is so. A physical model can provide a more quickly changing and simplified version of a system. A mathematical model can enable precise predictions about a system when its equations are solved. A computer model can be run again and again with different conditions, simulating a range of possibilities. Differences between a model and the phenomenon that is modeled are key to the value of model-based science, or learning about the world indirectly through models. Recall the discussion of how overfitting—that is, corresponding too closely to the actual data—can hamper the value of a data model. Something similar is true for models of phenomena. Scientists can go wrong by constructing a model that builds in too many elements of the target system or is too similar to the target system. This could make it so that the resulting model is only applicable in very narrow circumstances or too difficult to study, either of which limits its usefulness. If instead a model is constructed to incorporate only the most important, or most interesting, features of a phenomenon, then it will be useful in lots of different ranges of circumstances. We see examples of this in what follows.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Scale Models To illustrate this use of models and the range of forms it can take, consider some categories of models of phenomena. To begin, scale models are concrete physical objects that serve as down-sized or enlarged representations of their target systems. Architectural models of urban landscapes are a familiar example; these are widely used in civil engineering. The Bay Model also belongs to this class, since it is a threedimensional physical object made of concrete slabs, copper tags, and water. The spatial scale of the Bay model is 1:1000 (that is, 1 foot in the model represents 1,000 feet in the real world, where 1 foot = 0.3 meters) on the horizontal axis and 1:100 (that is, 1 foot in the model represents 100 feet in the real world) on the vertical axis. Temporally, the Bay Model is also scaled; each 24-hour day is represented as a 14.9-minute sequence, divided into 40 equal intervals of 22.36 seconds (that is, one minute in the model represents 1 hour and 40 minutes in the real-world target system, the San Francisco Bay).

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

Models and Modeling

Copyright © 2018. Taylor & Francis Group. All rights reserved.

FIGURE 3.6

107

James Watson and Francis Crick’s double helix model of DNA

While the Bay Model is a scaled-down representation, other scale models are enlarged representations of their targets. The historic discovery of the structure of DNA by James Watson and Francis Crick in 1953 provided understanding of how genes replicate and how parents transmit their characteristics to their offspring. Using wire and tin plates, Watson and Crick had begun building scale models of DNA in 1951. After several failures, the two scientists recognized from the work of Rosalind Franklin that a model with a double helical structure had the best fit to current knowledge about DNA. This model had a spatial scale of roughly 1,000,000,000:1. That is, an inch (2.54 centimeters) in the double-helix model represented one-one-billionth of an inch in a real DNA molecule. (See Chapter 8 for more discussion of the discovery of DNA’s structure, including Rosalind Franklin’s role.)

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

108

Models and Modeling

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Analogical Models Analogical models can be physical or abstract objects, depending on whether they rely on physical or abstract analogies to represent their target systems. Scale models like the Bay Model can be characterized as concrete analog models, as they share several physical properties with their targets. An example of an abstract analog model is the computer model of the mind, which is based on formal similarities between computers and minds. Like computers, the human mind is an information-processing system that can be described in functional terms, without talking about its actual physical composition, or ‘hardware’. Like computers, minds can be understood in terms of the operations they carry out in order to solve certain tasks, or in terms of their ‘software’. Here is another example of an analogical model, located somewhere between the Bay Model and the computer model of the mind on the concrete-abstract spectrum. Another hydraulic model, like the Bay Model, was built by William Phillips in 1949. But whereas the Bay Model used water flow to represent a real body of water, Phillips’s model used water flow to represent the British economy! This model is called the Phillips machine or Monetary National Income Analogue Computer (MONIAC). The Phillips machine was a set of plastic tanks, each representing some aspect of the economy, which were connected by pipes and sluices and different valves. Dyed water, representing money, was hydraulically pumped around the machine by an old airplane motor to simulate the ‘flow’ of money in an economy. An overhead tank, representing a treasury, could be drained so that the water inside could flow to other economic sectors, like education, health care, infrastructure and investment, savings, and so on. Water could be pumped back to the ‘treasury’ tank to represent taxation and state revenue, with pumping speeds adjusted to simulate changes in tax rates. Exports and imports could also be simulated by adding or draining water from the model. The Phillips machine was a physical model, but it is not a scale model. (The British economy isn’t itself operated hydraulically, of course.) Unlike the Bay Model, the Phillips machine uses water flow as an analog to money flow. Changes in water level and flow were analogous to changes in highly complex, abstract parameters of the British economy. In its day, this actually was an amazingly accurate tool for learning about how changes in different economic sectors affect others (Morgan & Boumans, 2004). Relying on analogies is a particularly useful strategy in early stages of modeling, when scientists may have little or no knowledge of the phenomenon they are interested in. This enables scientists to focus on the salient features of a model and to let the discovery of analogous features guide modeling approaches. For example, the similarity of the physical arrangement of a spiral staircase to a DNA molecule was striking to Watson and Crick, guiding their modeling efforts of DNA toward a double-helix structure. Watson, in his memoir, says, ‘[E]very helical staircase I saw that weekend in Oxford made me more confident that other biological structures would also have helical symmetry’ (1968, p. 77). Spiral staircases were useful analogous models for DNA, stepping stones toward the scale model Watson and Crick ended up developing. As knowledge about the target develops, analogical models may give way to models less obviously related to the target systems they represent. As we have mentioned, the Lotka-Volterra model is a set of mathematical equations, which is hardly analogous to

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

Models and Modeling

Copyright © 2018. Taylor & Francis Group. All rights reserved.

FIGURE 3.7

109

William Phillips’s MONIAC hydro-economic model

populations of predators and prey. But knowledge about those target systems was used to develop mathematical equations that effectively—if indirectly—represent key relationships among the populations in question.

Mechanistic Models Mechanistic models are representations of mechanisms. Mechanisms are organized systems consisting of component parts and component operations that are organized spatially and temporally, so as to causally produce a phenomenon. Certain features of cells (like

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

110

Models and Modeling outside cell

K+ (potassium ion)

cell membrane ATP ATP Na+ (sodium ion)

inside cell

Copyright © 2018. Taylor & Francis Group. All rights reserved.

FIGURE 3.8

Visual depiction of the sodium-potassium pump

neurons), organs (like brains), and whole organisms can be seen as mechanisms. Examples of phenomena produced by mechanisms include blood circulation, protein synthesis, and cellular respiration. Mechanistic models represent the causal activities of organized component parts that produce some such phenomenon. By doing so, they can help illuminate how the target phenomenon works and, in particular, how it depends on the orchestrated functioning of the mechanism that produces it. Mechanistic models can be physical structures representing concrete target systems, such as an orrery. Other mechanistic models are physical structures representing more abstract phenomena, such as the MONIAC Phillips machine model of the British economy. But most mechanistic models are schematic representations of abstract structures and functions and the relationships among them. For example, consider the mechanistic model of the sodium-potassium pump in cells depicted in Figure 3.8. This is not a model of a particular instance of a particular cell exchanging sodium ions for potassium ions. Instead, it is a generic representation of what all such exchanges, in any living cell, have in common.

Mathematical Models As we have seen with the Lotka-Volterra model of predator-prey populations, mathematical models are equations that relate variables, parameters, and constants to one another. These models attempt to quantify one or more dependences among variables in the target.

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

Models and Modeling

111

For example, the Lotka-Volterra model uses a pair of first-order differential equations to represent changes in predator and prey populations over time. The first equation,

dx/dt = αx − βxy describes the fluctuations of a population of prey, dx, over time, dt, where αx represents the prey population’s exponential growth and βxy represents the rate of predator/prey interaction. The number of mice at a given time, for example, is determined by their population growth, minus the rate at which they’re preyed upon by hawks. By contrast, the number of hawks is fixed by their population growth given the supply of prey, minus their mortality rate. Hence, the second equation,

dy/dt = δxy − γy describes the fluctuations of a population of predators, dy, over the same time interval, where δxy represents predator population growth and γy represents the loss of predators due to death, disease, resettling, and so on. Another example of a mathematical model is a game theory model called the prisoner’s dilemma. Suppose that you and your friend Dominik have been arrested for robbing a bank, and you’ve been placed in different cells. A prosecutor makes this offer to each one of you separately:

Copyright © 2018. Taylor & Francis Group. All rights reserved.

You may choose to confess or to remain silent. If you confess and your accomplice keeps silent, all charges against you will be dropped, and your testimony will be used to convict your accomplice. Likewise, if your accomplice confesses and you remain silent, your accomplice will go free while you will be convicted. If you both confess, you will both be convicted as co-conspirators, for somewhat less time in prison than if only one of you is convicted. If you both remain silent, I shall settle for a minor charge instead. Because you are in a different cell from your friend, you cannot communicate or make agreements before making your decision. What should you do? Assuming that neither you nor Dominik want to spend time in prison, you face a dilemma. Each of you will be better off confessing than remaining silent, regardless of what the other does. Either Dominik doesn’t confess, or he does. If Dominik doesn’t confess and you do, you go free, whereas if you didn’t confess, you’d both be charged with a lesser crime—and going free is better than being charged with a crime. If Dominik does confess and you do also, you get charged as co-conspirators, whereas if you didn’t confess, you’d be charged as solely responsible for the crime—and this carries a longer prison sentence. So, regardless of Dominik’s decision, you are better off confessing. However, the outcome of both you and Dominik confessing is worse for both of you than the outcome of both you and Dominik remaining silent. In the first scenario, you are both charged as co-conspirators, while in the second scenario, you are both charged merely with a lesser crime. Thus, the prisoner’s dilemma seems to raise a puzzle for rationality. You are better off confessing, regardless of Dominik’s choices, but if you both

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

112

Models and Modeling

are inspired by that fact to confess, things are worse for you than if you had both kept your mouths shut. Reasoning independently, you should confess. But, even so, both of you employing that reasoning leads to a worse outcome than if you’d both acted in the best interest of your conspirator. This situation is customarily represented using the mathematical formalism of game theory. In its simplest form, the prisoner’s dilemma is a game described by the payoff matrix shown in Table 3.1. Although this situation may seem contrived, many real-life scenarios can be modeled with a generic version of the payoff matrix, as the one shown in Table  3.2. Here the numbers are generic payoffs, or consequences for each decision. The higher the number, the more desirable the payoff. The first number in each set of parentheses represents Player 1’s payoff, the second number Player 2’s payoff. The players are also generic; they might be suspects in a crime, or they might be any other people, businesses, nations, animals, or even bacteria. Any entities that vary their behavior in response to others’ behavior are fair game. The most basic relationship that characterizes the prisoner’s dilemma also dictates the situations to which it can be applied. This basic relationship is that, no matter what one’s partner chooses to do, one always does better by choosing to defect (in the original story, to rat out your friend) rather than to cooperate (in the original story, to remain silent). But—and this is key—players always do better if they are partnered with cooperators than if they are partnered with defectors. (You’re always better off if your buddy doesn’t rat you out, regardless of what you choose.) This mathematical model boils that scenario down to simple numbers that represent the desirability of different outcomes. The dilemma of the prisoner’s dilemma thus amounts to how to encourage cooperative behavior, which is better for everyone, in the face of the temptation to defect into selfish

TABLE 3.1

Payoff matrix for the prisoner’s dilemma with Dominik Dominik

Copyright © 2018. Taylor & Francis Group. All rights reserved.

You

TABLE 3.2

Remains Silent

Betrays

Remain Silent

Each pays a small fine

You get 3 years of prison Dominik goes free

Betray

You go free Dominik gets 3 years of prison

Each gets 2 years of prison

Payoff matrix for a generic prisoner’s dilemma Player 2

Player 1

Cooperate

Defect

Cooperate

(2, 2)

(0, 3)

Defect

(3, 0)

(1, 1)

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

Models and Modeling

113

behavior. The prisoner’s dilemma model has been applied in a variety of circumstances to help account for scenarios involving cooperative behavior, ranging from symbiotic relationships among organisms to the practice of not killing opponent soldiers that developed spontaneously in the trenches of World War I (Axelrod, 1984). For example, consider the cleaning symbiosis. Individuals of one species, the cleaner, remove parasites and dead skin from individuals of the other species, the client. This happens in many pairs of species, but let’s focus on cleaner fish and client fish. Cleaner fish have the choice of cooperating by cleaning the client fish or defecting by eating extra skin from the client fish. Client fish have the choice of cooperating by allowing the cleaner fish to clean safely or defecting by threatening or eating the cleaner fish. The fish are better off if both cooperate: the client fish gets an important cleaning, and the cleaner fish gets dinner. But there’s a benefit to defecting for each: the cleaner fish would get a bigger dinner by eating more from the client fish, and the client fish would get to eat the cleaner fish. The prisoner’s dilemma has been used to represent these options and the circumstances that can enable cooperative symbiosis to evolve.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Computer Models Many real-world situations can be modeled as cases of the prisoner’s dilemma. But what we’ve seen so far isn’t enough to demonstrate why business firms, gangsters, animals, bacteria, and nations so often cooperate in real life. One important reason is that, in most real-life scenarios, decisions about whether to cooperate aren’t made in an isolated room, cut off from your partner, and in expectation that you’ll never see that partner again. Real firms, gangsters, animals, bacteria, and nations interacting with one another do not make their decisions once and for all, and without communicating with one another. Instead, they might guess at what each other might do, signal their own intentions, or interact repeatedly over time, allowing for reputations to form. The model of the prisoner’s dilemma introduced here does not capture these kinds of interactions, but it can be extended so that it does. One common extension is to the iterated prisoner’s dilemma, where we suppose that two agents play the prisoner’s dilemma with each other repeatedly. This is one way in which cooperative behavior has a chance of winning out over the selfish choice to defect. Insight into how this can happen was provided in the 1980s by a computer game. The political scientist Robert Axelrod invited various social scientists to submit computer programs for a tournament of the iterated prisoner’s dilemma. Each computer program had its own strategy governing the circumstances in which it would cooperate or defect, and these programs were pitted against one another to see which would do the best in the long run. This was a computer model. Computer models or simulations are programs run on a computer using algorithms, or step-by-step procedures, to explore aspects or changes of a target system. Like other models encountered thus far, computer models can range from incredibly simple to quite complex. The goal is to create insight into some target system(s) by examining a similar set of dynamics encoded in a computer program.

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

114

Models and Modeling

It’s unusual for computer models to invite participation from other scientists, as in Axelrod’s tournament, but by doing so, Axelrod made it so that the strategies available weren’t limited by what he could imagine or what he thought would be successful. And, indeed, the result surprised him. The winning strategy—that is, the strategy that accumulated the most points in the iterated prisoner’s dilemma tournament—belonged to a program named Tit-for-Tat, submitted by a psychologist Anatole Rapoport. The program was so simple that it had only a few lines of programming code. Tit-for-Tat cooperated in the first round of any game it played in the tournament, and then it simply mirrored the other player’s previous action in every round thereafter. So, when Tit-for-Tat played against generally cooperative players (other programs), it also cooperated and so reaped the rewards of that mutual benefit. But when Tit-for-Tat played against uncooperative, selfish players, which defected a lot, it too played selfishly after that initial cooperative move. This protected it from exploitation by selfish programs. Axelrod’s computer simulation thus demonstrated the success of a strategy of reciprocal cooperation, which is often called reciprocal altruism (see also Rapoport, Seale, & Colman, 2015 for a more recent assessment of Tit-for-Tat).

EXERCISES 3.11 In your own words, characterize models of data and models of phenomena, and give an example of each. How are these types of models similar? How are they different from each other? 3.12 We have characterized the steps of data modeling as (1) eliminating errors, (2) displaying measurements in a meaningful way, and then (3) extrapolating from those measurements to the expected data for measurements that weren’t actually taken. Describe each of these steps for any example of a dataset from this section or Chapter 2.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

3.13 Describe the curve-fitting problem, and indicate how it relates to the three steps of data modeling. 3.14 List the five types of models of phenomena described in this section, and give an example of each. For each example, indicate why it counts as a model of that type and what target system(s) it is supposed to represent. Then, rank your examples from 1 to 5, where 1 is the most concrete relationship to the target system(s) and five is the most abstract. 3.15 Define mechanism in your own words. Then, refresh your memory of photosynthesis. (You probably encountered this in high school science if not since.) Consider this as an example of a mechanism by outlining (a) the main component parts and (b) how their operations are organized so as to constitute the mechanism’s activity. Then, consulting the description you’ve developed, say whether you think photosynthesis is a mechanism and why or why not. 3.16 Thomas Schelling, an American economist and Nobel Prize winner, famously developed a model of segregation in 1971 (see also Schelling, 1969). The model utilizes a checkerboard, pennies, and dimes. Initially, squares on the board are filled randomly by either a penny or a dime or left empty. Over time, pennies and dimes are

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

Models and Modeling

115

moved around the board according to a rule representing whether they were satisfied to stay in their current location. Schelling discovered that a movement rule representing a preference for at least a small percentage of like neighbors would, over time, lead to segregated patches of pennies and dimes on the board. An example of a movement rule representing such a weak preference is the following: an occupant moves if fewer than three of the eight adjoining squares have occupants the same as the occupant (pennies if the occupant is a penny, dimes if the occupant is a dime); otherwise, it stays. A main application of this model is to housing segregation, where the model shows that even a weak preference for at least a minority of neighbors to be the same as oneself can lead to segregated patches of like inhabitants. (Importantly, this does not show such a weak preference was in fact what led to housing segregation in any given instance.) a. In this application, what does the checkerboard represent, and what do the pennies and dimes represent? b. What does the movement rule represent? (This one is tricky.) c. List some of the idealizations needed to use the model to represent housing segregation. (Idealizations were discussed in 3.1.) d. We said that Schelling’s model doesn’t show that weak individual preference in fact led to housing segregation. What are the implications of this model for segregated housing? 3.17 Mathematical models are among the most abstract representations of target systems. Describe how it is that mathematical models represent target systems. You might look back at our discussion of the Lotka-Volterra model and/or the prisoner’s dilemma model for help.

3.3

LEARNING FROM MODELS

Copyright © 2018. Taylor & Francis Group. All rights reserved.

After reading this section, you should be able to do the following: • • • •

Characterize the similarities and differences between models and experiments Identify the three features all models share Describe how similarity, difference, and social convention are involved in representation Define trade-off and give an example

Modeling as Experimentation and Theorizing To close this chapter, let’s consider more fully the roles that models of phenomena play in science, and the relationship to phenomena that enables them to play that role. A helpful comparison is with the roles of experiments that we outlined in Chapter 2. In experiments, scientists intervene directly on the target system. By contrast, modelers often intervene on a model, as a representation of the target. Nonetheless, constructing and analyzing a model shares similarities with experimentation. Both modelers and experimenters often perform their interventions in order to test expectations based on some hypothesis; and

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

116

Models and Modeling

like experimentation, modeling can provide evidence for or against hypotheses about real-world systems. For example, animal models like Drosophila melanogaster are used to indirectly test expectations about the genetic and molecular mechanisms of human disorders, like Parkinson’s disease and diabetes. And interventions were made on certain features of the Bay Model to test expectations about the consequences the Reber Plan would have for the real San Francisco Bay. The iterated prisoner’s dilemma has been studied to test expectations about the conditions that enable cooperative behavior to emerge among self-interested individuals. Each of these uses of models is a way to indirectly test scientists’ hypotheses about real-world systems. And in some cases, the results were quite surprising. So, models can play a role similar to experiments. One big difference is that, with experiments, interventions are performed directly on the experimental system, whereas with models, interventions to models are used to draw conclusions about the target system. This is why models must aptly represent their targets. As we have seen, the work of modeling also includes gaining a better understanding of the phenomena under investigation, and then constructing models to reflect that understanding, so the models more accurately reflect the phenomena. Indeed, sometimes getting a model to more accurately reflect its target is the primary task of modeling. In such cases, a model of some phenomenon can play a role similar to a theory; a model can be a way to capture a set of ideas about what that phenomenon is really like. When a model is proposed as a theory about what some phenomenon is like, data gathered about the phenomenon, and perhaps about the model, can be used as evidence to confirm or disconfirm that theory. An example of such a theoretical use of modeling is the Lotka-Volterra model of predator-prey interactions. Given an initial setting of parameters in the equations, one can make predictions about changes in the sizes of a given predator population and the population of its prey, say, polar bears and seals. Those predictions can then be tested against observations of the actual predator-prey system—polar bears and seals living in the same broad area. When a model behaves similarly to the expected target system(s) in more and more instances and across different circumstances, it may become accepted as an account of how the target behaves. So, models can play an experimental role by providing a way to empirically investigate a phenomenon. Or they can play a theoretical role, by positing an account of some phenomenon. Sometimes the same model can even play both a theoretical and experimental role. In Axelrod’s tournament, a computer simulation was used as a virtual environment to test which strategies would perform best in an iterated prisoner’s dilemma game. There was no expectation that Tit-for-Tat would succeed in the competition. However, the outcome accorded with an existing theory in evolutionary biology, called reciprocal altruism. The basic idea is that it can be evolutionarily advantageous for an organism to help another at some cost to itself if there is a chance the favor will be returned in the future. The success of Tit-for-Tat was consistent with this theory, for it was based on reciprocity. It paid off for Tit-for-Tat to cooperate with others—but only when the others were cooperating too. Thus, the success of Tit-for-Tat in Axelrod’s computer tournament was taken to confirm the idea in biological theory that natural selection can favor cooperative behavior, even when it has a cost.

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

Models and Modeling

117

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Three Features All Models Share As this chapter has made clear, there are many different kinds of scientific models, which are used to do lots of different kinds of things. These include data models, many varieties of models of phenomena in more concrete and more abstract forms, models used in something like experimentation, and models used as a kind of theorizing (and a combination of both). With all these differences, one might suppose there’s nothing to say about what all scientific models have in common. While no one single definition perfectly characterizes what scientific models are or what uses they have, they do share at least three salient features. We have already encountered all three features in this chapter. Here, we make them explicit and discuss each in a bit more depth. First, all models play some representational role, that is, they are in some sense about their target system. This is what qualifies them as representing their target. In playing a representational role, models represent, or stand for, something else. It’s something of a challenge to say exactly what’s required for a model to represent a target, but some basic components are more or less agreed upon. The model must be like the target in the right ways, where this likeness might be understood in terms of similarity or even a much stronger mapping relationship like isomorphism, which is minimally a one-to-one correspondence between each feature of the model and of the target. But as we have seen, models typically aren’t exactly like the target systems they represent. They are often dissimilar from their targets in important ways—recall the Phillips machine, a hydraulic apparatus that looks nothing like the British economy. Something’s needed to overcome that gap—the differences between the model and target—in a way that enables the model to nonetheless be about the target. It’s increasingly believed that what fills that gap is social convention—that is, scientists’ shared practices in using and interpreting their models. Think of it this way. Scientists intend for models to be similar to and different from their targets in certain ways. Beyond the actual similarities and differences, it’s those intentions that set up models to relate to their targets. Social conventions in modeling allow these intentions to be conveyed and shared. Social conventions enable modelers to see what similarities and differences they should expect between a model and a system, which in turn governs how the model should be interpreted and properly used. For example, cartographers (mapmakers) need to communicate what features of the territory their maps aim to get right. We have been told not to trust customary maps of the Earth about the shapes and sizes of the oceans, for example, as those are distorted in order to picture the landmasses more accurately. These kinds of social conventions are essential to our ability to use any map in an effective way and to know which map to use for which purpose. The same goes for scientific models. Social conventions in model construction and use help scientists understand how a model is supposed to relate to one or more target systems; the similarities between model and target aren’t enough by themselves. We should also note that not all models have targets that actually exist. The Bay Model was used to represent the Reber Plan, which, thankfully, was never implemented. The Schelling segregation model represents how only a preference for not being too much in the minority among your neighbors can lead to segregation, but, as we have mentioned, this doesn’t mean that such a preference is in fact solely responsible for segregation. (It isn’t.) And some scientific models aim to explore possibilities that are even more distantly related

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

118

Models and Modeling

to real occurrences. Regardless, those models are used to represent scenarios of scientific interest, and the knowledge gained from them concerns natural phenomena. Second, all scientific models are used to learn about the world. Data models represent data in forms that advance hypothesis-testing. By constructing and investigating models of phenomena, scientists can reason about the targets they represent in hopes of gaining new scientific knowledge. In both cases, the models are used as vehicles for learning about natural phenomena investigated in science. Third, all scientific models involve abstraction and idealization. Recall that models bear not only similarities to their targets but also differences from them. The differences come in at least two varieties: abstraction and idealization, which are not always easy to distinguish neatly. Roughly, in representing a target system, you may leave things out, or you may introduce features that the system clearly does not possess. Omitting or ignoring certain known features of the system is abstraction; including features the target system doesn’t have is idealization. Abstraction and idealization serve different goals. Modelers often disregard many properties of their targets to focus on a limited set of features deemed important for the purposes at hand. The Lotka-Volterra model, for example, abstracts away from properties of prey and predators, like their speed; their size; their capacity for camouflage; their particular senses of smell, sight, and hearing; their location; and much else. Those features aren’t essential to how predator-prey interactions influence population size and so have been abstracted, or removed, from the model. Like abstractions, idealizations are a way of simplifying a model, enabling scientists to focus on the bare essentials of the phenomenon they’re interested in, without getting lost in complicating details. But whereas abstraction involves leaving features of the target out of the model, idealizations are properties of the model that the target doesn’t actually have. We encountered the concept of an idealization earlier, when the Lotka-Volterra model was first introduced. There we defined idealizations as assumptions made without regard for whether they are true and generally with full knowledge they are false. In modeling, this results in the misrepresentation of certain aspects of the system being studied. For the Lotka-Volterra model, idealizations include the assumptions that prey can find food at all times, that predators are hungry at all times, and that both predators and prey are moving randomly through a homogenous environment. Scientists don’t think these assumptions are true. But, in many situations, the falseness of these idealizations doesn’t interfere with the Lotka-Volterra model’s representation of the predator-prey dynamics. To recap, the three features shared by scientific models are (1) they represent one or more targets; (2) they are used to learn about natural phenomena under investigation in science; and (3) they involve abstraction and idealization. These last two features are also related to models’ representational purpose. Abstraction and idealization are features of models that affect how they represent their targets, and the ways models represent their targets partly determines what can be learned. Representation is, then, at the heart of scientific modeling.

What Makes a Model Good? A target system can be represented in many different ways. A physical model of a hydrological system, like the Bay Model, represents water flow in ways that significantly differ

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

Models and Modeling

119

from how a mathematical model of fluid dynamics does. And both are different from the computer model that eventually took over the work of the Bay Model. There’s no one perfect model of a given phenomenon. Instead, the goodness of a model is judged by considering what the modelers want to learn from and do with the model and, perhaps, the ease of developing or using the model. Sometimes one model will be enough for learning about a target system; other times, multiple models of the same target will be necessary to gain knowledge. Several features are desirable for models to have. These include accuracy (a model realistically representing its target), generality (applying to a range of related target systems), precision (providing exact information), tractability (ease of use), and robustness (stable behavior across different assumptions). Each of these features helps make a model valuable. And each of these features comes in degrees. A model isn’t simply general or not, or precise or imprecise; instead, models vary in the extent of their accuracy, generality, precision, tractability, and robustness. Attempting to create the perfect model by maximizing all of these features is futile, since these features usually trade off against one another; gaining more of some desirable feature of a model often requires losing ground on some other desirable features. For example, a model that is more general, applying to more target systems, is also often less precise and accurate of any one target system. This is because targets differ from one another in some regards, so tailoring a model to be precise and accurate of a specific target makes it ill-suited to represent a different system. For related reasons, a model that is more precise and accurate is often less tractable and robust. So, when constructing models, scientists must decide which desirable features to emphasize and which to compromise on. In the rest of this section, we elaborate on how the desirable features of models trade off against one another. (See Levins, 1966, on the issue of trade-offs in model-building in population biology.)

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Accuracy Models representing more actual features of a target system tend to be more descriptively accurate, or realistic. A model representing all and only the actual components and features of its targets, as it actually has them, would be a model that is maximally accurate. But this ideal is seldom achieved, and it’s unnecessary for practical success; recall that models are improved by some differences from their targets. For example, a mathematical model of drought-resistant landscaping is improved by accurately accounting for how water-intensive different plantings are. But such a model would be unwieldy if it included a parameter for the number of blades of grass in order to be more accurate. Even if such parameters increased the model’s accuracy, this wouldn’t give any additional insight into drought resistance. And it would come at a tremendous cost to tractability and generality. Each time you had a different number of blades of grass, the model would work differently. However, for mathematical models of which kinds of turf are the most water-intensive, it may be entirely relevant to know how many blades of grass there are per square meter of sod or (perhaps) differences in the water absorption rates. So, which features are important for models to represent will depend on which phenomena modelers are interested in. Think of the Bay Model again. The engineers cared about salinity and how water moved in the bay but not about the color of the

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

120

Models and Modeling

bay floor or the exact number of water molecules. The features worth modeling accurately are the most relevant features for the modelers’ interests. Models are benefited by accuracy because this increases their similarity to their targets, which in turn makes findings about the model more certain to hold of the target as well. However, some properties of a target are best excluded from a model because their exclusion has compensatory benefits.

Generality A model is more general when it applies to a greater number of target systems. Generality is a desirable feature of models insofar as it enables models to be reused in a variety of circumstances and, more significantly, because general models make it possible for scientists to discern what a variety of phenomena have in common with one another. This is a step toward formulating general theories or laws about phenomena of interest. Consider the prisoner’s dilemma model again. Because it can apply to humans, bacteria, corporations, and many other entities, this is a general model with numerous applications. That generality also reveals something which all those types of entities have in common: repeated interactions can enable cooperation to spontaneously emerge. However, sacrificing some generality in a model can be worthwhile, if doing so enables the model to more accurately represent its target. A general prisoner’s dilemma model might be supplemented with information about, say, how natural selection favors bacteria that can coexist in close proximity to one another (a form of cooperation). The resulting model will give more insight into bacteria cooperation in virtue of this additional detail. But it also will be less general—it will no longer apply to humans or corporations. Which is better depends on the modelers’ aims.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Precision A model is more precise to the extent that it more finely specifies features of the target. For example, a climate model that allows scientists to predict how much warmer the global average temperature will be in 30 years within a range of ±1° Celsius is more precise than a model that allows them to predict a ±5° Celsius range of temperature increase in 20 years. Notice that precision is different from accuracy. Whereas accuracy is a matter of a given value’s proximity to the true value, precision is a matter of the proximity of values in a range. Think again of an archer loosing arrows at a target. Arrows that are scattered all around the bull’s-eye but very near to it are accurate but imprecise. Arrows that are tightly clustered together but off-center, away from the bull’s-eye, are precise but inaccurate See Figure 3.9 for an illustration of this. Consequently, a model could be very precise but still inaccurate. For example, the prediction enabled by the more precise climate model might turn out to be wrong. Greater precision benefits a model by enabling it to give a more specific characterization of its target and to make more specific predictions about that target. But increasing precision usually comes at the cost of a model’s generality, its tractability, and sometimes its accuracy. Like generality, precision often trades off against accuracy. The more specific a prediction is, the easier it is for that prediction to be incorrect.

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

121

accuracy

Models and Modeling

Copyright © 2018. Taylor & Francis Group. All rights reserved.

precision FIGURE 3.9

Accuracy versus precision

Tractability Tractability is the ease of developing and using a model. This could involve different considerations, for example, the time it takes to run a model on a computer, or whether the equations of a mathematical model have exact solutions. It could even involve whether a modeler happens to already be familiar with one approach but not another. More tractable models are easier to construct, manipulate, or analyze. Consider, for example, that the iterated prisoner’s dilemma involves agents having repeated encounters, and so this model is less tractable than the original prisoner’s dilemma. One

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

122

Models and Modeling

consequence of this decreased tractability is that scientists know exactly what the possible outcomes are for the original prisoner’s dilemma, but they cannot in general predict the outcomes for its iterated version. This is why Axelrod ran a computer tournament to explore some of the possible outcomes. For obvious reasons, tractability is never maximized though. The easiest thing to accomplish is usually nothing at all. And more complicated models can result in more accurate, precise, and insightful findings. The iterated prisoner’s dilemma reveals how repeat encounters (in certain circumstances) can overcome the dilemma entirely, making cooperation directly beneficial.

Robustness A more robust model is one that changes less despite variation in its assumptions. Consequently, robustness is a measure of insensitivity to the features that differ from the target, including the model’s abstractions and idealizations. Normally, scientists don’t want their models’ predictions to be sensitive to such features. To be trustworthy, the predictions should be based as much as possible on known similarities between the model and target. But limited robustness is inevitable. Models incorporate assumptions that are needed for them to produce the desired information, so to some extent, those assumptions always matter. What scientists aim to avoid is over-reliance on specific assumptions that are unlike to be true or even known to be false. Multiple models are sometimes used to determine how robust a model’s predictions are. If different models, with different assumptions and details, all predict roughly the same result, that prediction seems more trustworthy than if it had been generated by just one model, with uncertain assumptions and parameters. Robustness analysis, which was introduced in Section 3.1, capitalizes on this idea. Robustness analysis is possible whenever multiple models are employed; it’s common in climate science, for example.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Trade-offs in model building In considering these valuable features of models, notice how and why each is limited. Increasing one feature often comes at the cost of others; this is called a trade-off. There is no single answer to how a model should best approach these desirable features, nor is there a perfect trade-off among the features. Instead, scientists strategically develop their models to be tractable enough for their current circumstances; robust enough to be certain to some reasonable degree; accurate and precise enough to make interesting, trustworthy predictions; and general enough to be enlightening. The balance struck thus depends in subtle ways on the phenomena under investigation, the scientists’ circumstances, and the purposes to which the models are put.

EXERCISES 3.18 Describe the experimental use of models, and explain why models are well situated to play this role. Then describe the theoretical use of models, and explain why models are well situated to play that role. Can the same models play both roles? Why, or why not?

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

Models and Modeling

123

3.19 Think again about the use of the Bay Model in testing the Reber Plan. This bears some similarity to an experiment, but it is conjectural in a way that directly experimenting on the actual San Francisco Bay would not be. a. Characterize the experimental features of this use of the Bay Model: the independent variable, the dependent variable, how the independent variable was intervened upon, and what the findings were. b. Describe at least one way in which the findings are less certain in their implications for the effects of the Reber Plan than an actual experiment would have been. c. Describe at least three advantages to using this model instead of directly investigating the effects of the Reber Plan. You might consider the desirable features of models described earlier in formulating your response. 3.20 What are the three features that we have said all models share? How do these three features relate to one another? 3.21 In a paragraph, describe how models represent their targets. You should reference all of the following: similarities, differences, social conventions, abstractions, and idealizations. 3.22 Define abstraction and idealization in your own words. What is the difference between them? 3.23 Choose one of the models we have discussed in this chapter. Say which model you’ll focus on and what target system(s) it represents. Then, formulate a list of the abstractions involved in using that model to represent this system and a separate list of the idealizations involved in using that model to represent this system. You’ll need to think beyond what’s actually said about the model, considering especially the differences between the model and its target(s).

Copyright © 2018. Taylor & Francis Group. All rights reserved.

3.24 Describe in your own words all five of the desirable features of models characterized in the last part of this section. Then, compare the classic game theory mathematical model of the prisoner’s dilemma and the computer model of the iterated prisoner’s dilemma on each feature. For each feature, write down whether you think one model is better and which one, if you think the two models tie, or if you don’t have enough information to decide. In all cases, explain your answer. 3.25 Consider your answer to 3.24. Describe a purpose you think the classic game theory model of the prisoner’s dilemma would serve better than the computer model of the iterated prisoner’s dilemma. Then, describe a purpose you think the computer model of the iterated prisoner’s dilemma would serve better than the classic game theory model of the prisoner’s dilemma. 3.26 Scientists have constructed models of atoms, genetic lineages, economies, rational decisions, traffic, forest fires, and climate change. Locate and investigate a scientific model we have not discussed in this chapter. a. Identify the type of model it is and what target system(s) it’s used to represent. b. Describe how the elements of the model represent features of the target system(s). c. Describe what scientists have learned about the target system(s) from the model. d. Why is this model a helpful way for scientists to investigate this phenomenon? In answering this question, think back to the challenges of experimentation discussed in Chapter 2, the advantages of modeling discussed in 3.1, and the desirable features of models discussed in 3.3.

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

124

Models and Modeling

FURTHER READING

Copyright © 2018. Taylor & Francis Group. All rights reserved.

For more on the use of models in science, see Weisberg, M. (2013). Simulation and similarity: Using models to understand the world. Oxford: Oxford University Press. For more on mechanistic models, see Glennan, S. (2005). Modeling mechanisms. Studies in History and Philosophy of Biology and the Biomedical Sciences, 36, 443–464. For a discussion of computer modeling and attention to climate change models, see Winsberg, E. (2010). Science in the age of computer simulation. Chicago: University of Chicago Press. For a more general discussion of computational methods in science, see Humphreys, P. (2004). Extending ourselves: Computational science, empiricism, and scientific method. Oxford: Oxford University Press. For a classic treatment of scientific modeling, and especially models’ relationship to analogies, see Hesse, M. (1963). Models and analogies in science. London: Sheed & Ward. For more on how models represent target systems, see Giere, R. (2004). How models are used to represent reality. Philosophy of Science, 71(Suppl.), S742–S752. For an account of idealization and how it influences science, see Potochnik, A. (2017). Idealization and the aims of science. Chicago: University of Chicago Press.

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:09:41.

CHAPTER 4

Patterns of Inference

4.1

DEDUCTIVE REASONING

After reading this section, you should be able to do the following: • • • • •

Summarize how the minimum age of the universe is inferred Describe reasoning, inference, and argument and explain how they are involved in science Define deductive inference, validity, and soundness Recognize and assess common patterns of deductive inference Analyze whether a criticism of a scientific inference provides logical grounds to question that inference

Copyright © 2018. Taylor & Francis Group. All rights reserved.

How Old Is the Universe? How old is the universe? One possible answer is that the universe is eternal. The ancient Greek philosopher Aristotle (384–322 BCE) developed several arguments in support of this conclusion. A plausible assumption is that everything that comes into existence requires some underlying matter from which it comes. Aristotle then reasoned as follows. If the universe came into existence and is not eternal, then it came into existence from some pre-existing material substratum. Now, either this material substratum is itself eternal or it is not eternal. If the material substratum from which the universe came into existence is not eternal, then the substratum must have come into existence from some other pre-existing material substratum. But then the same reasoning can be applied to this  other pre-existing substratum, which is either eternal or came into existence from some other pre-existing substratum. Hence, if the universe is not eternal, an infinite regress arises; the sequence of reasoning never terminates. Each purported material substratum itself requires another substratum from which it comes. Aristotle concluded that matter must be eternal and that the universe did not have any beginning. From the early Middle Ages (roughly the 7th century) to the end of the Renaissance (roughly the 16th century), scholars and theologians continued to engage with questions about the age of the universe. The structure of Aristotle’s reasoning was largely kept, but the eternality of the universe was replaced by the eternality of God in order to fit with various creation stories. The universe itself was often estimated to have come into existence around 4,000 BCE (that is, 6,000 years ago). The estimate was derived

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

126

Patterns of Inference

Copyright © 2018. Taylor & Francis Group. All rights reserved.

from arithmetical calculations based on genealogical records in various religious texts, and remained prevalent through the 18th century. By the 19th century and early 20th century, most scientists believed that the universe is eternal and unchanging—that is, in a steady state. In the 1920s, the American astronomer Edwin Hubble (1889–1953) made two discoveries that were inconsistent with that belief. Using a telescope with a 2.5-meter aperture at

FIGURE 4.1

Edwin Hubble at Mt. Wilson Observatory

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Patterns of Inference

127

Mount Wilson Observatory in Southern California, Hubble discovered evidence that the universe is much larger than people previously thought and that the universe is expanding. Pointing the telescope toward the Andromeda Nebula, Hubble saw stars similar to those nearer to Earth, only dimmer. One of those was a Cepheid variable, a star whose brightness as seen from the Earth changes periodically. Hubble knew of the relationship between the period of time it takes a Cepheid’s brightness to change and the luminosity of the star, which is the total amount of energy it emits in one second. Thus, from the period of the Cepheid, Hubble could calculate its luminosity, thereby determining how much brighter it was than the Sun. Light travels at a constant speed of about 300,000 kilometers per second. Over the course of a year, light travels nearly 9.5 million gigameters (a gigameter is one billion meters); this distance is one light-year. Furthermore, the apparent brightness of a star— that is, how bright a star appears to be as seen from a distance—depends on the distance to the star. Once this relationship is known, it can be used, along with knowledge of the speed of light, to determine the distances to stars and faraway galaxies. Hubble did just that: he used his knowledge of the relationships between light’s speed of travel, the apparent brightness of a star, and its distance to calculate the Cepheid’s distance from Earth. Based on the distance of that Cepheid variable, Hubble reasoned that Andromeda was in fact a different galaxy from our galaxy, the Milky Way. This discovery, announced in 1925, demonstrated that the universe is much larger than had been thought. Hubble also demonstrated that the universe has not always been this large. It’s expanding. His reasoning started from the claim that light, like sound, will change its frequency depending on the relative movement of the object emitting it and the observer. An example is the change in frequency of an ambulance siren as it moves toward, and then away, from an observer. The siren sounds higher pitched as it approaches, and then lower pitched once it has passed. This frequency change, called the Doppler effect, was discovered in the mid-19th century by the Austrian physicist Christian Doppler (1803–1853). It has proven useful in a number of scientific investigations. For Hubble’s purposes, the important implication was that a star moving away from Earth appears redder, while a star moving toward Earth appears bluer. The degree of redness of receding stars is called redshift. Using the technique of astronomical spectroscopy, Hubble discovered that the redshift of starlight from any galaxy increased in proportion to the galaxy’s distance from Earth. This indicates that galaxies are moving further and further away from Earth. In 1929, Hubble announced these findings, which suggest that the universe is expanding. This is now known as ‘Hubble’s Law’. According to recent estimates, the universe’s expansion rate, known as ‘Hubble’s constant’, is about 70 kilometers per second per megaparsec (km/sec/Mpc), where 1 megaparsec (Mpc) is approximately three million light-years—an extremely long distance! So, Hubble showed that the universe is not only much larger than previously estimated but also expanding. But how do these findings bear on the question of the age of the universe? The answer again concerns the relationship between time and the movement of starlight through space. The simple fact that astronomers like Hubble can observe stars from very distant galaxies indicates something about the age of those stars and thus about the age of the universe containing them too. No star can be older than the universe. So, we can estimate the minimum age of the universe on the basis of the age of the most

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

128

Patterns of Inference

distant stars we can observe. In this way, Hubble was able to show that the universe was at least 10 billion years old. Currently, the furthest objects that deep space telescopes have detected are approximately 13.8 billion light-years away. Therefore, the universe must be at least 13.8 billion years old. This finding has also been supported by convergent evidence from sciences like cosmological physics and geochemistry. The previous three chapters have focused in part on the importance of empirical evidence in science. And indeed, empirical evidence is essential for developing scientific knowledge. But for observations to lead to knowledge, scientists must assess their significance and implications, and the relationships among them. In other words, scientific knowledge comes not from mere observation, but from reasoning about observations. Aristotle sought to establish that the universe is eternal by showing that the denial of this would lead to an absurd infinite regress. Hubble combined empirical observations with calculations of light’s travel over distances and through time to support a precise estimate of the universe’s age. Hubble appealed to empirical evidence in ways Aristotle did not, but both reasoned their way to conclusions.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Reasoning, Inference, and Argument In this chapter, we focus explicitly on patterns of inference in scientific reasoning. This will help us better see how reasoning is involved in the development of scientific knowledge from a basis in empirical evidence. Reasoning is a psychological process, which cognitive psychologists divide into the operations of two cognitive systems: System 1 and System 2. As the Israeli-American psychologist and Nobel Prize winner Daniel Kahneman characterizes them, System 1 operates automatically and quickly, with no conscious mental effort needed. System 2 operates slowly, engaging working memory and allocating attention to effortful mental activities, accompanied by a sense of voluntary control (Kahneman, 2011). Scientists, like everyone else, reason with both of these systems. There are famous cases of creative ‘Eureka!’ moments in which a scientist suddenly grasped some solution or conclusion. Recall from Chapter 1, for example, Friedrich August Kekulé’s fever dream of a snake biting its tail that led to the discovery of the ring-structure of the benzene molecule. Nonetheless, as we have seen, science is a collaborative social process of giving and taking reasons. It thus mainly engages System 2 processes. This means that most scientific reasoning is slow and deliberative. Hubble labored with other astronomers and assistants to collect and interpret data over many years. The argument from this data set to the conclusion that the universe is expanding was also developed and refined over time, with not just Hubble but many other astronomers contributing (Kragh & Smith, 2003). Scientific reasoning involves the application of broad reasoning skills to the concerns and content of science: to greenhouse gases, light-years, molecules, ecosystems, and, as in Kahneman’s work, even to reasoning processes themselves. We have already encountered many examples of scientific reasoning. These include, to name a few, reasoning from large-scale carbon release during the last two centuries to the dramatic increase in the average global temperature (Chapter 1); reasoning from the temperature of colored lenses to the hypothesis that light colors vary in temperature (Chapter 2); and reasoning from the results of modeling the San Francisco Bay to the rejection of the Reber Plan

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

Patterns of Inference

129

(Chapter 3). Chapter 4 began by describing how scientists reasoned from the speed of light and observation of distant astral bodies to the conclusion that the universe must be at least 13.8 billion years old. Deliberative scientific reasoning involves making and evaluating inferences, and inferences are the backbone of any argument. An inference is a logical transition from one thought to another that obeys abstract rules. Whereas reasoning, as we’ve characterized it, is a psychological process, the features of inference are instead logical. An argument is a set of statements (stated propositions) with inferential structure. You might think of an argument as a set of instructions for performing inferences to reason your way to some conclusion. This differs from the everyday use of the word argument to mean bickering—a quarrel one might have with friends or family. An important part of scientific work is reasoning from empirical evidence in ways that involve logical inferences, and assembling arguments reflecting the structure of those inferences. Making inferences and assembling arguments requires being able to distinguish the roles of premise and conclusion. The premises of an argument are statements that provide rational support, the basis for inference. The conclusion of an argument is the statement that is supported by the premises, the endpoint of an inference. For example, recall Aristotle’s reasons for thinking that the universe is eternal. These can be reconstructed into an argument as follows: 1. 2. ∴

3. 4.

Copyright © 2018. Taylor & Francis Group. All rights reserved.



5.

If the universe is not eternal, then the universe came into existence. Everything that comes into existence requires some pre-existing material substrate. If the universe is not eternal, then some material substrate existed before the universe came into existence. It cannot be the case that some material substratum existed before the universe came into existence. The universe is eternal.

The argument is written as an ordered list of statements. The first four statements are the premises of the argument; the argument’s conclusion is the last statement in the list. Statements inferred from one or more premises are marked with the symbol ‘∴’, which is notation symbolizing words like therefore, so, or hence. As this example shows, an argument may involve more than one inference. The inference to the third statement is made from the first two premises, and the inference to the fifth statement—the argument’s conclusion—is made from the third and fourth premises. Scientific reasoning involves three main patterns of inference: deductive, inductive, and abductive. An argument is a deductive argument when the relationship of its premises to its conclusion is purportedly one of necessitation: the premises should together guarantee, or make necessary, the conclusion. Inductive and abductive inferences are non-deductive; the premises do not guarantee the conclusion, but they still give reason to infer the conclusion. Inductive and abductive reasoning play a more central role in scientific reasoning than deductive reasoning. We discuss these patterns of inference in Section 4.3, and they also relate to the main topics of Chapters 5–7. But for now, let’s concentrate on deductive inference.

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

130

Patterns of Inference

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Conditional Statements Statements of the form ‘if …, then …’ are crucial elements of inferential reasoning. These if/then statements are called conditional statements because one circumstance is given as a condition for another circumstance. As an intuitive example, imagine the parents of a young child asking her to eat her vegetables in order to get dessert: ‘If you eat your broccoli, then you can have dessert’. The child then knows the ticket to dessert—shoveling down that broccoli! The first circumstance, following the ‘if’, is called the antecedent. This is the condition upon which the other circumstance is introduced. The second circumstance, following the ‘then’, is called the consequent. This is the condition that arises from or hinges upon the introduction of the antecedent. The latter term is closely related to the word consequence, and in the previous example, it is just that: getting dessert is a consequence the parents commit to on the basis of the antecedent condition, eating the broccoli. Antecedent means existing prior to, coming first in time, and also being logically prior. But for conditional claims, only the last meaning is relevant. Nothing guarantees that an antecedent will come before its consequent. For example, consider the conditional statement, ‘If Piet is a dog, then Piet is an animal’. This is a true conditional, because being an animal is a guaranteed consequence of being a dog. But unlike broccoli and dessert, being a dog doesn’t come before being an animal. Instead, in this example, if the antecedent is true, the consequent is simultaneously true. Time-ordering of the antecedent and consequent can also be reversed. For example, ‘If you are hungry now, then you must not have eaten enough dinner’. In this case, the consequent (not eating enough dinner) happened before the antecedent (being hungry now). But the antecedent is still logically prior: being hungry is the condition placed on not eating enough dinner. A good way to think about the logical relationship between antecedents and consequents is in terms of requirements and guarantees, or, more formally, in terms of necessary and sufficient conditions. For a conditional statement to be true, the antecedent occurring guarantees that the consequent also occurs. The antecedent is thus a sufficient condition for the consequent. Consider again the conditional statement, ‘If Piet is a dog, then Piet is an animal’. Piet’s being a dog guarantees that Piet is also an animal; being a dog is sufficient for being an animal. This doesn’t work in reverse. For a true conditional statement, the consequent occurring doesn’t guarantee the antecedent will occur. Piet might be an animal but not a dog; the consequent might be true but the antecedent false. Instead, the consequent is a requirement, or a necessary condition, for the antecedent. Piet’s being an animal is a requirement placed on Piet being a dog but no guarantee that he is one. TABLE 4.1

Conditional statements

Standard Form

Name

Concept

Condition Type

If A, …

Antecedent

Basis for a guarantee or requirement

Sufficient condition for C

… then C

Consequent

What is guaranteed or required

Necessary condition for A

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

Patterns of Inference

131

Let’s consider a conditional statement important in Hubble’s reasoning about the age of the universe. Recall that Hubble calculated the age of distant stars to be greater than 10 billion years old, and he reasoned from this that the universe was at least of that age. Put in the form of a conditional statement, the idea was ‘If a star is more than 10 billion years old, then the universe must be more than 10 billion years old’. This is the claim that a sufficient condition for the universe to be 10 billion years old is that some star in the universe is 10 billion years old. Put in reverse, if the universe weren’t that

Box 4.1 Conditionals Scientific inquiry as well as everyday reasoning often involves the formulation and evaluation of conditional statements about the relationships among objects, states, and events. For example, if water is heated to 120º Celsius, it boils. One complication is that conditionals can be, and often are, expressed in various non-standard forms. Instead of if A then C, one might say, equivalently (where A is still the antecedent and C the consequent):

Copyright © 2018. Taylor & Francis Group. All rights reserved.

C if A A only if C A guarantees C Without C, A is not the case Not A unless C And there are still further ways to express this same conditional relationship. One approach to navigating these non-standard forms is to understand the meanings of their parts. Suppose somebody states that you have an identical twin only if you have a sibling. What must be the case for this statement to be true? What about for it to be false? Without a sibling, you can’t have an identical twin sibling; having a sibling is necessary for having a twin. Consulting Table 4.1, you’ll see that a necessary condition is a consequent. So, this statement was the same as saying that if you have an identical twin, then you have a sibling. If it were possible to have an identical twin without having siblings, then the statement—either in its original form or the if/then formulation—would be false. But every other circumstance—having a twin sibling, having a sibling but no twin, or having no twin and also no sibling—is consistent with this statement being true. This might aid in navigating conditional claims disguised in other formulations. There’s also a point here about when conditional claims are true. Put abstractly, conditional statements are only false when the antecedent A can be true while the consequent C is false. If parents told their child that if she eats her broccoli then she’ll get dessert, and she eats her broccoli, but then they withhold dessert, they were lying. (Which isn’t a very nice thing to do to a little kid who wants dessert!) This kind of conditional, the only kind we’ve discussed, is called a material conditional.

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

132

Patterns of Inference

old, it couldn’t contain any objects that old. The universe being 10 billion years old is thus a requirement for any star to be that old. Notice, in contrast, that finding out the universe is a certain age would not guarantee that any star is that old. It is possible that the universe is, say, 15 billion years old and all stars are younger. The universe having a given age is a necessary condition for there to be a star of that age, but it is not a sufficient condition.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Evaluating Inferences Scientific reasoning can be evaluated as good or bad based on the abstract rules and formal properties of the inferences involved. The study of the rules and patterns of good and bad inference is called logic. Logic is a subject that can, and does, fill many textbooks. We’ll keep our discussion here as brief as possible, but some basic ideas of logic are important for understanding successful scientific reasoning. The evaluation of both deductive and non-deductive inferences focuses on two main questions. First, are the premises sufficient to rationally support the conclusion? And second, are those premises true? The first question assesses the logical relationship between premises and conclusion, the grounds for inference. The second question assesses the status of the inference’s premises themselves. Good inferences answer both questions affirmatively: there is good reason to believe that all premises are true, and together, those premises provide sufficiently good reason to infer that the conclusion is true. The premises of a good inference should together provide a logically compelling reason for thinking the conclusion either must be true (in deductive inference) or is likely to be true (in inductive and abductive inference). When the truth of the premises of a deductive inference guarantees the truth of the conclusion, the inference has the property of being valid. This term has several different meanings. In one non-technical use, it simply indicates something is reasonable or understandable. In Chapter 2, we discussed the external and internal validity of experiments; this is another meaning of validity. Here, in the context of deduction, validity has a technical definition different from these other meanings. A deductive inference is valid just when the truth of the premises logically guarantees, or necessitates, the truth of the conclusion. In a deductively valid inference, it is impossible for the conclusion to be false provided that the premises are true. To assess whether a deductive inference is valid, first suppose all of its premises are true. You should imagine those premises are the only things you know about the world. Then, ask yourself whether there is any possible way the conclusion could be false. If there is any way for the conclusion to be false while the premises are true, say, by imagining strange things about the world, then the inference is invalid. If not, if the truth of the premises alone guarantees the truth of the conclusion, the inference is valid. Any deductive inference is either valid or invalid. A valid deductive argument cannot be made more valid, or rendered invalid, by adding more premises. This property of deductive reasoning is called monotonicity. Reasoning is monotonic if the addition of new information never invalidates an inference or forces the conclusion to be retracted. For this reason, deductive arguments are rock- solid; you might be wrong about a starting point—one or more of your premises might be false—but if you have

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

Patterns of Inference

133

a valid inference, you can be absolutely certain that your premises (if true) guarantee your conclusion. Some patterns of deductive inference are common enough to have been given names. For example, one of the most basic patterns of deduction is affirming the antecedent of a conditional statement (also known by its Latin name modus ponens). This is when a conditional statement and its antecedent are used as premises for concluding the consequent must be true. For example, 1.



2. 3.

If a star is more than 10 billion years old, then the universe must be more than 10 billion years old. This star is more than 10 billion years old. The universe must be more than 10 billion years old.

Another elementary form of reasoning is called denying the consequent of a conditional (also known by its Latin name modus tollens). This is when a conditional statement and the negation of its consequent are used as premises for concluding the antecedent must be false. For example, 1.



2. 3.

If the universe is in a steady state, then astral bodies remain the same distance from one another. It is not the case that astral bodies remain the same distance from one another. It is not the case that the universe is in a steady state.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Each of the previous two arguments is deductively valid. The premises may not be true. But if they were true, they would logically guarantee that the conclusion must also be true. This holds for every other instance of these general patterns of inference. No matter how long and deep you think, you will not be able to find an instance of either pattern that is invalid. Affirming the consequent and denying the antecedent, as general patterns of deductive inference, can be expressed as follows. (‘It is not the case that’ can be indicated with the negation sign ‘¬’.)



1. 2. 3.

If A, then C A C

1. 2. ∴ 3.

If A, then C ¬C ¬A

Keep in mind that to have a valid argument, it is not enough to start with all true premises and to have a true conclusion. Rather, the truth of the premises must force the conclusion to be true; there must be no way around having a true conclusion (if the premises are true). Consider another example:



1. 2. 3.

Cats are mammals. Tigers are mammals. Tigers are cats.

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

134

Patterns of Inference

Both premises are true: cats and tigers are kinds of mammals. The conclusion is true as well: tigers are one kind of cat. But this is an invalid inference. Even though every statement comprising it is true, the truth of the conclusion isn’t guaranteed by the truth of the premises. To see this, substitute in ‘dogs’ for ‘cats’ in the argument. (Remember you can do whatever you want, other than making a premise untrue, to try to make the conclusion come out false. If you can accomplish this, the argument is invalid.) With this substitution, the two premises are still true, but the conclusion is not. The inference is invalid. Here’s one more argument:



1. 2. 3.

The Earth is 6,000 years old. Buenos Aires is in South America. The Earth is 6,000 years old and Buenos Aires is in South America.

This argument is valid. If both premises were true, then the conclusion must also be true. There is no possible way for both premises to be true but the conclusion false. Of course, the premises aren’t both true. Buenos Aires is in fact in South America; but the age of the Earth is approximately 4.54 billion years. So, even though this is a valid argument, we don’t have good reason to believe the conclusion. The previous two examples illustrate that valid arguments can have false premises and conclusions and invalid arguments can have true premises and conclusions. The best deductive inferences are those that combine both validity and truth. These inferences are sound. A sound inference is a valid deductive inference with all true premises. Being valid rules out inferences like the cats and tigers example, where the conclusion is only accidentally true. Having all true premises rules out inferences like the Earth and Buenos Aires example, where the inference is valid but the conclusion is nonetheless false because one or more premises are false. A sound deductive inference takes all the guesswork out of establishing proof for a claim. If you know both that all the premises are true and that the inference is valid, then you know that the conclusion must be true. No additional evidence or reasoning can change that. If it does, then either you didn’t actually have a valid deductive inference, or you didn’t actually know that all the premises are true. Thus, if scientists know some inference is sound, they can be certain that the conclusion is true beyond a shadow of a doubt.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Uncovering Bad Arguments Whether or not someone is persuaded by another’s reasoning is mainly a matter of human psychology. People can fall for bad arguments, or they may not be persuaded by good ones. But whether a deductive inference is good or bad is simply a matter of logic and truth. The two main criticisms that can be made of a deductive argument are that (i) its premises are false and that (ii) the conclusion isn’t validly inferred from the premises. When evaluating a deductive argument, one should determine whether either or both of these criticisms apply. And it is here that psychological reasoning and logical inference intersect. If you think an argument is faulty on one or both of these grounds, you should consider whether it can be repaired by replacing any false premises with true ones or whether additional premises could be supplied such that there is a valid argument for the conclusion. The valid inference patterns involving conditional statements discussed earlier—affirming the antecedent and denying the consequent—have related invalid inference patterns

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

Patterns of Inference

135

that result from confusing the roles of necessary and sufficient conditions in conditional statements. Denying the antecedent occurs when a conditional statement and the negation of its antecedent are used as premises for concluding that the consequent must be false as well. Here is an argument that commits the error of denying the antecedent: 1.



2. 3.

If a star is more than 15 billion years old, then the universe is more than 15 billion years old. No star is more than 15 billion years old. It’s not the case that the universe is more than 15 billion years old.

This is an invalid argument. Even if the first two premises are true, that doesn’t guarantee the conclusion is also true. As we have seen, the age of the oldest star is just a minimum age for the universe. The conditional statement in the first premise reflects this, as the consequent (the age of the universe) is a requirement for the antecedent (the age of the oldest star). The antecedent guarantees the consequent but not the other way around. So, denying the antecedent, as the second premise does, provides no good reason to believe that the consequent is the case, but it doesn’t demonstrate that the consequent is not the case either. Affirming the consequent occurs when a conditional statement and its consequent are used as premises for concluding that the antecedent must also be true. Here is an argument that commits the error of affirming the consequent: 1.

Copyright © 2018. Taylor & Francis Group. All rights reserved.



2. 3.

If the Andromeda Nebula is 13.8 billion light-years away, then the universe is at least 13.8 billion years old. The universe is at least 13.8 billion years old. The Andromeda Nebula is 13.8 billion light-years away.

This is also an invalid argument. Both premises are true, but they don’t guarantee the truth of the conclusion. Some specific astral body that we can view from Earth being 13.8 billion light-years away does guarantee the universe is at least 13.8 billion years old, but this is not required for the universe to be that old. The conclusion here is in fact false, since Andromeda is around 10 billion light-years away. Situations that you can describe, whether real or imagined, in which the premises of an argument are true but the conclusion is false are called counterexamples to the argument. Counterexamples demonstrate that an argument or inference is invalid. So far, the defects in reasoning we have seen are with the form of the inference. But sometimes the problem with an inference is an empirical one, not a logical one. Sometimes, even when an argument is valid, the world doesn’t cooperate with the statements made about it. This is one place where the detective work of science often comes in. Consider, for example, the following argument about atoms (recall also that the word atom means indivisible, from the Greek a- + temnein, meaning not + to cut.)



1. 2. 3. 4.

The word atom means indivisible. If the word atom means indivisible, then atoms are indivisible. If atoms are indivisible, then atoms are the smallest type of matter. Atoms are the smallest type of matter.

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

136

Patterns of Inference

This is a valid argument, which involves affirming the antecedent—a valid inference pattern. Given premises 1 and 2, it follows that atoms are indivisible; and from the conjunction of that claim with premise 3, it follows that atoms are the smallest type of matter. The problem, of course, is that scientists discovered particles smaller than atoms over a century ago. Electrons were discovered in 1897, followed by the subsequent discoveries of protons, neutrons, neutrinos, positrons, muons, bosons, and hadrons, which are all smaller than atoms. These discoveries show the conclusion to be false: atoms are not the smallest type of matter. So, the argument is not sound. Because the argument is valid, learning that the conclusion is false also tells us something about the premises: at least one of the three premises is also false. Can you figure out which is to blame? We have seen that reasoning can falter because of a defect in the form of the inference, or because they accidentally contain a false premise. In other cases, the defect in reasoning owes to an informal fallacy, which is a faulty inference pattern where the defect in reasoning lies with the inference’s content rather than its form, and which goes beyond just merely have false premises. Unfortunately, there is no fully unified theory of informally fallacies, nor any universally agreed upon definition (Walton, 1989/2008); and there are hundreds of such fallacious patterns. Here are a few that are unfortunately common in debates about science. The strawman fallacy involves caricaturing someone’s thoughts in order to criticize the caricature rather than the actual thoughts. Here is an example:

Copyright © 2018. Taylor & Francis Group. All rights reserved.



1. 2. 3.

Evolutionary theory claims that humans recently evolved from monkeys. The idea that humans recently evolved from monkeys is clearly wrong. Evolutionary theory is clearly wrong.

This argument seems to be an instance of affirming the antecedent, which is a valid inference pattern. But the argument misrepresents evolutionary theory, so premise 1 is false. (Evolutionary theory instead posits, among other things, that humans and apes share a common ancestor several million years ago.) This is an instance of the strawman fallacy because evolutionary theory is misrepresented in order to claim it is clearly wrong. The complexity of many scientific theories makes them easy targets for the strawman fallacy. Another common error in reasoning about science is called appeal to irrelevant authority. For example, the pseudoscientific pronouncements of scientologists—a waning religious cult from the 1950s—often appeal to L. R. Hubbard’s book Dianetics. Hubbard, however, had no expertise in any academic subject whatsoever. Appeals to his book are poor grounds for scientific conclusions about well-being, mind, or the cosmos. It’s sometimes difficult to assess whether some authority is legitimate. For example, sometimes genuine experts in one scientific field make pronouncements about other fields in which they have no authority. Uncovering appeals to irrelevant authority thus can require careful analysis of credibility. This relates to Chapter 1’s discussion of how politicians should not be viewed as experts on climate change science and the broader issues about expertise introduced there. Finally, appeal to ignorance is another informal fallacy. Arguments that commit this fallacy conclude that a certain statement is true because there is no evidence proving that it is not true. For example,

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

Patterns of Inference

1. ∴

2.

137

There is no compelling evidence that the pyramids were not built by extraterrestrial creatures. The pyramids were built by extraterrestrial creatures.

Plainly, this is a bad inference. Indeed, there’s a slogan that ‘absence of evidence is not evidence of absence’. In other words, not having evidence that something is true isn’t necessarily reason to think it isn’t true. For this example, we can imagine things that might provide evidence that the pyramids were built by extraterrestrial creatures, but it’s hard to even imagine how we could provide evidence that they weren’t. More generally, a lack of empirical evidence in support of some scientific claim is usually reason not (yet) to believe the claim is true. But this is generally not grounds for declaring the claim false, for the lack of evidence may say more about the limits of our scientific knowledge than how the world really is. The fallacy of appealing to ignorance highlights three interesting features of reasoning. First, it is generally easier to prove that something is the case than that it is not the case. Perhaps it would be better to examine evidence for who did in fact build the pyramids than to simply look for evidence that it wasn’t aliens. Second, the burden of proof, or the obligation to provide evidence in support of a belief, generally lies with the person who makes an assertion. So, if you assert that the pyramids were built by aliens or that genetically modified foods are risky for human health, then you should be able to provide evidence in support of your assertion when asked to do so. Third, the more extraordinary a statement is, the more evidence it requires. When a chemist asserts that a solution must be acidic because the litmus paper turned bright red, there is usually little need to ask her how she knows that the color was red. Extraordinary claims, however, such as that all life on Earth has evolved from a single common ancestor, require a lot of evidence. The English naturalist, geologist, and biologist Charles Darwin (1809–1882) spent years assembling evidence for his theory of evolution and common ancestry, and many scientists following Darwin have added and improved upon that store of evidence.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Bad Reasons to Reject Inferences Keeping a lookout for invalid inference patterns, false premises, and informal fallacies can help uncover bad arguments. These, as well as the valid inference patterns we have discussed, are summarized in Table 4.2. But these logical and empirical reasons to challenge some arguments should be carefully distinguished from the negative psychological reactions some arguments can evoke. These reactions are usually not reasons to reject an inference, but they may inhibit the recognition of sound reasoning. Some scientific findings and inferences can be counterintuitive or difficult to understand. But this, by itself, is not grounds for rejecting the finding or inference. A person with limited background in evolutionary theory may find it difficult to imagine how humans could have evolved ultimately from single-celled organisms. Similarly, without training in physics and cosmology, it can be difficult to wrap your head around the universe being over 13.8 billion years old and expanding out from an initial Big Bang. But evolutionary theory and cosmological research, including the Big Bang theory, provide solid grounds for accepting the truth of both of these bewildering claims. Just as a claim’s intuitiveness is not a guide to whether it is true, an argument’s difficulty or complexity is irrelevant to whether the inferential structure of the argument is any good.

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

138

Patterns of Inference

Likewise, whether someone finds the conclusion of an argument distasteful, offensive, or disagreeable is irrelevant to whether that conclusion is true. The conclusion that global warming is caused by human activity is politically inconvenient for friends of the fossil fuel industry, including many politicians. This has motivated some of those individuals to cast doubt on the finding, and they’ve been incredibly successful at creating public doubt about climate change. These disinformation campaigns often have pointed to the mere occurrence of disagreement as a reason for doubting climate change, as well as other unwelcome scientific findings. But the evidence and structure of inferences supporting anthropogenic climate change are incredibly strong. Some people object to the idea that the universe is billions of years old, sometimes suggesting this is ‘just an opinion’. Similarly, skeptics of evolutionary theory love to point out that it is ‘just a theory’ that biological species have evolved from a common ancestor. These are also bad objections. Natural phenomena, and natural explanations of those phenomena, are not simply a matter of opinion. And scientific theories are developed on the basis of a tremendous amount of confirming evidence and careful inference. These criticisms are not based on disagreements about evidence or the logic of arguments, but instead appeal to the trivial fact that people have different ideas about some things. Ideas that are supported by evidence and sound inference should be taken seriously.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

EXERCISES 4.1

Define reasoning, inference, and argument, and describe how they are involved in science (even though science is based on empirical evidence).

4.2

The following statements concern necessary and sufficient conditions. For each statement, rephrase it in the form of a standard if/then conditional statement and say whether it’s true or false. 1. Being a mammal is a sufficient condition for being human. 2. Being human is a sufficient condition for being an animal. 3. Being alive is a necessary condition for having a right to life. 4. Being alive is a sufficient condition for having a right to life. 5. Having a PhD is necessary if you want to be a scientist. 6. It’s sufficient for being awarded the Nobel Prize in immunology that one generates the cure for cancer.

4.3

Rephrase each of the following statements into standard conditional statements, and then say whether they’re true or false. 1. P is a sufficient condition for Q if it is true that if P then Q. 2. It is true that if P then Q, but only if Q is a necessary condition for P. 3. It is true that P only if Q, but only if P is a sufficient condition for Q. 4. Not Q is a sufficient condition for P if it is true that P unless Q. 5. Something is a brother if and only if it is a male sibling. So, being a male sibling is necessary for being a brother. 6. Something is a brother if and only if it is a male sibling. So, being a male sibling is sufficient for being a brother.

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

Patterns of Inference

139

Copyright © 2018. Taylor & Francis Group. All rights reserved.

4.4 Define deductive inference, validity, and soundness, and then answer the following questions. Explain each answer. a. Is every deductive argument valid? b. Is every deductive argument sound? c. Is every valid argument sound? d. Is every sound argument valid? 4.5

Rewrite each of the following arguments in standard form, with numbered premises and a conclusion. For each argument, say whether it is valid and whether it is sound. Give reasons to justify each of your answers. 1. LeBron James must be mortal. After all, all humans are mortal, and LeBron James is a human. 2. God is often characterized as the most perfect being. A perfect being must have every trait or property that it would be better to have than not to have. Since one of those properties is existence—that is, it is better to exist than not to exist—then God exists. 3. The number 1 is a prime number, and 3 is a prime number. So too are 5 and 7. Therefore, all odd integers between 0 and 8 are prime numbers. 4. Real Madrid has won more than 17 games every year for the past 30 years. So, you can safely bet Real Madrid will win more than 17 games this year. 5. The universe cannot be younger than 11 billion years old because the age of the oldest known stars is 11 billion years old. 6. The term tachyon refers to a particle that travels faster than light. Therefore, it’s not the case that nothing travels faster than light.

4.6

Come up with an example argument employing the inference pattern of affirming the antecedent. Do the same for denying the antecedent, affirming the consequent, and denying the consequent. For each argument, say whether it’s valid. For each invalid argument, provide a counterexample. For each valid argument, say whether it’s sound.

4.7

Describe the three informal fallacies outlined in this section. Give a new example of each. Try to think of a real instance you’ve encountered, but if you can’t, it’s fine to make up an example.

4.8

Review the passage about Hubble’s discoveries in the first part of this section. Summarize the inferences that led Hubble to conclude that the universe is over 10 billion years old.

4.9

Review the passage about Hubble’s discoveries in the first part of this section. Identify three conditional statements involved in Hubble’s inference that the universe is over 10 billion years old. (These might not be written in the text in if-then form, and some of the conditional claims involved in Hubble’s inference process might not even be explicitly written out.) Write out the three statements in standard if-then form.

4.10 Review the passage about Hubble’s discoveries in the first part of this section. Summarize the inferences Hubble made that led to the conclusion that the universe is expanding. Then, put that argument into standard form, with numbered premises and a conclusion. Are any premises needed for a valid deductive argument missing? If so, add them, even if they weren’t explicitly stated in the description of Hubble’s reasoning.

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

140

Patterns of Inference

4.11 Read the following passage, and try to understand the argument it makes. Anybody who wants to repeat an experiment in modern subatomic physics has to undergo many years of training. Only then will he or she be able to ask nature a specific question through the experiment and to understand the answer. Similarly, a deep mystical experience requires, generally, many years of training under an experienced master and, as in the scientific training, the dedicated time does not guarantee success. If the student is successful, however, he or she will be able to ‘repeat the experiment’. A mystical experience, therefore, is not any more unique than a modern experiment in physics. On the other hand, it is not less sophisticated either, although its sophistication is of a very different kind. The complexity and efficiency of the physicist’s technical apparatus is matched, if not surpassed, by that of the mystic’s consciousness—both physical and spiritual—in deep meditation. The scientists and the mystics, then, have developed highly sophisticated methods of observing nature which are inaccessible to the layperson. A page from a journal of modern experimental physics will be as mysterious to the uninitiated as a Tibetan mandala. Both are records of inquiries into the nature of the universe. (Capra, 1975, pp. 35–36) a. b. c. d.

What’s the conclusion of the argument developed in this passage? The passage draws an analogy between science and mysticism. What purpose does the analogy play in the argument? Assess the author’s reasoning. What are good points or inferences? What weaknesses are there in the author’s reasoning? Asses the author’s conclusion. Do you think the conclusion is right? Has the author given adequate grounds for believing the conclusion?

Copyright © 2018. Taylor & Francis Group. All rights reserved.

4.12 Read the following passage, and try to understand the argument it makes. An electron is no more (and no less) hypothetical than a star. Nowadays we count electrons one by one in a Geiger counter, as we count the stars one by one on a photographic plate. In what sense can an electron be called more unobservable than a star: I am not sure whether I ought to say that I have seen an electron; but I have just the same doubt whether I have seen a star. If I have seen one, I have seen the other. I have seen a small disc of light surrounded by diffraction rings which has not the least resemblance to what a star is supposed to be; but the name ‘star’ is given to the object in the physical world which some hundreds of years ago started a chain of causation which has resulted in this particular light-pattern. Similarly in a Wilson expansion chamber I have seen a trail not in the least resembling what an electron is supposed to be; but the name ‘electron’ is given to the object in the physical world which has caused this trail to appear. How can it possibly be maintained that a hypothesis is introduced in one case and not in the other? (Eddington, 1935/2012, p. 21) a. b.

What’s the conclusion of the argument developed in this passage? The passage draws an analogy between electrons and stars. What purpose does the analogy play in the argument?

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

Patterns of Inference

c. d.

4.2

141

Assess the author’s reasoning. What are good points or inferences? What weaknesses are there in the author’s reasoning? Assess the author’s conclusion. Do you think the conclusion is right? Has the author given adequate grounds for believing the conclusion?

DEDUCTIVE REASONING IN HYPOTHESIS-TESTING

After reading this section, you should be able to do the following: • • • •

Define hypothetico-deductive method Describe how an example of hypothesis-testing might be construed as an application of the H-D method Describe how auxiliary assumptions complicate the H-D method Characterize the axiomatic method and indicate how it’s been used in science

Copyright © 2018. Taylor & Francis Group. All rights reserved.

The Hypothetico-Deductive Method In Chapter 2, we learned that hypothesis-testing is a central part of experimental research. Testing hypotheses requires at least two ingredients: empirical evidence and rational inference. Empirical evidence is the primary source of justification for scientists’ hypotheses about the world, but rational inference is needed to evaluate hypotheses on the basis of the available evidence. One form that evaluation can take makes key use of deductive inference. This has been described as the hypothetico-deductive (H-D) method. The two parts of that name will be familiar at this point: hypothesis and deduction. In general, we have said that hypothesis-testing involves establishing expectations from a hypothesis, and then comparing those expectations with observations. On the H-D method, the expectations formulated on the basis of a hypothesis should be logically implied by that hypothesis using deductive inference. Hence, if the hypothesis is true, the expectation derived from it also must be true. Sound familiar? This is what’s required to have a valid deductive argument. That the truth of the hypothesis guarantees the truth of the expectations also means there is a conditional statement with the hypothesis as the antecedent and the expectation as the consequent: ‘If H, then E’. If we’ve formulated the expectations properly, this conditional statement will be true. We don’t yet know whether the hypothesis is true, but we do know that if the hypothesis is true, then the expectation will be true. This conditional statement can be thought of as an answer to the question: ‘If this hypothesis is true, what must be the case about the world?’ After deductively inferring expectations from the hypothesis, scientists make observations, perhaps by conducting an experiment. Those observations are then compared with the expectations. Here too the H-D method sees a role for deductive inference. If the observation does not match the expectation, that is, if the expectation is not observed, then this enables a deductive argument for the conclusion that the hypothesis is false. The inference pattern is denying the consequent, which we’ve learned is always a valid form of deductive inference:

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

142

Patterns of Inference

Refutation



1. 2. 3.

If H, then E ¬E ¬H

In this case, from the observations, we can deductively infer that the hypothesis is false. In other words, the observations refute the hypothesis. If instead the observations and expectations match, this enables the inference pattern of affirming the consequent. Careful—that was an invalid form of deductive inference! In this case, no deductive argument for or against the hypothesis is possible. A match between expectations and observations is consistent with the truth of the hypothesis, but it does not guarantee the truth of the hypothesis. If the evidence matches expectations, the hypothesis is confirmed, but if not, it is refuted. Confirmation

Copyright © 2018. Taylor & Francis Group. All rights reserved.



1. 2. 3.

If H, then E E Probably or possibly H

Let’s work through a really simple example. Imagine the hypothesis is that all swans are white. If it is true that all swans are white, then the swan you next observe will be white. This is a true conditional claim: the antecedent guarantees the consequent. So, you go out looking for swans, with the expectation that, if your hypothesis is true, you will see a white one. Let’s say you instead encounter a black swan. This observation violates your expectation; by denying the consequent, you’ve shown the antecedent (the hypothesis) is false. Breaking news: it’s not the case that all swans are white! However, if the next swan you see is white, then your observation matches the expectation. You haven’t proven anything, but you do have a bit more evidence in favor of the hypothesis. There is, then, a crucial difference between refutation and confirmation. Refutation is a valid deductive argument that demonstrates the hypothesis is false. In contrast, confirmation is not a deductively valid argument. The truth of the premises does not guarantee the conclusion is true. The argument scheme for confirmation shown here reflects this by concluding not H but ‘probably or possibly H’. An observation matching what a hypothesis leads us to expect generally is taken to provide some evidence for the hypothesis. But this isn’t always so, and it’s surprisingly tricky to articulate how this works. We will return to this difficulty later in the chapter.

The Case of Puerperal Fever A real instance of scientific reasoning famously used by the philosopher Carl Hempel (1905–1997) to illustrate the H-D method is the story of Dr. Ignaz Semmelweis (Hempel, 1966). Semmelweis was a scientifically trained doctor working in the 1st Maternity Division of the Vienna General Hospital in the 1840s, when many women delivering babies there were contracting a serious and often fatal illness. The illness was known as

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

Patterns of Inference

143

puerperal or childbed fever. (Puerperium refers to the postpartum period following labor and delivery.) A puzzling observation was that the mortality rate in the 1st Maternity Division was about three times higher than in the adjacent 2nd Maternity Division. These rates are shown in Table 4.3. Why was the rate of puerperal fever so much higher in the first clinic? An answer to this question might provide some insight into how to decrease the incidence of puerperal fever overall. Semmelweis (1861) made several observations that seemed potentially relevant. Women with dilation periods longer than 24 hours during delivery died of puerperal fever much more often. He also observed that patients in the first clinic fell ill in a sequential manner, one after another. The health of patients and the skill and care provided by their caretakers did not seem related to the incidence of puerperal fever. Finally, not only was the illness rate in the 2nd Maternity Division lower, but women who instead

TABLE 4.2

Valid inference patterns, invalid inference patterns, and informal fallacies

Some Valid Inference Patterns Affirming the antecedent: 1. If A, then C

Denying the consequent: 1. If A, then C

2. A

2. ¬ C

∴ 3. C

∴ 3. ¬ A

Some Invalid Inference Patterns Denying the antecedent:

Copyright © 2018. Taylor & Francis Group. All rights reserved.

1. If A, then C

Affirming the consequent: 1. If A, then C

2. ¬ A

2. C

∴ 3. ¬ C

∴ 3. A

Some Informal Fallacies Strawman fallacy: caricaturing a position or argument in order to criticize the caricature rather than the actual position Appeal to irrelevant authority: appealing to the views of an individual who has no expertise in a field as evidence for some view Appeal to ignorance: concluding that a certain statement is true because there is no evidence proving that it is not true

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

144

Patterns of Inference

Annual births, deaths, and mortality rates for all patients at the two clinics of the Vienna maternity hospital 1841–1846

TABLE 4.3

Year

First Clinic Births

Rate

Births

Deaths

Rate

1841

3036

237

7.70

2442

86

3.50

1842

3287

518

15.80

2659

202

7.50

1843

3060

274

8.90

2739

164

5.90

1844

3157

260

8.20

2956

68

2.30

1845

3492

241

6.80

3241

66

2

1846

4010

459

11.40

3754

105

2.70

20,042

1989

9.92

17791

691

3.38

Total Avg.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Deaths

Second Clinic

gave birth at home or elsewhere outside the clinic—even unattended on the street—were unaffected by puerperal fever. Semmelweis used these observations to rule out a number of proposed sources of the illness. Puerperal fever wasn’t a city-wide epidemic. If it were, women who gave birth outside the hospital would also suffer from the illness, but they didn’t. Nor was puerperal fever triggered by psychological traumas during childbirth, like intense modesty from being medically examined by male doctors (as had been proposed). If it were, surely some women who gave birth in the streets would also experience puerperal fever, but they didn’t. Most crucially, all proposed sources of the illness led to the expectation of equal rates of the illness in the 1st and 2nd Maternity Wards. That expectation did not match observations. So, reasoning in a way that is captured well by the H-D method of refutation, Semmelweis rejected all these hypotheses about the cause of puerperal fever. Semmelweis tried to develop hypotheses that were consistent with the observed difference in puerperal fever rates between the two maternity wards. One difference between the wards was that the 1st Ward was staffed by male doctors and medical students, while the 2nd Ward was staffed by female midwives. Women in the former gave birth on their backs, women in the latter on their sides. Semmelweis changed procedures in the 1st Ward so that all women there also gave birth on their sides. From the hypothesis that giving birth on one’s back increases incidence of the illness, one can deductively infer the expectation that changed birth position will decrease the incidence of the illness. Alas, this expectation did not match Semmelweis’s observation: changing birth position in the 1st Ward made no difference. Other hypotheses were similarly tested and similarly ruled out. Then, at the end of March 1847, Semmelweis learned that his colleague Dr. Jakob Kolletschka had died. Kolletschka was a professor of forensic medicine. He had been performing an autopsy on a woman who had died from puerperal fever when a scalpel had lacerated his finger. Kolletschka subsequently exhibited the same symptoms as the

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

Patterns of Inference

Copyright © 2018. Taylor & Francis Group. All rights reserved.

FIGURE 4.2

145

Frieze at the Social Hygiene Museum in Budapest, honoring Ignaz Semmelweis

mothers and infants who had died of puerperal fever. Semmelweis was distraught by his friend’s death, but he also saw the value of this information for the investigation of puerperal fever. He hypothesized that the scalpel had contaminated Kolletschka’s blood with ‘cadaverous particles’, and this caused the puerperal fever that led to his death. Semmelweis also realized that this was supported by the observation of the difference in illness rates between the two wards: doctors and medical students performed autopsies, whereas midwives did not. Semmelweis reasoned that if the hypothesis that cadaverous particles caused puerperal fever were true, then the illness could be prevented by eliminating the cadaverous particles. To test this hypothesis, he required all students and midwives to thoroughly wash their hands in a solution of chlorinated lime prior to examining patients. If this made no difference, then cadaverous particles weren’t to blame, and this new hypothesis would also be refuted. But, instead, the mortality from puerperal fever began to decrease, and the incidence in the 1st Ward dropped to a similar level as in the 2nd Ward. Semmelweis’s hypothesis was confirmed. This is a good illustration of the H-D method and in particular the difference between refutation and confirmation. Recall that, on the H-D account, refutation is decisive, as it is the result of a valid deductive inference, whereas confirmation is weaker. It turns out that Semmelweis’s confirmed hypothesis was wrong. Cadaverous material wasn’t responsible for puerperal fever; it was a bacterial infection of the uterus. Luckily, chlorinated lime is an antibacterial agent. Semmelweis thought the prescribed handwashing worked because it removed cadaverous material, but instead, it worked because it removed bacteria. Some other important instances of hypothesis-testing are also well described by the H-D method. Another example, which we encountered in Chapter 2, is the case of Arthur Eddington’s confirmation of Einstein’s theory of relativity from the 1919 solar eclipse. This was also a refutation of Newton’s cosmological theory. Einstein’s theory of general relativity, as you may recall, implies that light will bend around a massive object like the Sun. Newton’s theory also predicts light will bend because of gravity. However, the

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

146

Patterns of Inference

theory of general relativity implies that light will bend twice as much as the value predicted by Newtonian physics. Measuring how much light bends around the Sun allowed Eddington to refute Newtonian physics and provide some confirmation of Einstein’s theory of general relativity.

Auxiliary Assumptions The H-D method seems to accurately capture something important about hypothesistesting in science, namely the distinctive power of refutation. Data that fit our expectations are well and good, but we can really learn something from data that contradict our expectations. This also accords with the importance of hypotheses that are falsifiable, as outlined in Chapter 1. The power of refutation is also what makes the idea of crucial experiments compelling, as we discussed in Chapter 2 with the case of Newton’s prism experiments. Yet, the H-D method also has its limitations. We’ll close this discussion by describing one challenge to this account of hypothesis-testing; then, later in the chapter, we will survey two powerful alternatives based on non-deductive patterns of inference. The challenge to the H-D method is that the inference from a hypothesis to some expectation is never truly deductive. Or, more precisely, additional claims are needed in order to make a deductive inference from hypothesis to expectation valid. These additional claims include background assumptions about how the world works, what in Chapter 2 we called auxiliary assumptions. Lurking in the background of Semmelweis’s inference about handwashing, for example, was the assumption that handwashing would remove cadaverous material. Beyond Eddington’s refutation of Newtonian physics were a number of assumptions about the behavior of instruments, the properties of light, the location of certain astral bodies, and so on. Such auxiliary assumptions often go unnoticed, either because they are assumed to be true or, in some cases, simply because no one has noticed them. But because valid deductive inference requires the premises to guarantee the conclusion, these auxiliary assumptions are essential premises for the deductive inference from a hypothesis to some empirical expectation, a key component of the H-D method. So, the schemes we identified earlier for refutation and confirmation on the H-D account need to be adapted as follows:

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Refutation 1. If H and A, then E 2. ¬ E ∴ 3. ¬ H

Confirmation 1. If H and A, then E 2. E ∴ 3. Probably or possibly H

In this new formulation, the letter A stands for statements of whatever auxiliary assumptions are required as additional premises to validly deduce E from H. Required auxiliary assumptions may include background ideas about the phenomenon under investigation, as well as assumptions about the reliability of experimental instruments and measurement procedures. Taking into account auxiliary assumptions in the H-D schemes more realistically captures the type of reasoning that underlies hypothesis-testing. But this also introduced a new problem. The refutation scheme with ¬ H, or it’s not the case that H, as its conclusion

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

Patterns of Inference

147

is no longer a valid deductive argument. This is no longer an instance of denying the consequent. To fit that pattern, the argument’s conclusion must instead be: ∴

3′. ¬ (H and A)

This amounts to the statement that it’s not the case that both H and A are true. In other words, taking into account auxiliary assumptions, all you can deductively conclude from observations not matching expectations is that either the hypothesis is wrong or one or more auxiliary assumptions is wrong (or both). Because of the need for auxiliary hypotheses, the H-D method can’t provide a deductive argument that the hypothesis is false. This problem is known as the Duhem-Quine problem, named after the French physicist, mathematician, and philosopher of science Pierre Duhem (1861–1916) and the American philosopher and logician Willard van Orman Quine (1908–2000). One upshot of the Duhem-Quine problem is that deductive logic alone is insufficient for successful hypothesis-testing. In the face of refutation, scientists need to decide whether to give up on a hypothesis or to question one or more of their auxiliary assumptions. It seems there’s an element of choice. A scientist may well want to hold on to a hypothesis she likes and look for another explanation for why the observations didn’t turn out as expected. The hope of reasonably deciding whether to reject a hypothesis or an auxiliary assumption isn’t entirely destroyed by the Duhem-Quine problem. Scientists typically have independent evidence for many of their auxiliary assumptions. Instruments and measurement procedures have been tested and employed in other circumstances, and background beliefs about a phenomenon are often based on evidence. These considerations can be used to help scientists decide whether, and when, to reject the hypothesis under investigation. Yet the need for auxiliary assumptions limits the power of the H-D method of hypothesistesting. The Duhem-Quine problem makes clear that, just like confirmation, refutation is messier than simple deductive inference.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Axiomatic Methods Deductive inference plays a different kind of role in some fields of science. Progress in scientific reasoning is sometimes achieved through formal axiomatization, a constructive procedure by which statements are derived from foundational principles. The foundational principles, called axioms, are accepted as self-evident truths about some domain. The axioms are then used to deductively infer other truths about the domain, called theorems. The most venerable example of axiomatization comes from the Greek mathematician Euclid, who lived between the 4th and 3rd centuries BCE. Book I of Euclid’s Elements of Geometry begins with 23 definitions and five axioms. The five axioms are the following: 1. 2. 3. 4.

A straight line may be drawn between any two points. Any terminated straight line may be extended indefinitely. A circle may be drawn with any given point as center and any given radius. All right angles are equal.

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

148

5.

Patterns of Inference

If two straight lines in a plane are met by another line, and if the sum of the internal angles on one side is less than two right angles, then the straight lines will meet if extended sufficiently on the side on which the sum of the angles is less than two right angles.

Together, these five axioms form the premises of Euclidean geometry. From these premises, one can validly deduce theorems about the congruency of figures, parallel lines, and other results of Euclidean geometry. In turn, these theorems can be treated as premises in new arguments aimed at validly deducing new theorems. Euclid’s axiomatization of geometry was accepted as decisive for almost two millennia. It was a clear example of rigorous scientific reasoning grounded in first principles, with the power to systematize all existing knowledge of geometry. It deeply influenced Ibn al-Haytham’s work in optics and Newton’s physical theory of mechanics. Since the 19th century, however, non-classical geometries have been developed that diverge from Euclid’s axiomatization. Just as Euclid’s geometry was central to earlier physics and astronomy, these non-Euclidean geometries paved the way for Einstein’s radical new theories of the relativity of space and time. This implies that the geometry of physical space itself is not in general Euclidean. Another example of an important use of the axiomatic method concerns the foundations of arithmetic. Concerned with questions about the exact nature of numbers, the Italian mathematician Giuseppe Peano (1858–1932) employed axiomatic reasoning to give a rigorous foundation for the natural numbers (0, 1, 2, 3, 4, …). Peano’s axiomatization of natural numbers began with three primitive concepts, that is, concepts that were not defined in terms of other concepts. Peano thought these primitive concepts were self-evident: the set of natural numbers, N; the number zero, a member of the set N; and the successor function S. This successor function can be applied to any natural number, and it will yield the next number after it. For example, S(6) = 7. Likewise, S(0) = 1. From here, Peano laid down several axioms:

Copyright © 2018. Taylor & Francis Group. All rights reserved.

1. 2. 3. 4. 5.

Zero is a number. If n is a number, then S(n) is a number. Zero is not the successor of a number. Distinct natural numbers have distinct successors. If 0 is an element in a set of numbers and the successor of every number is in that set, then every number is in that set.

Given these axioms, the basic properties of natural numbers could be described and theorems about them, including the arithmetic operations like addition and subtraction, could be deduced. To take a simple example, the supposition that there is a number preceding zero (S(k) = 0) would contradict axiom 3. Accordingly, the theorem that zero has no predecessor in N can be derived from axiom 3.

EXERCISES 4.13 Summarize the H-D method. How does this method relate to hypotheses? How does it relate to deductive reasoning? What’s the crucial difference between refutation and confirmation?

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

Patterns of Inference

149

4.14 There are supposed to be two applications of deductive inference in each H-D refutation. (a) What are those two deductive inferences, and how is each related to how we have characterized hypothesis-testing in general? (b) Define the Duhem-Quine problem. Which application(s) of deductive inference does this problem interfere with? 4.15 Return to the description of Semmelweis’s investigation of puerperal fever. Identify three inferences that can be described as uses of the H-D method (either refutation or confirmation). For each, write out the inference as an argument in standard form with premises and conclusion. 4.16 After reading the passage below, identify the hypothesis under investigation. What would the researchers expect to find if the hypothesis is true? Finally, list five important auxiliary assumptions required for a deductive inference from the hypothesis to the expectations.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

In an influential study published in 1979, the primatologist and psychologist, Woodruff & Premack examined whether chimpanzees can learn to deceive; specifically, to act with the intention of causing a person to hold a false belief about the location of food. In one of Woodruff & Premack’s tests, a chimpanzee could see the location of food, which was placed in one of two containers. However, because both containers were out of reach, the animal could only obtain the food from a human. The human didn’t know where the food was, but was instructed by the experimenters to search the container that the chimpanzee seemed to be indicating through its orientation or by pointing. In some trials of this experiment, the human was dressed in green and was cooperative: if they found the food, they gave it to the chimps. In other trials, the human was dressed in white and was competitive: if they found the food, they kept it for themselves. Thus, write Woodruff and Premack, ‘the chimpanzee’s success in procuring the goal depended upon his ability to convey accurate locational information to a cooperative partner on the one hand, and suppress or convey misleading information to a competitive individual on the other’. (Woodruff & Premack, 1979, p. 335) 4.17 Read the passage from exercise 4.16 above. Woodruff and Premack (1979) found that after 120 trials, each of the four chimpanzees they tested showed a reliable tendency to indicate the container with food in the presence of a cooperative human and an empty container in the presence of the competitive human. Say whether the hypothesis under investigation is confirmed, refuted, or neither by this evidence. Justify your claim. 4.18 Imagine you want to estimate a rock’s age using the technique of radiometric dating. This technique allows scientists to estimate age from the known decay rate of radioactive materials, given that traces of radioactive materials were incorporated when the rock was originally formed. (a) What are some auxiliary assumptions that you think are involved in a test like this? (b) You hypothesize that the rock is 3.8 billion years old, but the test results do not match your expectations. What are some possible reasons that you could have gotten this result, even if the rock is actually 3.8 billion years old? List at least three. 4.19 Describe the axiomatic method in your own words. How has this method been used in science?

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

150

4.3

Patterns of Inference

INDUCTIVE AND ABDUCTIVE REASONING

After reading this section, you should be able to do the following: • • • •

Define inductive inference, indicating how it differs from deductive inference Characterize the problem of induction Define abductive inference, indicating how it differs from deductive and inductive inference List the strengths and weaknesses of each of the three forms of inference: abductive, deductive, and inductive

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Flint’s Water Crisis It is sometimes said, even if only metaphorically, that the conclusions of deductive arguments are already contained in the premises. What this saying means is that the conclusion doesn’t add any new content beyond what the premises provided. This is a consequence of the premises’ truth guaranteeing the conclusion’s truth, the requirement for any valid deductive argument. Deductive inferences are thus non-ampliative; the conclusion cannot augment the content of the premises. The non-ampliative nature of deductive reasoning limits the usefulness of deductive inference patterns. Scientific and everyday reasoning are often not like this. Consider the water contamination crisis in Flint, which is a city of approximately 100,000 people in Michigan (USA). About 40% of the population of Flint fall below the poverty threshold, due in part to the downsizing of an automobile factory located there. Flint had received its water supply from neighboring Detroit, but in 2014, major budget deficits led the city council to change the city’s water supplier. The new supplier would provide water from the Flint River. The problem was that this river had been badly polluted for decades, and the environmental cleanup of its contaminants and toxic waste was improperly performed. Soon after the change in water source was made, bacteria and other contaminants were detected in the water. Residents were instructed to boil any water before drinking it, and water treatment changes were made. Levels of disinfectant by-products in violation of the Safe Drinking Water Act were found, followed by buildup of what is now known to be total trihalomethane (TTHM). Both of these have negative health effects, including increasing the risk of cancer. But this wasn’t the worst of it. Officials ignored federal environmental regulations that required treatment of the water supply system with anticorrosive chemicals. As a result, lead pipes in the system corroded, causing high levels of lead to leech into the water and soil. Lead is extremely toxic and can have massively adverse health effects in everyone but especially in children. Lead concentrations in children’s blood as low as five parts per billion (ppb) can result in decreased intelligence and behavioral and learning deficits. After anecdotal reports of pets and children becoming sick from the water supply, frustrated residents carried jugs of brown tap water to community meetings in early 2015. City officials affirmed that the water was safe. When concerned residents began demanding scientific testing of the water, this was shown to be false, and the magnitude of the problem was revealed.

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

Patterns of Inference

151

FLINT WATER CRISIS safe lead level

actionable lead level

actionable lead level

considered toxic waste

lowest lead level

average lead level

highest lead level

0

10

15

5,000

300

2,000

13,000 +

WHO standard

EPA standard

= 15 parts per billion (ppb)

April 2015 sampling at Walters’s Home in Flint Michigan, USA

(a) Flint Michigan water crisis, with numbers indicating parts per billion (ppb); (b) Lee Anne Walters, the Flint citizen-scientist who initially requested water-testing

Copyright © 2018. Taylor & Francis Group. All rights reserved.

FIGURE 4.3

An initial test conducted by the Environmental Protection Agency (EPA) in March 2015 showed lead levels at 104 ppb in the water of one home—seven times the legal limit in water. Levels of some other chemicals, such as iron, couldn’t be specified because they exceeded the measurement capabilities of the instruments. The next month, the lab team of Virginia Tech Professor Marc Edwards arrived to conduct more extensive testing. Lead levels at the same home at which the previous test had been conducted ranged, depending on water flow rate, from 217 ppb to a staggering 13,200 ppb. (Anything over 15 ppb is over the legal limit for water. According to the EPA, anything higher than 5,000 ppb is hazardous waste.) Across the 269 homes that were eventually sampled, 40% had elevated lead levels above the EPA’s recommended guidelines. The evidence suggested that Flint’s water supply is toxic. The Flint water crisis raises a number of interesting and difficult issues, including regarding how science relates to the public. For now, notice the general form of the inferences that were made in determining the problem, and the extent of the problem, with Flint’s water. The aim was to know something about the water quality across all houses in Flint, and the path to getting there started with testing water quality in one family’s house. A conclusion about the water quality in all houses in Flint can’t be truly deductive without exhaustive testing of every house. This isn’t the best way to proceed in the face of a public health emergency though. Like much of scientific reasoning, the scientists investigating Flint’s water instead proceeded inductively. Their conclusions were ampliative; they went beyond the sampling results.

Inductive Inference Imagine you go to the grocery store, hankering for some grapefruit. The grocer takes one grapefruit from the top of one box, cuts it open, and offers you a slice to taste. It tastes good! What you may not notice is that the grocer is tacitly expecting you to making the following inference:



1. 2.

One grapefruit from this box is good. All grapefruits from this box are good.

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

152

Patterns of Inference

You draw three other grapefruits from the box at random. The three grapefruits look good, like the grapefruit the grocer showed you. So, you buy that box of a dozen grapefruits. What’s the inference you’re tacitly making?



1. 2.

The three grapefruits picked at random from this box are good. The next nine grapefruits drawn from this box will also be good.

Neither of these is a valid deductive inference; the truth of the premises do not guarantee the truth of the conclusions. Assuming the premises are true, the conclusions are at best likely or probable. Accordingly, both inferences are inductive. An inference is inductive when the inferential relationship from premises to conclusion purports to be one of probability, not necessity. Even if the premise in each inference is true, the conclusion may nonetheless be false. Perhaps not all grapefruits from the box are good, even if the grapefruit the grocer showed you was good and even if you checked three other randomly picked grapefruits from the box. For all you know, the rest of the grapefruits in the box could be rotten. Because the truth of the premises in inductive arguments does not guarantee the truth of the conclusion, inductive inferences are always logically invalid. Nonetheless, reasoning inductively is a primary form of making inferences in science and everyday life. Two common forms of inductive inference are generalizations and projections. Inductive generalizations are inferences to a general conclusion about the properties of a class of objects based on the observation of some number of the objects in the class. If the conclusion applies to all members of the class, the generalization is a universal inductive generalization. The form of inductive generalizations is something like this: Inductive generalization ∴

1. 2.

O1, O2, O3 …, and On each has property P. All Os have property P.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

The grocer’s inference was like this; it went from the premise that one grapefruit from the box is good to the conclusion that all grapefruits are good. In contrast, inductive projections (sometimes called next-case induction) are inferences to a conclusion about the feature of some object that has not been observed based on the observation that some objects of the same kind have that feature. The form of inductive projections is something like this: Inductive projection ∴

1. 2.

O1, O2, O3, …, and On each have been observed to have property P. The next observed O will have property P.

Your argument at the market was like this; it went from the premise that each of three grapefruits you observed is good to the conclusion that the next nine grapefruits will be good. These two patterns of inference are similar. The difference is that generalization makes a prediction of some entire class of entities, whereas projection makes a prediction of entities that have not yet been encountered.

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

Patterns of Inference

153

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Characteristics of Inductive Arguments Inductive arguments have three characteristics that distinguish them from deductive arguments: they are ampliative, non-monotonic, and have different strengths. All forms of non-deductive inference have these features, so they’re also possessed by abductive arguments, introduced later in this section. We’ve already encountered the first of these characteristics. Ampliative inferences have conclusions containing information that in some sense goes beyond, that is, amplifies, the content present in the premises. Ampliative inferences enable us to extend beyond that which we already know. The conclusion that all grapefruits in the box are good, for example, goes beyond the information contained in the premise that three randomly picked grapefruits from the box are good. Likewise, the conclusion that Flint’s water supply in general is toxic goes beyond the evidence provided by the Virginia Tech samples. The consequence of ampliative inference, as we have seen, is that the conclusions of inductive inferences are not necessitated by the premises. An inductive inference may preserve truth, but it does not necessarily do so. Second, inductive arguments are also non-monotonic, which means that whether an inductive inference’s premises adequately support the conclusion may change with new information. Adding new premises to the existing premises of a non-monotonic inference with a true conclusion can render that conclusion false. For example, adding to either of the previous grapefruit arguments the new premise that one grapefruit from the box is rotten undermines the reasoning for the conclusion that all grapefruits from the box (or all remaining grapefruits) are good. In contrast, recall that deductive arguments are monotonic; no additional information can possibly render a valid deductive inference invalid. Scientists often face surprising findings that force them to adjust and update their ideas. This feature is captured well by inductive inference patterns because they are non-monotonic. The addition of new information can reveal how a good inference from true premises may nevertheless be wrong. For example, you may know that the smallpox (variola) virus was completely eradicated in 1979. That smallpox was eradicated doesn’t entail that the virus no longer exists, however. The World Health Organization (WHO) permits and oversees two vaults containing variola specimens: one at the Centers for Disease Control and Prevention in Atlanta, Georgia (USA), and the other at the State Research Centre of Virology and Biotechnology in Novosibirsk, Russia. It was thus reasonable to infer that there are no remaining variola specimens out in nature or otherwise outside of the control of WHO—at least until 2014, that is. Then, scientists stumbled upon some 60-year-old unsecured vials of smallpox while cleaning out a storage closet at the Bethesda campus of the National Institutes of Health in the USA. This discovery undermined the reasonable inductive inference from the eradication of smallpox and WHO’s strict control of remaining specimens to the conclusion that there were no other smallpox specimens unaccounted for. It was a good inductive argument—until a premise was added that directly contradicted its conclusion. Third, and last, inductive inferences are of different strengths. The conclusion that the grapefruits in the box are good would be stronger if the grocer had let you eat two grapefruits from the box and both tasted good. Similarly, the conclusion that all of Flint’s water is toxic was strengthened when the Virginia Tech team sampled water from nearly

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

154

Patterns of Inference

Copyright © 2018. Taylor & Francis Group. All rights reserved.

300 homes, compared to the earlier inference based only on problematic water samples from a single home. Good inductive inferences are strong, that is, likely to preserve truth. This means that true premises are grounds for inferring the conclusion is probably true as well. Deductive arguments are either valid or not, but this is not so for the strength of inductive arguments. Strength comes in degrees: two arguments might be strong, but one might be even stronger than the other. Further, any inductive argument, no matter how strong, can be additionally strengthened. The degree of strength of an inductive inference may be measured by the probability that the conclusion is true given that all the premises are true. Strong inductive inferences may nonetheless have false conclusions, as the smallpox example shows. To take a more famous example, until the 17th century, Europeans believed that all swan were white. Their belief was supported by strong evidence: no European had ever observed a black swan, and no one they’d ever consulted had either. However, in 1697, the Dutch explorer Willem de Vlamingh returned to Europe with two black swans he had captured on Australia’s Swan River. The strong inductive argument in favor of all swans being white was undermined by this development, and the conclusion was shown to be false. We discussed the hypothesis that all swans are white with the H-D model of hypothesis-testing. The point there was to show the deductive force of refutation, or falsification, in

FIGURE 4.4

The black swan of the family (Black Australian swan surrounded by Bewick’s swans)

© Copyright Colin Smith and licensed for reuse under this Creative Commons License.

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

Patterns of Inference

155

contrast to confirmation. The discussion of inductive inference here sheds additional light on the process of confirming hypotheses. Earlier, we merely pointed out that confirmation does not involve a valid deductive inference. What it does involve is inductive inference. From this perspective, inferring that a hypothesis is true from some observation(s) can be judged according to the inductive strength of the inference.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

The Problem of Induction Why do you look to the east if you want to see the morning sunrise at the horizon? Why do physicians continue to prescribe Tylenol to reduce fevers? At first glance, the answers to these questions might seem obvious. That’s where the Sun has always been on the horizon every morning, and that’s what we, or any other human, have always experienced. And Tylenol has reduced fevers almost without fail so far. These are very strong inductive inferences. And that’s fine, as far as it goes. But what justifies inductive inference in general? Well, you might say, inductive inference works pretty well! (Aside from that whole business of not all white swans being white, that is.) More seriously, we’ve said any inductive inference, no matter how strong, may be shown to be incorrect with the addition of new information. Nonetheless, inductive inference has led us to buy good grapefruit and other foods based on samples, to wear a coat when we leave the house in the winter, to look for the morning sun in the east, and to take Tylenol (or similar) as needed. It occasionally leads us astray, but by and large, inductive inference has a very good track record. The problem, however, is that this good track record can’t justify inductive inference. This reasoning is itself an instance of inductive inference: because inductive reasoning has guided us so well up to this point, we conclude that it will continue to do so. But this just prompts the same question: what justifies the inductive inference that inductive reasoning is justified? This is the problem of induction. This problem was set out in the 18th century by the Scottish philosopher David Hume (1711–1776). Hume (1748/1999) argued that the problem of induction cannot be solved. The argument goes as follows. Consider how we might justify inductive inference. There are two possibilities: either use deductive reasoning or use non-deductive reasoning. Because a strong inductive inference with true premises may still have a false conclusion, inductive inferences are invalid deductive inferences, so they cannot be justified using deduction. The only other option, then, is to justify inductive inference with non-deductive reasoning. But the claim that inductive inference is justified requires showing that it is generally reliable, which requires nothing other than inductive inference. So, looking to a non-deductive justification for inductive inference leads to circular reasoning: we would need to prove inductive inference is reliable in order to justify inductive inference. In other words, we would have to assume the reliability of the method whose reliability we need to establish. Consequently, inductive inferences cannot be justified using non-deductive reasoning, either. Given that deductive and nondeductive reasoning exhaust the possibilities, Hume concluded that inductive reasoning cannot be rationally justified. Hume also noted that the justification for induction appears to depend on what he called the uniformity of nature assumption. This is the idea that the natural world is sufficiently uniform, or unchanging, so that we are justified in thinking our future experiences

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

156

Patterns of Inference

will be consonant with our past experiences. The uniformity of nature assumption can’t justify induction either, though, since this assumption is merely based in our past experience. We think nature is uniform because, so far, it has been. But what do we know about tomorrow? Philosophers of science have proposed several solutions to the problem of induction. One possible solution begins from the observation that inductive inferences are intended to warrant probable conclusions—not guarantees. And there are rational grounds for inferring claims about the probability of something being the case on the basis of empirical evidence. Perhaps, then, tools of statistical reasoning, which we focus on in Chapters 5 and 6, can justify some varieties of inductive inference. And statistical reasoning does have a rational basis, provided by probability theory (an axiomatic theory). A different approach to solving the problem of induction is simply to show that inductive inference is the best we can reasonably hope for when it comes to making reliable predictions (Reichenbach, 1938). Either nature is uniform, or it isn’t. If nature is uniform and we want to make reliable predictions, then a non-inductive method like, say, fortune-telling may or may not work. In contrast, inductive inference will clearly work. (Remember the uniformity of nature assumption.) So, if nature is uniform, induction will be more reliable than non-inductive methods. Now suppose nature is not uniform. In that case, inductive inference will be unreliable, but so will any alternative methods. Why is that? Well, suppose that fortune-telling were better than induction, that is, that fortune-tellers were able to reliably predict the future. This success would imply some kind of uniformity. But any uniformity in nature can be exploited by inductive inference. You could, for example, inductively infer the future success of fortune-tellers from their past successes. Consequently, whether or not nature is uniform, the best approach one can take to making reliable inferences about the future or the unobserved is inductive inference. So, while the Duhem-Quine problem shows that deductive inference isn’t the full story for hypothesis-testing, the problem of induction indicates inductive inference probably isn’t the full story of scientific inference either. At the very least, both problems challenge us to think more deeply about our grounds for inference. In the case of induction, we’ll see in Chapter 6 how statistical inference may be able to support inductive reasoning and make more precise its nature.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Abductive Inference In 1915, the German scientist Alfred Wegener advanced a systematic proposal about the geologic history of Earth. He proposed that a single landmass, named Pangaea, had fragmented into the continents that we recognize today. Initially, Wegener’s hypothesis was not widely accepted. At the time, most scientists accepted that the Earth’s molten surface cooled billions of years ago and that the remnants of this cooling process are the major landmasses that we recognize today. There were good reasons to accept the hypothesis that, once encrusted, the Earth’s surface was relatively fixed and stable. But some surprising geological features were left unaccounted for. For instance, if the continents are fixed and stable and do not move, then the rough congruence of the shapes of some continents (think of Africa and South America) is a puzzling coincidence; see Figure 4.5. Further, some rocks that are now several thousands of kilometers apart have a variety of

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Patterns of Inference

157

(a) The Earth’s landmasses fit together a bit like puzzle pieces; (b) Marie Tharp and Bruce Heezen

FIGURE 4.5

characteristics in common. And fossils of some early types of plants and animals were distributed across continents. Wegener hypothesized that the continents are not fixed on the surface of the Earth but are instead very slowly drifting in relation to one another. If true, that hypothesis would account for the puzzling observations that lacked an explanation if the Earth’s landmasses are unchanging (see Wegener, 1929). In the 1950s, a few decades after Wegener’s initial

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

158

Patterns of Inference

proposal, the American geologists Marie Tharp and Bruce Heezen were working to map the ocean floor, when they made a fascinating discovery about the Mid-Atlantic Ridge, an extensive mountain range running the whole length (north to south) of the Atlantic Ocean, almost entirely underwater. They learned that at the top of that ridge, running its full length, was a valley, and many earthquakes originated in this valley. This, too, fit with the hypothesis of continental drift. They had, it seemed, discovered that the seafloor was spreading, further separating the landmasses on either side of the Atlantic Ocean, the edges of which were roughly congruent, a bit like puzzle pieces. Continental drift, if true, would account for all of this evidence. Like the shape of the continents on Earth, everything—all the observations—would then seem to fit together. Various other kinds of evidence came to light in investigations carried out in a diversity of fields of science, all supporting continental drift. Today, continental drift is part of the accepted theory of plate tectonics. What kind of inference pattern was used when scientists eventually reasoned, from a variety of evidence, that the hypothesis of continental drift was true? This is clearly an ampliative inference, in that it goes beyond what’s contained in the evidence. So, it’s not deductive reasoning. But this doesn’t correspond very well to the pattern we’ve seen of inductive inference either; it’s not a generalization or projection from an observation of a certain kind, like the quality of some grapefruit or water, to the expectation of more observations of that kind. There’s a bigger leap involved in the inference from premises about geologic features to the conclusion that landmasses have separated and moved apart over the course of Earth’s history. This is an abductive inference, a type of non-deductive inference that attributes special status to explanatory considerations. Abductive inference is also called inference to the best explanation. The conclusion is not validly deducible from the premises, nor is it a generalization or projection on the basis of the premises. Instead, in reasoning abductively to some conclusion, one considers whether or not the conclusion, if true, would best explain the premises. Suppose, for example, that you know your roommate Theresa had a serious accident yesterday while preparing dinner. This morning, you see her walking down the hallway with stitches in her hand. The best explanation for the stitches seems to be that Theresa was cut with a kitchen knife, and that, because of the severity of the cut, she sought medical attention. So, you hypothesize that Theresa accidentally cut herself and got the stitches from having gone to the local hospital. This conclusion might not be true. But if it were, it would account for the available evidence. Thus, abductive inference is characterized by an appeal to explanatory considerations to conclude that some hypothesis is true. Reasoning that corresponds to the form of abductive inference is quite common in both everyday reasoning and scientific contexts. Recall from earlier in this chapter Hubble’s reasoning from empirical data to the conclusion that the universe is more than 10 billion years old (now recognized to be at least 13.8 billion). Support for this hypothesis included that the universe being at least 10 billion years old best explained a rich body of other data. Among these data were the pattern Hubble observed in the redshift in the spectral lines of distant galaxies, observations about the life cycle of stars, and the observation of microwave cosmic background radiation. The hypothesis also coheres with fundamental theories of physics, like the theory of general relativity, as well as various dating methods in geochemistry. That agreement with other theories is also best explained by the truth

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

Patterns of Inference

159

of this hypothesis about the age of the universe. Abductive inference from observations and other scientific theories thus confirmed Hubble’s hypothesis. The Dutch mathematician and scientist Christian Huygens (1629–1695) said of abductive reasoning, One finds in this subject a kind of demonstration which does not carry with it so high a degree of certainty as that employed in geometry; and which differs distinctly from the method employed by geometers in that they prove their propositions by well-established and incontrovertible principles, while here principles are tested by the inferences which are derivable from them. The nature of the subject permits no other treatment. It is possible, however, in this way to establish a probability which is little short of certainty. This is the case when the consequences of the assumed principles are in perfect accord with the observed phenomena, and especially when these verifications are very numerous; but above all when one employs the hypothesis to predict new phenomena and finds his expectations realized. (1690/1989, p. 126) The principles Huygens discussed are hypotheses about the nature of light that could explain experimental results in optics. One way to interpret Huygens’s suggestion is that hypotheses that provide good explanations of these results are probably true. The rule of inference he is suggesting is something like the following: from a given set of observations, infer the best explanation of those observations.

What’s Distinctive about Abductive Inference Abductive arguments have a distinctive logical form. Abduction is, in a sense, reasoning backwards. The American philosopher Charles Sanders Peirce (1839–1914) characterized abduction in this way:

Copyright © 2018. Taylor & Francis Group. All rights reserved.

I call all such inference by the peculiar name, abduction, because its legitimacy depends upon altogether different principles from those of other kinds of inference. The form of inference is this: the surprising fact, C, is observed; but if A were true, C would be a matter of course, [and hence], there is reason to suspect that A is true. (1903/1940, p. 151) The logical form of abduction Peirce described is this:



1. 2. 3.

The surprising fact C is observed. If the hypothesis A were true, then C would be unsurprising. There is reason to believe that A is true.

Characterized in this way, abduction is similar to the deductively invalid inference of affirming the consequent. Recall that the conclusion A is not a valid deductive inference from the premises if A then C, and C. But notice this is the same pattern of inference in the preceding argument scheme. In fact, this argument scheme corresponds to the H-D

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

160

Patterns of Inference

method’s scheme for confirmation. Recall that the confirmation scheme was not deductively valid; the role for deductive inference was in the refutation scheme. There’s an extra element present in the abductive inference scheme characterized here though, beyond what’s contained in the pattern of affirming the consequent or H-D confirmation. This extra element is the reference to a level of surprise regarding the observed fact. An abductive argument can’t be used to infer that any antecedent is true simply from the fact that its consequent is. Instead, the idea is that if the antecedent accounts for a consequent that would otherwise be left unexplained, then this is grounds for believing the antecedent is true. The power of a hypothesis to explain what is otherwise unexplainable is a reason to infer it is (probably) true. Abductive inference thus differs from both deductive and inductive inference. Abduction looks like a kind of deductive inference, but it is deductively invalid. Like inductive inference, abductive reasoning is thus a form of non-deductive inference. It is thus ampliative and non-monotonic, and the quality of arguments is a matter of degree. But unlike induction, abductive inference does not generalize or project from what has been observed. The special weight abductive inference accords to explanatory considerations means that its conclusions are harder to predict from existing observations. It’s not clear how to characterize the idea of some hypothesis best explaining some set of observations. How should a hypothesis relate to the observations in order to explain them? Abductive inferences seem to rely on an inferential ‘leap’—a leap in the reasoning of one or more scientists having an ‘aha!’ moment, of seeing how some new idea about the world might explain otherwise puzzling observations. Scientists employing abductive inference in favor of a hypothesis need to hope that their audience grasps the connection, that their audience sees how the hypothesis accounts for the observations. It’s not clear whether there is anything definitive that can be said about what it takes for a hypothesis to accomplish that task. One suggestion is that a hypothesis best explains a set of observations if it predicts the observations, that is, if it shows why the observations were to be expected. By itself, this isn’t enough to make for a good explanation. Just saying that the observations in fact occurred is a way to make those observations unsurprising, but it doesn’t explain anything. Explanations must also have some other qualities. Perhaps explanations should also be simple, fit with other explanations we already accept, and generate new expectations for what we will observe. These qualities seem to make an abductively inferred hypothesis—a best explanation—enlightening, as well as a ‘bold and risky conjecture’. We have emphasized the value of the latter periodically throughout this book. Indeed, qualities like simplicity, coherence with other explanations, and fecundity of new ideas have been shown to play central roles in people’s assessment of explanatory goodness. Like inductive inferences, the goodness of abductive inferences comes in degrees. Given the difficulty of pinning down the definition of a best explanation, it’s worth considering what features of abductive inferences contribute to their strength. First, it seems the number and variety of surprising observations that a hypothesis explains contributes to its strength. The abductive inference to continental drift became stronger over the decades, as geological observations accumulated that would be expected if continental drift had occurred and that would be surprising otherwise. Second, the degree of an observation’s surprisingness and the degree to which the hypothesis dispels the surprisingness

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Patterns of Inference

161

contributes to the strength of abductive inference. The finding of a rift down the center of the Mid-Atlantic Range with significant seismic activity is pretty shocking without different parts of the Earth’s crust moving in (very slow) motion. Third, if appealing to features of the hypothesis like simplicity, coherence, and fecundity is the right way to characterize its value as an explanation, then the degree to which those features are possessed by the hypothesis contributes to the strength of the grounds for the inferential leap to the truth of this explanation. An amazing discovery in Morocco illustrates how scientists can appeal to a hypothesis’s explanatory virtues as evidence in support of the hypothesis. Before this discovery, fossils from Ethiopia were commonly regarded as the first anatomically modern humans, early representatives of our species Homo sapiens. These fossils indicated that humans evolved relatively quickly in a specific region of Africa about 200,000 years ago. The discovery of new fossils from an archeological site in Morocco, named Jebel Irhoud, challenged this conclusion (Hublin et al., 2017). In Jebel Irhoud, archeologists and evolutionary anthropologists found several specimens of stone tools and human bones, including a remarkably complete jaw and skull fragments. The researchers used dating techniques to determine that the remains were about 315,000 years old. If these were Homo sapiens, this would push back the origin of our species by about 100,000 years. This would also suggest that humans did not evolve only in eastern sub-Saharan Africa (modern Ethiopia) but in multiple locations across the African continent. The previously favored hypothesis that Homo sapiens evolved in eastern sub-Saharan Africa around 200,000 years ago could explain the findings at Jebel Irhoud as remains from some hominid species that lived prior to Homo sapiens, perhaps the Neanderthals. The Jebel Irhoud findings also prompted a new hypothesis, though: that the Homo sapiens species’ evolution was a pan-African process that occurred about 300,000 years ago. This new pan-Africa hypothesis was simpler than the previously favored hypothesis, as it doesn’t require positing an archaic hominid species in North Africa, later replaced by Homo sapiens. The pan-Africa hypothesis also cohered with archeological and anatomical observations about Neanderthals and Homo sapiens. For example, the teeth found in Jebel Irhoud better matched what would be expected for Homo sapiens than what would be expected for Neanderthals. The morphology of the skull was almost indistinguishable from that of anatomically modern humans. And the pan-Africa hypothesis is consistent with geographical and ecological evidence that the Sahara was green, filled with rivers, and hospitable around 300,000 years ago. Animals like gazelles and lions inhabiting the East African savanna then also populated the Saharan region and migrated to northwest Africa. In fact, remains of plants and animals indicate biological and environmental continuity between those regions. Finally, the pan-Africa hypothesis explained a greater number of diverse observations about human origins than the East Africa hypothesis, including the mix of anatomical features seen in the Jebel Irhoud remains and in other Homo sapiens–like fossils from elsewhere in Africa. It also better fits with genomic evidence collected in South Africa that seems to indicate that the lineage split between archaic hominid species and anatomically modern humans occurred more than 260,000 years ago. Explanatory considerations, including simplicity, coherence, and fecundity, thus favored the pan-Africa hypothesis. The researchers involved in the Jebel Irhoud discovery concluded that ‘the Garden of Eden in Africa is probably Africa—and it’s a big, big garden’ (Callaway, 2017).

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

162

Patterns of Inference

FIGURE 4.6

The pan-African dawn of Homo sapiens

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Testimony The testimony of others plays a central role in reasoning. Many of your beliefs actually originate from what other people think is true, and this is the same in science. Belief in others’ testimony is a key component involved in the system of trust and skepticism we’ve said is crucial for science. Suppose that you are a resident of Flint, Michigan. You attend a community meeting, where the governor, Rick Snyder, reports that the city water is safe to drink. To demonstrate this, he himself drinks some tap water. On the basis of this testimony, you infer that the water isn’t toxic. Later, you learn about the results of the water quality testing by the EPA and Virginia Tech scientists. This new information undermines your earlier inference on the basis of the governor’s testimony; you no longer believe Flint’s water is safe to drink. Later still, Virginia Tech scientist Marc Edwards reports that Flint’s water is getting better and is far less risky to drink if one uses a highgrade water filter. You are willing, again, to update your beliefs based on testimony. You probably wouldn’t take the governor’s word for it at this stage, but given Edwards’s role as an outside scientific investigator, you take his word for it. Because science is so collaborative, several scientists—sometimes even thousands of scientists, like at CERN or NASA—typically conduct research together. In these cases, they rely on the specific expertise and the honesty of collaborators. This trust in the testimony of collaborators—believing that collaborators are also operating, like yourself, under norms of sincerity and accuracy—is essential for many scientific projects. Reliance on the

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

Patterns of Inference

163

testimony of subjects is also essential in some disciplines, such as ethnography, where researchers doing fieldwork must rely on the word of locals to understand certain social or cultural practices. Trustworthy sources are thus essential to the validity of their research findings. Because people can lie, make up fantasies, or simply be wrong, reasoning from testimony is risky. This is certainly not deductive inference. But it does seem plausibly described as a form of abductive inference. When we believe a statement is true based on someone’s testimony, we do so because the truth of the statement is the best explanation for why the person would say it is so. This accounts for why you are inclined to believe the Virginia Tech scientist’s testimony about Flint’s water quality but not the governor’s. The best explanation for the governor’s claim of safety is that he wants to reassure the public and, perhaps, protect himself from any culpability. Thinking about inference from testimony as a kind of abductive inference might help distinguish the circumstances in which testimony provides sufficient grounds for belief from when it does not. Expertise about the topic of the claim means that someone is less likely to be wrong about the claim. The motivations of the person providing testimony can be taken into consideration to determine the likelihood of intentional deception. More generally, an assessment of a source’s credibility and the credibility of that person’s claim is essential in determining when testimony provides reason for belief—and when it should instead be regarded with skepticism.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

EXERCISES 4.20 Decide whether each of the following inferences is deductive, inductive, or abductive. Provide a justification for each of your answers. 1. Disorder in a system will increase, unless energy is expended. Your home is a system. So, disorder will increase in your home unless energy is expended. 2. The president says that human activities are not a cause of global warming. Therefore, human activities are not a cause of global warming. 3. There is no such thing as drought in Australia. The town of Darwin is in Australia. Therefore, the town of Darwin needn’t ever make plans to deal with drought. 4. Bread appears to grow mold more quickly in the bread bin than the fridge. Therefore, temperature determines the rate of mold growth. 5. Over two million people on Twitter say that aliens are coming to Earth, which is more than the number of people on Twitter who are not saying it. So, aliens are coming to Earth. 6. All mathematicians like math. Jun is a mathematician. Therefore, Jun loves math. 7. Gravity has always operated in the universe. So, gravity will continue to operate in the universe. 8. The weather forecast indicates that tomorrow will be sunny. So, tomorrow will be sunny. 9. My brother has black hair, as does my father. Therefore, everyone related to me has black hair. 10. The library has millions of books. I have a book in my hand, and I just left the library. Therefore, the book was borrowed from the library.

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

164

Patterns of Inference

4.21 Assess the quality of each of the arguments in 4.20, using the proper standard for its form (deductive, inductive, or abductive). Explain your reasoning. For any bad arguments, assess whether they would be better arguments of a different inferential pattern (inductive instead of deductive, for example). If so, reclassify those arguments to be of the pattern they are better at achieving. 4.22 Decide whether each of the following inferences is deductive, inductive, or abductive. If you aren’t 100% sure of your answer, you should also provide a justification for your decision. 1. Whenever it rains, the streets get wet. The streets are wet now. Therefore, it must have rained. 2. Of the students interviewed, 65% say that they prefer Italian to French wine. Therefore, all students prefer Italian wine. 3. A medical technology ought to be funded if it has been used successfully to treat patients. Adult stem cells have been used to treat patients successfully. Therefore, adult stem cell research and technology ought to be funded. 4. The murder weapon has Pat’s fingerprints on it. Therefore, Pat is the murderer. 5. Sociologists agree that global inequality has decreased because of economic liberalization in China and India. Therefore, it must be true that global inequality has decreased. 6. Studies found a strong correlation between IQ scores and language competence. Therefore, if a person has a high IQ score, that person has high linguistic competence. 7. The witness testified that a paisley yellow car caused the accident. Given how unmistakable paisley is, it’s very likely that a paisley yellow car did cause the accident. 8. These beans have been randomly selected from this 25-pound bag, and they are black. So, it is likely that all the beans from this bag are black. 9. The best explanation of the acquisition of language is that we possess an innate universal grammar. So we must possess an innate universal grammar. 10. Leaded gasoline and lead pipes were both used for a while but eventually discontinued. So, all lead products are toxic.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

4.23 Assess the quality of each of the arguments in 4.22, using the proper metric for its form (deductive, inductive, or abductive). Explain your reasoning. For any bad arguments, assess whether they would be better arguments of a different inferential pattern (inductive instead of deductive, for example). If so, reclassify those arguments to be of the pattern they are better at achieving. 4.24 Define deductive inference, inductive inference, and abductive inference in your own words, and give an example of each. 4.25 Consider each of the patterns of inference, deductive, inductive, and abductive, as an account of hypothesis-testing. For each account, describe what features of hypothesis-testing it captures well and at least one drawback or limitation it faces. 4.26 The conclusion that Flint’s water supply is toxic is based on substantial evidence. Nonetheless, the inference to that conclusion is non-monotonic. What kinds of new information could you learn that would undermine the inference to that conclusion? Give three examples. 4.27 We have said that good inductive arguments are strong, but we haven’t said much about what it takes for an inductive argument to count as strong. Consider what we’ve learned about inductive inference and examples of inductive inference we’ve

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

Patterns of Inference

165

encountered, as well as the features of experiments and other studies from Chapter 2. List at least three features an inductive argument could have that contribute positively to its strength. 4.28 Consider this passage from Darwin’s Origin of Species (1872: 421): It can hardly be supposed that a false theory would explain, in so satisfactory a manner as does the theory of natural selection, the several large classes of facts above specified. It has recently been objected that this is an unsafe method of arguing; but it is a method used in judging of the common events of life, and has often been used by the greatest natural philosophers. The undulatory theory of light has thus been arrived at; and the belief in the revolution of the earth on its own axis was until lately supported by hardly any direct evidence. It is no valid objection that science as yet throws no light on the far higher problem of the essence or origin of life. Who can explain what is the essence of the attraction of gravity? No one now objects to following out the results consequent on this unknown element of attraction; notwithstanding that Leibnitz formerly accused Newton of introducing ‘occult qualities and miracles into philosophy’. What ‘method of arguing’ do you think Darwin had in mind? What objections to this method of arguing does he consider, and how does he dispute those objections? 4.29 Thinking about whatever examples of science you want, from elsewhere in this book or other sources, come up with a clear instance of inductive inference. (This should be a more realistic example than grapefruit choosing.) Put the inference in standard argument form with numbered premises and a conclusion (as best you can), and then assess its strength. If you were a scientist focused on this inference, what kinds of steps could you carry out to additionally support the conclusion? 4.30 Thinking about whatever examples of science you want, from elsewhere in this book or other sources, come up with a clear instance of abductive inference. Why should this be viewed as an abductive inference? Assess the explanatory strength of the inference. If you were a scientist focused on this inference, what kinds of steps could you carry out to additionally support the conclusion?

Copyright © 2018. Taylor & Francis Group. All rights reserved.

4.31 Describe one instance in which you would, or in fact have, taken someone’s word for something, that is, used testimony as grounds for belief. Then, try to characterize this as an abductive inference. Is abductive inference a good way to think about this use of testimony? Why or why not? 4.32 Describe an instance in which you would not, or in fact have not, taken someone’s word for something, that is, used testimony as grounds for belief. What was different between this situation and the situation you described in your answer to 4.31? Does consideration of the features of good abductive inferences account for the difference? Why or why not?

FURTHER READING For a philosophical treatment of reasoning in general, see Harman, G. (2008). Change in view: Principles of reasoning. Cambridge: Cambridge University Press.

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

166

Patterns of Inference

Copyright © 2018. Taylor & Francis Group. All rights reserved.

For more on conditional reasoning, see Nickerson, R. (2015). Conditional reasoning: The unruly syntactics, semantics, thematics, and pragmatics of “if”. Oxford: Oxford University Press. For an in-depth summary of the Flint water crisis, see www.cnn.com/2016/03/04/us/flintwater-crisis-fast-facts/index.html For Hume’s problem of induction, see Hume, D. (1748/1999). An enquiry concerning human understanding, ed. T. L. Beauchamp. Oxford/New York: Oxford University Press. Sections 4–6. For a helpful guide to Hume’s problem, see Salmon, W. (1975). An encounter with David Hume. In J. Feinberg (Ed.), Reason and responsibility (3rd ed., pp. 245–263). Encino: Dickenson Publishing Co. For a different version of the problem of induction, see Chapter 3, entitled ‘The new riddle of induction’ Goodman, N. (1983). Fact, fiction and forecast. Cambridge: Harvard University Press. For more on abductive reasoning, see Lipton, P. (2003). Inference to the best explanation. New York: Routledge.

Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-24 00:11:18.

CHAPTER 5

Statistics and Probability

5.1

THE ROLES OF STATISTICS AND PROBABILITY

After reading this section, you should be able to do the following: • • •

Give three new examples of situations involving statistical reasoning and describe how statistical reasoning is involved in each Characterize the difference between descriptive and inferential statistics Define probability theory and say how it relates to statistics

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Statistical Thinking as a Pillar of Health, Wealth, and Happiness ‘Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write!’, the British writer H. G. Wells has been paraphrased as saying. Wells was right. Whether you realize it or not, statistics rules your life. When you are making a prediction about how long it takes to drive from your home to your weekend destination and whether the weekend trip will be rained out, when you are arguing with a friend about basketball teams or wondering how your grade compares to your classmates’ grades—in all these situations, you are using statistical tools and relying on statistical information. Statistical reasoning is an important part of making good decisions in everyday life. Statistical reasoning is also a staple of scientific inquiry. Statistics is related to several topics we’ve already discussed. Recall our description of one central recipe for scientific reasoning from Chapter 1: using a hypothesis to generate expected observations, testing the expectations against actual observations, and using actual observations (or data) to help decide whether the hypothesis is a good one. We’ve seen that determining exactly what expectations follow from a hypothesis can be tricky. This is especially the case whenever there is variation in how events unfold. Variation means that we get different results when we repeat measurements. As the study of variation is central to statistical reasoning, statistics provides tools to help determine what a hypothesis should lead us to expect. It thus contributes to hypothesis-testing in science. Because statistical reasoning can be construed as a kind of inductive reasoning, it also helps us extend from what we think we know about the world to make predictions that we’re less certain of—as when we predict the weather or driving time based on traffic conditions. All of this will become clearer as we dive into statistical reasoning in this and the next chapter. For now, simply notice that, if all of this is true, then it’s no exaggeration to say

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:28.

168

Statistics and Probability

that our health, wealth, and happiness all hinge upon understanding and communicating statistics. Up until the 18th century, the word statistics (from Latin for ‘state’) meant any data relevant to running a nation or country. These data included demographic and economic information relevant to the condition of the country—for example, about birth and death rates, individual and national wealth, and level of employment. Today, statistical reasoning is applied to virtually any kind of data—from data concerning the performance of basketball players to data about casinos, medical diagnoses, and issues of global importance like anthropogenic climate change. Consider these three scenarios: 1.

Arguing about basketball: Your friend says that LeBron James is a better basketball player than Michael Jordan. You disagree. You remind her that Jordan won the NBA Championship for the last time in 1997–1998, playing 82 games. Over that season, he scored a total of 2,357 points, rebounded 475 missed shots, and made 283 assists. His free throw percentage was .784. No way LeBron is better than MJ! Your friend responds that the first time LeBron won the NBA Championship in 2011–2012, he played 79 games, scoring a total of 2,111 points, rebounding 590 missed shots, and making 554 assists. LeBron’s free throw percentage was .759. And he’s a monster for stealing balls and blocking shots.

As this example shows, people often appeal to statistical evidence in sports, perhaps to support the claim that some sports player is best or to argue that some team is likely to win the next game. 2.

Playing roulette: Imagine that you’re at a casino in Monte Carlo, eager to play roulette. The wheel includes 37 colored and numbered pockets, of which 18 are black, 18 are red, and one is green. If you bet €10 on red, and the winning color is red, then you will win €20—and likewise if you bet €10 on black and black wins. Now, imagine that the winning color has been black for 26 times in a row. You might bet on red, reasoning that red should come up very soon since there have been so many black wins.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

You are making a prediction based on past occurrences, and your prediction is based on statistical reasoning. (This is also flawed statistical reasoning, as we’ll see.) 3.

A medical test: You have a sore throat, so you go to the doctor. The doctor examines your throat and calls for a ‘rapid strep test’. While you wait for her to return with the results, you ponder how you should react. What if she tells you the test was negative? Does this mean you don’t have strep throat? Not necessarily. It means there’s an approximately 95% chance that you don’t have strep throat. If you have all the symptoms of this illness though, your doctor may want to follow up with another test—a ‘strep culture’—to verify the negative result. There might be strep bacteria lurking there, undetected. What if your doctor tells you the test was positive? Then you can be pretty certain you do have strep bacteria in your throat. However, about 20% of people are carriers for strep. This means that even if strep bacteria are present, there’s a chance this isn’t the cause of your sore throat.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:28.

Statistics and Probability

169

The rapid strep test—like most medical tests—gives you statistical data. You and your doctor then need to decide how to interpret the data, whether they are evidence for a particular conclusion, and what steps to take next. Each of these three cases involves the collection, presentation, analysis, and interpretation of statistical information. Reasoning with statistical information is everywhere! Learning to reason better with statistics can thus help you make good decisions about questions concerning your health, wealth, and happiness—and basketball too.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Populations and Samples We have said that statistical reasoning is used in science to determine expectations from hypotheses and is a form of inductive reasoning. These and other uses of statistics stem from one core feature: statistics is exceedingly useful in managing and understanding variation. There is variation across most types of things: butterfly coloration, the severity of oil spills, how often people smoke cigarettes, and the grades of students, to name a few. How can one know what to expect, given all this variation? How much and in what ways will things vary, and in what ways will they stay the same? What are the meaningful patterns and regularities, and what is meaningless variation? These are all tasks that statistics helps scientists address. When scientists interpret their experimental results, they regularly need to discern between what we might call background variation and the variation between experimental and control groups due to an intervention. Did the students who attended the study session really perform better on the test, or was that just background variation due to chance differences between the students? Scientists also regularly need to generalize from the groups they’re familiar with to another group, a population. The term population is often used by statisticians, scientists, and others to refer to the target systems of experiments and studies. In statistics, a population is a large collection of things that share some characteristic. For example, the population of Indian people share a common geographic origin, literature, genetic heritage, and linguistic history. Some populations consist of people, but others consist of bacteria, stars, or more abstract objects like companies, households, homicides, and free-throws. As previously discussed, data are public recordings of observations, such as measurements, that are elicited from the real world and used to evaluate hypotheses about a target system. In most cases, it is impossible to collect data about each individual in a population of any of these kinds—think about surveying all the people in India on some question or how difficult it would be to collect data about all stars of the Milky Way galaxy. For this reason, scientists regularly obtain data about a subset of the population they are interested in. This subset is a sample of the population, and the data concerning individuals in this subset are sample data.

Descriptive and Inferential Statistics There are two main kinds of statistical reasoning. The first kind, which will be a main focus of this chapter, involves the use of statistics to describe features of data sets. This is descriptive statistics: summarizing, describing, and displaying data in a meaningful way. For example, finding a class’s average score on an exam or quiz is a common use of

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:28.

170

Statistics and Probability

descriptive statistics. Finding patterns in sets of data—averages, extent of variation, and trends over time, for example—and visually representing them are all forms of descriptive statistics. The second kind of statistical reasoning, which will be a main focus of the next chapter, involves the use of statistics to make inferences based on data. This is inferential statistics: using statistical reasoning to draw broader conclusions from data, such as the generalizations and projections discussed in Chapter 4. For example, from data about a sample of American citizens, inferential statistics can be used to estimate what proportion of Americans who smoke marijuana every day will develop one or more mental health problems. Likewise, test scores from a subset of students can be used to infer what distribution of scores we should expect in the class as a whole. Inferential statistics uses patterns in existing data sets to inform our expectations when we do not have data about a population or a system we want to learn about. With the help of inferential statistics, existing data sets can be used to make predictions about larger groups, different groups, and the future. The idea of inferring conclusions that go beyond what is already known calls to mind inductive reasoning. Recall from Chapter 4 that an inference is inductive when the nature of the relationship between the premises and conclusion is one of probability rather than necessity. Inferential statistics can be understood as a specific type of inductive inference, which is especially useful in the face of variation. An inductive inference counts as statistical when it uses the tools and follows the rules of statistics. There are two main uses of inferential statistics: (1) either inferring properties of a population based on sample data or inferring properties of a sample based on information about a population and (2) testing hypotheses about a population by performing an experiment or observational study on a sample. In this latter use of statistics, the sample often includes experimental and control groups. The role of these groups in experimentation was discussed in Chapter 2.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Statistics and Probability Theory Statistics is an important use of mathematics in science, and the mathematical power of statistics comes from probability theory. Probability theory is a mathematical theory that has been developed to deal with randomness—that is, with outcomes that are individually unpredictable but that behave in predictable ways over many occurrences. We have already said that statistics excels at dealing with variation. The trick behind this is thinking about variation as a kind of randomness. In the context of statistical reasoning, randomness does not mean haphazard or lacking aim or purpose. Instead, randomness is a measure of uncertainty of an outcome and applies to concepts of chance, probability, and information. The simplest examples of randomness are things like coin tosses and dice throws. In a normal roll of a standard die, you can’t possibly know whether you’ll roll a one, two, three, four, five, or six. But you do know that if you roll that die 500 times, or roll 500 dice, you probably won’t roll a six every time. The word probably is important there. Probability theory actually enables us to calculate what that probability is; it can tell us exactly how unlikely it is to roll a six 500 times in a row. These kinds of probability calculations are put to work in statistical reasoning. For example, suppose you use probability theory to work out the chance of all possible

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:28.

Statistics and Probability

171

different outcomes for 500 dice rolls. In working out these probabilities, you assume the die is fair; that is, you assume each possible outcome—one, two, three, four, five, six—is equally likely on each roll. If you then roll a die 500 times, and six comes up 200 times, you can use those probabilities to infer that this is a somewhat improbable, or unlikely, outcome. You can use statistics to decide, based on the level of improbability, whether something’s fishy—whether, perhaps, your die isn’t fair after all. So, statistical reasoning relies on mathematics and in particular on probability theory. But it doesn’t just boil down to running calculations. It is much more important to understand the meaning of the numbers, probabilities, and equations behind the statistics. Acquiring this understanding will help make you a stats-savvy person, someone who can critically examine claims based on statistical reasoning in science and in everyday life and who can better handle the barrage of statistical information that fills our lives. This will be our focus in this chapter and in Chapter 6 as well. In this chapter, we’ll work through some basic concepts of probability theory, and then discuss descriptive statistics. Then, in Chapter 6, we will turn our focus to inferential statistics.

EXERCISES

Copyright © 2018. Taylor & Francis Group. All rights reserved.

5.1 First, describe the difference between a sample and a population. Second, state whether the following statements refer to a sample or to a population: a. Researchers found that 2% of the Americans they interviewed believed they had seen a UFO. b. Based on their survey data, the researchers concluded that one in three of all car crashes in the country are linked to alcohol impairment. c. Two-thirds of the butterflies we observed were pink. d. After reading four essays, the teacher expects that 85% of the class will pass the exam. e. Twenty-five percent of the planets in the Solar System have no moons. f. More than one billion people in the world live on less than one dollar a day. 5.2 What is the difference between descriptive statistics and inferential statistics? Indicate whether each of the following statements is based on descriptive or inferential statistics, and explain why. a. As of 2017, the director Quentin Tarantino has received a total of two Academy Awards. b. Students with an undergraduate GPA of 3.00 are expected to have a starting salary of $30,000. c. In 2016, the population of São Paulo, Brazil, was 12,038,175. d. The mean grade in the class was B+. e. A study stated that British adults are nearly 12 kilograms (26 pounds) heavier now than they were in 1960. f. Economists say that mortgage rates may soon drop. g. The gross national income per capita in South Sudan in 2013 was $2. h. According to World Health Organization data published in 2015, life expectancy in Bangladesh is 71.8 years.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:28.

172

Statistics and Probability

5.3 Describe what probability theory is in your own words. Then, looking back at the definitions of descriptive statistics and inferential statistics, describe how you think statistics relies upon probability theory. Name three everyday situations where probability theory is used. 5.4 Find a news article or opinion column published in the past month that uses statistical reasoning of some kind. After citing the source, write a paragraph describing the following: a. The main point of the article or column b. What statistics are provided c. How the author makes use of statistics in his or her reasoning d. How good this use of statistical reasoning seems to be and why (or why not) 5.5 Statistical reasoning pervades our lives, often in ways we don’t realize. After reflecting on your daily routine, write out a list of 10 ways in which variation, statistical reasoning, and probability are part of that routine, either explicitly or implicitly.

5.2

BASIC PROBABILITY THEORY

After reading this section, you should be able to do the following: • • •

Define these seven terms: random variable, outcome space, mutually exclusive, collectively exhaustive, total probability, statistical independence, and conditional probability Calculate the probability of multiple outcomes occurring (together or individually) based on the probabilities of individual outcomes Calculate conditional probabilities

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Random Variables The number rolled on a die and whether a coin lands on heads or tails are both random variables. Random variables have different values that are individually unpredictable but predictable in the aggregate. You can’t predict whether a coin will land on heads or tails, but you can predict that lots of coin tosses will give you roughly equal numbers of heads and tails. The set of all values a random variable can have is called its outcome space, or sample space. Let’s work through these ideas using the simple coin-toss example. The random variable involved in a coin toss is the figure shown on the top of the coin. We can refer to this variable with a capital letter, say, X. The set of possible values of X—its outcome space—is heads and tails: these are all the values our random variable can possibly take. To distinguish the variable from its possible values, we will refer to the values of a random variable with small letters, in this case, say, h and t. We can now define the outcome space of a coin toss as follows:

X = {h; t} (The symbols ‘{’ and ‘}’ are curly braces, which is the conventional notation used to indicate a set, that is, any abstract grouping of items.)

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:28.

Statistics and Probability

173

Random variables are the building blocks of probability theory and, in turn, of statistical reasoning. Probabilistic reasoning begins with the observation of how probable it is for a random variable to take on any given value. For our coin-toss example, there’s a 100% chance that the coin lands on either heads or tails (since this is the whole outcome space). Probabilities vary between 0 (maximally improbable) and 1 (maximally probable), so we write this as follows:

Pr(X=h or t) = 1 No matter how many values a random variable can have, that whole set of values—its whole outcome space—has a probability of 1. This means it’s guaranteed that the variable will take on one of those values. The total probability of an outcome space is always 1. The outcomes in any outcome space have two important properties: they are mutually exclusive and collectively exhaustive. Mutually exclusive outcomes occur when no more than one of the outcomes can occur at any given time. On a single coin toss, you might get heads or tails, but you will never get both. Heads and tails are mutually exclusive outcomes. Collectively exhaustive outcomes occur when at least one of the outcomes must occur at any given time. For a successful coin toss, the coin must land heads up or tails up—there is no third option. This means that heads and tails are collectively exhaustive outcomes. Now, if the coin is fair, then the probability of the coin landing on heads will equal the probability of it landing on tails. That is, for a fair coin, Pr(X = h) = Pr(X = t). Since we already know the probability of the whole outcome space together is 1, and there are two equally probable outcomes in that outcome space, we can calculate that:

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Pr(X=h) = Pr(X=t) = 1⁄2, or .5, or 50% Because there are two equally probable outcomes, each outcome has a probability of ½ (.5 or 50%). That’s just the total probability for the outcome space (which is always 1) divided by the number of possible outcomes (which is two, in this case). To generalize, for any random variable with equally probable outcomes, the probability of one of those outcomes is one divided by the number of possible outcomes. So, for a fair, six-sided die, the probability of rolling any one number is one divided by six, or 1⁄6. A random variable that is not fair is biased in favor of one or more outcomes. This means one or more outcomes are more likely—have a higher probability of occurrence— than other outcomes. French roulette is fair, but American roulette is not—at least not in the statistical sense of fairness. This is because a French roulette wheel has 37 pockets, numbered zero through 36, whereas an American roulette wheel has 38, two of which are zeroes. In the latter case, the roulette is biased toward zero, because zero will occur more often than any other number on the wheel if we spin the roulette over and over again. More precisely, in American roulette, the probability of getting any number from one to 36 is 1⁄38, while the probability of getting zero is 2⁄38, or 1⁄19. There is another way in which a roulette, or any series of outcomes, might be unfair. A series of outcomes might have ‘memory’, in the sense that previous outcomes might influence future outcomes. In one of the scenarios described at the beginning of the chapter, we imagined a person who thought roulette wheels work in this way. This person

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:28.

174

Statistics and Probability

reasoned, ‘Red should come up very soon since there’s been so many black wins. So, I’ll bet on red!’ If the roulette were unfair because it had memory, this might be good reasoning: the roulette might change to red because there had been lots of black wins. But fair roulette spins have independent outcomes: the probability of each outcome is not influenced by past outcomes. So, in order to be fair, roulette spins and any other random variables must be independent of one another. To summarize, a random variable must be unbiased and its outcomes must be independent for the random variable to be fair. Coin tosses, dice throws, and French roulettes are all examples of fair random variables. Lots of random variables are unfair. For example, LeBron James’s free throw success is a random variable. Let’s call this variable Y. There are only two possible outcomes: LeBron either misses the free throw or scores. So, this random variable has an outcome space of Y = {miss; score}. So far, this is simple. The problem is that the chance of LeBron scoring versus missing is probably not 50⁄50. There is a bias in favor of the outcome of scoring; for LeBron James, this is more likely than missing. The outcomes might also fail to be independent: missing a shot might make LeBron more, or less, likely to score on the next free throw. It’s much more difficult to calculate probabilities for unfair variables like free throw success. So, for now, we’ll stick with fair random variables, like coin tosses and dice throws.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

The Addition Rule The larger the probability associated with an event, the more likely it is to occur. One assigns probability 1 to events that are guaranteed to happen or to statements that we are entirely certain are true. And we assign probability 0 to events that are guaranteed not to happen or to claims that are certainly false. For instance, the probability that you roll a seven on a single, regular die is 0. This cannot happen, since no side of the die shows seven dots or marks. Assuming you successfully roll the die, the probability that you roll some number between one and six is 1. Between 0 and 1, a higher number—a larger probability—means an outcome is more likely. For a fair die, rolling any number between one and six is equally probable. In our official notation, we could write this as Pr(D = 1) = Pr(D = 2) = . . . = 1⁄6. But we might also wonder about the probability of other possible outcomes. For example, on a single die roll, how probable is rolling an even number? How about an odd number? How about any number greater than one? These probabilities can be found using simple addition. Consider the example of rolling an even number. This can be expressed as: Pr(D = 2 or D = 4 or D = 6). We already know that each of those three outcomes has a probability of 1⁄6. The probability that any of those outcomes occurs on a given roll is just the probability of each outcome, all added up together as follows:

Pr(D=2 or D=4 or D=6) = Pr(D=2) + Pr(D=4) + Pr(D=6) = 1⁄6 + 1⁄6 + 1⁄6 = 3⁄6 = 1⁄2 Beware! Adding probabilities in this way only works for mutually exclusive outcomes. If we wanted to ask about the probability of rolling an even number or a five, we could

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:28.

Statistics and Probability

175

just add in another 1⁄6, yielding 4⁄6 or 2⁄3 as the probability. But this doesn’t work if someone asked us about the probability of rolling an even number or a six. Because six is one of the even numbers on the die, the outcomes of rolling a six and rolling an even number are not mutually exclusive. You can’t simply add up the different probabilities to find the answer. In the case of rolling an even number or a six, the probability is the same as it was for rolling an even number (since rolling a six is one way to roll an even number). The probability is still ½. This way of calculating probabilities is called the addition rule. This rule says that the probability that any of a series of outcomes will occur is the sum of their individual probabilities. It’s very important to ensure that the requirement of mutually exclusive outcomes is met. If not, addition will lead you astray.

The Multiplication Rule A different rule of probability uses multiplication to calculate the probability of all of a series of outcomes occurring. For example, what is the probability of rolling two sixes when you roll two dice? Put another way, this question is asking for the probability of rolling a six on one die and also rolling a six on a second die. The probability we are looking for is thus Pr(D1=6 and D2=6), where D1 and D2 are the two dice. Of course, there’s a 1⁄6 probability of a six for any given die roll. So:

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Pr(D1=6 and D2=6) = Pr(D1=6) × Pr(D2=6) = 1⁄6 × 1⁄6 = 1⁄36 The probability of 1⁄36 is a lot closer to zero than to 1⁄6. That’s why rolling two sixes or two ones—‘snake eyes’—is exciting. It seldom happens! Beware though! There’s an important condition for multiplying probabilities as well. They must satisfy the independence condition. This means that the probability of each outcome must be independent from one another. Each outcome must not influence the probability of the other outcomes. Think of it this way. If, instead of calculating the probability of rolling two sixes on two dice, we wanted to calculate the probability of rolling a six on one die roll but also a one on the very same die roll—Pr(D1 = 6 and D1 = 1)—we can’t just multiply 1⁄6 × 1⁄6. These outcomes aren’t independent. In fact, they are mutually exclusive: if one occurs, the other is guaranteed not to occur. This means the probability in question is maximally improbable: it’s zero. So, we can only use multiplication to find the probabilities of a series of outcomes all occurring if the outcomes in question are independent. According to the multiplication rule, the probability that all of a series of outcomes occurs is the result of multiplying their individual probabilities. Again, if the requirement of independent outcomes is not met, then multiplication will lead you astray. When two events are not independent, the probability that both happen depends on the nature of the connection between the events. Simple multiplication won’t work. Let’s take a moment to compare the multiplication rule with the addition rule. We saw that the addition rule is used to calculate the probability of any of a series of mutually exclusive outcomes occurring. You could ask about the probability of getting a six or a one on a given roll. (They have to be on the same roll to be mutually exclusive outcomes.) To calculate this, we would add 1⁄6 and 1⁄6 to get 2⁄6, or 1⁄3. The multiplication rule is instead used to calculate the probability of all of a series of independent outcomes

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:28.

176

Statistics and Probability

TABLE 5.1

Addition, multiplication, and subtraction rules and their conditions

Rule

Language

Function

Condition

Result

Addition rule

Any

Disjunction (or)

Mutually exclusive

Probability always increases

Multiplication rule

All

Conjunction (and)

Independent

Probability always decreases

Subtraction rule

Not

Negation (not)

Collectively exhaustive

Probability can be large or small

occurring. You could ask about the probability of getting a six on one roll and a one on a different roll. (They have to be different rolls or different dice to be independent outcomes.) To calculate this, we would multiply 1⁄6 and 1⁄6 to get 1⁄36. In these two examples, notice that the addition rule led to a larger probability (closer to 1) and the multiplication rule led to a smaller probability (closer to 0). This will always happen. Addition will always increase probability, and multiplication will always decrease probability. This is because probabilities are always positive numbers between 0 and 1, and multiplying two numbers in that range (such as two fractions) always yields a smaller number while adding two positive numbers of any kind always yields a larger number. This can provide a quick way to remember when to add and when to multiply. Do you expect the probability to get larger or smaller for the occurrence you’re calculating, compared to the outcomes that generate it? It’s easier (more probable) to get any of a one, two, or three on a die roll than each one of these numbers individually: use addition. Any, or, addition, and larger probabilities go together. And the outcomes linked with the word or need to be mutually exclusive. It’s harder (less probable) to get a six on all the first roll, second roll, and third roll than on a single roll: use multiplication. All, and, multiplication, and smaller probabilities go together. And the outcomes linked with the word and need to be independent. This is all summarized in Table 5.1.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

The Subtraction Rule Here’s one more mathematical relationship among probabilities. Recall that the total outcome space—all of the available possibilities—always has a probability of 1. The subtraction rule makes use of this fact: you can calculate the probability of some outcome by subtracting the probability of all other outcomes in the outcome space from 1 (the total probability). For example, what is the probability of getting anything but a two on a single die roll? The total probability is 1, and the probability of rolling a two is 1⁄6 (as it is for any other number from one to six). So, the probability of getting anything but two, or Pr(D = not 2), is:

Pr(D=not 2) = 1 − Pr(D=2) = 1 − 1⁄6 = 5⁄6

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:28.

Statistics and Probability

177

The subtraction rule is only for collectively exhaustive outcomes. Just as with the requirements placed on the addition and multiplication rules, the requirement of collectively exhaustive outcomes is crucial. This is what makes probabilities sum to 1. This requirement is most easily satisfied with the use of the word not—rolling a two and not rolling a two, rolling an even number and not rolling an even number, rolling two sixes in a row but not rolling two sixes in a row. Each of these pairs is collectively exhaustive; any possible outcome would fall in one or the other category. So, the main word to prompt you to use the subtraction rule is not, which is one way of guaranteeing collectively exhaustive outcomes. This is summarized in Table 5.1.

Conditional Probability

Copyright © 2018. Taylor & Francis Group. All rights reserved.

There’s one final probability concept we need to discuss: conditional probabilities. Sometimes it can be useful to know how the probability of some event changes in light of other events occurring. The conditional probability of an event is the probability of its occurrence given that some other event has occurred. In the notation we’ve been developing, we can write the conditional probability of a random variable Y taking the value y, given that a variable X takes the value x as Pr(Y=y | X=x). The symbol ‘|’ can be read as given that. Notice that, for two independent events, the conditional probability of one event given the other’s occurrence will be the same as the original probability of the event. Indeed, the concept of conditional probability enables us to more exactly articulate what independence amounts to. Two random variables X and Y are statistically independent when Pr(Y=y | X=x) = Pr(Y=y) and Pr(X=x | Y=y) = Pr(X=x). This means that the outcome x occurring doesn’t make the outcome y any more or less likely, and the outcome y occurring doesn’t make the outcome x any more or less likely. If an event y is not statistically independent from an event x, then the probability of y occurring goes up or down if x occurs. In extreme cases, one event can result in the probability of another event becoming 1 or 0. For example, the probability of a die roll resulting in an even number is ½. But the probability of an even number given that you roll a two is 1, since rolling a two is one way of rolling an even number. The probability of an even number given that you roll a three is 0, since three is odd. (In both cases, we’re assuming there’s only one roll of the die.) That is:

Pr(D1=2 or 4 or 6 | D1=2) = 1 Pr(D1=2 or 4 or 6 | D1=3) = 0 In other cases, the statistical dependence is subtler. The probability of an event might be raised or lowered by the occurrence of another event, but not all the way to 0 or 1. Consider again the probability of getting two sixes when two dice are rolled, which we previously calculated to be 1⁄36. We can also ask what the probability of getting two sixes on two rolls is, given that the first roll yielded a six. The chance of getting two sixes has gone up if one roll is a six, but it’s still not guaranteed.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:28.

178

Statistics and Probability

Figuring out the conditional probability in cases like this one requires calculation. For x and y, the values of two random variables, the probability of y occurring given that the other event x occurs can be calculated using the following conditional probability formula:

Pr(Y=y|X=x) = Pr(Y=y&X=x) / Pr(X=x) This calculation only works when the probability of x is greater than 0. Think of this formula as a two-step procedure for finding the probability of y given x. First, you limit your attention only to cases when x occurs. This is the role of Pr(X = x) as the denominator (the bottom) of the equation. Second, you look within those cases of x occurrences for occurrences of y. This is the role of Pr(Y = y & X = x) as the numerator (the top) of the equation. The basic idea is that if the outcomes are restricted to only those cases when x occurs, this becomes the new outcome space for the variable Y. Let’s try this out to find the probability of getting two sixes in two dice rolls, given that the first roll is a six. (To make the scenario more intuitive, perhaps imagine that you decide to roll the dice one at a time and you’ve rolled the first but not yet the second.) Plugging this example into the formula gives us:

Pr(D1=6 & D2=6|D1=6) = Pr((D1=6 & D2=6) & D1=6) / Pr(D1=6) Before moving on, take a moment to figure out why this equation is the right version of the formula for calculating conditional probabilities. We can solve this equation by plugging in the probabilities we already know and doing some simple math. Notice that Pr(D1 = 6 & D2 = 6) and Pr((D1 = 6 & D2 = 6) & D1 = 6) will be the same probability; in the second, D1 = 6 is just listed twice. The reason why it shows up twice is because the first roll had to be six (D1 = 6) in order for it to be possible for both rolls to be sixes. So, plugging in the probabilities:

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Pr(D1=6 & D2=6|D1=6) = (1⁄36) / (1⁄6) = 6⁄36 = 1⁄6 One nice thing about starting with this simple example is that we can check the answer. What is the probability of rolling two sixes given that you’ve already rolled one six? This is the same as the probability of getting a six on one roll, since that’s exactly what needs to happen if you are to get two sixes, given that you already have one six. And we know the probability of getting a six—or any other number, one through six—in a single die roll is 1⁄6. So, our calculation of the conditional probability gave us the right answer. Let’s try our hand at finding a slightly more difficult conditional probability for dice throws. What’s the probability that you roll a number that is less than four on a die throw, given that you roll an odd number on that throw? This is the same as asking about the probability of rolling a one, two, or three (the outcomes less than four) given that you roll a one, three, or five (the odd outcomes). Applying our conditional probability formula, this yields:

Pr(D=1 or 2 or 3|D=1 or 3 or 5) = Pr((D=1 or 2 or 3) & D=1 or 3 or 5) / Pr(D=1 or 3 or 5) Notice that the probability of rolling a one, two, or three and rolling a one, three, or five is the same as the probability of rolling a one or three. Why? Because those

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-25 02:23:28.

Statistics and Probability

179

Pr(D Pr(H), then we say that the observation O confirms hypothesis H. That is, an observation confirms a hypothesis if the probability of the hypothesis, a rational degree of belief that the hypothesis in question is true, goes up once the observation has been made. So, comparing the prior and posterior probabilities shows us whether an observation confirms or disconfirms a hypothesis and by how much. A big increase in probability implies a large degree of confirmation, and a small increase implies a small degree of confirmation; a big decrease in probability implies a large degree of disconfirmation, and a small decrease implies a small degree of disconfirmation.

Comparing Support for Different Hypotheses The core of Bayesian statistics we just described can be used in numerous ways in statistical hypothesis-testing. Let’s now turn to two main ones. First, Bayesian statistics can be used to calculate the degree to which some observation or data set favors one hypothesis over another. In theory, posterior probabilities can be calculated for any number of hypotheses from the same observation, taking into account the prior probabilities of the various hypotheses and the probability of the observation given each of the various hypotheses. These posterior probabilities can then be compared with one another to find which hypothesis is more likely, taking into account

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:03.

236

Statistical Inference

Copyright © 2018. Taylor & Francis Group. All rights reserved.

the observation that’s been made. Unlike classical statistics, this provides a comparative approach to hypothesis-testing. Another approach is a kind of shortcut. This approach is to compare not posterior probabilities of different hypotheses but the probability of the observation given each hypothesis, or Pr(O|H1) versus Pr(O|H2). These probabilities are usually easier to find than posterior probabilities. Consider this example. Imagine that Lasha and Janine are interested in public opinion about the theory of evolution. Based on their separate research, Lasha believes that 70% of the public is convinced by the theory of evolution, while Janine believes that 60% of the public is convinced. They decide to query 100 randomly selected people about their opinions. Based on their different hypotheses, Lasha and Janine can predict what they will observe: Lasha predicts that about 70 / 100 will say they believe the theory of evolution; Janine’s prediction puts that number at about 60 / 100. In fact, using tools of inferential statistics described earlier in this chapter, we can find the probability distribution each predicts for this random sample of 100 people. As it turns out, of the 100 people in the sample, 62 said they are convinced by the theory. According to the probability distribution based on Lasha’s hypothesis of 70% belief in evolution, this observation has a probability of .02, that is, Pr(O|H1) = .02. According to the probability distribution based on Janine’s hypothesis of 60% belief in evolution, this observation has a probability of .08, that is, Pr(O|H2) = .08. An observation favors one hypothesis over a second hypothesis to the degree that the first hypothesis predicts the observation better than the other hypothesis. Given Janine’s hypothesis, the observed result is much more likely than it is given Lasha’s hypothesis. This can be expressed numerically with the Bayes factor, which is the ratio of the probability of the observation given the first hypothesis to the probability of the observation given the second hypothesis, that is Pr(O|H1) / Pr(O|H2). The Bayes factor expresses the discriminatory power of the evidence with respect to the two hypotheses. In this case, the Bayes factor is .08 / .02, or 4. This means that the result of the survey favors Janine’s hypothesis over Lasha’s by a factor of four. Here’s a shorthand method for calculating the Bayes factor in circumstances like this (random sampling, independent outcomes, and different hypotheses about the distribution of the values of a binomial random variable). If Lasha’s hypothesis is right, each individual has a 0.7 probability of saying he or she believes the theory of evolution; if Janine’s hypothesis is right, each individual has a 0.6 probability of saying he or she believes the theory of evolution. The Bayes factor can be found by comparing these probabilities. In particular:

[Pr(yes|H1)(# of yes answers) × (Pr(no|H1)(# of no answers)] / [Pr(yes|H2)(# of yes answers) × (Pr(no|H2)(# of no answers)] In this case, this is [(.762) × (.338)] / [(.6562) × (.3538)], which is 4.

Bayesian Belief Updating A second use of Bayesian statistics is to guide how our beliefs should be updated in light of new evidence. This is simple, assuming we can calculate posterior probabilities. The rule is called Bayesian conditionalization, and it states that the new degree of belief in a hypothesis H ought to be equal to the posterior probability of H: Prnew(H) = Pr(H|O). Every time we make observations, we learn something new, so we should update our beliefs accordingly.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:03.

Statistical Inference

237

Bayesian conditionalization shows us how. These updated beliefs are our new prior probabilities for hypotheses, which are then the basis for assessing how to respond to the next observation. There’s even a slogan for this: ‘Today’s posteriors are tomorrow’s priors’. Here’s an example. Around age 40, most women undergo routine mammography screening. Mammograms are x-ray photographs of the breast tissue, which can be used to screen for breast cancer in women who otherwise have no signs or symptoms of the disease. Suppose you are a doctor and that one of your patients is a 50-year-old woman with no symptoms who is participating in routine mammography screening. She tests positive. She is alarmed for obvious reasons and immediately wants to know from the doctor—you—whether she has breast cancer. You can’t tell her that without more testing, but you can tell your patient the probability that she has breast cancer given the positive test result and the probability that the result was a false positive. That is, you can calculate Pr(H1|O) and Pr(H2|O), where the first hypothesis is that she has breast cancer and the second hypothesis is that it was a false positive. You need three pieces of information for the calculation: 1. 2. 3.

The probability that a 50-year-old woman has breast cancer is around 1%. If a woman has breast cancer, the probability that she tests positive is around 90%. If a woman does not have breast cancer, the probability that she tests positive anyway is around 9%.

Given this data set (which is always being updated; visit for current statistics), how should you answer the patient’s questions in light of the screening results? Here, again, is Bayes’s theorem: Pr(H|O) = Pr(O|H)Pr(H) / Pr(O). This theorem can be rewritten in a form that’s easier for the task at hand:

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Pr(H|O) = Pr(O|H)P(H) / [Pr(O|H)P(H) + Pr(O|not-H)Pr(not-H)] Pr(O|H) and Pr(O|not-H) are the probabilities of the observation, given a specific hypothesis and the negation of that hypothesis, just like the alternative and null hypotheses. We have this in this example. The two hypotheses under consideration are that your patient has breast cancer (H1) and that the test was a false positive (H2), which is another way of saying that your patient doesn’t have breast cancer. This version of Bayes’s theorem simplifies the calculation by eliminating the need to find Pr(O), the overall probability of the observation. Your patient is looking for Pr(H1|O) and Pr(H2|O), so we’ll need to use Bayes’s theorem on each hypothesis. To start, for each, we need to find the prior probability of the hypothesis in question and the probability of the observation given the hypothesis. From these numbers, we can calculate the posterior probability of each hypothesis, given the observation of the positive test result. For the first hypothesis, that your patient has breast cancer, the prior probability, Pr(H1), is the rate of breast cancer in the general population (#1 above). Before the exam, the rational degree of belief in the hypothesis that your patient has breast cancer is just the disease’s incidence in the population, so Pr(H1) = .01. The likelihood of the positive test result given the first hypothesis (that is, if it’s true that your patient has cancer) is 90% (#2 above). So, Pr(O|H1) = .90.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:03.

238

Statistical Inference

For the second hypothesis that the test was a false positive, the prior probability, Pr(H2), is the rate in the general population of not having breast cancer, which is 100% of the population minus the 1% that does have breast cancer, or 99%. So, Pr(H2) = .99. The likelihood of the positive test result given the hypothesis of a false positive is 9% (#3 above). So, Pr(O|H2) = .09. Now we can calculate both Pr(H1|O) and Pr(H2|O):

Pr(H1|O) = Pr(O|H1)P(H1) / [Pr(O|H1)P(H1) + Pr(O|H2)Pr(H2)] = (.90 × .01) / [(.90 × .01) + (.09 × .99)] = .009 / (.009 + .0891) = .009 / .0981 = .0917 Pr(H2|O) = Pr(O|H2)P(H2) / [Pr(O|H2)P(H2) + Pr(O|H1)Pr(H1)] = (.09 × .99) / [(.09 × .99) + (.90 × .01)] = .0891 / (.0891 + .009) = .0891 / .0981 = .908 We are imagining that your patient has just tested positive for breast cancer. We have found that, given this positive result, she has a .0917, or 9.17%, chance of having breast cancer and a .908, or 90.8%, chance of getting a false positive on the test. It’s true your patient should be concerned; her chance of breast cancer just increased from 1% to more than 9%. But she shouldn’t be as concerned as she no doubt is: there’s no guarantee she has breast cancer, and in fact, there’s over a 90% chance that she does not.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Problems with Bayesian Inference In describing Bayesian statistics, we’ve also gestured at a few of its benefits over classical statistics. But Bayesianism faces its own problems. For one thing, it is often criticized for its apparent lack of objectivity. There’s often not enough information to have objective grounds for the prior probability—the probability of the hypothesis before data are gathered. In the mammogram/breast cancer example, we could calculate prior probabilities from well-established facts about the incidence of breast cancer in the general population and the commonness of false positives for mammograms. But that data is unavailable for many hypotheses. Recalling the earlier example, where do Lasha and Janine get their different ideas about the percentage of people convinced by evolutionary theory? Without clear, objective information to guide the selection of prior probabilities, individual biases and subjective values can find their way in. This is a problem because prior probabilities influence posterior probabilities, and so subjective starting points can find their way even into conclusions based on data. This possibility seems to undermine the objectivity of Bayesian reasoning. This is perhaps the main challenge facing Bayesian statistics, and it’s received a lot of attention. Some responses to this challenge about subjectivity have been to develop rules for how prior probabilities should be established. A different kind of response is to argue that the variability in prior probabilities is a good thing. Different people often have different background beliefs, and one might think these different background beliefs should be taken into account. Different choices of priors make it transparent how two scientists’ judgments differ. So instead of lurking in the background, with unclear influence on science, different background beliefs and how they influence scientific judgment are brought

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:03.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Statistical Inference

239

into explicit consideration by Bayesian statistics. What’s more, this transparency in prior beliefs enables rational disagreement. Scientists should be able to provide justification for particular choices of prior probabilities, articulating what sorts of theoretical or empirical considerations informed their choice. In this respect, Bayesian and classical statistics are in similar situations. When testing hypotheses or making general inferences, scientists using classical statistics must decide on sample size, which kind of statistical test to employ, and so forth. These decisions are also open to criticism, and scientists making these decisions should be able to justify them. However, some remain unconvinced by this argument for subjectivity. The choice of prior probabilities in Bayesian statistics is a kind of direct influence of background beliefs on scientists’ beliefs about hypotheses under investigation, which many scientists are uncomfortable with (Gelman & Hennig, 2017). And so far, no rule for how prior probabilities should be established is both broadly applicable and enjoys broad support. A second problem for Bayesian statistics is that it’s not obvious that Bayesian conditionalization, in which one updates one’s belief in accordance with posterior probabilities, is always the right thing to do. Some have suggested that abductive reasoning, or inference to the best explanation, is a better alternative. Recall from Chapter 4 that when people engage in abductive reasoning, they use explanatory considerations as evidence to support one hypothesis over others. You see cheese crumbs, small droppings, and some chewed-up paper, and so you might reason that a mouse resides in your kitchen. But does that inference follow Bayesian conditionalization? It’s not clear it does. The kind of work and reasoning performed by some scientists, such as paleontologists, is akin to CSI-style forensic work. They gather different pieces of evidence from several fields, and on the basis of that evidence and explanatory considerations, they weed out implausible hypotheses and develop the most plausible hypothesis about the distant past of life on Earth. Bayesian conditionalization may not capture this explanatory leap. There is no universal method for statistical inference. There are different approaches to classical statistics, an alternative framework of Bayesian statistics, and even different approaches to Bayesian statistics. All of these offer tools that scientists can use in hypothesis-testing, depending on the type of hypothesis to be tested, the type of experiment or observational study that will be run, and the nature of the relevant background knowledge. Bayesian statistics is perhaps a better guide to belief than classical statistics when prior probabilities can be reliably estimated, as with medical diagnoses based on epidemiological studies. The classical statistics method of hypothesis-testing we described earlier in the chapter is perhaps better when there is little background knowledge to draw upon or when scientists are unable to specify multiple alternative hypotheses. The statistical toolbox is large, with many different tools. It contains many forms of inferential statistics, which we’ve barely scratched the surface of in this chapter. It also contains descriptive statistics, as introduced in Chapter 5, as well as other tools we haven’t discussed. Statistics is not the mindless application of mathematical formulas but careful scientific work influenced by scientists’ aims and concerns, just as are other forms of scientific reasoning. And, as we emphasized at the outset of Chapter 5, statistical reasoning is an important form of literacy in today’s world.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:03.

240

Statistical Inference

EXERCISES 6.21 In your own words, describe three problems for the classical statistics approach to hypothesis-testing. 6.22 Write out the mathematical formula for Bayes’s theorem, then state what it means in your own words. Write out the definition of conditional probability from Chapter 5. Bayes’s theorem can be derived from this definition; describe anything you notice about how the two relate. 6.23 Describe two ways in which Bayes’s theorem can be used in inferential statistics, including what you can accomplish with each. Illustrate each of these two main uses of Bayes with a simple example (imaginary or real). 6.24 Suppose that you are being screened for a disease that affects about one person in 1,000. You have no symptoms, and the test is accurate 90% of the time. That is, if you actually have the disease, then the test result is positive with 90% probability, and if you do not actually have it, the test result is negative with 90% probability. After several anxious minutes, the test results come back: positive! How worried should you be? a. Find the prior probability of the hypothesis that you have the disease, the prior probability of the hypothesis that you don’t have the disease, the probability of the test result given the hypothesis that you have the disease, and the probability of the test result given the hypothesis that you don’t have the disease. b. Use Bayes’s theorem with these probabilities to calculate your chance of having the disease given your positive test result. Describe how concerned you think you should be in light of your positive test result. c. Consider that, out of 1,000 people, 100 will test positive. About how many of those people will actually have the disease? Does this consideration change your reasoning in (b)?

Copyright © 2018. Taylor & Francis Group. All rights reserved.

6.25 A small company has bought three software packages to solve an accounting problem. These packages are called Fog, Golem, and Pear. On first trials, Fog crashes 10% of the time, Golem 20% of the time, and Pear 30% of the time. Of 10 employees, six are assigned to Fog, three are assigned to Golem, and one is assigned to Pear. Jan was assigned a program at random. It crashed on the first trial. What is the probability that Jan was assigned Pear? You can answer this question by finding the posterior probability of Jan being assigned to Pear given that the program crashed from the prior probability of Jan being assigned to Pear and the overall chance of one of the three programs crashing. 6.26 Seamus and Amanda have different opinions regarding public support for a smoking ban in restaurants and pubs. Seamus believes that 75% of the people in town support the ban; Amanda thinks that only 50% support the ban. They decide to ask 100 randomly selected people; 65 are in support of the ban, 35 against it. a. Calculate the Bayes factor. Is Seamus’s hypothesis or Amanda’s hypothesis more favored by the data? b. How would the Bayes factor change if a single survey participant changed his or her opinion from a ‘yes’ to a ‘no,’ resulting in 64 in support of the ban and 36 opposed? Calculate Bayes factors for this alternative outcome.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:03.

Statistical Inference

c.

241

Do you find the change in Bayes factor in this alternative scenario surprising? Why or why not?

6.27 Imagine that you are a lawyer with a client who has been accused of committing a heinous crime. Your client’s DNA matches some of the traces found on the victim. This is the only piece of evidence against her, but it is a serious one. The court is told that the probability that this match occurred by chance is one in 100,000 (or 0.001%). Do you believe this proves your client is guilty? Why, or why not? (Hint: consider what the numbers mean in terms of frequencies. Out of every 100,000 people, one will show a match. If you live in a city with two million people, for example, how many will have DNA matching the trace on the victim?) 6.28 In your own words, describe (a) three different types of problems for the Bayesian approach to statistics and (b) three different advantages that the Bayesian has over the classical approach to statistical testing.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

FURTHER READING For a historically informed treatment of different approaches to statistical inference and the relationships among them, see Gigerenzer, G. (1993). The superego, the ego, and the id in statistical reasoning. In G. Keren & C. Lewis (eds.), A handbook for data analysis in the behavioral sciences: Methodological issues (pp. 313–339). Hillsdale: Erlbaum. For an example from economics of the difference between statistical significance and scientific significance, see McCloskey, D., & Ziliak, S. (1996). The standard error of regressions. Journal of Economic Literature, 34, 97–114. For more on Bayesianism, see Howson, C., & Urbach, P. (2006). Scientific reasoning: The Bayesian approach (3rd ed.). La Salle: Open Court. For a more concise overview, see Hartmann, S., & Sprenger, J. (2010). Bayesian epistemology. In S. Bernecker and D. Pritchard (eds.), Routledge companion to epistemology (pp. 609–620). London: Routledge. For more on the classical approach to statistical inference and a vigorous critique of the subjective Bayesian approach, see Mayo, D. G. (1996). Error and the growth of experimental knowledge. Chicago: University of Chicago Press. For an accessible treatment of an approach to statistics focused on the notions of effect sizes, confidence intervals, and meta-analysis, see Cumming, G. (2013). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York: Routledge.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:03.

CHAPTER 7

Causal Reasoning

7.1

WHAT IS CAUSATION?

After reading this section, you should be able to do the following: • • • • •

Describe the difficulty about causal claims that worried David Hume Give three reasons why correlation and probabilistic dependence don’t guarantee causation Describe the physical process and difference-making accounts of causation Indicate how each of the following informs the investigation of causal relationships: spatiotemporal contiguity, correlation, probabilistic dependence, causal background Analyze whether a cause is necessary, sufficient, or probabilistically related to an effect, and gauge the strength of a probabilistic causal relationship

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Does Fracking Cause Earthquakes? Hydraulic fracturing, or ‘fracking’, is used across North America and in several other places in the world to unearth natural energy resources. Fracking involves drilling into the earth, and then using high-pressure injections of sand and water treated with hydrochloric acid and other chemicals to break up shale formations. This produces small underground explosions, which result in the release of oil or gas for corporate capture. Over the last decade, concerns about fracking have increased. In part, this is because fracking is correlated with increased seismic activity, as well as environmental contamination, habitat loss, and damage to surrounding surface structures. In the US, for instance, Oklahoma has become infamous for a sudden increase in earthquakes thought to be linked to fracking. From 1978 to 1999, Oklahoma averaged approximately one earthquake of at least 3.0 magnitude per year. By 2009, that average had surged to over 20 such earthquakes per year—20 times the past rate. Since 2009, Oklahoma has experienced approximately 2,300 earthquakes of 3.0 or greater magnitude. It now has earthquakes more frequently than Pacific Ring of Fire states like California, historically known for their earthquakes. These dramatic numbers, shown in Figure 7.1, provide sufficient reason to rule out the possibility that this is merely an unlucky coincidence. Statistical hypothesis-testing would undoubtedly show the increase in seismic activity in Oklahoma to be due to something other than chance. But is the increase because of fracking or something else? Does fracking cause earthquakes?

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:17.

FIGURE 7.1

Annual seismic activity in Oklahoma 1978–2017

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Sources: USGS-NEIC ComCat & Oklahoma Geological Survey; Preliminary as of July 4, 2017

FIGURE 7.2

USGS map showing locations of wells related to seismic activity 2014–2015

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:17.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

244

Causal Reasoning

The answer may seem obvious. How could there be such a dramatic rise in earthquakes in Oklahoma if fracking weren’t the cause of them? Lobbyists and other advocates for the US oil and gas industry are quick to remind everyone that correlation does not guarantee causation. Just because fracking has increased and seismic activity has increased—that is, just because these are correlated—doesn’t mean that one caused the other. Some other unknown cause might be responsible for the earthquakes. However, while not all correlated types of events are causally related, correlation does raise the question and can even be the proverbial ‘smoking gun’ for causation. We need to look more closely to know whether fracking causes earthquakes or if there is an alternative explanation for the correlation. And, in fact, the answer is a bit subtle. There is evidence of some type of relationship between fracking in particular and the uptick in seismic activity in Oklahoma. But how are they related? Since 2009, most of the Oklahoma earthquakes have been located very close to fracking wells, which pump massive volumes of liquid up to the surface. The spatial correlation of wells and earthquakes adds some support for the idea that fracking is, in some sense, involved in the rising numbers of earthquakes. However, geologists, hydrologists, and the other scientists involved in the US Geological Survey—a federal agency devoted to the scientific study of the American landscape and the natural hazards that can threaten it—have concluded that earthquakes resulting directly from fracking tend to be relatively minor in Oklahoma. So fracking operations are highly correlated with the dramatic increase in seismic activity in Oklahoma, but they are not directly responsible for causing many of those earthquakes. Instead, wastewater injection from both fracking and non-fracking wells appears to be more directly responsible for the increase in earthquakes in Oklahoma. During hydraulic fracturing, some of what’s pumped up is oil, while some is the byproduct of fracking: salty, sandy, chemically treated wastewater. After capturing oil and gas, corporations inject large volumes of this wastewater back into disposal wells. Doing this raises the pressure within the pores of a hydrocarbon reservoir over large areas, which tends to shift subterranean stress. And shifting stress tends to destabilize preexisting faults. This subterranean stress from wastewater injections back into the Earth’s sedimentary formations has been implicated as one cause of seismic activity. The US Geological Survey results identifying wastewater injection as the primary cause of the significant uptick in Oklahoma earthquakes are fairly conclusive. Of course, energy industry operations involved in fracking are still the culprit. Even if fracking does not directly cause earthquakes, it is not causally unrelated either. Cease all fracking activity, and the volume of wastewater injected back into the earth will significantly diminish; so too will the risk of earthquakes. So while not solely to blame, fracking is one of several oil and gas operations that are together causing increased seismic activity. Over the last decade, Canada has seen a similar increase in seismic activity, which has also been tightly correlated in time and space with fracking. Researchers documented more than 900 seismic events near shale drilling sites in northwest Alberta and observed a pattern between the timing of fracking operations and the timing of earthquakes (Bao & Eaton, 2016). In this case, scientists found that both the increase in pressure during fracking operations and the increase in pressure from wastewater injection induced seismic activity. The triggers for induced seismicity in Alberta may be different from those in Oklahoma.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:17.

Causal Reasoning

245

Thus, existing evidence indicates that fracking does play some causal role in producing seismic activity but that the role it plays may be not be simple. Fracking’s causal role may be modulated by other factors, such as the local geology of Alberta versus Oklahoma, and the extent to which it plays that causal role may be changed by other causes of earthquakes, like wastewater injections back into the earth. It is an important task for seismologists, geologists, and hydrologists to clarify the complex web of causal relationships that lead to earthquakes. More generally, unravelling the causes underlying complex phenomena like polio or cholera epidemics, global warming, or economic crises is usually a tricky process; scientific investigation is our best hope for doing so.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Scientific Reasoning about Causes Scientific investigation of the causal consequences of fracking illustrates three general characteristics of causal reasoning in science and in everyday situations. First, causal relationships are learned on the basis of information about the timing, location, and frequency of events. We described how scientists considered the timing, location, and frequency of seismic activity to help discern fracking’s role. Correlation in time or in location can be suggestive of a causal relationship. If you get sleepy after lunch, perhaps eating lunch is the cause of your drowsiness. If you get a rash where your arm brushed against an unknown plant in your backyard, perhaps contact with that plant caused the rash. Similarly, an increase or decrease in the frequency of some type of event—like earthquakes—draws our attention to what else changed during that time. This is also a form of correlation—correlation in the frequency of two outcomes. (Recall our discussion of statistical correlation in Chapter 5; two variables or events are correlated when higher values of one are related to either higher or lower values of the other.) Second, testing causal hypotheses often involves doing something in the world, such as performing an intervention. Varying a suspected causal factor while leaving other factors unchanged can provide more insight into causal relationships than just observing a correlation in the frequency of two outcomes. For instance, scientists could test the hypothesis that wastewater injection is the main trigger of earthquakes by performing fracking operations but not injecting the wastewater into disposal wells. If seismic activity is significantly reduced, this suggests wastewater injection is largely to blame for the earthquakes in that region. If not, other possible causes, including fracking itself, should be considered. Third, causal reasoning has great practical significance. Knowing about causes is how we can make things happen—and prevent things from happening—in the world. Besides the effects of fracking on seismic activity and other features of our environment, causal reasoning is also crucial for inferring the effects of economic policies like tax rates, for inferring medical conditions from symptoms, and for establishing legal responsibility or liability, among many other things. Good causal reasoning thus can be an urgent matter of scientific and practical importance. It’s no wonder, then, that causal reasoning is a central feature of science. This chapter explores how the scientific tools encountered thus far in this book—especially experimentation and observational studies, modeling, and inference using logic, probability, and statistics—are used to identify causal relationships. This chapter thus refers to several ideas from earlier chapters that are helpful to clarify what’s involved in good causal reasoning.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:17.

246

Causal Reasoning

Skepticism about Causation Imagine the following scenario. You are playing a game of billiards at your local pool hall. You hit the cue ball, which then rolls across the felt and strikes the 8-ball, which is itself then set in motion. Did the cue ball cause the 8-ball to move? The answer seems obvious. What else could have possibly made the 8-ball move? There’s a worrisome issue here though—one that has some significance for science. The Scottish philosopher David Hume (1711–1776) challenged us to examine what our experience allows us to know about the nature of causal relationships. Hume argued that experience doesn’t tell us much. When your cue ball hit the 8-ball, you only saw a series of events, one after another. You saw the cue ball moving toward the 8-ball, the cue ball touching the 8-ball, and then the 8-ball itself moving. Where was the causation in all of that? What makes you think there is any ingredient above and beyond just a series of events? Hume agreed that we regularly experience a constant association between two events and their correlation in space and time. Cue balls hit 8-balls, earthquakes follow fracking operations, you experience drowsiness after lunch, and so on. Hume would not have denied that there is a regular association between such events. What he doubted was that there is anything more beyond just those events occurring together in a certain order; all you really discern when you perceive causation is constant association. Hume doubted that there was anything special to call causation (Hume, 1738). We’re not going to address Hume’s concern head-on. Instead, we are going to use this as a challenge for us to get very clear about the specifics of causal reasoning. Causal claims are important to science, and they have real practical significance as well. But it’s tricky to say what, exactly, causation amounts to. What is it, exactly, that you would look for to decide whether smoking causes cancer? There are smokers who are in excellent health, and there are people with cancer who have never smoked. And even when smokers develop cancer, how are we to say smoking was responsible for their cancer?

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Spatiotemporal Contiguity as a Guide to Causation The perception of causal relationships is a robust, automatic, and often reliable process. It was systematically investigated by the Belgian psychologist Albert Michotte (1881–1965) in the 1940s. Michotte’s experiments showed that it’s very hard not to perceive certain sequences of events as involving causation (Michotte, 1962). These experiments also show that causal perception depends on spatial and temporal information. If two events—for example, pressing a piano key and hearing B sharp—are spatially and temporally contiguous, that is, if they happen at the same time and place, then we perceive them as causally related without requiring repeated exposure to those events. When there is a spatial or a temporal gap between two events, we are much less likely to perceive the one event as causing the other. Although spatiotemporal cues can be an important element of the perception of causal relations, they are not always a reliable guide. Sometimes they mislead us. It can be mistaken to conclude that one event causes another simply because the events occur in succession close to each other. A child in Oklahoma might stamp her foot right before an earthquake, but we know the stamp couldn’t have caused the quake. The mistake of

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:17.

Causal Reasoning

247

reasoning from spatiotemporal succession to causation is named, from Latin, the post hoc, ergo propter hoc fallacy (‘after this, therefore because of this’). So, spatiotemporal contiguity doesn’t guarantee causation. It’s not necessary for causation either. Many causes are separated from their effects in time and even space. For instance, when you hang out with a friend who has the flu, you may begin to feel ill a few days later. In this case, your friend’s flu caused your own illness, despite an intervening delay. And when you play a video game, pressing buttons on a remote control causes changes in the game, even though the two events happen in different places. Indeed, many of the cause-effect relationships investigated in science, and important for everyday life, are spatiotemporally separate to some degree. Sometime the degree of temporal separation is used to distinguish among the causes of an event. Proximate causes are those that occurred more closely in time and place to the event that was caused, while distal causes occurred further back in time or place from their effects. For example, when asked about the cause of your illness, you may cite your friend’s recent case of the flu. Or you might instead reply that we’re in the midst of flu season, and this year’s seasonal flu has spread extensively. The former cause is proximate, the latter distal. As the fracking example illustrates, identifying a cause of some event doesn’t imply that you have identified the cause or have ruled out other causes. The distinction between proximate and distal causes shows that any event may have multiple causes. One way to think about this is in terms of ‘chains’ of causation—like neural firings that contracted the muscles, that moved the hand, that pushed the cue stick, that hit the cue ball, that hit the 8-ball into the corner pocket. Such causal chains go back and back and back. Another way to think about the multiple causes of some event is in terms of complex webs and networks: all the different factors that contributed to bringing about some outcome. The 8-ball’s moving was caused not only by my cue ball hitting it, but also by my choosing to go to the pool hall and picking up the cue stick, the 8-ball resting where it in fact was, the cue stick being chalked by the previous player, the billiard cloth having a certain smoothness, and so on. Whether you think of causal relationships in terms of chains or in terms of contributing factors, it seems clear that causal relationships are everywhere.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Correlation as a Guide to Causation Besides spatiotemporal cues, we also tend to use information about correlation between events to discern causal relations. Correlation is a measure of the association between two variables. If two variables are correlated, then they are not statistically independent; the variation in their values shows some trend. If the values of two variables are correlated, then we may wonder if one causes the other. For example, imagine you have always observed that whenever the price of a beer at your local pub is $5, there are fewer customers than when the price is $3. This is a correlation. You may wonder, based on this, whether the increased price of the beer decreases demand for beer. This is a causal claim. You may think this is so even if the timing doesn’t match; maybe customers only start to trail off a while after the price of beer has gone up. While correlation is a guide to causation, it’s an imperfect guide. Correlation can exist when causation does not. For one thing, correlation is symmetric: if an event A correlates with another event B, then B correlates with A as well. But causation isn’t symmetric. Having cancer correlates with death, and death correlates with having cancer,

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:17.

248

Causal Reasoning

but cancer causes death and not the other way around. In other cases, neither correlating event causes the other, but they share a common cause—a third event that causes both. Ice-cream consumption and homicide rates are famously correlated, but eating ice cream is not a cause of murder, nor does committing murder cause ice-cream eating. Instead, there is some evidence that hot days increase both ice-cream consumption and homicide rates. There are also spurious correlations, where two types of events happen to be correlated but are not related in any interesting way, causally or otherwise. For example, from 2000 to 2009, data from the US Dairy Association regarding per capita cheese consumption and data from the Centers for Disease Control regarding the numbers of people who died by becoming tangled in their bedsheets were highly correlated (see Figure 7.3), but obviously there’s no causal relationship connecting these variables. Causal relations between events can also exist even when they don’t seem to be correlated. Philosopher Nancy Cartwright has suggested the following example in which counterbalanced causal relationships cancel each other out. Smoking cigarettes is well established as a cause of heart disease. It’s also the case that adequate exercise prevents heart disease. If, for whatever reason, smoking is strongly correlated with exercise, then a well-established cause of heart disease will also be strongly correlated with its prevention, and smoking and heart disease will not generally correlate. But, smoking would remain a cause of heart disease (Cartwright, 1989). Here’s another example. Pregnancy is a cause of thrombosis, which involves blood clots forming inside blood vessels. Since taking contraceptive pills reduces the chance of pregnancy, one might hope that taking contraceptive pills indirectly prevents thrombosis. However, taking contraceptive pills is also a cause of thrombosis. So contraceptive pills prevent thrombosis by reducing the chance of pregnancy, while also causing thrombosis. If these opposed influences exactly cancelled each other out, then thrombosis and taking contraceptive pills will not exhibit a statistical correlation even though the two events are related causally (Hesslow, 1976).

Per capita cheese consumption correlates with

Number of people who died by becoming tangled in their bedsheets Cheese consumed

2001

2002

2003

2004

2005

2006

2007

2008

2009

33lbs

800 deaths

31.5lbs

600 deaths

30lbs

400 deaths

28.5lbs 2000

2001

2002

2003

2004

Bedsheet tanglings

2005

2006

2007

2008

2009

200 deaths

Bedsheet tanglings

Copyright © 2018. Taylor & Francis Group. All rights reserved.

2000

Cheese consumed tylervien.com

Visualization of the correlation between per capita consumption of cheese and number of people who died from getting tangled in their bedsheets

FIGURE 7.3

Reproduced under Creative Commons, .

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:17.

Causal Reasoning

249

So, while correlation is a guide to causation, causation doesn’t just boil down to correlation. There must be something more to causation, Hume’s skepticism notwithstanding.

The Nature of Causation: Difference-Making and Physical Processes Here are two ideas about what causal relationships are, beyond the mere correlation of types of events. One idea is that causal relationships are, at root, relationships of differencemaking. Put simply, if the occurrence of one event makes a difference to the occurrence of a second event, then the first event is a cause of the second event. If the billiard ball had not struck the 8-ball, then the 8-ball wouldn’t have moved. If the billiard ball had struck the 8-ball in a different place or at a different speed, then the 8-ball would have moved in a different direction or at a different speed. The billiard ball’s motion made a difference to the 8-ball’s motion. Thus, the billiard ball’s motion caused the 8-ball to move. This difference-making relationship is something beyond the mere correlation of events. A second idea about what causal relationships are, beyond the mere correlation of events, is that of a physical process. On this view, causation occurs when there is a continuous physical process connecting a cause to its effect, such as the transfer of energy. When the billiard ball knocked into the 8-ball, some of its kinetic energy transferred to the 8-ball, which is why the 8-ball started moving. This is a physical process connecting the billiard ball’s motion to the 8-ball’s motion. Thus, the billiard ball’s motion is a cause of the 8-ball’s motion. This physical process is something beyond the mere correlation of events.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Box 7.1 Counterfactual Statements and Difference-Making According to the difference-making account of causation, causes are those factors that make a difference to whether an effect happens or not. The idea of differencemaking can be made more precise with the help of counterfactual conditionals. Recall that a conditional is any ‘if/then’ statement. Counterfactual conditionals, statements like ‘if you had scored three goals, your team would have won the game’ have the form: If it were the case that C, then it would be the case that E. These are called counterfactuals because the antecedent of the conditional is contrary, or counter, to fact. For material conditionals, truth or falsity is simply determined by the truth or falsity of the antecedent and consequent. This isn’t so for counterfactual conditionals. You didn’t score three goals, but that doesn’t necessarily make it true that you would have won if you had. On a counterfactual approach to difference-making, to identify causes, you should check that the following two counterfactual conditionals are true: (I) If C had occurred, then E would have (probably) occurred. (II) If C had not occurred, then E would not have (probably) occurred.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:17.

250

Causal Reasoning

Physical process and difference-making accounts of causation may be compatible. Perhaps both physical processes and the ability to make a difference distinguish causal relationships from mere correlations. Perhaps, in the billiard ball case, the billiard ball’s motion makes a difference to the 8-ball’s motion precisely because of the transfer of kinetic energy from one to the other. However, some philosophers think that one of these, and not the other, is the right account of causation. Others think that causation might include numerous different kinds of relationships, including both of these and perhaps others as well. Each of these accounts is more useful for thinking about causal reasoning in different circumstances. For some causal claims, physical processes are difficult to track. Imagine you want to investigate the causal influence of the average values of homes on the stock market. How would you even start thinking about energy transfer or other physical processes based on the average value of homes? In contrast, it’s clear how to think about changes in average home value and whether those changes do or don’t make a difference to the stock market. For other causal claims, the idea of difference-making doesn’t apply very well. The moon orbits around the sun because of the curvature of space-time. (If you want, you can just think about this in terms of gravity.) How would you start thinking about space-time having a different curvature? (Or gravity not existing?) It’s a bit confusing; this seems like an ingredient of reality that can’t be changed. And without that, you can’t very well assess whether such a change would be a difference-maker. To sum up, difference-making and physical processes offer two ways to think about causation that go beyond mere correlation. These might be compatible accounts of causation, or one or the other might be better, or they each might be right in some circumstances. Regardless, we think these are both helpful ways to think about the nature of causation.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Necessary and Sufficient Causes Sometimes a cause is, by itself, enough to bring about an effect. To say ‘electrocution causes death’ is to say that electrocution is enough to cause death, although there are other ways of dying. Likewise, your reasoning about the increased price of beer driving down demand for beer (if accurate) cites one cause that suffices to bring about the effect. Increased price is one way to ensure fewer customers, but other things—like a snowstorm or holiday closure—might also lead to fewer customers. These are sufficient causes: the causal condition is enough to bring about the presumed effect, but that effect might sometimes occur because of some other cause. If the occurrence of a cause doesn’t guarantee the occurrence of the effect, then the cause is not a sufficient cause. Some causes are needed for an effect to occur but may not by themselves guarantee the effect. To say ‘oxygen causes combustion’ is to say that combustion never occurs without oxygen present, although oxygen is often present in the air without causing fires. This is a necessary cause: the causal condition must be present for the effect to occur, but the cause might sometimes occur without bringing about the effect. If the occurrence of a cause isn’t required for the occurrence of the effect, then the cause is not a necessary cause. So, sufficient causes guarantee their effects, while necessary causes are required for their effects. This should bring to mind the discussion of necessary and sufficient conditions from Chapter 4. It can be useful to keep in mind the difference between necessary and sufficient causes. Knowledge of sufficient causes empowers us to bring about desired effects.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:17.

Causal Reasoning

251

If we introduce the causes that are sufficient to bring about an effect, we’re guaranteed that effect will occur. To have healthy teeth, for example, it’s ordinarily sufficient to brush, floss, and visit the dentist regularly. Knowledge of necessary causes enables us to prevent some effects from happening. If we remove just one necessary cause, this will eliminate the effect. For example, spaying or neutering one’s pets prevents unwanted kittens or puppies, regardless of what other conditions occur. This is because intact reproductive systems are necessary for reproduction. And abstaining from excessive drinking prevents hangovers and liver cirrhosis, because significant alcohol consumption is necessary for both of these health conditions. Although there’s a useful distinction between necessary and sufficient causes, matters are often not so simple. For many putative necessary conditions, alternative causes can be found. For example, having sex is usually necessary for sexual reproduction, but it isn’t always; in vitro fertilization is an alternative. Likewise, for many putative sufficient causes, exceptions can be found too, when the cause doesn’t bring about the effect as expected. Raising the price of goods, like beer, does not always decrease demand. Sometimes instead, demand is sustained by institutions that regulate the market. The exceptions to sufficient and necessary causal relationships hint at the importance of background conditions for causal relationships, what we might call the causal background. The causal background of two events comprises all the other factors that actually do, or in principle might, causally influence these two events, thereby also potentially affecting the causal relationship between the two events. Oftentimes causal background is ignored when causal claims are made, but it’s actually crucial for causal relationships to occur as expected. Revisiting a couple of our previous examples shows that causes only count as sufficient or necessary assuming a given causal background. Brushing, flossing, and visiting the dentist regularly is sufficient to ensure healthy teeth if your dentist is qualified and (say) you haven’t already had all your teeth removed. And spaying and neutering one’s pets works to prevent unwanted kittens and puppies because pets having intact reproductive systems is necessary for new kittens or puppies if in vitro fertilization isn’t employed and no stray kittens and puppies show up at your house.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Causation and Probability This discussion of causal background suggests that causal relationships are seldom straightforward guarantees. They depend on the causal background and often in subtle ways. Some causal relationships may even have exceptions within a given causal background. Consider again the example of fracking causing earthquakes. If this is true in Alberta but not in Oklahoma, this may well be because of different causal backgrounds in those two locations, perhaps having to do with geological features. If fracking causes an earthquake at one site in Alberta but not at another, is this also due to different causal backgrounds in those two locations, or is it pure chance? Occurrences of a cause do not always lead to occurrences of its effect, either because causation itself is probabilistic or because causal backgrounds vary. Here’s another example. There are people who smoked two packs of cigarettes a day without ever getting cancer, even though smoking does cause cancer. Is this because smoking causes cancer probabilistically or because some feature of the causal background prevents some people from getting cancer? This is a matter of debate.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:17.

252

Causal Reasoning

A factor that increases the likelihood of an event occurring despite being neither necessary nor sufficient for the effect is called a contributing cause or partial cause. Contributing causes are much more common than truly necessary or sufficient causes. For this reason, it is useful to think about causation probabilistically. Usually, a cause raises the probability of its effect. This idea can be formalized in terms of conditional probabilities, which we discussed in Chapter 5. For a cause C and an effect E,

Pr(E|C) > Pr(E|not-C) The effect is (usually) more likely to occur if the cause occurs than if the cause doesn’t occur. This idea is deeply related to correlation as a guide to causation. The probabilistic relationship that generally holds between causes and their effects can also be exploited beyond the observation of correlations. Recall the differencemaking account of causation. Well, if researchers bring about some event and observe a resulting increase in the frequency of a different event, this is some evidence that the first causes the second. Even better if this intervention is carried out when extraneous variables are controlled directly or indirectly. This enables the causal background to be held fixed or to vary randomly, leaving the intervention on the suspected cause as the only difference between the circumstances in which the suspected effect does and doesn’t occur. This relates deeply to our discussion of experimental design in Chapter 2. If you suspect that playing video games causes violent behavior, you might ask one group of people to play several hours of video games and another group of people to do something else like read books, and then query them about their moods and dispositions afterward. If more video game players are agitated or aggressive or disposed to act violently, this may point at the video games as the culprit—the cause of violent behavior. Thinking about causation in terms of conditional probabilities also provides a way to define the strength of a causal relationship. If Pr(E|C) = 1 and Pr(E|not-C) = 0, then the cause is both necessary and sufficient for the effect, in any causal background(s) where this is true. When the cause occurs, so does the effect; when the cause is absent, so is the effect. For probabilistic causal relationships, the stronger they are, the closer they will be to this ideal. You can judge the strength of a causal relationship with the following calculation:

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Strength = Pr(E|C) − Pr(E|not-C) Notice that a necessary and sufficient cause will result in the maximum value of 1. If, at the other extreme, there is no difference in the probability of E when C is present (holding fixed the causal background), then the occurrence of C is causally irrelevant to the occurrence of E. For the video gaming and violence example, this would correspond to the finding that the experimental and control groups do not differ in their levels of violent behavior. The strength of most causal relationships is somewhere in between the two extremes of perfect guarantee and irrelevance. We have already discussed that causation doesn’t just boil down to correlation. The same goes for probabilistic dependence. Changes in causal backgrounds can interfere with probabilistic dependence. Beyond this, probabilistic dependence may change in different causal backgrounds and only hold in some causal backgrounds. Smoking may not raise

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:17.

Causal Reasoning

253

the probability of someone getting heart disease if the person also starts a serious exercise regime at the same time. And smoking does not raise the probability of someone getting cancer if the person already has cancer. Also, probabilistic dependence, like correlation, doesn’t distinguish among causes, effects, and events that are correlated but not causally related. All of these are reasons why causation isn’t just probabilistic dependence. These are also reasons for ensuring good experimental design when looking for probabilistic dependence. Intervention is a way of isolating the expected cause, which avoids mistaking an effect for a cause or events that share a common cause with cause and effect. And having a control group is a way of controlling for the influence of the causal background. These steps enable researchers to determine which events truly make a difference to the occurrence of other events.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Box 7.2 Simpson’s Paradox This paradox has nothing to do with Homer Simpson’s toast, ‘To alcohol! The cause of, and solution to, all of life’s problems’. Rather, it concerns how an aggregate statistical trend can differ from the individual trends that comprise it. In the 1970s, the University of California Berkeley was one of the first universities to be sued for sexual discrimination against women who had applied for admission to graduate school. In the fall of 1973, 12,763 people applied for admission; 44% of the men were admitted but only 35% of the women. There was a positive correlation between being a woman and being rejected from Berkeley’s graduate school in 1973. That is, Pr(rejected|woman) > Pr(rejected|man). For this reason, suspicions were raised of sexual discrimination. But when admission rates to individual programs were examined, the correlation between being a woman and being rejected vanished. It wasn’t the case, for any given program, that Pr(rejected|woman) > Pr(rejected|man). So women were overall less likely to be admitted to graduate school at Berkeley but not less likely to be admitted to individual programs at Berkeley for all programs. How could that be? It turns out that during that year, more women applied to competitive programs with low admission rates, whereas more men applied to less competitive programs with higher admission rates. The positive correlation between rejection and being a woman was thus due not to gender itself but to a correlation between gender and the competitiveness of the program applied to. This is an instance of Simpson’s paradox, described in 1951 by the British statistician Edward Simpson. Simpson’s paradox demonstrates the importance of considering the causal background. A correlation between two types of events can disappear, or be reversed, when data are grouped in a different way, because different groupings take into account different factors in the causal background (here: the competitiveness of different graduate programs).

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:17.

254

Causal Reasoning

EXERCISES 7.1 Describe Hume’s worry about causal reasoning in your own words. Evaluate the merits of his concern, taking into account the main points of discussion throughout this section. 7.2 Define correlation, and give three examples of events that you believe are correlated. 7.3 Describe how correlation and probabilistic dependence relate to causation. Give an example of a causal relationship that results in straightforward correlation, an example of a causal relationship that in some contexts does not seem to result in correlation, and an example of a correlation that is not due to a causal relationship. 7.4 Describe each of the following scenarios as a causal claim put in terms of differencemaking, and then as a causal claim put in terms of chains of physical processes. You might need to invent some details about these causal relationships to give a thorough answer—feel free to get creative. a. The high tide washing ocean debris up to a certain point on the beach b. Your pickup basketball team winning its game yesterday c. Smoking causing lung cancer 7.5 What do you think the advantages are to thinking about causation in terms of difference-making? How about the disadvantages? What are the advantages and disadvantages of thinking about causation in terms of physical processes? 7.6 Describe what each of the following is and how each informs, or is taken into account, in the investigation of causal relationships: spatiotemporal contiguity, correlation, probabilistic dependence, and causal background.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

7.7 Give a novel example of each of the following: a. A causal relationship that violates spatial contiguity b. Events at the same place that are not causally related c. A causal relationship that violates temporal contiguity d. Events at the same time that are not causally related e. A causal relationship and causal background in which the cause is not correlated with the effect f. Two correlated events that are not cause and effect 7.8 Define proximate causes and distal causes. Then, for each of the following events, describe a more proximate cause and a more distal cause. You might need to invent some details about these causal relationships to answer this question; feel free to be creative. a. The Titanic sinking b. Ruth leaving a tip after her meal at the restaurant c. A hurricane occurring 7.9 For each of the following pairs of events, say which is the cause and which is the effect. Then decide whether the cause is necessary or sufficient to bring about the effect. You can assume that the causal background is normal, that is, what things are usually like. a. Buying a lottery ticket, winning the lottery b. Attending a concert, buying a concert ticket c. Attending class, getting an A in the course d. Becoming an attorney, being accepted into law school e. Passing the bar exam, becoming an attorney

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:17.

Causal Reasoning

255

7.10 Write down the formula regarding conditional probabilities that gives the strength of causal relationships. Then, considering that formula, order the following causal relationships from strongest to weakest: a. Brushing your teeth, flossing, and visiting the dentist prevents cavities. b. Frequent smiling increases well-being. c. Eating pizza prevents getting the flu. d. Consuming anabolic steroids improves physical strength. e. An increase in the minimal wage produces higher attendance at football games. f. Warmer summers lead to longer periods of drought. 7.11 For each of the causal relationships in 7.10, name one feature of the causal background that would make the causal relationship stronger and one feature of the causal background that would make the causal relationship weaker. It might help to consider the conditional probability relationship that gives the strength of causal relationships.

7.2

TESTING CAUSAL HYPOTHESES

After reading this section, you should be able to do the following: • • • •

Characterize the relationship between intervention and difference-making Identify and describe Mill’s five methods Discuss the significance of having a control group and random assignment to groups for causal hypothesis-testing Articulate how statistical hypothesis-testing helps to test causal hypotheses

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Intervention and Difference-Making Understanding the causal structure of the world requires more than just sitting back and seeing what happens. In the first section of this chapter, we discussed how causation is more than just correlation or probabilistic dependence. Despite Hume’s skepticism, it sure seems like there must be something else to causation. We have also touched on two candidates for what that something else might be: difference-making and physical processes like energy transfer. But there is a lingering problem related to Hume’s skeptical worry. It’s relatively simple for scientists to discover correlations. Going beyond correlation to discover causal relationships is much trickier. As we discussed in the previous section, for most scientific investigations, looking directly for physical processes like energy transfer from cause to effect won’t work, even if this turns out to be involved in all causal relationships. This is because the events under investigation are usually only distantly related, and scientists often have a limited understanding of intervening processes. For example, the question of how smoking causes emphysema is much more difficult to answer than establishing that smoking causes emphysema. That smoking causes emphysema is a causal relationship inferred from a lot of scientific evidence about relatively clear biophysical processes. In contrast, the question of how this happens requires knowledge of some processes that are more difficult to understand or not yet known.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:17.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

256

Causal Reasoning

The idea of difference-making is much more useful for causal analysis in many fields of science. Scientists have at least two methods to go beyond statistical information about correlation to uncover difference-making relationships. One method is to run an experiment—ideally, a perfectly controlled double-blind experiment, as detailed in Chapter 2. Another method, when experimentation isn’t feasible, is to construct a causal model and rely on statistical information about variables of interest to make causal inferences. This section discusses how experiments can be used to uncover causal relationships; causal modeling will be addressed in the next section. We have covered topics related to testing causal hypotheses and causal modeling earlier in the book, including Chapter 2’s discussion of experimentation, Chapter 3’s discussion of modeling, and Chapter 6’s discussion of statistical hypothesis-testing. But let us reconsider these topics now with an eye to how they relate to causation in particular. Let’s suppose that you are a farmer and you are interested in finding out whether using a new fertilizer will increase your crop yield. This involves a causal hypothesis. How would you test it? One way would be to try out the fertilizer on your crops this year and see what kind of a yield you get. But the causal background might vary from last year to this year in a way that affects crop yield. You wouldn’t be able to distinguish that influence from the specific effect of the fertilizer on the yield. What you want to know is whether the fertilizer makes a difference to crop yield. A better approach would be to divide your field into different plots of equal size. You can then use the new fertilizer on some of the plots but not on the others. After some time, go to your field and compare the crop yield from the fertilizer plots to the crop yield from the other plots. If the plots treated with the new fertilizer produced, on average, a larger crop yield than the other plots, then the fertilizer made a difference. If the two groups of plots yielded about the same amount of crop, then the new fertilizer is probably useless (or no better than your old fertilizer, if that’s the comparison you were studying). If the fertilizer plots do worse, the fertilizer makes a difference—but the wrong kind! Let’s redescribe this scenario using concepts from Chapter 2. The farmer has created an experimental group of plots (to which the new fertilizer is applied) and a control group of plots (which is handled according to the farmer’s past practices). The application of fertilizer to plots in the experimental group is an intervention (or treatment). In causal terms, the farmer is intervening on a suspected cause in order to see whether this makes a difference to the suspected effect. The suspected cause is the independent variable, and the suspected effect is the dependent variable. In testing causal hypotheses like this, sometimes the aim is to establish whether there is a causal relationship. Other times, the aim is to clarify the nature and strength of a causal relationship. For example, some drug trials simply seek to establish safety—that a drug won’t have negative effects. Others seek to establish efficacy—that a drug will have the expected positive effect. And still others aim to determine whether some drug is more effective than another drug already on the market, that is, to establish the relative strength of a causal relationship already identified. By introducing an external influence on a system, interventions disrupt ordinary functioning in a way that can help to disentangle causal relations. That’s in part why the suspected cause is called an independent variable—the intervention independently determines its value, which eliminates the possibility that the suspected cause is affected by

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:17.

Causal Reasoning

257

the causal background. Other features of experimental design, such as having a control group, are used to minimize the chance that changes to the suspected effect are due to the causal background instead of the intervention. Altogether, these features help scientists test causal hypotheses, identifying which particular factor is a genuine difference-maker.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Mill’s Methods The English philosopher and social scientist John Stuart Mill (1806–1873) emphasized the role of both observation and experimentation in discerning causal relationships (Mill, 1893). Mill identified five methods (see Table 7.1) used in the science of his day—and before the development of statistics—to evaluate hypotheses concerning cause and effect. (Scholars have suggested that some of these methods were discussed by scientists and philosophers well before Mill—for instance by the Persian polymath Avicenna, whom you encountered in Chapter 1.) Mill’s methods have proven to be a helpful way to think about how observation and experiments, even nowadays, are used to identify causal relationships. Let’s start with what Mill called the method of concomitant variations. This method begins with the observation of correlation: that the values of two variables change in the same circumstances. Mill noted that when one variable varies together with another, we may infer a causal connection of some kind between them, although we won’t yet know just how they are causally related. More specifically, we won’t yet know whether the two variables are cause and effect, or share a common cause, or are related in some other way. So, for example, we might see that people who play more video games than average are also more violent than average. But while these attributes may be causally related, we cannot tell just from their concomitant variation whether propensity to violence causes an interest in video games, whether people become more violent by virtue of video game exposure, or whether there is some indirect relationship between them, like a love of excitement causing both a propensity to violence and an interest in video games. The other methods Mill identified help get to the bottom of that question, and they do so in ways that suggest the importance of intervention and randomization or other forms of variable control. According to the method of agreement, one begins with cases that agree in effect, and then scrutinizes them to learn what possible cause they have in common—some way in which they agree. If in all instances when an effect occurs there is one prior event or condition common to all of those cases, then one may infer that the event is the cause of the effect. To use this method, one might let the causal background vary while keeping the suspected cause the same. If the suspected effect still occurs in those different instances, this is evidence that the suspected cause is indeed responsible for the effect. If the causal background is varied sufficiently, this rules out a common cause or other circuitous causal relationship. The opposite approach is the method of difference. It begins with cases that differ in effect, and then scrutinizes them to learn whether there’s some other respect in which they differ. If in one case an effect is observed and in another case that effect is not observed, and the only difference is the presence of a single event or condition in the first case that is absent in the second case, then one may infer that this event is the cause of the effect. An instance in which the suspected effect occurs is compared to an instance in which the suspected effect does not occur. If the suspected cause is the only factor present in the former but not the latter, this suggests the suspected causal relationship obtains.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:17.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

258

Causal Reasoning

The method of difference can also be employed when agreement has been discovered; this is called the joint method of agreement and difference. We can consider cases where the suspected effect occurs and see what they have in common and consider also cases where the suspected effect does not occur and see what those have in common. If the suspected cause is the only difference between the two sets of cases, then this affirms a causal relationship between the suspected cause and the suspected effect. Imagine interviewing people with a record of violence and people without such a record. If the only distinguishing feature we find is that those in the former group play a lot of video games and those in the latter group do not, this result would indicate a causal connection between video games and violence. This joint method of agreement and difference provides more evidence of the causal relationship than either the method of agreement or the method of difference by itself. None of these methods—the method of agreement, the method of difference, or the joint method of agreement and difference—eliminates the possibility that the suspected effect is instead the cause. From the investigation described above, we have established a causal relationship between video games and violence, but we can’t know whether video games cause violence or the other way around. To resolve this, we can perform an intervention on an experimental group with the joint method, with the added element of external influence on the independent variable. If we randomly choose groups of participants (thereby eliminating any pre-existing differences between people in the groups) and ask one group to play a lot of video games, then we’ve eliminated the possibility of violent tendencies causing video-game-playing. In observational studies (see Chapter 2), the joint method of agreement and difference can be supplemented not with an intervention but by using other forms of causal analysis. So, for example, we might ask our interview subjects not just how much gaming they do but also for how many years they’ve played video games. For each person, we can compare that with when his or her violent behavior began. Finally, the method of residues is a way to apportion causal responsibility. With this method, one traces all other effects to their causes and looks for the causal variable that remains. If scientists have learned that some causal factors bring about certain effects, and some of those causes present by themselves bring about some but not all of the effects, then the missing cause(s) should be taken to be responsible for the absent effect(s). This is a way of taking into account the causal background in order to focus on some specific cause and determine the difference it makes. Imagine we’ve learned that obesity and smoking cause diabetes, heart disease, and lung cancer. From our knowledge that obesity causes diabetes and heart disease but not lung cancer, we can infer that smoking causes lung cancer. A limitation of this form of causal reasoning is that it assumes causal relationships are simpler than they often are. What if, for example, the combination of obesity and smoking together causes lung cancer, but neither does by itself? The method of residues can’t evaluate this possibility. Consideration of Mill’s methods is in part of interest because causal hypothesis-testing in today’s science inherits some of the features of these methods. These include a focus on similarities among like situations, differences among unlike situations, and causal apportioning. Mill’s methods also illustrate the difficulty of establishing the direction of causation, the importance of intervention, and the limitations of apportioning causal influence. With Mill’s methods in the background, let’s now move on to these and other topics regarding causal hypothesis-testing.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:17.

Causal Reasoning

TABLE 7.1

259

Mill’s methods

Method

Procedure

1. Method of agreement

Start with cases that agree in the effect, and find a possible cause they have in common

2. Method of difference

Start with cases that differ in the effect, and find a possible cause on which they differ

3. Joint method of agreement and difference

Compare cases that agree in the effect to cases that agree in not having the effect, and find if there is one possible cause that cases in the former group have in common but cases in the latter group do not

4. Method of residues

Trace all known causes to their effects, and find a possible cause and possible effect that are left over

5. Method of concomitant variations

Find a possible cause that varies (directly or inversely) with the effect

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Testing Causal Hypotheses In this book, we’ve repeatedly returned to the simple picture of formulating a hypothesis, generating expectations, comparing those expectations to observations, and reasoning from the results of that comparison. Let’s consider a few distinctive features of this process for causal hypothesis-testing. Causal hypotheses can posit the existence of a causal relationship, the direction of a causal relationship, or even the strength of a causal relationship. Two of the founders of microbiology, Louis Pasteur (1822–1895) and Robert Koch (1843–1910), wanted to better understand the causes of diseases like tuberculosis and cholera. They advanced a causal hypothesis: that some diseases are caused by microorganisms like bacteria. This is called the germ theory of disease. It posits the existence of a causal relationship between diseases and microorganisms and the direction of the relationship: exposure to certain microorganisms leads to (or increases the chances of) developing certain diseases. Recall John Snow’s observational study of cholera in Chapter 2 and, from Chapter 4, Ignaz Semmelweis’s discovery that patients were being infected by ‘cadaverous particles’ from doctors who didn’t wash their hands thoroughly enough after performing autopsies. Both of these were steps toward the germ theory of disease. Compare Pasteur’s and Koch’s causal hypothesis about some microorganisms causing disease to the example of a farmer testing how a new fertilizer influences crop yield. In the latter case, the farmer’s hypothesis not only posits a causal relationship and its direction but also something about the strength of the relationship. In particular, the farmer is interested to know whether the fertilizer increases crop yield by at least enough to justify the additional cost of purchasing and applying the fertilizer. Hypotheses about causal relationships, their direction, and their strength are used to develop specific expectations regarding how dependent variables will change in response to changes to independent variables. Based on the germ theory of disease, Koch expected

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:17.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

260

Causal Reasoning

that healthy mice infected with the proper bacteria would develop anthrax, an infectious disease. The farmer was evaluating the expectation that the fertilizer plots of land produced enough additional crop yield to offset the increased costs of supplies or labor. But expectations based on causal hypotheses inherit all the complications of causal reasoning in general. Should Koch expect all treated mice to develop anthrax? We have seen that many causes aren’t sufficient by themselves but only increase the probability of their effects. So, if not every mouse, how many should Koch expect to develop anthrax? And, in what conditions should we expect this to happen? Other features of the causal background might interfere with this causal relationship, even if the germ theory of disease is true. We don’t expect the application of fertilizer to increase crop yield if the crops aren’t watered, after all. These are a few of the complications in determining what expectations we should generate from a causal hypothesis. These complications with causal reasoning make some features of experiments and observational studies particularly significant. To start, we have seen that control groups provide a way to eliminate differences in the causal background, keeping them from becoming confounding variables. In Koch’s experiments, he inoculated some mice with blood taken from the spleens of farm animals that had died of the anthrax disease. He inoculated other mice with blood from the spleens of healthy animals. The only (relevant) difference between these groups of mice was thus their exposure to blood from an animal that died from anthrax (Ullman, 2007). Random assignment to groups is also important to control variation in the causal background. Our farmer’s investigation of the new fertilizer won’t be very illuminating if all the fertilizer plots are in an arid, low-yield part of the farm and the control plots aren’t. Statistical hypothesis-testing is also crucial for testing causal hypotheses. As we saw in Chapter 6, statistical hypothesis-testing involves the development of specific expectations regarding the probability distribution values of a random variable on the assumption the null hypothesis is true. This is important for hypotheses that predict probabilistic causal influence. Causal hypotheses play the part of the alternative hypothesis in statistical hypothesis-testing. The null hypothesis is, usually, simply that the posited cause does not actually influence the phenomenon of interest. So, for our farmer, the null hypothesis is that the fertilizer is causally inefficacious: the range of crop yield from the fertilized plots of land will only differ from the range of crop yield from the other plots by chance variation. Taking into account the number of plots of land and average crop yield for the plots in the control group, the farmer can predict how high of a crop yield for the fertilizer plots is sufficiently unlikely, given the null hypothesis, to warrant rejecting the null hypothesis. And, to warrant buying this new fertilizer. Many scientific hypotheses are concerned with causal relationships. Knowledge about causes and effects are, as we have seen, key to bringing about desirable outcomes and preventing undesirable outcomes. Productive farming practices are worth adopting, and wasting the farm’s money is worth avoiding. Given the range of negative health effects from smoking, perhaps that’s an activity we should each work to avoid or minimize. Coming to grips with the effects of fracking is crucial to deciding whether to use this form of energy capture and, if so, how the industry should be regulated. The techniques of experimental and control groups, randomized assignment to groups, and intervention, as well as statistical hypothesis-testing, are all motivated in large part by their ability to discern causal relationships from mere correlation, cause from effect, and causal influence from chance variation.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:17.

Causal Reasoning

261

EXERCISES 7.12 Describe how you might apply each of Mill’s methods to test the causal hypothesis that not getting enough sleep makes you (you in particular) hungrier the next day. 7.13 Describe the ideal experiment, looking back to Chapter 2 if helpful. You should reference experimental and control groups, random assignment, independent and dependent variables, extraneous variables, and intervention. Then, articulate the significance of each of the features of the ideal experiment for testing causal hypotheses in particular. Your response should discuss causal background, distinguishing causes and effects, common causes, and spurious correlation. 7.14 a.

b.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

c.

Describe how statistical hypothesis-testing can be used to investigate a causal hypothesis—say, that the death penalty prevents crime. (Look back to Chapter 6 if this is helpful). Make sure you specify the null hypothesis and describe, in general, what is needed in order to reject it. Write out the formula for determining the strength of a probabilistic causal relationship (from 7.1). What is the relationship between the two sides of this equation if C does not influence E, that is, if C is not a cause of E? Considering your answers to (a) and (b), answer the following questions. What would the process of statistical hypothesis-testing show if C is not a cause of E (and there is no type I or type II error)? If one causal relationship (CR1) is probabilistically stronger than another causal relationship (CR2), is there a greater chance of a type I error with CR1 or with CR2? How about a type II error?

7.15 Headlines in popular media often misrepresent the scientific studies they discuss. One way this happens is that many headlines suggest a causal relationship where the evidence provided by the scientific study only supports a correlation. Consider the following headlines. For each, (a) identify whether it makes either a causal or a correlational claim; (b) rewrite any headline using causal language so that it reads as a correlational study; and (c) suggest a possible explanation for each correlation that is not the posited or suspected causal relationship. 1. ‘Lack of Sleep May Shrink Your Brain’, CNN, September 2014 2. ‘To Spoon or Not to Spoon? After-Sex Affection Boosts Sexual and Relationship Satisfaction’, Science of Relationships, May 2014 3. ‘Daytime TV (Soap Operas) Tied to Poorer Mental Scores in Elderly’, Reuters, March 2006 4. ‘Study Suggests Attending Religious Services Sharply Cuts Risk of Death’, Medical Xpress, November 2008 5. ‘Facebook Users Get Worse Grades in College’, Live Science, April 2009 6. ‘Texting Improves Language Skill’, BBC, February 2009 7. ‘Study Suggests Southern Slavery Turns White People into Republicans 150 Years Later’, Think Progress, September 2013 8. ‘Dogs Walked by Men Are More Aggressive’, NBC News, November 2011 9. ‘Want a Higher GPA? Go to a Private College’, New York Times, April 2010 10. ‘Sexism Pays: Men Who Hold Traditional Views of Women Earn More Than Men Who Don’t, Study Shows’, Science Daily, September 2008

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:17.

262

Causal Reasoning

7.16 Choose three of the headlines listed in Exercise 7.15, and then, for each, look up the text of the popular media report. Write a paragraph evaluating the strength of the evidence cited in the media report supporting the claim (causal or correlational) in the headline. Try to note both positive features and negative features. 7.17 For each of the following claims, identify three possible confounding variables in the causal background that may impact the relationship. Say whether each possible confounding variable would be an alternative cause, contributing cause, common cause of both stated cause and stated effect, or something else. a. Watching pornography leads to committing sex crimes b. Eating pizza promotes immunity to flu c. Ice-cream consumption raises the probability of drowning deaths d. Being an American scientist raises the chance of having a scientific paper published e. Volcanic eruptions cause tsunamis 7.18 Describe an experiment you could use to determine whether smoking marijuana is a cause of schizophrenia. Address how extraneous variables are to be controlled. Finally, identify the expectations given the hypothesis, that is, what finding would enable you to conclude that smoking marijuana is a cause of schizophrenia.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

7.19 Psychologists have long studied the causes of altruistic behavior. In a classic psychological study by Darley and Latane (1968), participants walked down an alley on their way to another experiment. Some were told they were late for the experiment, others were told they were on time. Each passed by a confederate slumped in a corner. Darley and Latane found that time pressure decreased helping behavior. Describe the specific causal hypothesis and the features the experimental design must have had to adequately test this hypothesis. 7.20 Economists have taken a different approach to studying altruistic behavior. They have investigated it using experimental paradigms, such as the ultimatum game encountered in Chapter 2—a task in which one player is given a real sum of money and decides how to split that money with a partner, then the partner can decide only whether to accept or reject the offer. The finding was that people offered fairer divisions than self-interest predicts, and they rejected divisions deemed unfair even though this results in no money won. The researchers concluded that people sacrifice some self-interest to promote fairness. What are some important differences and similarities between this approach and the experiment described in 7.19? Evaluate each approach for how well it can investigate the causes of altruism.

7.3

CAUSAL MODELING

After reading this section, you should be able to do the following: • • •

Describe the advantages of causal modeling and when this approach is called for Define causal Bayes nets and say what they are good for Specify the kinds of assumptions embedded in causal Bayes nets and discuss their significance and limitations

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:17.

Causal Reasoning

263

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Modeling to Search for Causal Relationships Scientists have developed a variety of sophisticated modeling approaches based on mathematical results and statistical methods that can help investigate causal relationships. Most causal modeling approaches are closely associated with the difference-making account of causation, which emphasizes the importance of interventions to find out about causal relationships. Recall that, according to the difference-making account, if an event C causes another event E, then intervening on C will change the value of E. In contrast, if C and E are merely correlated, then intervening on C won’t change the value of E. Causal modeling approaches can be used when experimentation or observation studies are not possible, or in combination with them. With causal modeling, scientists can learn about causation from data and can derive precise expectations about the outcomes of experiments from causal hypotheses. The basic idea is to use probabilities to infer causal relationships. Causal modelers use patterns of probabilistic conditional independence within a set of variables to draw inferences about causal relationships among those variables. Causal modeling requires some specialized assumptions in order to relate probabilistic dependencies to causal relationships. When these assumptions hold for a data set, one can reliably learn about causal relationships from the causal model. There are several different approaches to causal modeling. Galton’s method of regression analysis, introduced in Chapter 5, is one of the oldest causal modeling procedures. The basic idea of regression analysis is to estimate the correlation of two variables conditional on all other measured variables. You can think of this as drawing a best-fitting line for the relationship in the values of two variables based on data on a scatterplot. When a causal relationship between the two variables is suspected, this can be used to estimate how the causal variable affects the other variable. This cannot tell us whether each variable is the cause or the effect or if they share a common cause. But when there is independent reason to suspect one of these causal relationships, regression analysis can be used to estimate the nature of that relationship. The building blocks of other causal modeling approaches are also, like in Galton’s regression analysis, statistical correlations between variables. In graphical representations of causal models, nodes in the graph stand for variables of interest, and arrows connecting different nodes stand for direct causal relationships between variables. See Figure 7.4 for a generic example. A causal graph of a system enables scientists to make reliable predictions about how the value of a variable would change, should the value of another variable change. In other words, scientists can use causal graphs to figure out what difference an event would make to the occurrence of another event. Suppose you are interested in the relationship between three variables: vaccination (V ), immunity (I), and autism (A). All three variables have two possible values: true and false, or yes and no. You already know that vaccination causes immunity. But—worried about what you’ve heard about potential side effects of vaccination—you make three hypotheses about the dependency between autism and vaccination. The first hypothesis is that vaccination causes immunity, which in turn causes autism. This structure can be graphically represented as a straightforward chain: V→ I → A. The second hypothesis is that vaccination is a common cause of immunity and autism. Using arrows pointing from a cause to its effect, you can graphically represent this structure as: I ← V → A. If this is right, then vaccination is a way to become immune to various

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:17.

264

Causal Reasoning

Copyright © 2018. Taylor & Francis Group. All rights reserved.

FIGURE 7.4 Generic causal graph with nodes representing variables of interest and arrows representing direct causal relationships

diseases, but it also has some chance of inducing autism. Your third hypothesis is that autism isn’t causally related to either vaccination or immunity. To evaluate these hypotheses, scientists would collect data about the values of the three variables of interest in different patients. What should you expect to find if each of these hypotheses were true? Consider the first hypothesized causal structure: V → I → A. This hypothesis states that immunity causally depends on vaccination and that autism depends on immunity. If this is right, then an intervention on immunity (say, due to decreases in the levels of antibody that protect from acquiring a disease) will decrease the chance of autism but will not affect whether one was vaccinated. This intervention would set the variable immunity to the value false and disrupt causal links from vaccination to immunity. And, if this first hypothesis were true, then intervening on immunity in this way would interfere with any correlation between vaccination and autism, making the variables vaccination and autism statistically independent, or uncorrelated. Put another way, on this hypothesis, if you consider everyone, vaccination would be correlated with autism, but if you consider only patients who are immune to a disease, then patients having autism would be uncorrelated to patients having been vaccinated. Vaccination would have no effect on autism beyond its influence on immunity. Consider the second hypothesis, that vaccination is a common cause of both immunity and autism: I ← V → A. What should the data look like to support the conjecture? You should find two correlations, one between the variables vaccination and immunity and another between vaccination and autism. Generally, if you find a correlation between two variables, then this dependence may result from one variable causing the other, but it is also possible that there is some third variable, a common cause, that causes the values of both variables and explains their correlation. Given the common cause structure associated with our hypothesis, you should also find that altering the value of the variable autism will not affect the value of immunity and that altering the value of immunity will not affect the value of autism. Holding fixed the value of vaccination makes autism probabilistically independent from, or uncorrelated with, immunity. So, if this hypothesis is true, then examining only people who are vaccinated (or aren’t vaccinated) would result in no correlation between immunity and autism. In actuality, there is no evidence that vaccination of any kind causes autism. Both hypotheses are false. Before delving into that, let’s first consider where the practice of vaccination came from. Vaccination has been practiced for three centuries. In the 1700s, there

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:17.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Causal Reasoning

265

was some recognition that survivors of certain infectious diseases would become immune to future exposure, and researchers began a primitive form of inoculation by infecting themselves with a disease to gain immunity. The risk of sickness and death with these primitive forms of inoculation was high. Then, the English physician and scientist Edward Jenner (1749–1823) discovered that if he infected people with the cowpox virus, related to smallpox but less dangerous, they had far lower mortality rates from smallpox. Vaccine research advanced significantly again almost a century later, when Louis Pasteur identified bacteria as a major cause behind several diseases; this knowledge led to the germ theory of disease we discussed earlier in this chapter, and to the first synthetically made vaccination. The 1900s saw the introduction of several successful vaccines, including those against diphtheria, measles, mumps, and rubella. As vaccines became more common, their causal mechanism became well understood. Basically, vaccines train the immune system to identify and combat pathogens, either viruses or bacteria. Certain molecules from the pathogen must be introduced into the body to trigger an immune response but not necessarily the whole pathogen. So, many modern vaccines have no chance of making you sick from the pathogen, since they don’t even contain the full viruses or bacteria. But despite increased understanding of how vaccines work and drastically increased vaccine safety, misconceptions remain. The myth that vaccines cause autism originated with a study published in a prestigious medical journal in 1997. The study linked the measles, mumps, and rubella (MMR) vaccine to increasing autism in British children. This was a correlation. Several other studies were independently conducted to test whether this correlation was due to a causal relationship; none found a causal relationship between vaccination and autism. In fact, several studies couldn’t even replicate the correlation between vaccination and autism. In the meantime, several other researchers pointed out that there were several methodological errors in the original study, that the authors had financial conflicts of interest, and that the study was ethically problematic. The article was eventually retracted from the journal. While the causes of autism are unclear, it has been definitively shown that vaccination is not among them. From this, we can conclude that any data you gathered would not confirm either your first or second hypothesis about a causal pathway from vaccination to autism. Vaccination and immunity are strongly correlated with each other; the reason why is that vaccination is one of the major causes of immunity. But vaccinations have undergone extremely extensive safety testing with huge groups of test subjects, and none has shown a correlation with autism. And scientists now believe there are physiological signs of autism even in utero, well before exposure to vaccination. Neither vaccination nor immunity are correlated with autism; nor does vaccination cause autism. Despite the scientific knowledge already achieved in this issue, the belief about a connection between vaccination and autism persists. This stems in part from the fact that the initial symptoms of autism can occur in early childhood, around the same time that many vaccinations customarily occur. So there’s a temporal connection between vaccination and autism diagnosis, and we have emphasized that spatiotemporal connection is a guide to causation. But in this case, the temporal connection is simply because autism diagnosis and vaccination share a common cause: being a young child. Now, back to causal models. The main strengths of causal modeling are transparency and flexibility. In constructing causal models, scientists are forced to be explicit about their assumptions regarding the causal relationships in question. An example is the different sets

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:17.

266

Causal Reasoning

of expectations stemming from the three different hypotheses for how vaccination may causally relate to autism. Once the assumptions about causal relationships are explicit, a causal model can simply represent dependencies between different variables in the model, and precise expectations can be formed about what would happen if you changed the value of a variable in the model. Patterns of statistical information can be used to test these expectations and, thus, the causal hypotheses behind them. Using causal models, scientists can make a fine-grained evaluation of whether correlational evidence supports a causal hypothesis; they can identify what manipulations to perform when conducting an experiment to assess a causal connection; and they can better recognize what factors in the causal background must be controlled. Causal models are used across many different fields of sciences, from epidemiology to economics. While there are several different approaches to causal modeling, the leading approach to causal learning and reasoning is the causal Bayes nets approach. The rest of the chapter will survey this approach.

Causal Bayes Nets

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Causal Bayes networks, or ‘nets’, are a kind of graphical causal model. They are made up of two components: a graph representing the variables in the system of interest with directed links representing causal relationships, as in Figure 7.4, together with a set of conditional probabilities specifying how the strength of each causal relationship. The purpose of causal Bayes nets is to provide a compact visual representation of a system’s causal relationships and their strengths. This purpose is accomplished using joint probability distributions, that is, the probability distribution for each variable, taking into account the probability of the other variables. These graphical models are called Bayes nets because they use the rule of Bayesian conditioning to compute posterior distributions, updating the probabilities in the network whenever new information is acquired. (See Chapter 6 for a discussion of Bayesian statistics.) Suppose there are only two events that could cause your Facebook page to be shut down: either you post material that infringes copyright or a friend reports you. Also suppose that posting material that infringes copyright affects the chance that your friend reports you: when you post a new song on your page, your friend will enjoy it and thus won’t report you. This situation can be modeled with a Bayes net like the one seen in Figure 7.5. This doesn’t yet show the conditional probabilities that specify the strength Copyright infringement

Your friend reports you



+

+ FB page shut down

FIGURE 7.5 Causal graph of the relationships between posting copyrighted material on your Facebook page, a friend reporting you, and your Facebook page being shut down.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:17.

Causal Reasoning

267

of each causal relationship. The strength of those relationships matters; this decides whether on balance posting copyrighted material increases or decreases the chance of your  Facebook page being shut down. Suppose that all three variables have two possible values, true and false, and that their conditional probability relationships are given in Table 7.2. Causal Bayes nets like this one can be used to make probabilistic and causal inferences and to learn about causal relationships. Because they are complete models for specified variables and their relationships, they can be used to answer questions about the probability that a certain variable takes on a specific value. For example, the causal Bayes net model outlined in Figure 7.5 and Table 7.2 can be used to determine the probability that you’ve been reported, given that your Facebook page has been shut down, but you posted no copyrighted material. Another use is, when the values of certain variables are observed, the network can determine the value of other variables by computing their posterior probabilities using Bayesian conditioning. Bayes nets can also be used to estimate causal relationships that are related to statistical features of our observations—for example, the negative correlation between copyright infringement and being reported by a friend. And they can be used to predict the effects that potential interventions on some variables would have on the values of other variables—for example, to predict what would happen if you posted copyrighted material on your page.

TABLE 7.2

Conditional probabilities for the causal graph in Figure 7.5

Pr(Copyright infringement) Copyright infringement = T

Copyright infringement = F

0.20

0.80

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Pr(Reported|Copyright infringement) Copyright infringement

Reported = T

Reported = F

T

0.01

0.99

F

0.40

0.60

Pr(Page shut down|Copyright infringement, Reported) Copyright infringement

Reported

Page shut down = T

T

T

0.99

T

F

0.80

F

T

0.90

F

F

0.00

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:17.

268

Causal Reasoning

To better understand how scientists use Bayes nets to learn about causal relationships, consider this scenario: Suppose that a patient has been suffering from shortness of breath (called dyspnoea) and visits the doctor, worried that he has lung cancer. The doctor knows that other diseases, such as tuberculosis and bronchitis, are possible causes of this symptom, as well as lung cancer. She also knows that other relevant information includes whether or not the patient is a smoker (increasing the chances of cancer and bronchitis) and what sort of air pollution he has been exposed to. A positive x-ray would indicate either TB or lung cancer. (Korb & Nicholson 2010, p. 30 ff.)

Copyright © 2018. Taylor & Francis Group. All rights reserved.

There’s plenty of causal information here, but how that information relates to the case at hand is tricky to figure out. Constructing and using a causal Bayes net is one effective way to assist the doctor in making a medical diagnosis. To construct such a model, the first thing to do is to identify the relevant variables. Like in the previous example, each variable will be represented with a node. There’s no uniquely right way of setting up the Bayes causal net, but it helps to make choices about what nodes to include that enable us to represent the relevant, known aspects of the situation with enough detail to perform the desired reasoning. One possible modeling choice is shown in Table 7.3. In this case, the variables include dyspnoea, smoker, pollution exposure, x-ray result, lung cancer. The second step of constructing a causal Bayes net is to specify the causal structure of the system by drawing arrows between the nodes. Smoking and living in a polluted area are two factors affecting the patient’s chance of having lung cancer. In turn, having lung cancer is a factor affecting the result of an x-ray, and the patient’s difficulty in breathing, that is, the patient’s suffering from dyspnoea. If this is the structure of the situation, then we may draw the graph pictured in Figure 7.6. Several forms of causal relationships can be represented in a causal Bayes net. A cause can increase or decrease the probability of some variable taking on a given value, causes can influence themselves, or there can be a feedback loop where two or more variables influence one another in a cyclical way. Most of the time, however, Bayes nets are assumed to be directed acyclic graphs (sometimes abbreviated DAG), which means that all the causal relationships are taken to go in one direction without feedback loops. This means

TABLE 7.3

Possible values for variables in dyspnoea case

Variable

Values

Dyspnoea

{T, F}

Smoker

{T, F}

Pollution

{low, high}

X-ray

{positive, negative}

Lung cancer

{T, F}

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:17.

Causal Reasoning

Smoker

269

X-ray Lung cancer

Pollution

FIGURE 7.6

Dyspnoea

Causal graph for the dyspnoea case

Conditional probabilities of developing lung cancer given level of pollution exposure and whether or not a person smokes

Copyright © 2018. Taylor & Francis Group. All rights reserved.

TABLE 7.4

Pollution

Smoker

Pr(Lung cancer=T|Pollution, Smoker)

High

T

0.050

High

F

0.020

Low

T

0.030

Low

F

0.001

that earlier causes are assumed not to also be later effects. You can see from Figure 7.6 that our graph satisfies this assumption; no arrows form circles like X → Y → Z → X, and no arrow is bidirectional like X ↔ Y. Having specified the nodes and their structure, the strength of the relationships between connected nodes must now be specified. To do so, one needs to define a probability distribution for each node, conditional on any node(s) that causally influence it. In the dyspnoea case, statistical information from medical studies or observed frequencies can be used to specify these probability distributions. For variables for which no such information is available, initial probabilities can be based on an intuition, guess, or estimation. These are exactly like the prior probabilities from the discussion of Bayesian statistics in Chapter 6. It turns out that Bayes nets can be accurate in the long run even if they start off with imprecise or inaccurate initial probabilities. Let’s take a look at the variable lung cancer in Figure 7.6. The variables that causally influence it are pollution and smoker, each of which can take two possible values for a total of four combinations of values: {; ; ; }. We can specify the conditional probability of having cancer in each of these four cases. One way to represent these conditional probabilities is in a table, as in Table 7.4. Once all the conditional probability distributions are determined, our causal Bayes net captures all of the relevant knowledge available. Now we can start to reason with it. Reasoning with a Bayes net amounts to the task of computing a posterior probability distribution for one or more variables of interest given the values of variables that you have information about. These computations are governed by Bayesian conditioning. Think of this as updating your beliefs about a variable based on changes to your beliefs about

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:17.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

270

Causal Reasoning

other variables. The arrows connecting nodes in the causal Bayes net show the paths that probability distribution changes follow. Belief updating can happen either from cause to effect, based on information about the value of a cause variable, or from effect to cause, based on information about the value of an effect variable. For example, if we’re certain that the patient has dyspnoea, and her x-ray results are negative, then we can update our diagnosis about whether the patient has cancer, a causal influence on both dyspnoea and x-ray results. In turn, updating our diagnosis of cancer will affect our beliefs about whether the patient is a smoker and lives in an area with high levels of pollution, proceeding up the chain of causal influence. Or if we are certain that the patient is a smoker, we can update our beliefs about her chance of having lung cancer accordingly, which is causally influenced by smoking status. This also influences our expectations of the x-ray result. A different type of reasoning with causal Bayes nets regards the relationship between two causes that compete to explain an observed effect. In our case, smoker and pollution are two such causes. They compete to explain the value of the variable lung cancer, which they both influence. Suppose we learn that the patient has cancer. This new piece of information raises the probability of both possible causes. Suppose that we learn further that the patient lives in a badly polluted city. Something interesting would now happen in our causal Bayes net. This new piece of information both explains the patient having cancer, and it also lowers the probability that the patient is a smoker. Although the variables smoker and pollution are initially probabilistically independent, given that we know that the patient has cancer and lives in a highly polluted area, the probability that the patient is a smoker goes down. Now that we know the patient has been exposed to significant pollution, this information accounts for the lung cancer and disrupts the attribution of a probabilistic association between lung cancer and smoking. Put another way, we don’t need to speculate that the patient was a smoker in order to explain the lung cancer. In the simple cases we’ve considered, a Bayes net is fully specified, and then used to make causal inferences and predictions. In some scientific applications, in contrast, causal Bayes nets are incomplete in two respects. First, there are many other variables that could be added to the model; variables that precede, mediate, or follow the variables that are explicitly represented. Second, information might be lacking about the causal relationships between variables represented in the model. In this case, the structure of the network and the relevant probabilistic dependencies must be learned from data as the model is developed. Cognitive neuroscientists, for example, are interested in the causal relationships between brain areas that support the same cognitive capacity. To find out about these causal relationships, they often rely on brain imaging data, where subjects perform tasks that tap the cognitive capacity of interest while having their brain activity recorded. Neuroscientists already have some background knowledge about which brain regions might be involved in a task, so they often focus their attention on recorded activity from only a few regions of interest, each one of which can be treated as a variable and represented as a node in a causal Bayes net. The challenge is then to discover the causal structure of these regions of interest—to determine the nature of the arrows. Machine learning algorithms help neuroscientists to tackle this challenge. One of these algorithms searches the brain imaging data set to find the causal structure that best helps scientists explain observed statistical dependencies between the variables of interest. Roughly, this search procedure begins with a graph with no arrows. Arrows are added

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:17.

Causal Reasoning

271

sequentially, based on how well they would help account for observed correlations. When no further addition of arrows can improve the account of observed correlations, the procedure moves to eliminating arrows until the account is as simple as it can be while still matching the observed correlations. The resulting causal structure is invoked as the best explanation of the observed data (Glymour, 2007).

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Assumptions of Causal Modeling Reasoning with causal Bayes nets and other forms of causal modeling requires a number of assumptions (see Eberhardt, 2009). In closing this chapter, we’ll discuss three such assumptions: modularity, the causal Markov condition, and faithfulness. When these assumptions are satisfied, causal Bayes nets are promising for learning reliably about causal relationships between variables from their observed statistical features. The failure of these assumptions can, in some cases, undermine the usefulness of causal Bayes nets. Modularity is the assumption that interventions on some causal relationship will not change other causal relationships in the system. If a system is modular and there is a correct causal Bayes net of that system, then dependencies between variables in the model that are not directly manipulated should not change. Thus, if modularity holds, it should be possible to change the value of a variable X in the Bayes net without making arrows into variables that depend on X appear or disappear. If variable X is not a cause of Y, then the probability distribution of Y should remain unchanged when there’s an intervention on X. In contrast, in systems that are not modular, an intervention on one variable may change other causal relationships in the system. The assumption of modularity allows one to make precise predictions about the effects of intervening on a particular variable. When the modularity assumption is not satisfied, Bayes nets may not provide correct answers to questions about the effects of an intervention. Different systems are modular to varying degrees. Systems that are not modular can sometimes be rearranged so that its causal relationships can be correctly represented by a Bayes net. Closely associated with modularity is the causal Markov condition. This is one of the most important assumptions of causal Bayes nets. The causal Markov condition specifies that each variable in a Bayes net, conditional on its direct causes, is independent of all other variables other than its direct and indirect effects. The basic idea is that remote causes do not matter to conditional probabilities, and thus to causal inference, so long as we know the immediate causes of an event. In the dyspnoea case, for example, the causal Markov condition assumes that whether the patient has a positive x-ray is influenced by whether he has cancer, but taking into account whether he has cancer, it is not influenced by whether he is a smoker, or by whether he lives in a high pollution area. The idea is that cancer causes a positive x-ray result, whether the cancer was caused by smoking or by pollution. The causal Markov condition indicates which variables will be probabilistically independent conditional on other variables. This enables scientists to reason from probabilistic information to causal relationships. If the causal Markov condition holds, then a Bayes net can correctly represents the absence of a direct causal relationship with the conditional independence of two variables. Our reasoning about vaccination, immunity, and autism relied on this reasoning. The causal Markov condition might fail if the set of variables

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:17.

272

Causal Reasoning

included in a Bayes net is incomplete in certain ways. But, here too, there are sophisticated machine learning techniques for causal discovery that work reliably. The third assumption of causal Bayes nets we’ll discuss is faithfulness. While the Markov condition indicates which variables in a Bayes net will be probabilistically independent, faithfulness specifies which variables will be probabilistically dependent conditional on other variables. In the dyspnoea example, if having cancer is causally related to tuberculosis, then TB and cancer in our Bayes net should be probabilistically dependent. The basic motivation for the faithfulness condition is that a causal relationship between two variables entails, almost always, a probabilistic dependence between those variables. This implies that the probabilistic influence of different causal pathways from one cause to an effect will not exactly cancel out each other’s influence. However, faithfulness doesn’t always hold. In Section 7.1, we discussed two examples of events that are causally related but uncorrelated. If smoking causes heart disease, but also causes exercise, and exercise prevents heart disease, then the causal influence may exactly cancel out. Here the faithfulness assumption fails. Failures of faithfulness don’t compromise causal inference as seriously as failures of the causal Markov condition. Conditions where faithfulness fails are much better understood than conditions where the causal Markov condition fails, and the number of techniques for causal discovery that don’t rely on faithfulness is larger. There are many more assumptions underlying reasoning with causal Bayes nets, beyond modularity, the causal Markov condition, and the faithfulness condition. As we have said of causal modeling in general, specifying these assumptions, and seeing where they fail to hold, is an important step toward making causal claims transparent. Understanding how causal modeling works when some of assumptions fail and what kinds of errors they may introduce is one of the most important challenges at the forefront of current causal modeling approaches.

EXERCISES

Copyright © 2018. Taylor & Francis Group. All rights reserved.

7.21 Describe what causal modeling can be used for. What are some advantages and limitations compared to other strategies we have seen for learning about causal relationships? 7.22 For each of the following cases, (a) indicate the causal hypothesis, explicitly distinguishing the cause from the effect; (b) offer another plausible cause for the effect; and (c) draw a simple causal model to help you assess whether the reasoning described in the case is good or bad. 1. You have eaten your birthday dinner at your favorite pizzeria in town for the past 10 years. This year, you got sick. This was also the first time your uncle Sam was there. You conclude you got sick because uncle Sam was there. 2. Every time Felipe goes to see Real Madrid play, they lose. Whenever he is not there, they win. If I want Real Madrid to win, I had better not let Felipe go to any more games. 3. Eryka normally goes to bed at midnight and gets up by 7:00 a.m. each morning. She usually runs two kilometers after having some breakfast. This morning,

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:17.

Causal Reasoning

273

however, she ran only half a kilometer and had to stop, as she was so tired. She recalled that she had gone to sleep unusually early the night before and concluded that too much sleep made her too tired to run. 4. In Albystown, there are two kinds of students: those who own a diary and those who own a smartphone. A first-grade teacher in Albystown noticed that all the students who consistently failed exams owned a smartphone. He concluded that those students who own a smartphone are intellectually inferior to those who own a diary, and that’s why they failed more exams. 5. Phineas Gage’s moral character changed dramatically after an explosion blew a tamping iron through his head. Gage was leading a railroad construction crew near Cavendish, Vermont, when the accident occurred. ‘Before the accident he had been a most capable and efficient foreman, one with a well-balanced mind, and who was looked on as a shrewd smart business man.’ After the accident, he became ‘fitful, irreverent, and grossly profane, showing little deference for his fellows. He was also impatient and obstinate, yet capricious and vacillating, unable to settle on any of the plans he devised for future action’. 7.23 Causal reasoning involves various types of probabilistic inferences: predictive inferences (from causes to effects); diagnostic inferences (from effects to causes); and reasoning about interventions (what would happen if you manipulated a certain feature of a system). For each of the following situations, (a) indicate whether you would make a predictive or a diagnostic inference to find out about the events described; (b) describe what intervention you would carry out to find out about the events described; and (c) explain why you would make those inferences and interventions. 1. You are a physician working at a hospital, and you notice that some patients have been infected with influenza. 2. You notice that you have a runny nose, body aches, and a sore throat. 3. You notice that there is an unusual smell coming from the engine of your car, while the needle on the temperature gauge creeps up quickly past the normal limit. 4. Every morning, you notice a continuous tinkling noise coming from the kitchen in your apartment. 5. You notice that the countryside of your town has more animals than the site could support for a grazing season.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

7.24 Describe the important elements of a causal Bayes network and what each represents. 7.25 A group of psychologists is interested in how intrinsic motivation of university students affects their exam results. They believe that intrinsic motivation affects both class attendance and home preparation (reading the textbooks, doing the assignments, and so on). They also believe that both class attendance and home preparation affect exam results. They do not believe that there are any further causal interactions. All relevant variables (intrinsic motivation, class attendance, home preparation, and exam results) have two values: high and low for intrinsic motivation, class attendance, and home preparation and pass and fail for exam results. The psychologists observe the following frequencies: 1. 40% of all students have a high intrinsic motivation. 2. 90% of all highly motivated students attend classes regularly, as opposed to 60% of all students with low motivation.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:17.

274

Causal Reasoning

3. 70% of all highly motivated students prepare well, as opposed to 20% of all students with low motivation. 4. 80% of all students who prepare well and attend class regularly pass the exam. 5. 60% of all students who prepare well and do not attend class regularly pass the exam. 6. 45% of all students who do not prepare well and do attend class regularly pass the exam. 7. 40% of all students who do not prepare well and do not attend class regularly pass the exam. Draw the causal Bayes net that corresponds to the story. Then, suppose that the university implements a new policy that forces students to attend class. Assume that all students comply with this policy. From the causal Bayes net and the frequencies given above, determine the probability that students pass the exam after this intervention. 7.26 Construct causal Bayes nets for simple examples of causal relationships with (a) a common cause structure, (b) a common effect structure, and (c) a chain structure. 7.27 Tillbourg College admits students who are either brainy or sporty (or both). Let C denote the event that someone is admitted to Tillbourg College, which is made true if they are either brainy (B) or sporty (S). Suppose in the general population, B and S are independent. Draw a causal Bayes net to represent this situation, defining all relevant variables and probabilities. If you learn that all students at Tillbourg College are sporty, what can you infer about the value of S? Explain your reasoning. 7.28 Give an example of explaining away, a situation in which discovering one causal relationship diminishes the probability of some presumed cause. 7.29 Suppose that we measure the variables storm (S), barometer reading (B), and atmospheric pressure (A). You find that storm and barometer reading are probabilistically dependent, as are barometer reading and atmospheric pressure, and storm and atmospheric pressure. Furthermore, you find that storm and barometer reading given atmospheric pressure are independent. From these constraints alone (assuming the causal Markov condition and faithfulness hold), what underlying causal structures can you infer? For each, provide a causal Bayes net.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

FURTHER READING For more on the psychology of causal reasoning, see Sloman, S., & Lagnado, D. (2015). Causality in thought. Annual Review of Psychology, 66, 223–247. Pasteur’s influence on the history and sociology of medicine is described in more detail in B. Latour’s (1993). The pasteurization of France. Cambridge: Harvard University Press. For an account of the difference-making view of causation and its importance in scientific explanation, see Woodward, J. (2003). Making things happen: A theory of causal explanation. Oxford: Oxford University Press. For a pluralist view of the nature of causation and discussion of causal analysis, including causal Bayes nets, see Cartwright, N. (2007). Hunting causes and using them: Approaches in philosophy and economics. Cambridge: Cambridge University Press. For advanced treatments of causal modeling, see Pearl, J. (2009). Causality: Models, reasoning, and inference, 2nd edition. New York: Cambridge University Press. Also see Spirtes, P., Glymour, C., & Scheines, R. (2001). Causation, prediction, and search, 2nd edition. Cambridge: MIT Press.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-26 19:08:17.

CHAPTER 8

Explaining, Theorizing, and Values 8.1

UNDERSTANDING THE WORLD

After reading this section, you should be able to do the following: • • •

Articulate the roles played by explanation in science Describe the nomological, pattern-based, and causal conceptions of explanation List advantages and problems of each of these three conceptions of explanation

Copyright © 2018. Taylor & Francis Group. All rights reserved.

The Workdays of Taxi Drivers Many taxi drivers in New York City are independent workers. They lease their cabs for a fixed fee, or they own them outright. They keep the fares they earn. And they can call it a day at any time during their shift. Some days are good for business: during weekdays when Wall Street is open, when it’s rainy, or when some big event is in town. On a good day, taxi drivers spend less time cruising around searching for customers, and they earn a relatively high hourly wage. Other days are bad, however; and taxi drivers may have a hard time finding costumers. So, although taxi fares are set by law, taxi drivers’ daily wages can fluctuate significantly. Given this variation in their hourly wages from day to day, how do taxi drivers in New York City choose the number of hours they work each day? Economists can answer this question by appealing to the law of supply. According to the law of supply, there is a direct relationship between price and quantity of goods and services: generally, as the price of an item increases, suppliers will attempt to increase their profit by increasing the quantity of items offered for sale. So, people will sell more of something when the price is high than when the price is low. Taxi drivers in particular will tend to sell more of their labor hours—that is, they will tend to work longer—when wages are higher than when they are lower. In other words, they will work more, when it really pays off, and cut out early on bad days when it doesn’t. The law of supply—along with its counterpart, the law of demand—is one of the most fundamental and intuitive explanatory principles in economics. Assuming people strive to do what is in their best interest, economists invoke general principles like the laws of supply and demand to explain how people set the prices of goods and services and how people allocate resources like their time. When an employer pays higher overtime hourly rates, the number of hours employees are willing to work increases. When

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

276

Explaining, Theorizing, and Values

consumers are willing to pay more for a slice of pizza than for a cupcake, bakeries will increase their production of pizza and reduce the production of cupcakes. The law of supply captures the relationship between price changes and suppliers’ behavior, as in these examples. Psychologists give a different answer to the taxi-driver question. There is a theory about daily income, called the ‘daily-income-targeting theory’, that appeals to two psychological tendencies. One tendency is that, when confronted with multiple related decisions over a period of time, people often consider the merits and weaknesses of only a single decision at a time, instead of considering the consequences of all decisions at once. The second psychological tendency is loss aversion: people dislike losing money or other resources more than they enjoy gaining similar amounts. Applied to taxi drivers, these tendencies suggest that their decisions about how much to work are made day by day instead of all at once and that they generally will resist quitting until they reach their daily target income. This predicts that taxi drivers will work longer hours on low-wage days and quit early on high-wage days. This is, of course, the opposite of what economists’ law of supply predicts. A group of economists and psychologists tested these competing predictions by carrying out a field study, where they analyzed data about New York taxi drivers’ behavior from the years 1988, 1990, and 1994 (Camerer et al., 1997). Their data indicated that less-experienced drivers tend to work more hours on bad days, when working does not pay off, and clocked off too early on good days. The income-targeting theory explains this apparently irrational behavior in a simple way: inexperienced taxi drivers use a simple rule of thumb—a heuristic—that guides them to aim for a certain amount of earnings over a certain period of time. If they are falling behind that rate, they work longer to catch up, and if they are ahead, they quit early. The data showed that more experienced taxi drivers don’t display this pattern of behavior. To figure out why, the researchers evaluated their data sets with an eye to other possible explanations. Two plausible explanations were that taxi drivers may learn with experience to resist the temptation to quit early on good days. Or they may simply learn that driving a fixed number of hours each day is more efficient than aiming for a certain amount of money. Neither of these possible explanations appeals to general economic principles. Taxi drivers, inexperienced or experienced, don’t seem to act in accordance with the law of supply.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Explanation, Understanding, and Scientific Knowledge In Chapter 1, we discussed how science aims at the production of knowledge—an aim that is constitutive of the very meaning of the word science. There are many kinds of knowledge, each of which can be important. But the kind of knowledge that science aims to produce is distinctive: scientific knowledge is explanatory knowledge of why or how the world is the way it is. In the case of taxi drivers, scientists have used both the law of supply and the daily-income-targeting theory to attempt to explain how taxi drivers decide how long to work each day. This can help scientists understand more generally how humans make decisions about things like time and revenue. Unfortunately, the explanation issuing from the law of supply doesn’t seem to adequately account for the behavior of actual taxi drivers, and so it doesn’t produce knowledge of how or why drivers make the decisions that they do. The explanation based on

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Explaining, Theorizing, and Values

277

the daily-income-targeting theory does a better job at accounting for drivers’ behavior. This explanation seems to help us understand taxi drivers’ decisions about how long to work each day, and it may be a promising start for explanations of other, similar human behavior (for a nice example see Camerer, 1997). To say that science aims to produce a special kind of knowledge is not to say that scientific explanations are entirely different from ordinary, everyday explanations. The explanatory knowledge produced in science is a special kind of knowledge, explicitly supported by evidence through the use of methods discussed in this book. But there’s significant overlap between scientific and everyday forms of explanation. All of us sometimes notice things that cry out for explanation. We routinely ask questions such as: ‘how much does drinking corrode the liver?’, ‘why did the economic crisis happen?’, ‘why do colleges and universities have vastly more highly paid administrators than they used to, given steep declines in public funding for higher education?’, and, of course, ‘how did the dinosaurs go extinct?’ Even children regularly engage in this pursuit of explanatory knowledge. Many have wondered why the sky is blue. A parent might quickly answer that the sky is blue because it looks that way to us or because that’s just the way the sky is. Such answers don’t explain why the sky is blue; they offer no insight into why or how the phenomenon is the way it is. A satisfying explanation of why the sky is blue relies on some sophisticated scientific theorizing: sunlight travels in straight lines unless some obstruction either reflects it, like a mirror; bends it, like a prism; or scatters it, like the molecules of gas in the Earth’s atmosphere. Because blue light has shorter wavelengths, it is scattered more than other colors in the spectrum. That’s why we normally see a blue sky. In contrast to most parents’ quick answers to this question, this explanation appeals to other facts about the world and scientific laws or theories in order to give a deeper understanding of the phenomenon in question. Generating explanations serves a variety of cognitive roles. It facilitates learning and discovery, and plays a central role in confirmation and reasoning. As we discussed in Chapter 4 in relation to abductive reasoning—also known as ‘inference to the best explanation’—explanatory considerations can be used as evidence in support of a hypothesis, making the hypothesis more credible. With respect to learning, generating explanations to oneself or to others facilitates the integration of new information into existing bodies of knowledge and can lead to deeper understanding; this is called the self-explanation effect. Performance on a variety of reasoning tasks, including logical and probabilistic tasks, can be improved when one is asked to explain. This is why explaining the study material and responding to explanatory questions is such a good way to learn new material encountered in a course. Instructors and tutors learn material faster and with more depth of insight by virtue of explaining it to others. Perhaps most important among these cognitive roles, explanation produces understanding. Understanding the world around and within us is a supreme achievement that is absolutely central to science. Understanding involves grasping why or how something came about or is the way it is. This makes it possible for us to intervene in the world and to anticipate what will happen next. When we understand how a system works—say, the tidal system of the San Francisco Bay, an example from Chapter 3—we are able to anticipate how changes in some features of the system (like the Reber Plan) will lead to changes in some other features (tides, salinity, and so forth). When explanations generate genuine understanding, they can satisfy our curiosity. To satisfy our curiosity and have that experience of ‘Aha! Now I get it!’ can feel really good.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

278

Explaining, Theorizing, and Values

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Psychologist Alison Gopnik (1998) once likened understanding to orgasm. Sex evolved to feel good because it leads to babies, which is needed for a species to continue. Similarly, Gopnik reasoned, understanding is enjoyable because explanations are tremendously helpful to people getting around in the world. And so, the desire to satisfy our curiosity has led humans to ever more sophisticated and accurate theories about our world. The satisfaction of curiosity is no guarantee of a good explanation, though. People can have a sense that they understand something without genuinely understanding it— explanations can be wrong. People also often fall prey to an illusion of explanatory depth, believing they understand the world more clearly and in greater detail than they actually do. We all regularly overestimate our competence and depth of knowledge; recall our discussion in Chapter 1 of the cognitive errors, like confirmation bias, which science is designed to correct for. An illustration of how one can be dangerously misled by the feeling one understands something is the public reception of climate change research. As you may recall, climate change was originally called ‘global warming’. But this terminology misled many people about what they should expect to experience. When a season was not warmer than usual in some particular location, some people were tempted to doubt the reality of climate change—it seemed to them like things weren’t getting warmer after all. But climate change does not produce warmer temperatures in every location at every point in time. Instead, it produces a global increase in average temperatures and increasingly extreme weather and storms along the way. Unfortunately, some people—including some politicians who shape how nations respond to climate change—still disregard scientific knowledge of climate change because of apparent conflicts with the daily weather they experience. Figure 8.1 pictures Oklahoma

Oklahoma Senator James Inhofe speaking before the US Congress in 2015 while brandishing a snowball

FIGURE 8.1

Reproduced from C-SPAN

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

Explaining, Theorizing, and Values

279

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Senator James Inhofe speaking before the US Congress in Washington D.C. in February 2015. Inhofe brought a snowball to illustrate that it was (he claimed) unseasonably cold outside. In fact, it was not unusually cold in D.C., and meanwhile, the West Coast of the United States was unusually warm. The year prior, 2014, had the warmest average temperatures in recorded history, and the Earth has continued to warm in the years since. Another example of the illusion of explanatory depth concerns public reception of neuroscientific information. Experimental data suggest that people are often misled into judging bad psychological explanations as better than they really are when accompanied by completely irrelevant neuroscientific information. This ‘seductive allure’ of neuroscientific explanations might interfere with people’s ability to critically evaluate the quality of an explanation (Weisberg et al., 2008). Coupled with an illusion of explanatory depth, this interference can have negative practical effects when, for example, it is exploited by advertisements for ‘brain training’ that promise brain enhancement ‘proven by neuroscience’. This is the opposite of the climate change case. Instead of scientific expertise being disregarded because of personal experience, scientific credibility is misapplied to get people to believe something there’s not actually sufficient evidence for. Given the centrality of explanation to the scientific enterprise and the potential for all people, including scientists, to feel like they understand something even when they do not, it’s an important task to clarify the nature of scientific explanation. If we can say what features good explanations must have, then we will be better able to judge whether something counts as an adequate explanation. One simple idea is that explanations are just true answers to why or how questions, such as ‘why is the sky blue?’ or ‘how do bicycles move?’ But we have suggested that some true answers to the question of why the sky is blue, like ‘because that’s the way it is’, don’t count as explanations. So, we need a way to determine when a true answer to a why- or how-question is a good explanation. What features should good answers to why- or how-questions have? Philosophers of science and some scientists have thought long and hard about this question. The possible answers relate to other topics we have discussed in this book. Some have suggested that explanations should cite laws in order to account for phenomena, either deductively or probabilistically. Another idea is that explanations should show how phenomena fit into patterns. Others have suggested that explaining is a kind of causal reasoning and that explanations should say what causes a phenomenon.

Nomological Explanation and Pattern Explanation Let’s consider these conceptions of explanation in greater depth. The first is that successful explanations appeal to scientific laws. This idea is at the heart of the nomological conception of explanation (from the Greek nomos, meaning law). According to this conception, a scientific explanation references a law that can account for the phenomenon to be explained. The nomological conception of explanation was developed most fully by the German philosopher of science Carl Hempel (1905–1997). Hempel proposed that explanations are arguments that appeal to general scientific laws to derive statements about the occurrence of the phenomena we want to explain. Explanations demonstrate that there are one

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

280

Explaining, Theorizing, and Values

or more scientific laws or principles that, together with background conditions, make it so that the phenomenon to be explained was to be expected. So, according to Hempel, nomological explanations have a form like this: 1. L1, …, Ln 2. C1, …, Cn ∴ 3. E In this scheme, L1, …, Ln are statements of general laws, such as the laws of supply and demand in economics. C1, …, Cn are statements of background conditions, such as the actual price and quantity of a good in some market at some time. And E is a statement of the phenomenon to be explained, like a dramatic decrease in the number of people taking taxis over the past year. Hempel believed that knowing the law and background conditions would lead people to realize the phenomenon in question was to be expected. By rendering phenomena expectable, scientific explanations reveal our world to be ordered, proceeding in accordance with general laws. Thus, if you want to explain why people are taking fewer taxis, you may begin by stating the law of demand: all other factors being equal, as the price of a good increases, the quantity of goods demanded by consumers decreases, and as the price of a good decreases, the quantity demanded increases. Then you may point out an increase in rideshare programs and cycling incentives and the advent of companies like Uber and Lyft. (See Figure 8.2 for some relevant data.) From these background conditions and the law of demand, it follows that taxi rides have gotten comparatively more expensive. And so, as the law of demand predicts, many people who previously bought taxi rides are now

500000 450000 400000 350000

Trips/Day

300000 250000

150000 100000 50000

Yellow Taxi

Jun-17

Apr-17

May-17

Mar-17

Jan-17

Feb-17

Dec-16

Oct-16

Nov-16

Sep-16

Jul-16

Aug-16

Jun-16

Apr-16

May-16

Mar-16

Jan-16

Feb-16

Dec-15

Oct-15

Nov-15

Sep-15

Jul-15

Aug-15

Jun-15

Apr-15

May-15

Mar-15

Jan-15

0 Feb-15

Copyright © 2018. Taylor & Francis Group. All rights reserved.

200000

Uber

Ridership data for Yellow Taxis and Uber in New York City 2015–2017, based on data from reports by the New York City Taxi & Limousine Commission

FIGURE 8.2

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

Explaining, Theorizing, and Values

281

doing so less often; they can instead use other, cheaper forms of transportation. That’s why fewer people take taxis. Hempel thought that some nomological explanations were valid deductive arguments, while others were strong inductive arguments. (See Chapter 4.) As in the preceding explanation scheme, the premises must include at least one statement of a scientific law—a general pattern or regularity. The premises also must have empirical content, so they can be tested. Many scientific explanations fit this nomological conception of explanation. Consider how scientists might explain the increase in the average global temperature of Earth’s atmosphere. One can begin by pointing out that atmospheric density changes in proportion to the permeability of the atmosphere to solar radiation and that the permeability of the atmosphere to radiation is directly correlated with average surface temperature. These are law-like generalizations that describe patterns and regularities in nature. Next, note that the atmospheric density on Earth has increased (because of greenhouse gases). This is a background condition, a fact about current circumstances. Together, these claims deductively imply the conclusion that the Earth’s average temperature has increased. This argument is deductively valid with all true premises, so we have a simple nomological explanation of global warming.

Box 8.1 Scientific Laws

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Laws play an important but controversial role in science. Historically, scientists have taken the purpose of science to be formulating laws, which can in turn be used to provide explanations and make predictions. Examples include Newton’s law of gravitation in physics, Arrhenius’s equation in chemistry, and the laws of supply and demand in economics. Laws often take the form of universal generalizations: rules for inferring what, in general, follows from some set of conditions. The law of supply, for example, is a way to infer the relative price of goods from the quantity of goods supplied. Newton’s law of gravitation is a way to infer the force between two bodies on the basis of their masses and the distance between them. However, some scientific statements of regularities do not seem to qualify as laws. Genuine laws of nature are frequently said to possess most or all of the following features (among others): non-trivial general true

exceptionless based on evidence explanatory

systematic precise predictive

Although many scientific laws satisfy many of these criteria, few if any satisfy them all. Philosophers thus debate what is required for something to count as a law and whether all scientific inquiry involves discovering laws.

Just as phenomena can be explained by laws, scientific laws themselves can be explained by appealing to other, more comprehensive laws. For example, consider Galileo’s law that bodies fall toward Earth at a constant acceleration. This law can be deductively derived from the Newtonian law of gravitation. The Newtonian force of gravity explains the

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

282

Explaining, Theorizing, and Values

constant acceleration of bodies falling toward Earth. Newtonian laws, in turn, can be explained by appealing to the principles of the more comprehensive general theory of relativity developed by Einstein. The Earth’s gravity is explained as a distortion of space caused by the Earth’s mass. Objects speed up as they fall toward Earth, just as a ball rolling from the edge to the center of a bowl speeds up. The idea of explaining scientific laws with reference to other, more general laws introduces a second conception of explanation. According to the pattern conception, explanations fit particular statements about phenomena into a more general framework of laws and principles. This has been called a unification conception of explanation, since the number of assumptions and beliefs required to explain phenomena decreases when an explanation is provided. Phenomena, and laws as well, are unified by uncovering the basic patterns that govern them. One advantage of the pattern conception over the nomological conception is that there’s no requirement of citing laws. Pattern explanations can cite regularities that may not qualify as laws. In place of the law requirement, there’s an emphasis on fitting the phenomenon to be explained into a wider pattern, to see it as one instance of a more general regularity of the world that has been identified. Earlier, we described the simple explanation of decreased taxi ridership as a nomological explanation, but it can also be construed as a pattern explanation. The phenomenon of decreased taxi ridership is explained as one instance of the general pattern whereby higher prices drive decreased demand and vice versa, a pattern that also applies to purchases as different from taxis as pizza, pomegranates, and tickets to the cinema. Many scientific explanations fit the pattern conception of explanation rather well. Consider evolutionary theory. This theory explains a great many phenomena involving the traits of organisms and the relationships among them with reference to a simple pattern that plays out in a multitude of ways. The pattern at the heart of evolutionary theory is that natural selection acts on variation among organisms to produce cumulative change in species. The theory of evolution is not a single, general law of nature; it recognizes many different influences on evolution besides natural selection and random variation, which proceed in various ways depending on various factors. Many evolutionary explanations are thus not productively viewed as nomological explanations. But they do fit the pattern conception rather well. The ideas behind the nomological and pattern conceptions of explanation—that explanations make phenomena less surprising by referencing laws or by showing how they fit into a wider pattern—are undoubtedly important. These ideas describe important features of many scientific explanations. But both also face significant objections. A significant problem for both is that they neglect a key feature of explanation: asymmetry. If one thing explains another, then this explanatory relation does not seem to hold in reverse. Consider the following example. Your mobile phone sends you a weather alert, and you explain this both with reference to the fact that a storm is approaching and the generalization that weather alerts are sent out when storms are approaching. But, it seems, you can’t explain the approaching storm by citing the weather alert you received, along with the generalization that weather alerts are sent out when storms are approaching. This mixes things up: the storm isn’t approaching because you received the weather alert. The weather alert gives you evidence of the storm, but it can’t explain the storm.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Explaining, Theorizing, and Values

283

Yet, the nomological and pattern conceptions of explanation don’t recognize this asymmetry. Consider the nomological account. Suppose that, in general, weather alerts are sent out when storms are approaching. Then, from this generalization and the premise that a storm is approaching, you could explain why you received a weather alert. This is a valid argument as required for nomological explanation. But the generalization about weather alerts being sent when storms are approaching and the premise that you received a weather alert can also be used to deductively infer that a storm is approaching. There is a valid deductive argument whether the weather alert or the approaching storm plays the part of background information. And yet, the storm is a good explanation for why you received a weather alert, but the weather alert is no explanation for why the storm is approaching. You can do a lot with your mobile phone, but you can’t usually bring about a storm. The problem is similar for the pattern conception of explanation. There is a general pattern relating weather alerts to approaching storms. What’s to say that this pattern can explain weather alerts but can’t explain storms? That difference isn’t accounted for by the pattern conception. A second problem for the nomological conception is that many good explanations don’t appeal to any laws. We have already suggested that some evolutionary explanations are successful without appealing to laws. Here’s another example. Why dinosaurs went extinct some 65 million years ago is explained by one of two hypotheses: either there was a massive bout of volcanism or an enormous asteroid hit the Earth. Either event would have had dire consequences on Earth’s climate and on dinosaurs’ ecosystems, and whichever occurred caused dinosaurs’ extinction because of those consequences. But neither of these explanations involves a general law of nature. We can’t say, in general, what to expect on the basis of a volcano or asteroid collision. This depends on numerous circumstances related to the nature of the catastrophic event, the organisms in question, and other factors. There is a related second problem for the pattern conception. The pattern conception focuses on explanations that fit a phenomenon into a wider pattern. But some explanations seem to be highly specific. Consider the explanation for how the human heart pumps blood. This explanation may not apply to the function of the hearts of other kinds of organisms. This is because hearts and other organs vary across species, and their differences are more significant the more distantly related organisms are. Something similar is true for the explanation for why dinosaurs disappeared. Both the volcanism explanation and the asteroid explanation are highly specific. They rely on particular conditions on Earth over 65 million years ago to help account for this extinction event. Nothing guarantees that these circumstances will ever recur; the explanation might account for only this one phenomenon, ever. So, they do not describe general patterns. Still, it seems like whichever of these is true is a good explanation. Here’s one final concern with the nomological and pattern conceptions. Discussion of laws has been decreasing in science. The decline is perhaps most evident in the psychological sciences. Psychologists are spending less and less time discovering and appealing to laws in their research. A bibliometric study of abstracts from the PsycLit database—indexing psychological research papers and journals—during the last century (1900–1999) looked at over 1.4 million abstracts and found 3,093 citations of law—an average of 22 citations per 10,000 entries (Teigen, 2002). As shown in Figure 8.3, the average number of such references significantly dwindled over time. Further, the laws psychologists are most

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

284

Explaining, Theorizing, and Values 300—

286

citation count

250— 200— 150—

119

100—

82

50— 0

FIGURE 8.3

138

40

28

32

23

13

10

1900– 1910– 1920– 1930– 1940– 1950– 1960– 1970– 1980– 1990– 1909 1919 1929 1939 1949 1959 1969 1979 1989 1999

Occurrence of the word law in PsychLit abstracts per 10,000 entries

(Teigen, 2002)

concerned about or familiar with were discovered long ago, with the most commonly cited laws discovered from 1834 to 1957. If psychology is any guide, the nomological conception of explanation is in trouble. And the pattern conception’s emphasis on broad patterns might be plagued with similar difficulties. Over this same period of time, psychologists have established very few general relationships between empirically measured variables—that is, very few general patterns.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Causal Explanation Many laws and patterns in phenomena are also called effects. For example, in psychology, there is the Garcia effect: an aversion to a particular taste or smell associated with a negative reaction. This is why you might have trouble ever again eating whatever food you had right before a bout of stomach flu. There are the primacy and recency effects, according to which people recall more easily items at the beginning (primacy) and items at the end (recency) of a list. And there is the self-explanation effect, described earlier in this chapter. This is where explaining something to yourself boosts your learning and helps you integrate new knowledge with existing knowledge. The convention of referring to certain patterns as effects isn’t limited to psychology. Consider the Larsen effect in acoustics. A public-address (PA) system has at least several major components: microphone, mixing console and soundperson, amplifier, and loudspeaker. If the soundperson registers that the broadcast is inaudible to the audience, she can adjust the volume level via the mixing console to increase the microphone’s input sensitivity, so it can pick up the speaker’s vocalizations more effectively, which the loudspeaker puts out via the amplifier. This system is a basic homeostatic mechanism involving feedback. But if volume levels increase beyond optimal values, the loudspeaker

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Explaining, Theorizing, and Values

285

can emit an unpleasant, high-pitched, runaway squeal. This feedback pattern is called the Larsen effect. The Larsen effect can be invoked to explain why there’s a squeal when a soundperson adjusts the volume on a PA system. But this effect is also something that stands in need of explanation. The explanation of the Larsen effect is that the microphone pickup locks on to, or couples with, the natural vibration produced by the loudspeaker, which causes them to begin resonating together. This pure tone resonance, or ‘ring’, causes the loudspeaker to further increase in efficiency, and the microphone picks it up again and relays it back to the loudspeaker. The coupling process is repeated at the speed of sound, and as the set-points for minimum and maximum volume are exceeded, the resonance seizes the system with abnormal levels of gain. This transition occurs very suddenly, temporarily arrests the broadcast, and is dangerous to the system (including the ears of people in the audience) if left unattended. Consider a second pattern that also involves homeostasis, that is, a stable equilibrium among interdependent elements. The scientific explanation of an organism’s regulation of blood sugar appeals to homeostatic systems that use pancreatic endocrine hormones to maintain blood sugar within a certain range (≈70–110 milligrams of glucose per 100 milliliters of blood). If blood sugar decreases below this range, pancreatic alpha cells secrete glucagon, which causes the liver to release stored glucose. If blood sugar increases above the range, pancreatic beta cells secrete insulin, which causes adipose tissue to absorb glucose from the blood. This explanation is also part of the explanation of diabetes, which is a disorder characterized by the pancreas producing insufficient amounts of insulin. These patterns, or effects, seem to be explained by describing their causes. The Larsen effect is caused by a coupling between the microphone pickup and the loudspeaker’s vibration, and this explains the volume feedback. According to the causal conception, explanations appeal to causes that bring about the phenomenon to be explained. The causal conception seems to account well for many explanations in science, including especially in fields that do not deal with laws. As we emphasized in Chapter 7, knowledge of causes enables prediction and manipulation of phenomena, via intervention on causal factors. It’s plausible that explaining those causal factors is also central to understanding. In one variety of causal explanation, the focus concerns how causal factors regularly combine into complex systems that produce the target phenomenon. The blood sugar regulation example exhibits this nicely. Pancreatic hormones, liver tissue, and blood sugar ordinarily work together in complex ways to maintain blood sugar levels within a narrow range. Some call this variety of causal explanation mechanistic. The search for causal mechanisms seems to play an especially important role in some parts of the social and life sciences. A causal conception of explanation can address the concerns raised earlier with the nomological and pattern conceptions. First, causal explanations are automatically asymmetric: causes explain their effects, but effects cannot explain their causes. This solves the symmetry problem of nomological and pattern accounts. The reason why appealing to the storm explains your mobile phone’s weather alert, but appealing to the weather alert doesn’t explain the storm, is that the storm’s approaching is a causal factor in producing your mobile phone’s alert; but the alert didn’t cause the storm. Second, some causal relationships occur in very general patterns or are law-like in nature, but others do not. If you heat ice, it will melt or evaporate. There are virtually no

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

286

Explaining, Theorizing, and Values

exceptions to this. If you heat chocolate, it will usually melt—but if you heat it too quickly, it gets thick and lumpy instead. This is a general pattern, but it has some exceptions. In contrast, perhaps the volcanic or asteroid episode that led to the dinosaurs’ extinction was an event that will happen only once in the Earth’s history. Perhaps background conditions had to be just right for such an event to cause a major extinction. But all of these, from the law-like to the highly particular, are still cause-effect relationships. This resolves the second concern with the nomological and pattern accounts: causal explanations can range from the highly general to the highly specific. The third concern raised with the other conceptions of explanation is that laws and general patterns seem to be of decreasing importance in science. In contrast, we suggested in Chapter 7 that causal reasoning is absolutely central to science. Yet the causal conception of explanation faces its own difficulties. First, as we also surveyed in Chapter 7, there is no consensus about the nature of causation. So, there are sometimes disagreements about whether a given explanation captures genuine causes. For example, are we sure that the economic law of demand is the kind of thing that can causally explain a decrease in taxi use? Some people respond to this concern by adopting a very inclusive view of causation. Others think that some explanations cite causes, and others cite other kinds of regularities, like mathematical regularities. A second difficulty with the causal conception of explanation stems from the observation that phenomena often have many causes. For this reason, causal explanations may come too easily. Causal explanations often cite only one or a few causal influences, when we know there are many causal influences on the phenomenon that’s explained. How is this enough to explain the phenomenon? Some respond to this challenge by saying that the more causal information you can give, the better explanation you have. Others seek another principle to decide what causal information belongs in an explanation. A third difficulty for the causal conception of explanation results from simply pointing out its difference from the nomological and pattern conceptions. If it is right that general patterns and scientific laws help us understand the world, at least sometimes, then the causal conception of explanation is lacking. This is because the causal conception doesn’t give us a way to recognize the explanatory value of general laws or patterns. So far, we have talked about these three conceptions of explanation as if one is right and the others wrong. But it’s possible that each conception captures certain elements of what helps us understand the world. One initial reason to think this might be so is that each of these conceptions of explanation aptly characterize some, but not all, of the examples of successful explanation we have discussed. Perhaps laws, patterns, and causes all can contribute to our understanding, and so any of these can be an ingredient of explanation.

EXERCISES 8.1 First, rate your knowledge and familiarity with bicycles on a scale from 1 (‘I know little or nothing about how bicycles work’) to 7 (‘I have a complete understanding of how bicycles work’). Figure 8.4 is a partial sketch of a bike; you will notice that it’s missing some parts. Try to finish the drawing, adding in your own sketch of the pedals, chain, and the missing pieces of the frame.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

Explaining, Theorizing, and Values

287

Frame

Pedals

Chain

FIGURE 8.4

Partial sketch of a bicycle

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Rate once again your knowledge and familiarity with bicycles on a scale from 1 (‘I know little or nothing about how bicycles work’) to 7 (‘I have a complete understanding of how bicycles work’). Did your rating go up, down, or stay the same? (Adapted from Lawson, 2006). 8.2

Describe the illusion of explanatory depth in your own words. Then, think through possible explanations for this illusion. Describe the possible explanation that you think is most promising, and say what might help you avoid the illusion of explanatory depth if your explanation is correct. Finally, describe how you might be able to test that explanation to see whether it’s correct.

8.3

From your background knowledge and the information provided in this section, do your best to answer each of the following questions. a. Why is the sky blue? b. Why is December cold in Sweden but warm in Australia? c. How do earthquakes happen? d. How does cancer kill an organism? e. Why do objects fall when dropped? What are the common features of the explanations you’ve given? What are some differences, and what do you think accounts for them?

8.4

For each of your explanations in 8.3, identify what conception(s) of explanation fits it best and say why. Then reflect on all of your answers together, and describe what you notice. For instance, if you answered in the same way about all or most of the explanations, why do you think that’s so? If you answered in different ways, what do you think accounted for the difference? Is there any general form—any common features—to your explanations?

8.5 After looking back at the box on scientific laws, consider the following argument: if criteria for lawfulness are necessary criteria, then something must satisfy them all to count as a genuine law of nature. Research in psychology, biology, and other disciplines do not satisfy all these criteria. So, there are no genuine laws in psychology, biology, and other disciplines. But if scientific explanation is nomological, it requires genuine laws. Thus, there are no explanations in psychology, biology, and perhaps other fields of science. We’re pretty confident this conclusion is false, but the argument is deductively valid. So, at least one of its premises must be false. Decide which premise you think is mistaken and develop an argument defending your view.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

288

Explaining, Theorizing, and Values

8.6

Choose one conception of explanation: nomological, pattern, or causal. Find a novel example of a scientific explanation that seems to conform to each conception. Describe the example, making clear how an explanation is given. Then describe why this example should be seen as conforming to the conception of explanation you chose.

8.7

Construct a chart or table listing the strengths and weaknesses of each of the three conceptions of explanation discussed in this section. Decide which conception(s) of explanation is the most promising, and support your answer with an argument.

8.9

In your own words, describe why explanation is important to science.

8.10 Thinking broadly about the topics you’ve encountered in this book, describe what you think is distinctive about scientific knowledge compared to other forms of knowledge and everyday information. Then, describe what features of science and scientific knowledge are similar to other forms of knowledge and everyday investigation.

8.2

THEORIZING AND THEORY CHANGE

After reading this section, you should be able to do the following: • • • •

Describe the role of theorizing in science Define scientific breakthrough and give an example Outline Kuhn’s view of the four stages of science Articulate two challenges for scientific knowledge from scientific breakthroughs and at least one response to those challenges

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Scientific Theories Consider the grounds we’ve covered in the chapters of this book so far. In Chapter 1, we considered what is distinctive about science. Chapter 2 focused on experiments and other ways of testing hypotheses with observation. Chapter 3 looked at modeling, another way of investigating hypotheses. Chapters 4–7 have all been about aspects of this same process of subjecting hypotheses to empirical tests: deductive, inductive, and abductive patterns of reasoning in scientific arguments; the role of statistics in representing data and testing hypotheses; and the significance of causal hypotheses. All of this fits in some way with the basic ingredients of recipes for science we laid out in Chapter 1: developing a hypothesis, formulating expectations on the basis of the hypothesis, and testing expectations against observations. At the same time, there is also remarkable variation in how science proceeds—recipes, not a single recipe—and we have tried to also give a sense for that in how each of these topics has been addressed. Still, recipes focused on hypotheses, expectations, and observations are not all there is to science. We have already seen in this chapter that a central aim of science is explaining our world. Scientists aren’t simply accumulating a list of confirmed hypotheses, the facts we know about our world and ourselves. The project is bolder: scientists are charged with helping us understand why and how things happen. And scientists are asked to furnish us with tools for predicting and changing the world around us. Scientists also create scientific

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

Explaining, Theorizing, and Values

289

Copyright © 2018. Taylor & Francis Group. All rights reserved.

theories, which are large-scale systems of ideas about natural phenomena, more general and more elaborate units of knowledge than individual hypotheses typically are, and with much more evidence to support them. Scientific theories thus provide bigger and more powerful insights into the world. Theories often go beyond what is readily observable. The Darwinian theory of evolution by natural selection is a grand theory about the origins of all the diverse life forms on Earth, and Einstein’s theory of relativity is a grand theory about the very nature of space and time. To be clear, empirical evidence has been central to testing and confirming both of these theories. But the content of these and other theories are usually taken to be larger than their readily observable implications. Evolutionary theory, for example, indicates what happened in the earliest years of life on Earth. Relativity theory tells us what would happen if we travelled at the speed of light, and it also gives us a reason for believing nothing but light will ever travel that fast. In common usage, that an idea is a ‘theory’ sometimes indicates that it hasn’t been tested out. Scientific theories are not like that. Quite the opposite, they are important human accomplishments, as both the Darwinian theory of evolution and Einstein’s theory of relativity illustrate. Yet, because scientific theories have implications that are not immediately observable, they are never taken to be true beyond a shadow of a doubt, no matter how much empirical data support them. Scientists have excellent justification for the theories of evolution, relativity, and, say, the atomic composition of matter. Even so, the possibility is held open that someday one or another of these theories, or another theory among our prized scientific achievements, will be replaced by a better theory. This possibility is intrinsic to the open and self-correcting nature of science. Just as scientific theories go beyond the readily observable, theories also come about not simply by extrapolating, or generalizing, from observations. Instead, there is usually a significant conceptual shift, some feat of imagination, that gives rise to a new way of thinking about observations. Darwin wondered whether the similar forms of life he observed across continents might not suggest they dispersed from some ancient, common ancestor. And he was inspired by an economist, Thomas Richard Malthus (1766–1834), who wrote about the pressures to survive created by population increases. Einstein was inspired by the puzzle of how to set clocks that are far apart to the exact same time and how observers’ experiences vary depending on whether they are in motion, to reconsider the very nature of space and time. In both cases, extensive observations were subsequently obtained to empirically support the theories. But the initial idea was a kind of spark of insight, a different way of thinking about what it was already known about the world.

Scientific Breakthroughs No scientific theory is set in stone, and theories are sometimes replaced by successors. The differences between a theory and its successor can be minor, or they can amount to radical shifts. The most significant scientific breakthroughs have been changes in worldview; they involve comprehensive revision to how background or auxiliary assumptions, data, and ideas are combined, and thus which scientific theory is supported. Consider again theories of our universe and the bodies within it. The worldview that arose with Aristotle (384–322 BCE) had great scope and logical coherence. The Aristotelian theory of falling bodies claims that heavy bodies fall faster than light ones,

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

290

Explaining, Theorizing, and Values

and its geocentric conception of the universe placed Earth at the center, which fit with most common observations of how the world was. But over time, observations were made that the Aristotelian worldview couldn’t easily accommodate. Eventually, it was replaced by a Copernican conception of the universe, followed by a Newtonian conception, with the Earth not a fixed center but a planet in motion around the Sun. Because of the dramatic change in worldview, astronomers from the 4th century BCE and the 17th century would have agreed about the positions of the stars in the sky, but they would have radically different interpretations of those observations. Similar observations provided clues to constructing explanatory theories, but the differences between those theories were vast. This is the Scientific Revolution of the 16th and 17th centuries, discussed also in Chapter 1. Additional radical shifts followed on the heels of the rejection of the Aristotelian worldview, and with these changes came radical revisions to accepted ideas about the position and movement of Earth, the shape of orbits, and the nature of universal forces. In general, each new theory accounted for some body of evidence better than its predecessor. Still, most of the changes were rather radical changes in perspective. The same is true of the later replacement of Newtonian mechanics with Einstein’s theory of relativity, when universal forces were replaced by non-Euclidean geometries of space-time. Scientific breakthroughs have periodically occurred in other fields of science as well. This is as you’d expect if scientists are truly open to revising or replacing any theory when doing so is warranted by the available evidence. And breakthroughs seem rewarding and significant; there’s a sense that, after a scientific breakthrough, we more clearly understand our world. An initial spark of insight leads to a conceptual shift that reinterprets existing data to support a new theory, and then more data are discovered that confirm this new theory. From another perspective though, the idea of scientific breakthroughs is also troubling. What happened to our scientific knowledge from before the breakthrough—were scientists just altogether wrong? How do we know that our current best theories won’t suffer the same fate and also be rejected for new and better theories? Can we trust our current scientific theories at all then? These are deep and troubling questions that strike right at the heart of science. But let’s postpone that discussion until later in this section, after we have a fuller picture of what scientific breakthroughs are like and how and why they occur.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Kuhn’s Scientific Revolutions The series of scientific breakthroughs in the 17th century suggests we might think about scientific breakthroughs in general in terms of revolution. Revolutions are pretty dramatic; think of political revolutions like the French Revolution at the end of the 18th century, the fall of the Soviet Union two hundred years later, or the more recent Arab Spring. A scientific revolution is a radical change of a reigning theory being overturned in favor of a new theory, often involving an alternative worldview. Scientific revolutions don’t just change which scientific theories are accepted; they also influence the fundamentals of science itself, such as how to interpret evidence, which scientific procedures are accepted, and often the social and institutional structure of science, such as who is accepted as a scientific authority. Thomas Kuhn (1922–1996), an American physicist, and historian and philosopher of science, wrote a famous book called The Structure of Scientific Revolutions, first published in 1962. In this book, Kuhn advanced an influential model of scientific change based on

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Explaining, Theorizing, and Values

291

the notion of revolution. He suggested that scientific revolutions have occurred and will continue to occur periodically as an important part of science. In his view, this would prevent science from proceeding in a straight line by accumulating an increasing body of knowledge and an expanding store of explanations. Kuhn thinks science instead proceeds in phases. We’ll first describe these phases; then we’ll work through how they apply to a specific scientific revolution. Kuhn called the earliest phase of science pre-paradigmatic. This is characterized by the existence of different schools of thought that debate very basic assumptions, including research methods and the nature and significance of data. Data collection is unsystematic, and it’s easy for theories to accommodate new observations because the theories are inchoate, or undeveloped. Such theories can easily be adapted in different ways to accommodate new observations. There are many puzzles and problems but not very many solutions. Kuhn’s second phase is the normalization of scientific research. One school of thought begins to solve puzzles and problems in ways that seem successful enough to draw adherents away from other approaches. Kuhn called this period normal science, because widespread agreement about basic assumptions and procedures allows scientific research to become stable. Scientific practices become organized. Laboratories or other workspaces may be set up, experimental techniques and methods become widely accepted, and agreed-upon measurement devices are improved. During normal science, scientific developments are driven by adherence to what Kuhn called a paradigm. Broadly conceived, a paradigm is just a way of practicing science. It supplies scientists with a stock of assumptions about the world, concepts, and symbols that they can use to more effectively communicate. It also involves methods for gathering and analyzing data, as well as habits of scientific research and reasoning more generally. Science students learn and come to tacitly accept the paradigm associated with a period of normal science based on textbooks. Containing little historical insight into the dynamics of scientific change, textbooks encapsulate the tenets of the paradigm, and provide students with shared examplars of good science. Kuhn thought that, during a period of normal science, each field of science is governed by a single paradigm. But scientists in the grip of some paradigm have often ended up with observations that are at odds with the paradigm or that lead to worrying puzzles called anomalies: deviations from expectations that resist explanation by the reigning theory. Usually, anomalies are just noted and set aside for future research. But anomalies can accumulate, and this creates a kind of increasing tension for the accepted scientific theory. Scientists begin to worry that the theory might not be right after all. The accumulation of anomalies sets science up for a crisis. A crisis occurs when more and more scientists lose confidence in the reigning theory in the face of mounting anomalies. For Kuhn, a paradigm is only rejected if a critical mass of anomalies have led to crisis and there’s also a rival paradigm to replace it. Another theory has been developed by some renegade scientists, and the problems with the existing paradigm mean that this new theory—together with its auxiliary assumptions, methods, and so on—can finally get attention. If this is so, a crisis might be followed by a scientific revolution. In this period of science, all the elements of the accepted paradigm are up for negotiation. Data, interpretations of data, auxiliary assumptions, methods, and technical apparatus—any and all might be rejected, replaced, or reinterpreted from the perspective of the new paradigm. This four-stage view of scientific change is summarized in Table 8.1.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

292

Explaining, Theorizing, and Values

TABLE 8.1

Thomas Kuhn’s four-stage view of scientific change

Stage

Features

1. Pre-paradigmatic science

Different schools of thought debate basic assumptions

2. Normal science

A paradigm is accepted, and work research is devoted to puzzle-solving

3. Crisis

Scientists lose confidence in the reigning theory in the face of anomalies

4. Revolution

One paradigm is rejected in favor of a new one

Copyright © 2018. Taylor & Francis Group. All rights reserved.

The Chemical Revolution The Scientific Revolution began when geocentrism was replaced with heliocentrism in astronomy, that is, when the Earth was no longer seen as the central heavenly body but instead taken to revolve around the Sun. According to Kuhn, this episode in the history of science perfectly fits his description of scientific revolution. Another abrupt revolutionary change in science that Kuhn recognized as a scientific revolution involved sweeping changes in the field of chemistry in the 18th century. Two of the protagonists of the chemical revolution were the French chemists AntoineLaurent Lavoisier (1743–1794) and Marie-Anne Paulze Lavoisier (1758–1836). When they began their work, scientific understanding of matter and its transformations was still grounded in the Aristotelian worldview. Aristotle had believed that all earthly materials are composed of the elements air, earth, fire, and water. This theory of the four elements had been slowly modified by the medieval alchemists, who aimed to turn base metals into gold and to produce an elixir of immortality. By the 18th century, alchemists believed all things were compounds of sulfur, mercury, and salt. In the early 18th century, one pressing scientific question was what happens when something burns? Alchemists thought that when materials changed into slag, rust, or ash by heating, they lost sulfur. The German medical physician and chemist Georg Ernst Stahl (1659–1734) modified this idea, developing the theory that every combustible material contains a universal fire-like substance, which he named phlogiston (from Greek, meaning flammable). Combustible materials, like wood, tend to lose weight when burned, and Stahl explained this change in terms of the release of phlogiston from the combustible material to the air. When the air becomes saturated with phlogiston or when a combustible material releases all its phlogiston, the burning stops. Stahl believed that the residual substance left behind after a metal burns is the true substance of the original metal, which lost its phlogiston during combustion. This residue, which was called metal calx (what we now know to be an oxide), has the form of a fine powder. Both metal calx and the gases produced during combustion could be captured, measured, and experimentally manipulated.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

Explaining, Theorizing, and Values

293

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Unlike gases and calx though, phlogiston was an utter mystery. Nobody had isolated it, and nobody had found a way to experimentally manipulate it. In fact, phlogiston seemed to have properties that were inconsistent with Stahl’s theory. Stahl believed phlogiston had a positive weight. When you burn a piece of wood, the remaining ash loses phlogiston, and it weighs less than the original log. But in other cases, for example, when magnesium or phosphorus burn, the residue left behind weighs more than the original material. If phlogiston was released during the burning process, why was there a gain in weight in these cases? This is an anomaly. Intrigued by this, the Lavoisiers experimented with a variety of metals and gases to investigate why and how things burn. They observed that lead calx releases air when it is heated. This suggested that combustion and air were, somehow, linked. Explaining the link was a difficult task, however, because at that time, little was known about the composition and chemistry of air. Meeting the English theologian and polymath Joseph Priestley (1733–1804) helped. Priestley had discovered a gas he called dephlogisticated air, which was released by heated mercury calx. This gas was thought to greatly facilitate combustion because, being free from phlogiston, it could absorb a greater amount of the phlogiston released by burning materials. Candles burning in a container with dephlogisticated air would burn for much longer, for example. This gas, Priestley observed, facilitated respiration too: mice in containers with dephlogisticated air lived longer than mice placed in containers without this gas. The Lavoisiers tried to replicate Priestley’s experiments, and based on their own results and observations, they elaborated a new theory of combustion. The central idea was that combustion was the reaction of a metal or other material with the ‘eminently respirable’ part of air. Believing (incorrectly) that this kind of air was necessary to form all sourtasting substances, or acids, Lavoisier called it oxygène (from the two Greek words for acid generator). According to this new theory, combustion did not involve the removal of phlogiston from the burning material, but rather, the addition of oxygen. This emerging rival paradigm set the basis for the revolution from which modern chemistry emerged. In the 1780s, the Lavoisiers and other scientists adopted the idea of a chemical element and of chemical compositions of simpler elements. This new system of chemistry was set out by Antoine-Laurent Lavoisier in a textbook in 1789. As Kuhn would expect, this book didn’t just describe the theory but also the other elements of a paradigm. The book explained the effects of heat on chemical reactions, the nature

FIGURE 8.5

Scientists of the chemical revolution

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

294

Explaining, Theorizing, and Values

of gases, and how acids and bases react to form salts. It also described the technological instruments used to perform chemical experiments. And it contains a ‘Table of Simple Substances’—the first modern listing of chemical elements. After the publication of this textbook, most young chemists adopted Lavoisier’s theory and abandoned phlogiston. ‘All young chemists’—Lavoisier wrote in 1791—‘adopt the theory, and from that I conclude that the revolution in chemistry has come to pass’ (Donovan, 1993, p. 185). From a Kuhnian perspective, the next phase of normal science had begun.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Non-Revolutionary Scientific Change Kuhn’s notion of scientific revolution seems to accurately characterize some episodes in the history of science—the times of especially radical transformation in accepted scientific knowledge. But this is a particularly extreme form of scientific change. Other episodes of scientific change don’t seem to be so dramatic, and there’s also a question about whether ordinary scientific activity fits Kuhn’s characterization of normal science. It appears that small, incremental changes in science are far more common and less abrupt than Kuhn’s account suggests. Consider, for example, the Darwinian revolution in the 19th century. Darwin’s theory of evolution has had a dramatic and lasting impact on our understanding of the nature of life forms, the relationships among different species, and how species have changed over time. The Darwinian revolution was arguably a scientific revolution. But Darwin’s theory was not the first evolutionary theory; nor has evolutionary theory remained exactly the same as what Darwin first described. Changes in the field of biology, both before and after Darwin’s revolutionary breakthrough, have been more gradual than Kuhn’s ideas would suggest. The general idea of evolution is that whole species—not just individuals—can change over time, and this idea is many centuries old. The nature of biological change as a scientific research program can be traced to the work of French, English, and Scottish naturalists over a half century before the publication of Darwin’s Origin of Species in 1859. Even Darwin’s specific ideas about evolution were significantly influenced by other scientific work; earlier, we mentioned the influence of the political economist Thomas Richard Malthus. And another scientist working at the same time as Darwin, Alfred Russel Wallace, was independently developing a theory of evolution by natural selection strikingly similar to Darwin’s. So, although Darwin’s ideas were a tremendous breakthrough, they did build upon existing scientific work, and they were inspired by and related to concurrent scientific work by others. Furthermore, the science of biology since the Darwinian revolution has not simply consisted in the application of Darwin’s ideas, as Kuhn would have us expect for a period of normal science. Rather, our understanding of evolution has been in continual development. The so-called Modern Synthesis in the early 20th century integrated the existing knowledge of genetics and Darwinian evolution, which had previously been seen as competing theories. Other elements of evolutionary theory have been revised since, like the recognition of non-genetic influences on traits and how significantly organisms shape their environment, thereby affecting how natural selection acts on themselves and nearby organisms of other species. Another point in support of non-revolutionary scientific change is that theory change doesn’t always involve the rejection of existing theories. Sometimes, it comes from the

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

Explaining, Theorizing, and Values

295

joining of theories, as in the Modern Synthesis, and other times, it can come from new methods. American biologist James Watson and English physicist Francis Crick, for example, reached their groundbreaking conclusion that the DNA molecule exists in the form of a double helix by applying a new modeling approach to data that had been gathered by Rosalind Franklin. Using cardboard cutouts to represent the chemical components of DNA, Watson and Crick tried to make different arrangements, as though they were solving a jigsaw puzzle. Through this concrete model-building, the double-helical structure of DNA was identified. This had enormous consequences for subsequent biological research. Mathematics and even philosophy can drive scientific theory change too. The development of a new kind of geometry, non-Euclidean Riemannian geometry, paved the way for Einstein’s theory of relativity. Einstein’s theory adopted this geometry as a description of physical space-time. One basic difference between Euclidean and non-Euclidean geometry concerns the nature of parallel lines. In Euclidean geometry, there’s only one line through a given point that is parallel to another line. In some non-Euclidean geometries, there are infinitely many lines through a point that are parallel to another line, and in others, there are no parallel lines. This development in mathematics made it possible for Einstein to wonder whether the geometry of our own universe might actually be non-Euclidean.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Scientific Progress Earlier, we raised worries about how scientific breakthroughs may undermine our confidence in scientific theories. If some well-confirmed theories were eventually rejected, who’s to say our current theories won’t also be rejected? And do such scientific breakthroughs make it so that science isn’t really making progress at all? Let’s consider these questions in a bit more depth and isolate a few important considerations. But we’ve entered deeper philosophical waters now, and this discussion won’t be decisive. There are lots of interesting questions here about the nature and significance of science, even if science is unquestionably our best way to gain knowledge about our world. When scientific theories change, do we have reason to think that the new theory is an improvement on the last one and that science is progressing toward truth? This question is complicated by two features of theories and theory-change. First, theories often appeal to phenomena that cannot be directly observed. Examples we have encountered in this book include the Higgs boson, the first moments of the universe’s existence after the big bang, and the original common ancestor of all life on earth. How can we ever be sure we are right about these and other phenomena like them? Second, at least some instances of theory-change have been radical: scientists rejected phlogiston, decided they were wrong about the placement of Earth in the universe, and much more recently decided Pluto wasn’t a planet after all. How can we ever be sure that our scientific findings are on a path to truth, when the next radical revision could be right around the corner? There’s at least one influential argument suggesting that, despite all this, we have reason to believe that our best scientific theories are true. This argument—sometimes called the no miracles argument—is an abductive inference, or inference to the best explanation, from the success of science. It begins with the observation that our best scientific theories are extraordinarily successful; they enable scientists to make empirical predictions, to explain phenomena, to design and build powerful technologies. What could explain this

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

296

Explaining, Theorizing, and Values

success? One possible explanation is that our best scientific theories are approximately true. And if these theories were not approximately true, then the fact that they are so successful would be astonishing. So, it seems, the best explanation for the success of science is that our best theories are true—or at least on the path to true and getting closer. Yet some believe that this conclusion is overly optimistic. Here’s an inductive argument for the opposite conclusion. If we examine the history of scientific theories in any given field, we find a regular turnover of older theories rejected in favor of newer ones. So, most past theories are plainly false. Therefore, by generalizing from these cases, most scientific theories are false. It seems this would include our current theories too. This suggests our current theories stand a good chance of eventually being replaced and regarded as false. The upshot of this argument—sometimes called the pessimistic meta-induction—is that we do not have a strong reason to think our current best scientific theories are true. This argument raises questions about how certain we can be about our current scientific theories. But we want to emphasize that science is the single most successful project for generating knowledge that humans have ever embarked on. Science as a set of methods for investigating our world has persisted for centuries and is unlikely to be surpassed, even if individual scientific theories are sometimes abandoned.

EXERCISES 8.11 Write a list of the primary features of scientific theories based on the discussion from early in this section. How do theories differ from hypotheses and laws? What features do they all share in common?

Copyright © 2018. Taylor & Francis Group. All rights reserved.

8.12 What do scientific theories add to science, beyond the processes of hypothesistesting we have mostly focused on in this book? How does theorizing relate to hypothesis-testing? 8.13 Look back at the Higgs boson discovery discussed in Chapter 6. This discovery was additional confirmation of the so-called Standard Model in physics. Investigate this theory, then answer the following questions about it as best you can. a. What is the theory a theory of—that is, what phenomena is it supposed to be about? b. What are the central concepts featuring in the theory? c. Are some claims made about things that we can’t directly observe? What kinds of things? d. What do scientists explain, predict, and describe with the theory? e. What are some of the considerations that sparked the development of the theory? f. Has the theory undergone any changes over time? Which one(s) and why? 8.14 Describe the features of each of Kuhn’s expected stages in your own words: preparadigm science, normal science, crisis, and scientific revolution. Illustrate each stage by describing how it applies to the chemical revolution. 8.15 Consider again the Copernican revolution, chemical revolution, and the Darwinian revolution. Evaluate the merits of Kuhn’s view of scientific change. What do you think are strengths of his view? Do you think there are any weaknesses or ways it is limited? Support your points by referencing these episodes of scientific change. (You’re also welcome to appeal to other scientific breakthroughs discussed earlier in this book.)

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

Explaining, Theorizing, and Values

297

8.16 How does the existence of scientific breakthroughs, or revolutions, challenge the ideas of scientific truth and scientific progress? Motivate the concern as well as you can. Then, evaluate the merits of the concern, thinking back to all you’ve read in this book. 8.17 Think of the case of Lavoisier and the chemical revolution and the case of Darwin’s evolutionary theory. In each of these episodes, in what sense did the breakthrough represent progress? In what ways did chemistry and biology improve? More generally, what standards do you think we should use to identify progress and advances in science?

8.3

SCIENCE, SOCIETY, AND VALUES

After reading this section, you should be able to do the following: • •

• • •

Describe three examples of how science has been influenced by its social and historical context Articulate how exclusion and marginalization based on race and ethnicity, nationality, gender, sexuality, and other factors have negatively influenced both society and science Define the value-free ideal for science and give an example of when values have influenced science in a problematic way List five ways in which values influence science in legitimate ways and give an example of each Characterize the main contemporary challenges to science’s trust and objectivity

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Science in a Social Context Because scientific reasoning is a fundamentally human endeavor, it always occurs in sociohistorical contexts. Scientists make observations, elaborate theories, make discoveries, and interact with one another all within specific interpersonal, institutional, and cultural circumstances. Science is embedded in institutional structures like universities, laboratories, hospitals, museums, journals, and publishing companies. As we stressed in Chapter 1, science is also a social practice, involving different people variously collaborating and competing. The social and historical context of scientific activity significantly influences the nature of science. Even as science aspires to produce knowledge that is not limited by a specific perspective, scientific theories are also creatures of the times, places, and people who created them. Recall how Darwin’s ideas about evolution were influenced by the economist Malthus, for instance. Some have also suggested that how Darwinian evolutionary theory dealt with sexual reproduction and the differences between male and female animals was strongly influenced by Victorian moral sentiment (Knight, 2002). Darwin took it for granted that, throughout the animal kingdom, male animals are promiscuous and aggressive and female animals are ‘coy’ and selective. This looks suspiciously like human gender norms—in Darwin’s Victorian England and, to some extent, in many cultures today. While Darwinian evolutionary theory was certainly a tremendous step forward for biology, it

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

298

Explaining, Theorizing, and Values

was also influenced by the time and place of its creation and perhaps by features of the individuals who created it. So, science seems to be shaped by its social and historical context. Science is also regularly used to promote particular social aims. The difficult truth is that, throughout history, science has regularly been used to promote objectionable social aims and, at times, has even been pursued in ways that incorporate morally repugnant social views. Science has been used to expand power over others, to invent nuclear and chemical weapons for the purpose of mass casualties, and to amass wealth for the few, as with research for the fossil fuel industry. Science has also been used to promote misinformation, as when the Ethyl Corporation paid Robert Kehoe to vouch for the safety of lead in gasoline (recall from Chapter 1) or when tobacco corporations paid scientists to present cancer research in a way calculated to mislead the public. Science has also directly abused people from marginalized groups, as when the Nazis ran deeply cruel experiments on the prisoners of concentration camps and when the US Public Health Service ran the Tuskegee Syphilis Experiment. In this clinical study, researchers withheld treatment from 399 impoverished, rural African-American men who had syphilis. They never informed the participants that they had syphilis or that there was a cure for the disease. In this section, we take a hard look at the relationship between science and society. We will consider how the participants in science and the social context of science influence the development of science. We will investigate the roles moral and political values should and should not play in science. And we will also raise some of the most pressing challenges to science and scientific authority in the contemporary world.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Participation in Science Let’s first explore the idea that the traits of scientists might influence the nature of the scientific endeavor itself. Here’s one way in which this seems to be so. A negative social influence on science is the exclusion or marginalization of individuals from the scientific community because of their gender, race and ethnicity, sexuality, or social and cultural background. The English polymath Alan Turing (1912–1954) did groundbreaking research in computer science, formal logic, mathematics, cryptography, and morphology. During World War II, he helped crack the code used by the Nazis to protect their military communication, an achievement that many historians believe was the single greatest contribution to the Allied victory. Turing was also a visionary of artificial intelligence. You may have heard of the Turing machine and Turing test, which he invented; he anticipated that human intelligence would one day be matched by machines. Turing was also gay, and at the time, this was illegal in Britain. Despite his groundbreaking scientific contributions, Turing was arrested and chemically castrated by the British government. Humiliated and resentful, he killed himself at the age of 41. Being outed as gay in the mid-20th century UK was awful; matters were also dark for women in science for most of history. British-American astronomer Cecilia PayneGaposchkin’s dissertation Stellar Atmospheres in 1925 became a cornerstone of modern astrophysics, and for this, she was rewarded with low-paying adjunct teaching work for the next 20 years. Rosalind Franklin (1920–1958) was an English chemist and x-ray crystallographer, whom we mentioned earlier for her important contributions to the understanding of DNA’s structure. Among other contributions, Franklin was responsible for an

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

Explaining, Theorizing, and Values

299

Copyright © 2018. Taylor & Francis Group. All rights reserved.

FIGURE 8.6 (a) Rosalind Franklin; (b) Franklin’s x-ray diffraction image that famously inspired Watson and Crick’s double-helix model of DNA

x-ray diffraction image that was shared with Watson and Crick without her knowledge (pictured in Figure 8.6b). After seeing that image, Watson and Crick developed their physical model of DNA. They went down in history as having discovered DNA’s double helix structure, eventually winning the Nobel Prize for this work. In contrast, Franklin’s enormous contributions to the discovery were recognized only after her death. A similar story is that of British neuroscientist Kathleen Montagu (est. 1877–1966), who published her discovery of the neurochemical dopamine in the human brain in 1957. Her work was largely overshadowed by a very similar discovery three months later by Swedish neuropharmacologist and Nobel Prize winner Arvid Carlsson and colleagues. This is a common enough phenomenon that it has been named. The Matilda effect is the bias against recognizing women scientists’ achievements. These women’s work is often uncredited or else attributed to their male colleagues instead. Societal prejudice has made it more difficult for not only women but also racial and ethnic minorities, people from developing nations, and other marginalized groups, to be supported in their scientific work and even to become scientists in the first place. When only certain kinds of people participate in science, the value of science is lower. For one thing, science squanders or loses out entirely on the contributions of people who couldn’t become scientists or who were marginalized in their participation in science. A second way in which science suffers is that there are fewer role models for aspiring scientists. When scientists like Turing or Franklin are dishonored or not acknowledged for their research breakthroughs, they are not available for younger people to look up to. When groups of people are systematically underrepresented and marginalized in science, young people may not see themselves as potential participants in science. Third, who participates in science and who is excluded can even affect the nature of scientific knowledge. The variable features of scientists—nationality, gender, socioeconomic background, race, religious belief, political affiliation—all help determine what questions scientists are interested in investigating, what bold conjectures they come up with, and perhaps also which experimental or modeling approaches they use. When scientists are diverse, all of

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

300

Explaining, Theorizing, and Values

the differences among them contribute to the range of questions, richness of ideas, and ultimately the success of science. If instead only certain kinds of people dominate science, like middle-class and wealthy white men from developed countries, the kinds of questions posed and ideas generated may reflect the limitations of the scientists. As an illustration of how diversity contributes to scientific success, consider Temple Grandin, an American researcher in animal behavior. Grandin explicitly brought her experiences as an autistic woman to bear in ways that led to dramatic improvements in the efficiency and ethical treatment of animals in slaughterhouses. Another example is Barbara McClintock (1902–1992), who significantly advanced our understanding of genetic inheritance by discovering ‘jumping genes’, or transposons. These are parts of chromosomes that are moved from one of the chromosomes in a pair to the other in an early step of sexual reproduction. McClintock’s great insight arose from a simple decision about how to study genes, and it has been suggested that this decision was motivated in part by her outsider status in science and her identity as a woman (Keller, 1983). McClintock departed from the well-established practice of focusing on mosquito genes by instead studying the genes of corn, or maize. Mosquitos are targeted in such studies because they are genetically simple, with only four chromosome pairs. Maize, in contrast, has ten chromosome pairs. This added complexity makes them more difficult to study, and McClintock was criticized for her decision. But this also made it possible to observe jumping genes in action. The issue is more than just who is recognized for what discovery, how breakthroughs are received, and who gets to be a scientist. Different people bring different styles of reasoning to the table, and scientific progress often demands creativity and seeing things anew. For these reasons, the inclusion of diverse people—with different personalities and backgrounds—in science doesn’t just benefit those individuals and society; it also makes science itself more successful.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Values and the Value-Free Ideal In Chapter 1, we emphasized how science has been developed to control for and overcome human limitations in reasoning. One feature of this is that science is supposed to provide a way to subject our pre-existing ideas to rigorous testing. Wanting something to be true isn’t good grounds for believing it is true, and science provides methods of hypothesis-testing, reasoning, and theorizing to evaluate the grounds for our beliefs. This suggests what has been called the value-free ideal for science, which refers to the idea that science should be free from the influence of our values—such as moral, social, or political beliefs. Scientific theories and hypotheses should be accepted only when evidence and reasoning favor them, not just because we want them to be true. This is an ideal because it’s not something we think actually always happens. Science has a spotty track record in this regard: it’s too often been used to further racist or sexist views, in support of one nation’s dominion over others, or to contribute to corporate profits. But the value-free ideal suggests that this shouldn’t happen, that any science influenced by moral or political views is bad science. Ideally, science will be governed by evidence and reasoning and not by the values of scientists, their funding sources, or societal trends. Because the desirability of scientific theories—whether we want them to be true— makes no difference as to whether they are true, the desirability of a theory also shouldn’t

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Explaining, Theorizing, and Values

301

make a difference as to whether the theory is accepted as true. Questions about the reality of climate change, about evolutionary changes in species, or whether GMO foods are safe, and how male brains and female brains are different, and so forth, may evoke emotional reactions from people, but those emotions aren’t part of answering these questions. These questions can only be answered by conducting experiments or observational studies, constructing models, evaluating evidence, and applying statistical reasoning—by applying the recipes for science. Still, there are some challenges to the value-free ideal for science. First, science’s spotty track record in achieving freedom from the influence of political and economic values might inspire skepticism of the value-free ideal. Careful historical studies of episodes in science reveal the influence of values. We mentioned how Darwin’s evolutionary theory encodes Victorian morality and how science has sometimes directly abused marginalized people. In Chapter 5, we mentioned Galton’s work on eugenics, which was just the tip of the iceberg of racist and sexist uses of science aimed to affirm white male superiority over others. If the ideal science is value-free, it’s unclear that science has come close to that ideal. Second, even if we just think about what we want science to be like—the ideal—it’s unclear that values should be entirely absent. Is discovering a vaccine for the Zika virus, which is easily transmitted to humans by mosquito bites and leads to serious birth defects when pregnant women are infected, more important than discovering new facts about quasars, pulsars, supernovas, or other astral phenomena? If you think so, this reflects a value you hold. If you think Zika research should be prioritized over astronomical research, then you think this value should influence science. You probably agree that the US Public Health Service should not have withheld syphilis treatment from 399 impoverished African-American men without their knowledge. This too is wishing for people’s shared values to influence science. The value-free ideal suggests that science is simply a source of objective facts about the world, immune to influence by human values. On the other extreme, some believe science only serves pre-existing values. The right view of how values should influence science is somewhere between these extremes. Scientists are human beings with different moral, political, and religious values, and we suggested just a moment ago that the social context for science and who participates in science can both influence scientific findings. And yet, the recipes for science are designed to limit the kinds of influence social and individual values have on science. Science is not, and should not be, value-free. Nonetheless, there are ways that our values should influence science, and there are ways in which science should limit or eliminate the influence of values.

How Values Shape Science Scientists, in their capacity as scientists, cannot avoid making value judgments. Values guide scientists’ judgments about what types of research to pursue, as well as which studies to perform and which to put on the back burner. Funding agencies also regularly employ values in determining which research to support. Guidelines for experiments on animals and humans are grounded in ethical values, and these kinds of experiments are carefully regulated. And values also influence how scientific results are communicated and employed.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

302

Explaining, Theorizing, and Values

None of these roles for values seems to interfere with science’s objectivity. We might think of these, then, as legitimate influences of values on science. Some other ways in which values sometimes have influenced science seem to be illegitimate. This includes, for example, endorsing a scientific theory not because evidence and reasoning support it, but just because you want it to be true. It also includes doctoring experimental results so the data sets support a hypothesis that you want to believe is true. Reflecting on these legitimate and illegitimate influences of values suggests a dividing line. When scientists’ values lead them to violate the recipes for science—acceptable approaches to data collection and modeling, to hypothesis-testing and abductive reasoning, to statistical analysis, and to causal reasoning, and so on—this is illegitimate. When scientists use values to supplement or guide the use of the recipes for science, this can be legitimate. In his book A Tapestry of Values, philosopher Kevin Elliott (2017) divides the legitimate roles values can play in science into five categories, as helping to answer five different questions. These questions are summarized in Table 8.2. Answers to these questions are needed in order to know which scientific methods to employ, on which phenomena, and to which end. These roles for values thus align with our suggestion that legitimate uses of values supplement or guide the methods of science but do not violate those methods. To begin, scientists’ individual values and societies’ collectively held values help answer the first question about what to study. Individually, a researcher’s interests and values surely shape what field of science she pursues, which lab she works in, and what problems she tackles. Collectively, we choose what kinds of scientific research to support when funding agencies, including tax-supported governmental agencies, designate the areas of research they fund and which specific research projects in those areas to fund. Beyond what to study, values also inform decisions about how phenomena should be studied. Scientists can bring different questions, methods, and auxiliary assumptions to bear on any given topic, and these choices in how research is pursued reflect researchers’ and society’s values. One instance of this influence is how the initial hypothesis and assumptions about the causal background both guide experimental design, including the

Five questions that arise when doing science that our values help us answer (Elliott, 2017)

Copyright © 2018. Taylor & Francis Group. All rights reserved.

TABLE 8.2

Question

Example

1. What should we study?

What kind of research is prioritized for funding

2. How should we study it?

How the initial hypothesis and assumptions about the causal background both guide experimental design

3. What are we trying to accomplish?

Getting the most accurate information versus accurateenough information quickly enough to guide policy

4. What if we are uncertain?

How much evidence to require before accepting or rejecting a given hypothesis

5. How should we talk about the results?

The level of certainty conveyed to the public about some scientific finding

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Explaining, Theorizing, and Values

303

nature of the intervention and which extraneous variables to control. As with the initial choice of what to study, funding agencies can also influence how phenomena are studied. For example, research into depression can focus on the efficacy of cognitive therapy; the role of sleep, diet, and exercise; or the efficacy of pharmaceutical intervention. The choice to strongly prioritize pharmaceutical intervention to the exclusion of other focuses reflects the outsized influence drug companies have had on medical science (Elliott, 2017). The third question that a focus on values helps to answer is what, exactly, scientists are trying to accomplish in studying some phenomenon. This is an even more finegrained decision than just what to study and how to study it. In climate research, for example, scientists might prioritize getting information about climate trends available quickly so that it can guide policy, or they might prioritize getting as accurate information as possible, no matter how long it takes. This decision about the aim of research is influenced by values, including views about the social role the scientific research is expected to have. Fourth, values influence how scientists proceed in the face of uncertainty. Scientific results are never free from uncertainty. There’s the basic problem of measurement error. We’ve also seen that, if observations don’t match expectations, it could be the fault of the hypothesis, or it could be the fault of some auxiliary hypothesis. In an experiment, no matter how perfectly controlled, there’s always the chance that an unexpected confounding variable has interfered with the finding. In statistical hypothesis-testing, scientists choose whether to reject the null hypothesis, and either choice could be wrong. These are just a few examples of the unavoidable uncertainty in science and the need to choose how to proceed in the face of uncertainty. These kinds of uncertainty are all forms of underdetermination. Recall from Chapter 2 that underdetermination is when the evidence available to scientists is insufficient to determine which of multiple theories or hypotheses is true. Some believe that there is even permanent underdetermination in science: that there will never be enough evidence to conclusively decide in favor of one hypothesis or theory and against all possible alternatives. When scientists face underdetermination, they must choose what to believe or whether to suspend judgment. Because of this unavoidable uncertainty, scientists must decide how much evidence to require before endorsing a theory or hypothesis (or before rejecting a theory or hypothesis). Safety is very important to us, whether for drinking water or medications, so toxicology tests must have a high bar for success. There is a lower bar for deciding whether a new drug is more effective than an already available drug. Scientists also must decide how to represent scientific uncertainty to the public. In 1988, climate scientist James Hansen declared that climate change, global warming, was occurring. He described that as a decision based on weighing ‘the costs of being wrong to the costs of not talking’ (Weart, 2014, referenced in Elliott, 2017). There was already enough evidence for Hansen to be relatively confident in his choice to speak up. Decades later, of course, there is now incontrovertible evidence of climate change. This introduces a fifth category of values’ legitimate influence on science regarding how scientists—and journalists and others who communicate scientific findings to the broader public—should talk about those findings. As Elliott stresses, this isn’t just a decision to be accurate. Scientific findings also can be discussed in their relationship to previous findings, their potential social effects, or—picking up on the previous point—their level of certainty.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

304

Explaining, Theorizing, and Values

This framing influences whether and how the public will engage with research, and this is a choice not dictated by scientific methods but by scientists’ and society’s priorities. So, what scientists should study, how they should study it, what they aim to accomplish, how much evidential support should be required, and how scientists should communicate their results all depend on moral considerations—on values. These are legitimate influences of values on science. Recognizing these roles for values in science is crucial. This enables us, as a society, to critically assess what values are employed at each of these junctures. The influence of values on science can be problematic or even nefarious if the wrong values are employed at any one of these stages. Figuring out the right and wrong values is tricky and, it is not a matter for scientists alone to decide. Instead, this is an issue that needs to be engaged with broadly in our society. Examples of problematic values influencing science are, unfortunately, very easy to come by. Here’s one. In 2017, US President Donald Trump proposed that NASA resources should be dedicated to exploring the solar system instead of to climate change research. This research priority—a decision about what to study—is a reflection of values endorsed by a small but vocal contingent of the Republican Party. Choosing not to fund climate change research amounts to deciding that knowledge about the rate and impact of climate change is relatively unimportant. But because climate change is already having disastrous effects on populations, the environment, and economies across the world, and because the long-term costs of ignoring it will be disastrous, this decision was arguably the wrong decision on moral grounds. Pulling funding from NASA’s Earth science division in order to avoid investigating climate change and its effects upheld the wrong values. (Notice this doesn’t mean that space exploration is unimportant! It too should be funded.) Other examples of problematic values influencing science through the proper channels include the outsized influence the pharmaceutical industry has on medical research, the continuing exploitation of at-risk communities due to the approaches used to study them, and powerful corporations controlling what messages the public gets about climate change and the risks of fossil fuel extraction. We’ll engage with some of these problems in the next section. We have suggested that science doesn’t have to be free from values to be trustworthy and objective. What matters is that values influence science in the right ways and that science effectively resists the problematic influence of values. Values, even good values, shouldn’t play the wrong roles in science; we should never decide a theory is true simply because we wish it were true. Further, the wrong values shouldn’t influence science, even through the proper channels. To better understand how science earns its trust and objectivity, it’s important to acknowledge the many roles of sociopolitical and moral values in scientific reasoning and to critically examine the values that influence our science. By doing these things, we can clarify what values should influence science and in what ways.

Trust and Objectivity: Challenges Facing Science Science achieves objectivity and is worthy of trust based on its characteristics outlined in Chapter 1—especially its capacity for self-correction—and the ways in which these characteristics play out in the methods described in the rest of the book. The capacity for self-correction requires scientists’ openness to criticism and dissent, their sincere and transparent communication of their results and uncertainties, and scientific communities’

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Explaining, Theorizing, and Values

305

welcoming of diverse perspectives. Objectivity in science can occur when scientists’ judgments are critically and openly assessed in light of other data and investigations, as well as competing interpretations and alternative possibilities. However, this intersubjective process, and thus science’s capacity for self-correction, faces significant challenges. Some of the most significant challenges relate to the incentive structure in science and how it shapes scientific findings in ways that undermine trust and objectivity. Facing up to these challenges requires us to think carefully about the scientific process, the role of incentives in shaping it, and what values are thus finding their way into science. Let’s back up. What is science’s incentive structure, and how does it create challenges for trust and objectivity? As we have seen, science is a social practice that occurs in institutions like universities and national research centers. Scientists are professionals who get paid for teaching and for doing research. But university salaries are, in most cases, not enough to fund scientific research. Scientists need extra money to pay for scientific instruments and lab equipment, for experimental participants, and for their assistants. This extra money is generally awarded by public agencies like the ERC (European Research Council) in Europe and the NSF (National Science Foundation) and NIH (National Institutes of Health) in the US. The competition is fierce; every year, the number of applicants for funding grows, while, partly due to budget cuts, the number of available awards shrinks. Scientists’ ability to secure grants determines their career prospects. And their chance of securing grants depends on their quantity and quality of publications, frequency of citations, previously awarded grants, and the public attention they are able to attract. ‘Publish or perish’ is the phrase coined to capture the increasing pressure in science to rapidly and continually publish work in order to sustain one’s career. The competition for space in prestigious journals is also fierce; many have rejection rates of greater than 90%. Because scientific production has increased dramatically over the years, journal editors usually prefer to publish novel results that support an exciting hypothesis rather than very robust and well-documented negative results. Consequently, scientists have a strong incentive to produce surprising, positive results. Other types of scientific research are harder to place in top journals. These include the negative result that a hypothesis wasn’t supported by the evidence, studies that replicate or assess previously published results, and preliminary, exploratory investigations that are not decisive. The tendency to reward only one form of scientific finding is called publication bias. This is common across all scientific journals but especially strong in the most prestigious journals. Publication bias, coupled with the scarcity of resources and employment opportunities, generates a challenge to science’s capacity for self-correction (Ioannidis, 2005). For one thing, openness to criticism depends on researchers attempting to replicate existing studies to see if the evidence holds up. Replication of previously published studies can increase the credibility of scientific claims when the supporting evidence for these claims is reproduced. When the supporting evidence is not reproduced, replication can instead foster innovation and can improve previous experimental designs and data analysis. Since publication bias works against replication studies, scientists have little incentive to perform them. This undermines self-correction and the accumulation of trustworthy knowledge across science and can have negative social consequences in some fields.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

306

Explaining, Theorizing, and Values

The literature on associations between specific foods and cancer risk, for example, may be seriously distorted. Statistically significant associations with cancer risk have been claimed for most food ingredients, from beef to tea. Careful analysis of this literature highlights that many published studies report implausibly large effects, even when the actual evidence is weak and effect sizes small (Schoenfeld & Ioannidis, 2013). Dissent and work toward replicability would improve the reliability and validity of claims about the role of food in cancer risk. The incentive structure of current science also negatively impacts scientists’ communication of their results. The emphasis on producing exciting, publishable results may lead scientists to cut corners in how experiments are designed, and how data are analyzed and presented. Whether or not these are conscious decisions, scientists may not randomize their experiment or not control for some known confounding factors. Another common shortcut is data dredging, where data mining techniques are used to uncover patterns in data sets that support a hypothesis not under investigation. This makes it more likely that a claimed pattern is actually a type I error (see Chapter 6) and the supported hypothesis is false. Relatively few studies report effect sizes and measures of uncertainty in a transparent way, so it’s often hard for others to assess the quality of a study and the soundness of the methods. Fierce competition in science also leads more and more scientists to abandon academia for jobs in industry. IT, AI, and pharmaceutical, chemical, and agricultural industries have been hiring more and more scientists. This raises another worry about sincere and transparent communication of scientific results. Being funded by a private company to carry out research may pose conflicts of interest, which introduces funding bias. Scientific studies are more likely to have findings supporting the interests of the study’s financial sponsor. This can happen because of how values influence science—what to study and how, what the aim is, how to handle uncertainty, and how to present the findings. It can also happen via intended or unintentional improper influence on data or methods. Regardless, this leads to corporations having outsized influences on the nature of our scientific knowledge and, in some cases, unknowingly—or even knowingly—misleading the public with bad information. Another challenge regards communication: too much science is inaccessible to the general public and even to many other scientists. Scientific studies get published by forprofit journals, and these journals typically put articles behind pricey paywalls. Academic institutions can pay for their faculty and students to have journal access, but not all academic institutions can afford subscriptions to these journals. By the time science is reported in newspapers and popular magazines, it is often characterized inaccurately and misleadingly, full of hype and exaggeration. This too is due to an incentive structure, this one for journalism: media outlets are rewarded for splash and clicks, not for careful accuracy. Bad science journalism, along with the seductive allure of celebrities’ (often misinformed) opinions on issues like nutrition and vaccinations, can fuel serious misunderstanding of scientific findings. Diversity in scientific approaches fosters science’s capacity for self-correction. But the institutional apparatus of contemporary scientific inquiry has reduced incentives for undertaking and freedom to pursue research that challenges existing scientific ideas or develops wholly new theories (Stanford, 2015). This has fostered specialization in the sciences, and it has also shielded popular theories and methods from being challenged

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

Explaining, Theorizing, and Values

307

by competing, and perhaps better, theories and methods. Demographic diversity and diversity in political views are also important for science’s capacity for self-correction (Duarte et al., 2015), and yet science has historically been, and remains, limited in both of these kinds of diversity. In this book, we have painted a picture of science as fallible but with the tools to reliably generate knowledge. Some scientific knowledge has had dramatic practical importance—just think about the outstanding progress of the medical sciences and of computer science and AI. Other knowledge regards fascinating aspects of the faraway universe and the strange behavior of microparticles. The value of science in producing knowledge requires openness to criticism and dissent, the pursuit of meaningful questions, and the communication of results in a sincere and transparent way. Only then can science live up to its self-correcting ideal, generate objective knowledge, and thus deserve our trust. It’s looking like some features of science, including its current incentive structure, need to be changed to promote these ends.

EXERCISES 8.18 Define diversity in your own words. Choose three characteristics of people (for example, gender, nationality, and political views), and, for each, describe how scientists’ diversity in that characteristic might contribute positively to science. You can think about any field(s) of science that will help in answering this question. 8.19 Describe two or three steps that you think could be taken to increase diversity in science. Mention also any concerns or downsides you can think of for each of these steps. 8.20 Describe in detail an example of when values have influenced science in an illegitimate way. Then diagnose what went wrong. What was wrong about the values or the nature of their influence, and what was the detrimental effect to science?

Copyright © 2018. Taylor & Francis Group. All rights reserved.

8.21 State the value-free ideal of science. Then, summarize the view of how values can legitimately factor into science outlined by Kevin Elliott. In your view, does that view of values’ influence violate the value-free ideal, or is it consistent with that ideal? Give an argument in support of your answer. 8.22 Suppose you are working for an NGO (non-governmental organization) on the task of measuring poverty levels across countries. For each of the following decisions, describe at least two ways to proceed, and say how values are relevant to making the decision. a. Which countries will you include in the study? b. How will you define and measure poverty? c. What extraneous variables will you take into account? d. How will you make comparisons across countries? e. How will your results be publicized? 8.23 List several potential ethical problems arising from scientific research funded by the pharmaceutical industry. For at least three of these problems, describe a concrete action to address that ethical problem that could be taken by governments, pharmaceutical companies, or some other party.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

308

Explaining, Theorizing, and Values

8.24 Describe a real example of when scientists need to act in the face of uncertainty. Describe the nature of the uncertainty and explain how social, economic, and moral considerations might factor into the decision of how to proceed. 8.25 Choose three of the main contemporary challenges to science’s objectivity described in this section, and rank them in importance from 1 to 3, where 1 is the most important. For each, describe why it is a problem, including some considerations not provided in the text; then suggest one step that you think could help address the challenge. You should also assess how practical it is to implement each of your suggestions. 8.26 In recent years, there have been several retractions of published scientific articles that have captured the world’s attention. In 2015, it was the retraction of a paper about gay marriage that was initially published in the prominent scientific journal Science. Read the description of this case on Retraction Watch (https://retractionwatch. com), and then answer the following questions. a. What risks do people who report misconduct in science (whistleblowers) face? b. Were human subjects ‘harmed’ in this case, and if so, how? c. Describe how data management issues influenced this case. d. Describe how authorship issues influenced this case. e. Does this case raise any conflict of interest? f. What issues does the case raise about collaborating with others? g. Describe how replication issues influenced this case. 8.27 Look back at the case of climate change discussed in Chapter 1. Identify at least five ways in which values are likely to have affected that research, and describe how those values have impeded or promoted scientific knowledge of climate change.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

FURTHER READING For an influential overview of philosophical conceptions of scientific explanation, see W. Salmon’s (1989). Four decades of scientific explanation. Minneapolis: University of Minnesota Press. See also Psillos, S. (2006). Past and contemporary perspectives on explanation. In T. Kuipers (Ed.), Handbook of the philosophy of science: Focal issues. (pp. 121–196). Amsterdam: Elsevier. For more on explanatory reasoning, see Lombrozo, T. (2012). Explanation and abductive inference. In K. J. Holyoak and R. G. Morrison (eds.), Oxford handbook of thinking and reasoning (pp. 260–276). Oxford: Oxford University Press. For Kuhn’s view on theory change, see Kuhn, T. (1962). The structure of scientific revolutions. Chicago: University of Chicago Press. For more on the relationship between social institutions, values, and objectivity, see Paul Feyerabend’s Against method (1975) and Science in a free society (1978). London: Verso. For an introduction to the roles values play in science, see Elliott, K. (2017). A tapestry of values: An introduction to values in science. Oxford: Oxford University Press. For an account of how values factor into science, see Longino, H. E. (1990). Science as social knowledge: Values and objectivity in scientific inquiry. Princeton: Princeton University Press.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

Explaining, Theorizing, and Values

309

Copyright © 2018. Taylor & Francis Group. All rights reserved.

For an exploration of how social conditions influence science, see Merton, R. K. (1973). The sociology of science: Theoretical and empirical investigations. Chicago: University of Chicago Press. For an account of values in science focused especially on underdetermination, see Douglas, H. (2000). Inductive risk and values in science. Philosophy of Science, 67, 559–579. For an overview of objectivity in science, see Reiss, J., & Sprenger, J. (2014). Scientific objectivity. In Stanford encyclopedia of philosophy. Retrieved from https://plato.stanford. edu/archives/win2017/entries/scientific-objectivity/.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:53:53.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Glossary

abductive inference: a commonly used type of ‘backward’ scientific inference that attributes special status to explanatory considerations; also called inference to the best explanation abstraction: leaving out or ignoring known features of a system from a representation or account of it accuracy: the extent to which a model correctly represents the true value of a target system addition rule: the probability that one of a number of mutually exclusive outcomes will occur is the sum of their individual probabilities affirming the antecedent: using the truth of a conditional statement and its antecedent as grounds for concluding the consequent is also true; a deductively valid form of inference affirming the consequent: using the truth of a conditional statement and its consequent as grounds for concluding the antecedent is also true; a deductively invalid form of inference algorithm: step-by-step procedure for obtaining some outcome alternative hypothesis: in statistical hypothesis-testing, the a bold and risky conjecture that, contrary to the null hypothesis, the variables in question are statistically dependent ampliative inferences: when conclusions express content that, in some sense, goes beyond what is present in the premises analogical models: physical or abstract objects with features analogous to focal features of a target phenomenon used to model the phenomenon anomaly: a deviation from expectations that resist explanation by the reigning scientific theory; (Kuhnian) motivation for scientific revolution and paradigm shifts antecedent: the left side of a conditional claim; a condition that guarantees some consequence; logically prior appeal to ignorance: an informal fallacy; concluding that a certain statement is true because there is no evidence proving that it is not true appeal to irrelevant authority: an informal fallacy; appealing to the views of an individual who has no expertise in a field as evidence for some claim applied research: scientific knowledge used to develop some product, like techniques, software, patents, pharmaceutical drugs, or new materials; often, a central motivation is to generate products for profit argument: a set of statements in which some of the statements, the premises, are intended to provide rational support or empirical evidence in favor of another statement, the conclusion assumption: a specification that a target system must satisfy for a given model to be similar to it in the expected way auxiliary assumptions: a set of assumptions about how the world works that often go unnoticed but are needed for a hypothesis or theory to have the expected implications; also called background assumptions average: see mean

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:54:10.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Glossary

311

axioms: statements that are accepted as self-evident truths about some domain, used as a basis for deductively inferring other truths (theorems) about the domain bar chart: visual representation of statistical outcomes in which bars of different heights are used to show the frequency of different values for some discrete variable basic research: scientific research that aims at knowledge for its own sake; also called pure research Bayes factor: a compact, numerical way of measuring the statistical evidence for a hypothesis H0 with respect to alternative H1. It is defined by the formula: B01(E) = Pr(H0|E) x Pr(H1) / Pr(H1|E) x Pr(H0) = Pr(E|H0) / Pr(E|H1) Bayes nets: causal Bayes networks, or nets; a kind of probabilistic model that provides a compact, visual representation of the causal relationships in a system and the strength of those relationships by using joint probability distributions Bayes’s theorem: a mathematical formula used for calculating conditional probabilities. It is defined by the formula: Pr(H|O) = Pr(O|H) x Pr(H) / Pr(O). Another form of Bayes’s Theorem that is generally encountered when comparing two competing hypotheses H and not-H is: Pr(H|O) = Pr(O|H) x P(H) / Pr(O|H) x P(H) + Pr(O|not-H) x Pr(not-H); the heart of Bayesian statistics Bayesian conditionalization: a probabilistic rule of inference. It says that, upon observing new evidence O, the new degree of belief in a hypothesis H ought to be equal to the posterior probability of H: Prnew(H) = Pr(H|O) bell curve: see normal distribution biased variable: a random variable that is not fair, that is, for which some outcomes are more likely than others big data: very large data sets that cannot be easily stored, processed, analyzed, and visualized with standard statistical methods bimodal distribution: two values in a range are the most common; in a histogram, there are two peaks blind experiment: an experiment or study designed so that the scientists recording or taking measurements don’t know which subjects are in the control group and which are in the experimental group calibration: comparing the measurements of one instrument with those of another to check the instrument’s accuracy so it can be adjusted if needed case study: a detailed examination of a single individual or system in a real-life context causal background: the other factors that in fact do or in principle might causally influence two events, thereby also potentially affecting the causal relationship between the two events causal conception of explanation: the view that explanation involves appealing to causes that brought about the phenomenon to be explained central limit theorem: a statistical theory that samples with a large enough size will have a mean approximating the mean of the population central tendency: a distribution with one peak at the center, corresponding to the most common group of values of a variable cluster indicators: identify several markers of some trait in order to more precisely define the trait while not oversimplifying it cohort study: a study in which researchers select a group of subjects and track them over time, at set intervals, to observe the effects of some condition they experience collecting data: gathering and measuring information about variables collectively exhaustive outcomes: when at least one outcome of a set of outcomes must occur at any given time common cause: when neither of two covarying types of events causes the other but a third event causes both computer simulation: a program run on a computer using algorithms to explore the dynamic behavior of a target system; also called computer model conclusion: in an argument, a statement that is supported by the premises

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:54:10.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

312

Glossary

conditional probability: the probability of an event’s occurrence given that some other event has occurred; expressed Pr(X|Y) where X is conditional on Y conditional statements: statements in which one circumstance, the antecedent, is given as a condition for another circumstance, the consequent; the antecedent guarantees the occurrence of the consequent confederate: in an experiment, an actor who pretends to be a subject confirmation: the observation matches the expectation based on the hypothesis, providing evidence in favor of the hypothesis confirmation bias: the tendency we all have to look for, interpret, and recall evidence in ways that confirm and do not challenge our existing beliefs conflicts of interest: financial or personal gains that may inappropriately influence scientific research, results, or publications; scientists are obligated to disclose any potential conflicts of interests confounding variables: extraneous variables that have varied in an uncontrolled way and influenced the dependent variable under investigation consequent: the right side of a conditional claim; the condition that arises from, or is guaranteed by, the antecedent contributing cause: a cause that is neither necessary nor sufficient to bring about an effect; also called a partial cause control group: a group that is similar to the experimental group but experiences other value(s) of the independent variable, i.e., does not receive the intervention correlated variables: the value of one variable raises or lowers the probability of the other variable taking on some value correlation coefficient: describes the direction and strength of correlation; a positive or negative sign indicates positive or negative correlation, and a number between 0 and 1 indicates the strength of the correlation correlation strength: how predictable the values of one variable are based on the values of the other variable counterexamples: situations you can describe, whether real or imagined, in which the premises of an argument are true but the conclusion false; shows that a deductive argument is invalid crisis: a period in which widespread failure of confidence in the ability of a (Kuhnian) paradigm to fulfill its scientific function cross-sectional study: a study in which different individuals are measured for some property or condition at a single, given time; helpful in investigating relationships among a number of different variables crucial experiment: an experiment that decisively adjudicates between two hypotheses, settling once and for all which is true curve fitting: extrapolating from a data set to the expected data for measurements that weren’t actually taken by fitting a continuous line through a data plot; there are always multiple different lines consistent with the data data: public records produced by observation or by some measuring device; allow observations to be recorded and compared data cleansing: identifying and correcting errors in a data set by deciding which data are questionable and should be eliminated data dredging: using data mining techniques to uncover patterns in their data that support some hypothesis that one did not set out to test in advance data model: a regimented representation of some data set, often with the aim of highlighting whether or not the data count as evidence for a given hypothesis deception: when researchers actively misinform participants about some aspect of an experiment or study

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:54:10.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Glossary

313

deductive inference: an inference in which the relationship between premises and conclusion is purported to be one of necessitation; in a valid deductive argument, the truth of the premises necessitate the conclusion; in an invalid deductive argument, they do not denying the antecedent: using the falsity of an antecedent and the truth of a conditional as grounds for concluding the consequent is false; a deductively invalid form of inference denying the consequent: using the falsity of a consequent and the truth of a conditional as grounds for concluding the antecedent is false; a deductively valid form of inference dependent variable: a variable that is expected to depend on, or be the effect of, the independent variable descriptive claim: a statement about how things are without making any value judgments descriptive statistics: tools for summarizing, describing, and displaying data in a meaningful way difference-making: the idea that if the occurrence of one event makes a difference to the occurrence of a second event, the first is a cause of the second direct correlation: greater values for one variable increase the probability of greater values for a second variable; also referred to as positively correlated direct variable control: when all extraneous variables are all held at constant values during an intervention directed acyclic graphs: graphs in which all the causal relationships are one-directional (none of a cause’s effects are also among its causes) and do not move in a circle (following a series of causeeffect relationships will not lead you back to an earlier cause as a later effect) distal causes: causes that occurred further back in time from the effect and perhaps further away as well double-blind experiment: an experiment or study in which both scientists and subjects are unaware of which subjects are in which group (control or experimental) because of randomization Duhem-Quine problem: the idea scientific hypotheses can never be tested in isolation; instead, scientific hypotheses are tested only against the background of auxiliary assumptions ecological validity: the degree to which experiment circumstances are representative of real-world circumstances effect size: a quantitative, scale-free measure of the strength of a phenomenon empirical evidence: information gathered through the senses, including with the use of technology to extend the reach of the senses, that weighs in favor or against some hypothesis estimation: predicting properties of a population on the basis of a sample eugenics: the idea that a human population can be improved by controlling breeding; historically linked to racist and classist science that threatened human liberties and human dignity evidence: fact or information that makes a difference to what one is justified in believing evidentialism: the idea that a belief’s justification is determined by how well the belief is supported by evidence exemplar: a model that is one of the target systems it is used to represent expectations: conjectural claims about observable phenomena based on some hypothesis; expectations should be true if the hypothesis is true, false if the hypothesis is false experiment: a method of testing hypotheses that involves intervening on one or more variables of interest and observing what effects this has experimental group: a group that receives the intervention to the independent variable or otherwise experiences the intended value of the independent variable explanatory knowledge: generating answers to questions about how things work and why things are the way they are exploratory experiment: an experiment that does not rely on existing theory and may not be aimed to test a specific hypothesis; used to suggest novel hypotheses or to assess whether a poorly understood phenomenon actually exists

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:54:10.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

314

Glossary

external experimental validity: the extent to which experimental results generalize from the experimental conditions to other conditions—especially to the phenomena the experiment is supposed to yield knowledge about extraneous variables: other variables besides the independent variable that influence the value of the dependent variable; if uncontrolled, these may become confounding variables fair variable: a random variable that has independent outcomes and is unbiased, that is, its outcomes are all equally likely faithfulness: the requirement that probabilistically independent variables are not directly causally related; an assumption of causal Bays nets falsifiable: evidence can be described that, if found, would show the claim to be false; a key feature of scientific claims falsificationism: the idea, due to Karl Popper, that scientific reasoning proceeds by attempting to disprove ideas rather than to prove them right field experiment: an experiment conducted outside of a laboratory, in the experimental subjects’ everyday environment frequency distribution: how often a variable takes on each range of values in a data set frequentist interpretation: the idea that the probability of an outcome is the limit of its relative frequency; an element of classical statistics full control: creating the conditions such that no variables other than the target independent variable and the dependent variable change as a result of an intervention funding bias: when a scientific study is more likely to support the interests of its financial sponsor(s) gambler’s fallacy: fallacious reasoning from a past variation from the expected frequency of outcomes that there will be a future variation from the expected frequency in the opposite direction; errantly supposing statistically dependence of outcomes Gaussian distribution: see normal distribution generality: a desirable feature of models; a model’s ability to apply to a greater number of target systems Hawthorne effect: a confounding variable in experiments involving human participants, where experimental participants change their behavior, perhaps unconsciously, in response merely to being observed; see also observer bias histogram: visual representation of statistical outcomes in which bars of different heights are used to represent the frequency of different values of a continuous variable hypothesis: a conjectural statement based on limited data; a guess about what the world is like, which is not (yet) backed by sufficient, or perhaps any, evidence hypothetico-deductive method: a method of hypothesis-testing; an expectation is deductively inferred from a hypothesis and compared with an observation; violation of the expectation deductively refutes the hypothesis, while a match with the expectation non-deductively boosts support for the hypothesis idealization: assumption made without regard for whether it is true, often with full knowledge that it is false illusion of explanatory depth: believing that one understands the world more clearly and in greater detail than actually is the case illusion of understanding: a lack of genuine understanding of some topic linked to a lack of appreciation for the depth of one’s ignorance about the topic independent outcomes: the probability of the outcome of one trial is not conditional on the outcomes of any other trials; e.g., numbers rolled on two different dice rolls are independent from one another independent variable: a variable that is changed or observed at different values in order to investigate the effect of the change indirect correlation: greater values for one variable increase the probability of smaller values for a second variable; negatively correlated

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:54:10.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Glossary

315

indirect variable control: causing the influence of extraneous variables to vary in a way that is independent from the value of the independent variable inductive: an inferential relationship from premises to conclusion that is one of probability not necessity inductive generalization: inference to a general conclusion about the properties of a class of objects based on the observation of some number of objects in the same class inductive projection: inference to a conclusion about the feature of some object that has not been observed based on the observation that some objects of the same kind have that feature inductive strength: the probabilistic extent to which the conclusion of an inductive inference is true given that its premises are all true. inference: a logical transition from one thought to another that can be characterized in terms of abstract rules inference to the best explanation: see abductive inference inferential statistics: using statistical reasoning to draw broader conclusions on the basis of limited data informal fallacies: inference patterns that involve a problem with the content of an inference; a deductive argument that commits an informal fallacy may be valid, but it will not be sound instruments: technological tools or other kinds of apparatus used in experiments intelligent design: the idea that life forms are so complex that they couldn’t possibly have come about without the help of an intelligent designer (such as the Judeo-Christian God) internal experimental validity: the degree to which scientists can draw accurate conclusions about the relationship between the independent and dependent variables intervention: a direct manipulation of the value of the independent variable isomorphism: one idea of the relationship a model bears to its target system(s); a one-to-one correspondence between each part or feature of the model and of the target joint method of agreement and difference: one of Mill’s methods; considering cases where the suspected effect occurs to see what they have in common (method of agreement), as well as considering cases where the suspected effect does not occur to see what those have in common (method of difference) joint probability distribution: the probability distribution for each of a set of variables, taking into account the probability of the other variables in the set justification: reasons for belief; one requirement for a belief to qualify as knowledge knowledge: traditionally, a belief that is at least both true and sufficiently justified laboratory experiments: experiments conducted in a laboratory, giving scientists control over interventions performed and direct and indirect control of many extraneous variables likelihood: often used as a synonym for ‘probability’, or to refer to the probability of observed data given the truth of a specific hypothesis. More precisely, a likelihood is a function of the parameters of a statistical model given observed data logic: the study of the rules and patterns of good and bad inference longitudinal study: a study in which the same subjects are measured (for some property or condition) repeatedly over a period of time, sometimes many years, allowing the researchers to track a subject’s change Markov condition: the requirement that the probability of causal variables conditional on their parent causes are probabilistically independent of all their other ancestors; an assumption of causal Bays nets Matilda effect: the bias against recognizing the achievements of women scientists, whose work is often uncredited or else attributed to their male colleagues instead material conditional: a conditional statement (with an antecedent and consequent) that is false only if the antecedent can be true while the consequent is false mathematical models: mathematical formulas that relate variables, parameters, and constants to one another to represent one or more target systems

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:54:10.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

316

Glossary

mean: a measure of the central tendency of a data set; the sum of all values in the data set divided by the number of instances; also called average measurement error: the difference between a measured value quantity and its true value mechanisms: complex hierarchical systems consisting of component parts and operations that are organized so as to causally produce a phenomenon mechanistic conception of explanation: the view that phenomena are explained by showing how they are mechanistic activities composed of the organized operations of those mechanisms’ component parts median: a measure of the central tendency of a data set; the middle value in a distribution when the values are arranged from the lowest to the highest method of agreement: one of Mill’s methods; considering cases where the suspected effect occurs to see what they have in common method of concomitant variations: one of Mill’s methods; using the observation that the value of one variable changes in tandem with changes to the value of a second variable to infer that the two are causally related method of difference: one of Mill’s methods; considering cases where the suspected effect does not occur to see what those have in common method of residues: one of Mill’s methods; comparing cases in which a set of causes brings about a set of effects to cases in which some of those causes bring about some of those effects and inferring, on that basis, that the absent cause(s) are responsible for the absent effect(s) methodological naturalism: the idea that scientific theories shouldn’t postulate supernatural or other spooky kinds of entities mode: a measure of the central tendency of a data set; the most frequent value in the data set modularity: the assumption that interventions on some causal relationship will not change other causal relationships in the system modus ponens: see affirming the antecedent modus tollens: see denying the consequent monotonic: the addition of new information never invalidates the inference multiplication rule: the probability that two independent events both occur is the result of multiplying their individual probabilities mutually exclusive outcomes: a set of outcomes, only one of which can occur in a given trial; e.g., rolling a one and a three are mutually exclusive outcomes natural experiments: interventions on independent variables occur naturally without experimenters influencing the system natural explanations: explanations that invoke features of the world to account for the phenomena under investigation natural phenomena: objects, events, regularities, and processes that are sufficiently uniform to make them susceptible to systematic study necessary cause: a cause that must occur in order for the effect to come about necessary condition: a condition that must be satisfied in order for a specified outcome to occur negatively correlated: greater values for one variable decrease the probability of greater values for a second variable; also known as indirect correlation nodes: used to represent variables in causal Bayes nets nomological conception of explanation: the idea that a phenomenon is explained by deductively inferring it from a scientific law and some initial conditions non-ampliative: an inference in which the conclusion doesn’t add any new content beyond what’s explicitly or implicitly contained in the premises; a property of valid deductive inference non-monotonic: the addition of new information can invalidate the inference

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:54:10.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Glossary

317

normal distribution: a symmetric, unimodal distribution with the most common values at the middle and decreasingly common outcomes as the values get higher and lower; also called a bell curve or Gaussian distribution normal science: the most common (Kuhnian) phase of science, within which scientific research is stable and based on widespread agreement about basic assumptions; this follows either pre-paradigm science or scientific revolution normative claim: a statement about how things ought to be, which might or might not correspond to how they in fact are null hypothesis: a reasonable default assumption about how the world is, which is not a bold and risky conjecture; in statistical hypothesis-testing, the null hypothesis generally states that the variables in question are statistically independent observable: capable of being perceived or detected with the use of one’s senses under appropriate circumstances; observability is relative to specific epistemic communities, their scientific theories, and technical apparatus observation: any information gained from your senses—not only what you see but also what you hear, smell, touch, and sense in any other way you can experience the world observational study: collecting and analyzing data without performing interventions or, often, aiming to control extraneous variables observer bias: See Hawthorne effect observer-expectancy effect: when a scientist’s expectations lead her to unconsciously influence the behavior of experimental subjects ontological naturalism: the idea that no supernatural entities exist openness to falsification: the willingness to abandon any claim or theory when the preponderance of evidence suggests it’s wrong; a key feature of science operational definition: a specification of the conditions when some concept applies, enabling measurement or other kinds of precision outcome space: the set of all values a random variable can take on, also called sample space outliers: measured values for a variable that are notably different from the other values in the data set p-value: the probability of the observed data assuming the null hypothesis is true paradigm: according to Kuhn, a way of practicing science; provides scientists with a stock of assumptions about the world, concepts and symbols for effective communication, methods for gathering and analyzing data, and other habits of research and reasoning parameter: a quantity whose value can change in different applications of a mathematical equation but that only has a single value in any one application of the equation pattern conception of explanation: the idea that a phenomenon is explained by fitting it into a more general framework of laws and principles perfectly controlled experiment: an experiment in which all variables are controlled except for the independent variable, an intervention is performed on the independent variable, and the effects on the dependent variable are measured; no confounding variables are possible Persian Golden Age: period of rapid intellectual achievements in science, philosophy, literature, and art spanning from Central Asia to the Arabian Peninsula between the 8th and 13th centuries, which was the core part of the so-called Islamic Golden Age more generally; arguably the most important period in the development of science prior to the Scientific Revolution phenomena: things or processes as we experience them; appearances of objects, events, regularities, or processes that exist or occur philosophy of science: the investigation of science, focused especially on questions of what science should be like in order to be a trustworthy route to knowledge and to achieve the other ends we want it to have, such as usefulness to society

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:54:10.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

318

Glossary

physical constants: quantities that are universal and unchanging over time physical process: an account of causation in which causation consists in some continuous physical process, such as energy transfer pie chart: visual representation of statistical outcomes in which a circle is divided into slices that used to show the relative frequency of the outcome space of different values for some variable placebo effect: when an experimental subject’s expectations lead to the outcome the subject expects; this can be an extraneous variable plagiarism: stealing somebody else’s ideas, data, or words by presenting them as one’s own work and failing to give appropriate credit population: a collection of entities that are grouped together, often in virtue of exhibiting common features population validity: the degree to which experimental entities are representative of the broader class of entities of interest; for experiments with human subjects, this is the broader population positively correlated: greater values for one variable increase the probability of greater values for a second variable; also known as direct correlation post hoc, ergo propter hoc: the mistaken conclusion that one event causes another simply because the events occur in succession close to each other; translated from Latin, ‘after this, therefore because of this’ posterior probability: the probability of a hypothesis conditional on an observation that has been made; Bayes’s theorem can be used to calculate this power: the probability that the test will reject a false null hypothesis precision: the extent to which a model finely specifies features of a target system premises: statements that provide support for some conclusion; the starting points for an inference pre-paradigmatic: the earliest phase of science according to Kuhn; characterized by the existence of different schools of thought that debate very basic assumptions, including research methods and the nature and significance of data prior probability: the rational degree of belief in a hypothesis before making a given observation probability distribution: how often a variable is expected to take on each of a range of values probability theory: a mathematical theory developed to deal with random variables, or outcomes that are individually unpredictable but that behave in predictable ways over many occurrences problem of induction: the idea that inductive inference cannot be logically justified, since any possible justification would need to employ inductive reasoning and would thus be circular prospective study: a study in which researchers identify a group of subjects with some property or condition and track their development forward in time proximate causes: causes that occur closely in time and perhaps in space to their effect pseudoscience: a non-scientific activity that masquerades as science, but is not; often designed to deceive people into believing it has scientific legitimacy publication bias: the tendency to publish surprising, new results more often than negative results, replication studies, and exploratory work qualitative data: information that is non-numerical and without some other standard that makes it easily comparable, such as diary accounts, unstructured interviews, and observations of animal behavior qualitative variables: variables with values that are not numerical but descriptive, such as the variable sport, with the values basketball, hockey, and so on. quantitative analysis: the use of mathematical techniques to measure or investigate phenomena quantitative data: data that is easily comparable, often in numerical form, such as numbers, vectors, or indices quantitative variables: variables with numerical values, such as height or percent correct on an exam random sampling: the individuals composing the sample are selected randomly from the population

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:54:10.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Glossary

319

random variables: variables that take on different values that are individually unpredictable but predictable in the aggregate randomization: randomly assigning experimental entities to experimental and control groups rational degree of belief: the interpretation of posterior probability in Bayesian statistics; believing a hypothesis to the same degree as the probability it is true given the observations that have been made range: a measure of the variability; the difference between the smallest and largest values in a data set reasoning: psychological processes leading to beliefs; could be inferential or not refutation: one consequence possible on the H-D method; the observation contradicts the expectation deductively inferred from the hypothesis; the hypothesis is deductively proven to be false regression analysis: finding the best-fitting line through the points on a scatterplot regression to the mean: the tendency for outlier values to relate to less extreme values in the future or past relative frequency distributions: frequency distributions that record proportions of occurrences of each value of a variable rather than absolute numbers of occurrences replication: performing an experiment again—often with some modification to its design—in order to check whether the result remains the same representative: the experimental entities studied do not vary in any systematic way from the general population retrospective study: a study in which researchers first identify a group of subjects who have the target property or condition, and then investigate their past in an attempt to isolate the cause of the condition robustness: a desirable feature of models; a measure of insensitivity to features that differ from the target in a given model robustness analysis: analyzing multiple models or different versions of a model to determine whether and to what extent their results are consistent sample: a subset of a population about which data are gathered sample data: data about individuals in a sample sample mean: the most likely average value of the trait of interest in a population sample size: the number of individual sources of data in a study, often the number of experimental entities or subjects sample space: see outcome space sample standard deviation: an estimate of the spread of the probability distribution for the random variable; s = √[∑(value−mean)2 / (n−1)] sampling error: incorrect conclusion due to a non-representative sample scale model: a concrete physical object that serves as a representation of one or more target systems scatterplot: visual representation of statistical outcomes in which the values of one variable are plotted against the values of another variable science: an inclusive social project of developing natural explanations for natural phenomena; these explanations are tested using empirical evidence and should be subject to additional open criticism, testing, refinement, or even rejection; science regularly, but not always, employs mathematics in both the formulation and testing of its explanations scientific breakthrough: radical shift in the theories in some field of science scientific revolution: a radical change of a reigning theory being overturned in favor of a new theory, often involving an alternative worldview; Kuhn’s view of the nature of scientific change Scientific Revolution (the): beginning with the work of Copernicus and ending with the work of Newton; a fundamental transformation in ideas about how knowledge claims ought to be justified, which led to the development of the scientific method

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:54:10.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

320

Glossary

scientific theory: a large-scale system of ideas about a natural phenomenon supported by a variety of evidence self-explanation effect: the observation that generating explanations to oneself or to others can facilitate the integration of new information into existing bodies of knowledge and can lead to deeper understanding set: a grouping of objects (called elements) significance level: how improbable, given the null hypothesis, an experimental result must be to warrant rejecting the null hypothesis Simpson’s paradox: a correlation between two events that disappears, or is reversed, when data are grouped in a different way 68–95–99.7 rule: the percentages of values that lie within one, two, and three standard deviations around the mean of a normal distribution soundness: a property that deductive arguments have when they are both valid and have all true premises spurious correlations: two events are correlated but aren’t causally related in any obvious way standard deviation: the square root of the variance; for a population, s = √[∑(value − mean)2 / n] standard error: the standard deviation of the sampling distribution of the mean; SE = s/√(sample size) statistical description: summarizing, describing, and displaying data in a meaningful way statistically independent: two events for which the occurrence of one does not increase or decrease the probability of the other; that is, when Pr(Y|X) = Pr(Y) and Pr(X|Y) = Pr(X) statistically significant: data with a p-value below the chosen significance level; grounds for rejecting the null hypothesis strawman fallacy: an informal fallacy; caricaturing an argument in order to criticize the caricature rather than the actual view subjects: humans, non-human animals, or inanimate objects in an experiment or non-experimental study; also called experimental entities subtraction rule: the probability that some outcome doesn’t occur is the result of subtracting the probability of that outcome from the total probability (Pr = 1) sufficient causes: causes that always bring about the effect sufficient condition: a condition that, if met, guarantees a specified outcome will occur super-observational: enhancement of our powers of observation far beyond what they ordinarily include through the use of tools or other implements target system: a selected part of the structure of world, about which scientists aim to gain knowledge; the phenomenon intended to be represented by a model theorems: statements deductively inferred from a set of axioms theoretical claims: claims made about entities, properties, or occurrences that are not directly observable thought experiments: devices of the imagination that scientists can use to learn about possible effects of an intervention; may supplement or replace empirical evidence total probability: the whole set of values in an outcome space for some random variable; always Pr = 1 tractability: the degree of ease in developing or using a model type I error: a false positive; the erroneous rejection of the null hypothesis when it is true type II error: a false negative; the erroneous acceptance of the null hypothesis when it is false underdetermination: when evidence is insufficient to determine which of multiple theories or hypotheses is true understanding: grasping why or how something came about or is the way it is uniform distribution: all values in a range are equally likely; a histogram shows a flat line unimodal distribution: one value in a range is the most common; in a histogram, there is one peak

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:54:10.

Glossary

321

Copyright © 2018. Taylor & Francis Group. All rights reserved.

valid: a property of deductive inference in which the truth of the premises logically guarantees or necessitates the truth of the conclusion value of a variable: the particular state or quantity that a variable has taken on in some instance value-free ideal: the idea that good science should not rely on moral and political beliefs in assessing the evidence for scientific models, theories, or hypotheses variability: the distribution of values in a data set; measures of variability like standard deviation and variance indicate how spread out the data set is; also called spread variable: anything that can vary, change, or occur in different states and that can be measured variance: a measure of how far a set of data is spread out from the average value of the data set; the average of the squared differences of the values of a random variable from its mean

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:54:10.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

References

Ahmed, M., Anchukaitis, K., Asrat, A., Borgaonkar, H., Braida, M., Buckley, B., …, & Curran, M. (2013). Continental-scale temperature variability during the past two millennia. Nature Geoscience, 6, 339–346. Al-Khalili, J. (2015). In retrospect: Book of optics. Nature, 518(7538), 164–165. American Association for the Advancement of Science. (2001). Designs for science literacy. New York: Oxford University Press. Anderegg, W. R. L., Prall, J. W., Harold, J., & Schneider, S. H. (2010). Expert credibility in climate change. Proceedings of the National Academy of Sciences, 107, 12107–12110. Arrhenius, S. (1908). Worlds in the making: The evolution of the universe. London: Harper & Brothers. Axelrod, R. (1984). The evolution of cooperation. New York: Basic Books. Bao, X., & Eaton, D. W. (2016). Fault activation by hydraulic fracturing in western Canada. Science, 354, 1406–140. Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E. J., Berk, R., .  .  . & Cesarini, D. (2018). Redefine statistical significance. Nature Human Behaviour, 2, 6–10. Blackawton, P. S., Airzee, S., Allen, A., Baker, S., Berrow, A., Blair, C., .  .  . & Hackford, C. (2011). Blackawton bees. Biology Letters, 7, 168–172. Broca, P. (1861). Remarques sur le siège de la faculté du langage articulé, suivies d’une observation d’aphémie (perte de la parole). Bulletins de la Société d’anatomie, 2e serie, 6, 330–357. Callaway, E. (2017). Oldest Homo sapiens fossil claim rewrites our species’ history. Nature News, 8 June 2017. Callendar, G. S. (1939). The composition of the atmosphere through the ages. Meteorological Magazine, 74(878), 33–39. Camerer, C. F. (1997). Taxi drivers and beauty contests. Engineering and Science, 60(1), 10–19. Camerer, C. F., Babcock, L., Loewenstein, G., & Thaler, R. (1997). Labor supply of New York City cabdrivers: One day at a time. Quarterly Journal of Economics, 112, 407–441. Capra, F. (1975). The Tao of physics. Boston: Shambhala Publications. Cartwright, N. (1989). Nature’s capacities and their measurement. Oxford: Oxford University Press. Chatrchyan, S., Khachatryan, V., Sirunyan, A. M., Tumasyan, A., Adam, W., Aguilo, E., . . . & Friedl, M. (2012). Observation of a new boson at a mass of 125 GeV with the CMS experiment at the LHC. Physics Letters B, 716(1), 30–61. Chattopadhyay, R., & Duflo, E. (2004). Women as policy makers: Evidence from a randomized policy experiment in India. Econometrica, 72(5), 1409–1443. Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 997–1003. Cumming, G. (2013). Understanding the new statistics: Effect sizes, confidence intervals, and metaanalysis. New York: Routledge. Darley, J. M., & Latane, B. (1968). Bystander intervention in emergencies: Diffusion of responsibility. Journal of Personality and Social Psychology, 8, 377–383.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:54:24.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

References

323

Darwin, C. (1872). On the origin of species by means of natural selection, or the preservation of favoured races in the struggle for life, 6th edition. London: John Murray. Dockery, D. W., Pope, C. A., Xu, X., Spengler, J. D., Ware, J. H., Fay, M. E., . . . & Speizer, F. E. (1993). An association between air pollution and mortality in six US cities. New England Journal of Medicine, 329(24), 1753–1759. Donovan, A. (1993). Antoine Lavoisier: Science, administration, and revolution. Oxford: Blackwell. Duarte, J. L., Crawford, J. T., Stern, C., Haidt, J., Jussim, L., & Tetlock, P. E. (2015). Political diversity will improve social psychological science. Behavioral and Brain Sciences, 38, 1–13. Dyson, F. W., Eddington, A. S., & Davidson, C. R. (1920). A determination of the deflection of light by the sun’s gravitational field, from observations made at the solar eclipse of May 29, 1919. Philosophical Transactions of the Royal Society A, 220, 571–581. Eberhardt, F. (2009). Introduction to the epistemology of causation. The Philosophy Compass, 4(6), 913–925. Eddington, Sir Arthur. (1935/2012). New pathways in science: messenger lectures (1934). Cambridge: Cambridge University Press. Elliott, K. C. (2017). A tapestry of values: An introduction to values in science. Oxford: Oxford University Press. Enten, H. (2017). What Harry got wrong in 2016. FiveThirtyEight. Retrieved from http://fivethirtyeight. com/features/what-harry-got-wrong-in-2016/ Fisher, R. A. (1956). Mathematics of a lady tasting tea. In J. R. Newman (Ed.), The world of mathematics (pp. 1512–1521). New York: Simon & Schuster. (Original work published in Fisher, R. A. (1935). The design of experiments. Edinburgh: Oliver & Boyd). Fizeau, H. (1849). Sur une expérience relative à la vitesse de propagation de la lumière. Comptes rendus, 29, 90–92. Floridi, L. (2012). Big data and their epistemological challenge. Philosophy and Technology, 25, 435–437. Galton, F. (1889). Natural inheritance. London: Macmillan. Gelman, A., & Hennig, C. (2017). Beyond subjective and objective in statistics (with discussion). Journal of the Royal Statistical Society, 180(4), 967–1033. Gelman, A., & Stern, H. (2006). The difference between “significant” and “not significant” is not itself statistically significant. The American Statistician, 60(4), 328–331. Gillham, N. W. (2001). Sir Francis Galton and the birth of eugenics. Annual Review of Genetics, 35, 83–101. Glymour, C. (2007). When is a brain like the planet? Philosophy of Science, 74(3), 330–347. Gopnik, A. (1998). Explanation as orgasm. Minds and Machines, 8(1), 101–118. Guéguen, N., Jacob, C., Le Guellec, H., Morineau, T., & Lourel, M. (2008). Sound level of environmental music and drinking behavior: A field experiment with beer drinkers. Alcoholism: Clinical and Experimental Research, 32(10), 1795–1798. Güth, W., Schmittberger, R., & Schwarze, B. (1982). An experimental analysis of ultimatum bargaining. Journal of Economic Behavior and Organization, 3, 367–388. Haddad, D., Seifert, F., Chao, L. S., Possolo, A., Newell, D. B., Pratt, J. R., . . . & Schlamminger, S. (2017). Measurement of the Planck constant at the National Institute of Standards and Technology from 2015 to 2017. Metrologia, 54, 633–641 (arXiv: 1708.02473). Harlow, J. M. (1848). Passage of an iron rod through the head. Boston Medical and Surgical Journal, 39, 389–393. Harlow, J. M. (1868). Recovery from the passage of an iron bar through the head. Publications of the Massachusetts Medical Society, 2, 327–347. Hempel, C. G. (1966). Philosophy of natural science. Englewood Cliffs: Prentice-Hall.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:54:24.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

324

References

Herschel, W. (1801). Observations tending to investigate the nature of the sun, in order to find the causes or symptoms of its variable emission of light and heat: With remarks on the use that may possibly be drawn from solar observations. Philosophical Transactions of the Royal Society of London, 91, 265–318. Hesslow, G. (1976). Two notes on the probabilistic approach to causality. Philosophy of Science, 43(2), 290–292. Hodges, J., & Tizard, B. (1989). Social and family relationships of ex-institutional adolescents. Journal of Child Psychology and Psychiatry, 30, 77–97. Hubble, E. (1929). A relation between distance and radial velocity among extra-galactic nebulae. Proceedings of the National Academy of Sciences, 15(3), 168–173. Hublin, J. J., Ben-Ncer, A., Bailey, S. E., Freidline, S. E., Neubauer, S., Skinner, M. M., . . . & Gunz, P. (2017). New fossils from Jebel Irhoud, Morocco and the pan-African origin of Homo sapiens. Nature, 546(7657), 289–292. Hume, D. (1738/2007). A treatise of human nature (D. F. Norton & M. J. Norton, eds.). Oxford: Clarendon Press. Hume, D. (1748/1999). An enquiry concerning human understanding (T. L. Beauchamp, ed.). Oxford and New York, NY: Oxford University Press. Huygens, C. (1690/1962). Treatise on light (S. P. Thompson, trans.). New York: Dover Publications. Intergovernmental Panel on Climate Change (IPCC). (2014). Climate change 2014: Synthesis report. Retrieved from www.ipcc.ch/news_and_events/docs/ar5/ar5_syr_headlines_en.pdf Ioannidis, J. P. (2005). Why most published research findings are false. PLoS Med, 2(8), e124. Kahneman, D. (2011). Thinking, fast and slow. New York: Farrar, Straus, & Giroux. Keller, E. F. (1983). A feeling for the organism: The life and work of Barbara McClintock. San Francisco: W.H. Freeman and Co. Khang, Y.-H. (2013). Two Koreas, war and health. International Journal of Epidemiology, 42, 925–929. Knight, J. (2002). Sexual stereotypes. Nature, 415, 254–256. Korb, K., & Nicholson, A. (2010). Bayesian artificial intelligence (2nd ed.). Boca Raton: Chapman & Hall/ CRC Press. Kragh, H., & Smith, R. W. (2003). Who discovered the expanding universe? History of Science, 41(2), 141–162. Kuhn, T. (1962/1970). The structure of scientific revolutions. Chicago: University of Chicago Press (1970, 2nd ed., with postscript). Lakens, D., Adolfi, F. G., Albers, C. J., Anvari, F., Apps, M. A. J., Argamon, S. E., . . . Zwaan, R. A. (2018). Justify your alpha. Nature Human Behavior, 2, 168–171. Lawson, R. (2006). The science of cycology: Failures to understand how everyday objects work. Memory & Cognition, 34(8), 1667–1675. Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). The parable of Google Flu: Traps in big data analysis. Science, 343(6176), 1203–1205. Le Cam, L. (1986). The central limit theorem around 1935. Statistical Science, 78–91. Lee, T. M., Markowitz, E. M., Howe, P. D., Ko, C. Y., & Leiserowitz, A. A. (2015). Predictors of public climate change awareness and risk perception around the world. Nature Climate Change, 5(11), 1014–1020. Levins, R. (1966). The strategy of model building in population biology. American Scientist, 54, 421–431. Levitt, S., & Dubner, S. J. (2005). Freakonomics: A rogue economist explores the hidden side of everything. New York: William Morrow. Lindley, D. V. (1993). The analysis of experimental data: The appreciation of tea and wine. Teaching Statistics, 15, 22–25. Lord, C. G., Ross, L., & Lepper, M. R. (1979). Biased assimilation and attitude polarization: The effects of prior theories on subsequently considered evidence. Journal of Personality and Social Psychology, 37(11), 2098–209.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:54:24.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

References

325

Manson, M. (1893). Geological and solar climates: Their causes and variations. San Francisco: G. Spaulding & Co. McMullin, E. (1985). Galilean idealization. Studies in the History and Philosophy of Science, 16, 247–273. Mendel, G. (1865/1996). Experiments in plant hybridization (W. Bateson, Trans.). Electronic scholarly publishing project. (Original work published as Versuche über Plflanzenhybriden. Verhandlungen des naturforschenden Vereines in Brünn, Bd. IV für das Jahr 1865, Abhandlungen, 3–47). Retrieved from www.esp.org/foundations/genetics/classical/gm-65.pdf Michotte, A. (1962). The perception of causality. Andover, MA: Methuen. Milgram, S. (1963). Behavioral study of obedience. Journal of Abnormal and Social Psychology, 67(4), 371–378. Mill, J. S. (1893). A system of logic, ratiocinative and inductive: Being a connected view of the principles of evidence and the methods of scientific investigation. New York: Harper & Brothers. Morgan, M., & Boumans, M. J. (2004). Secrets hidden by two-dimensionality: The economy as a hydraulic machine. In S. de Chadarevian & N. Hopwood (eds.), Model: The third dimension of science (pp. 369–401). Stanford: Stanford University Press. National Research Council. (1979). Carbon dioxide and climate: A scientific assessment. Washington DC: National Academies Press. Newton, I. (1671/1672). A letter of Mr. Isaac Newton, Professor of the Mathematicks in the University of Cambridge; containing his new theory about light and colors: Sent by the author to the publisher from Cambridge, Febr. 6. 1671/72; In order to be communicated to the R. Society. Philosophical Transactions, 6, 3075–3087. Newton, I. (1704/1998). Opticks: Or, a treatise of the reflexions, refractions, inflexions and colours of light: Also two treatises of the species and magnitude of curvilinear figures. Commentary by Nicholas Humez (Octavo ed.). Palo Alto: Octavo. Oreskes, N. (2004). The scientific consensus on climate change. Science, 306(5702), 1686. Oreskes, N., & Conway, E. (2010). Merchants of doubt. New York: Bloomsbury. Parsons, H. M. (1974). What happened at Hawthorne? Science, 183(4128), 922–932. Pashler, H., & Wagenmakers, E. J. (2012). Editors’ introduction to the special section on replicability in psychological science: A crisis of confidence?. Perspectives on Psychological Science, 7(6), 528–530. Peirce, C. S. (1903/1904) (1931–1936). The collected papers (Vols. 1–6, C. Hartshorne & P. Weiss, eds.). Cambridge: Harvard University Press. Pfungst, O. (1911). Clever Hans (The horse of Mr. von Osten): A contribution to experimental animal and human psychology (C. L. Rahn, Trans.). New York: Henry Holt (Originally published in German, 1907). Popper, K. (1963). Conjectures and refutations: The growth of scientific knowledge. London: Routledge and Kegan Paul.Pukelsheim, F. (1994). The three sigma rule. The American Statistician, 48(2), 88–91. Rapoport, A., Seale, D. A., & Colman, A. M. (2015). Is tit-for-tat the answer? On the conclusions drawn from Axelrod’s tournaments. PLoS One, 10(7), e0134128. Reichenbach, H. (1938). Experience and prediction. Chicago: University of Chicago Press. Retraction Watch. Tracking retractions as a window into the scientific process. Retrieved from http:// retractionwatch.com/ Rozin, P., Fischler, C., & Shields-Argelès, C, (2012). European and American perspectives on the meaning of natural. Appetite, 59, 448–455. Rudder, C. (2014). Dataclysm: Who we are when we think no one’s looking. New York: Crown Publishers. Schaffer, S. (1989). Glass works: Newton’s prisms and the uses of experiment. In D. Gooding, T. Pinch, & S. Schaffer (eds.), The uses of experiment: Studies in the natural sciences (pp. 67–104). Cambridge: Cambridge University Press. Schelling, T. C. (1969). Models of segregation. American Economic Review, 59, 488–493. Schelling, T. C. (1971). Dynamic models of segregation. Journal of Mathematical Sociology, 1, 143–186.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:54:24.

326

References

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Schoenfeld, J. D., & Ioannidis, J. P. (2013). Is everything we eat associated with cancer? A systematic cookbook review. The American Journal of Clinical Nutrition, 97(1), 127–134. Semmelweis, I. (1861/1983). The etiology, the concept and the prophylaxis of childbed fever (K. C. Carter, Trans.). Madison: University of Wisconsin Press. Simon, V. (2005). Wanted: Women in clinical trials. Science, 308(5728), 1517–1517. Snow, J. (1855). On the mode of communication of cholera. London: John Churchill. Squire, P. (1988). Why the 1936 Literary Digest poll failed. Public Opinion Quarterly, 52(1), 125–133. Stanford, P. K. (2015 online first). Unconceived alternatives and conservatism in science: The impact of professionalization, peer-review, and big science. Synthese, 1–18. Stanziani, A. (2008). Defining natural product between public health and business, 17th to 21st centuries. Appetite, 51, 15–17. Teigen, K. H. (2002). One hundred years of laws in psychology. The American Journal of Psychology, 115, 103–118. Thorgeirsson, T. E., Gudbjartsson, D. F., Surakka, I., Vink, J. M., Amin, N., Geller, F., .  .  . & Gieger, C. (2010). Sequence variants at CHRNB3–CHRNA6 and CYP2A6 affect smoking behavior. Nature Genetics, 42(5), 448–453. Ullman, A. (2007). Pasteur-Koch. Distinctive ways of thinking about infectious diseases. Microbe, 2, 383–387. United States Environmental Protection Agency. (2015). High lead levels in flint, Michigan. Retrieved from www.epa.gov/sites/production/files/2015-11/documents/transmittal_of_final_redacted_report_to_ mdeq.pdf Volterra, V. (1928). Variations and fluctuations of the number of individuals in animal species living together. Journal du Conseil. Conseil Permanent International pour l’Exploration de la Mer, 3, 3–51. Walton, D. (1989/2008). Informal logic: A pragmatic approach. Cambridge: Cambridge University Press. Watson, J. D. (1968). The double helix. New York: Atheneum Press. Weart, S. (2014). The public and climate change (since 1980). Retrieved from https://history.aip.org/ climate/public2.htm Wegener, A. (1929/1966). The origin of continents and oceans. New York: Dover Publications. Weisberg, D. S., Keil, F. C., Goodstein, J., Rawson, E., & Gray, J. R. (2008). The seductive allure of neuroscience explanations. Journal of Cognitive Neuroscience, 20(3), 470–477. Weisberg, M. (2013). Simulation and similarity: Using models to understand the world. Oxford: Oxford University Press. Woodruff, G., & Premack, D. (1979). Intentional communication in the chimpanzee: The development of deception. Cognition, 7(4), 333–362. Woodward, J. (2016). The problem of variable choice. Synthese, 193(4), 1047–1072.

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:54:24.

Index

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Page numbers in italics indicate figures and in bold indicate tables on the corresponding pages. abductive inference 156–159, 157; distinctive characteristics of 159–161, 162 abductive reasoning 156–161 abstraction 118 accuracy 119–120, 121 addition rule 174–175, 176 affirming the antecedent 133 affirming the consequent 135 al-Bīrūnī, ibn Ah·mad 19 alchemy 292 algorithms 113 al-Khwārizmī, ibn Mūsā 19 allergies, peanut 2 American Academy for the Advancement of Science (AAAS) 31 ampliative inferences 153 analogical models 108–109, 109 Andromeda Nebula 127 anomaly 80 antecedents 130, 130; affirming of 133; denying the 135 anti-vaccination advocacy 28 appeal to ignorance 136–137 appeal to irrelevant authority 136 applied research 14 archaeology 161, 162 arguments 129; characteristics of inductive 153–155, 154; uncovering bad 134–137 Aristotle 19, 86–87, 125, 129, 289–290, 292 Arrhenius, Svante August 8, 9, 281 ‘Artificial Production of Carbon Dioxide and Its Influence on Temperature, The’ 8 assumptions 99; auxiliary 58–59, 146–147 astrology 16, 28

asymmetric distribution 187, 188 atmospheric CO2 11, 11 autism 265–266 auxiliary assumptions 58–59, 146–147 average 191 Axelrod, Robert 113–114, 116, 122 axiomatic methods 147–148 axioms 147 background conditions 56 Bacon, Francis 93 bacteria: streptococcus 168, 170 bad arguments, uncovering 134–137 bar charts 184, 185, 186–187 basic research 14 Bayes, Thomas 234, 235 Bayes factor 236 Bayesian inference: Bayesian belief updating in 236–238; Bayesian conditionalization in 236–238; Bayes’s theorem and 234–235; comparing support for different hypotheses using 235–236; problems with 238–239 Bayes nets 266, 266–271, 267–269, 269 Bay Model 89–90, 90, 94, 97, 102, 117, 128; analysis of 100; construction of 98; as scale model 106–107, 108 bell curve 187 bias: confirmation 33, 38; controlling for 68–70; funding 306; observer 50; publication 305; random variables and 173 bibliometric study 283 Big Bang theory 137 big data 84–85, 104 bimodal distribution 186, 186, 186–187

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:54:38.

328

Index

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Blackawton Bees project 36 black holes 39, 40–41 blind experiments 69 brain: activity 42, 270; areas or regions 82, 270; damage 66, 82; metabolism 60 Broca, Paul 82 calibration of instruments 57, 60 Callendar, Guy 8–9, 9 California 89–91, 253 Cal Tech 35 calx 292 cancer 218, 236–8, 246–7, 251, 258, 269–72, 269 cannon thought experiment 86, 87 carbon dioxide, atmospheric see climate change Carlsson, Arvid 14, 299 Cartwright, Nancy 248 case studies 80–82 causal background 251 causal Bayes nets 266, 266–271, 267–269, 269 causal conception 285 causal conception of explanation 284–286 causal hypotheses: germ theory of disease and 259–260; intervention and difference-making 255–257; Mill’s methods of testing 257–258, 259; testing 255–260 causal modeling 262–272; approaches to 263–266, 264; assumptions of 271–272; causal Bayes nets in 266, 266–271, 267–269, 269 causation 242–253; correlation as guide to 247–249, 248; fracking and 242–245, 243; nature of 249–250; necessary and sufficient causes 250–251; probability and 251–253; scientific reasoning about 245; skepticism about 246; spatiotemporal contiguity as guide to 246–247 Centers for Disease Control and Prevention (CDC) 153, 248 central limit theorem 210–212 central tendency 187–191, 188–189, 190, 192 CERN (European Organization for Nuclear Research) 55–56, 67, 84, 162, 221, 224; developing a probability distribution 225–227, 227; using statistics to test hypotheses 221–230 Cepheid variable 127

Chattopadhyay, Raghabendra 76 chemical revolution 292–294, 293 cholera 78–80, 79 classical statistics 233–234 cleansing, data 104 Clever Hans 33–34, 34 climate change 7–11, 9–11, 28–29, 278–279; laboratory experiments on 74–75 Clinton, Hillary 207, 215, 217 cluster indicators 65 cohort studies 83 collaborative experiments 55 collecting data 56–58 collectively exhaustive outcomes 173 color 36, 47–52, 47, 54, 56, 58–61, 94–5, 168, 208–9, 209 common cause 248 computer models 113–114 computer simulations 85 conclusions of arguments 129 conditional probability 177–180, 179 conditionals 131 conditional statements 130, 130–132 confederates 70 confirmation bias 33, 38 conflicts of interest 35 confounding variables 49 consequent 130, 130; affirming the 135; denying the 133 continental drift 158 contributing cause 252 control groups 68, 78 Copernicus, Nicolaus 18, 19–20, 93, 290 correlated variables 196 correlation 195–197, 196; coefficient of 200; as guide to causation 247–249, 248; measures of 197, 197–201, 198–199; spurious 248; strength of 198 counterexamples 135 counterfactual statements 249 Craik, Kenneth 93 creationism and intelligent design 16–17, 28–29 Crick, Francis 102, 107, 107, 108, 295, 299 crisis 291, 292 cross-sectional studies 84 crucial experiments 58–60 curve-fitting 104, 105

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:54:38.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Index Darwin, Charles 137, 197, 289, 294, 297, 301 data 41; big 84–85, 104; collection and analysis of 56–58; curve-fitting 104, 105; models of 103–105, 104; overfitting 104, 106; qualitative 57; quantitative 57, 183–184; questionnaire 57–58; sample 169; visualization of 84 data cleansing 104 data dredging 306 deception 69–70 deductive arguments 129 deductive reasoning: on age of the universe 125–128, 126; in case of puerperal fever 142–146, 143–144, 145; conditional statements in 130, 130–132; Flint, Michigan, water crisis and 150–151, 151; in hypothesis-testing 141–148; hypotheticodeductive (H-D) method 141–142; inference, argument and 128–129 defining science: by its history 18–21, 19; by its methods 23–26, 31–32; by its subject matter 21–23; tricky work of 16–17 denying the antecedent 135 denying the consequent 133 dependent variables 49–50, 66 Descartes, René 24 descriptive statistics 169–170; correlation in 195–201, 196, 197, 198–199; generalizing from 207–217; measures of central tendency in 187–191, 188–189, 190, 192; measures of variability in 191–195, 192, 193, 195; variables and their values in 182–184; visual representation of values of variables in 184–187, 185–187 de Vlamingh, Willem 154 Dianetics 136 difference-making 249–250; intervention and 255–257 Digges, Thomas 93 direct correlation 196 directed acyclic graphs 268–269 direct variable control 67 disease: germ theory of 259–260, 265; heart 77, 248, 253, 258, 272; hereditary 48, 95, 200; Parkinson’s 14, 116; sexually transmitted 1; syphilis 66, 298, 301 distal causes 247

329

DNA (deoxyribonucleic acid) 11, 21, 94, 102, 295, 298–299, 299; analogical models of 108; scale model of 107, 107 Doppler, Christian 127 Doppler effect 127 double-blind experiments 69 drinking water 8, 76, 91, 150, 303 Dubner, Stephen 83 Du Châtelet, Émilie 53, 55 Duflo, Esther 76 Duhem, Pierre 147 Duhem-Quine problem 147, 156 DuPont 36 dyspnoea 268–269, 269 Early Childhood Longitudinal Study 83 earthquakes 158, 242, 244–246, 251, 287; and fracking 242, 244–247, 251, 260 ecological validity 75 economics 17, 20, 30, 71, 76, 266, 275, 280–281 Eddington, Arthur 64, 65, 146 Edwards, Marc 151, 162 Edwards v. Aguillard 151, 162 effect size 230 Einstein, Albert 64, 65, 145–146, 289, 290 electromagnetic radiation 61 Elements of Geometry 147–148 Elliott, Kevin 302–304 empirical evidence 23–25 empiricism 24 Environmental Protection Agency (EPA) 151 errors, sampling 216–217 estimating from samples 212–215, 213, 214 Ethyl Corporation 35–36, 298 Euclid 147–148, 290, 295 eugenics 201, 301 Europe 18–19, 21, 28, 154, 305 European Organization for Nuclear Research: see CERN (European Organization for Nuclear Research) evidence: definition of 25; empirical 23–25; falsification of 25–26 evidentialism 23–25 evolution, theory of 289, 294, 297, 301 exemplar 95 exemplification 95

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:54:38.

330

Index

Copyright © 2018. Taylor & Francis Group. All rights reserved.

expectancy bias 38 expectations 40–41; in perfectly controlled experiments 63–66, 65 experimental groups 68; choices in 77–78 experimentation, modeling as 115–116 experiments: blind 69; case studies and natural 80–83; collaborative 55; contributing to science 46–48, 47; crucial 58–60; double blind 69; experimental setup of 55–56; exploratory 61; field 75–76; intervention 49, 66–67; laboratory 74–75; on light 51–54, 52, 54–55; other roles for 60–61; perfectly controlled 63–70; replication of 37–38, 59–60; thought 85–86; variables in 48–51, 51 explanation: causal 284–286; natural 22–23; nomological 279–284, 280; as pattern-fitting 282–284, 284; and understanding 275–286 explanatory knowledge 14, 277 exploratory experimentation 61 external experimental validity 75 extraneous variables 49–50 faithfulness 272 false positives 236–238 falsifiable claims 25 falsification 25, 26, 25–26, 154–155; openness to 26 field experiments 75–76 ‘final theory of everything’ 25 Fisher, Ronald 225, 225 FiveThirtyEight 215, 216–217 Fizeau, Hippolyte 61 Flint, Michigan, water crisis 150–151, 151, 153, 162, 163 food allergies 2 fracking 242–245, 243 Franklin, Rosalind 295, 298–299, 299 Freon 36 frequency distributions 208–212, 209, 211 frequentist interpretation 233 Freud, Sigmund 63 fruit flies (Drosophila melanogaster) 95–96 functional magnetic resonance imaging (fMRI) 60 funding bias 306 Gage, Phineas 80–82, 81 Galilei, Galileo 20, 86–87

Gallup polls 216 Galton, Francis 197–201, 199, 200, 301 gambler’s fallacy 180 game theory 64 Gauss, Carl Friedrich 210 Gaussian distribution 187, 210, 211 generality 119, 120 generalizations, inductive 152 General Motors 36 genetically modified organisms (GMOs) 22, 301 geocentrism 43 geometry 147–148, 295 Gianotti, Fabiola 222 glaciers 8, 29, 55–56 global warming 8, 13, 15, 23, 138, 163, 245, 278, 281, 303 Google Flu Trends 84 Gopnik, Alison 278 Grandin, Temple 300 greenhouse gases 7–8, 27, 29, 128, 281 Hansen, James 9 Harlow, John 81 Harvard Six Cities Study 78, 82, 83 Harvard University 35 Hauser, Marc 35 Hawthorne effect 50–51 Heezen, Bruce 157, 158 heliocentrism 20–21, 42, 43, 93 Hempel, Carl 142, 279–280 heredity 200 Herschel, William 53, 54, 55, 61 Hesse, Mary 94 Higgs boson 222, 222–223 Hindu-Arabic numeral system 18–19 histograms 186, 186, 186–187, 186–187, 193 history: of modeling 93–94; of science 18–21, 19 Homo sapiens 161, 162 Hubbard, L. Ron 136 Hubble, Edwin 126, 126–128, 131, 159 human reasoning, flaws in 33–34, 34 Hume, David 24, 155, 246, 249, 255 Huygens, Christian 159 hypotheses 39–40; alternative 223; deductive reasoning in testing 141–148; null 223–224, 226, 228, 229, 260; testing causal 255–260;

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:54:38.

Index

Copyright © 2018. Taylor & Francis Group. All rights reserved.

underdetermination of 58, 59; using statistics to test 221–230 hypothetico-deductive (H-D) method 141–142, 159–160, 223; auxiliary assumptions in 146–147; in case of puerperal fever 142–146, 143–144, 145 Ibn al-Haytham 19, 51–52, 53 Ibn Rushd 19 Ibn Sina 19 ice cores 9, 10 idealizations, model 99, 118 illusion of explanatory depth 278, 279–280 illusion of understanding 12–13 importance of science 13–14 incentive structure in science 305 independent outcomes 174 independent variables 49–50, 66, 77, 256–257 indirect correlation 196 indirect variable control 67–68, 78 induction, problem of 155–156 inductive arguments 153–155, 154 inductive generalizations 152 inductive inference 151–152, 170 inductive projections 152 Industrial Revolution 11, 11 inferences: abductive 156–161; ampliative 153; bad reasons to reject 137–138; deductive reasoning 125–148; definition of 129; evaluating 132–134; inductive 151–152, 170; non-ampliative 150; problem of induction and 155–156; sound 134; strength of 153; testimony and 162–163 inference to the best explanation 158 inferential statistics 169–170; Bayesian inference 234–239; classical statistics and its problems in 233–234; considerations in designing statistical tests in 229–230; definition of 208; estimating from samples in 212–215, 213, 214; frequency distributions and probability distributions in 208–212, 209, 211; generalizing from descriptive statistics 207–217; representative samples in 215–216; used to test hypotheses 221–230 informal fallacy 136 ingenuity 36 Inhofe, James 278, 279

331

institutional care for children 83 instruments 55, 56–57; calibration of 57, 60 Intergovernmental Panel on Climate Change (IPCC) 12, 29 internal experimental validity 74, 76 intervention 49, 66–67; computer simulations 85; difference-making and 255–257; thought experiments 85–86 investigators, norms of 35–36 isomorphism 117 James, LeBron 168, 174, 179–180, 183–184, 208 Jebel Irhoud (Morocco) 161, 162 Jenner, Edward 265 joint method of agreement and difference 258, 259 joint probability distributions 266 justification 13 Kahlo, Frida 1 Kahneman, Daniel 32, 128 Keeling, C. David 9, 9 Keeling Curve 9, 10 Kehoe, Robert A. 36, 298 Kekulé, Friedrich August 31–32, 128 Kepler, Johannes 20 Kibble balance 57 Kitab al-Manazir (Book of Optics) 51–52 Kitzmiller v. Dover Area School District 29 knowledge 13; explanatory 14, 277; pure 13–14; scientific 13–14, 276–279 Koch, Robert 259–260 Kolletschka, Jakob 144–145 Korea 82 Kuhn, Thomas 290–291, 292, 293–294 Kyoto Protocol 7 laboratory experiments 74–75 La Divina Commedia 28 Landon, Alfred 216 Large Hadron Collider 42, 55, 67, 84, 221–222 Larsen effect 284–285 Lavoisier, Antoine-Laurent 292–294, 293 Lavoisier, Marie-Anne Paulze 292–294, 293 lead (pb) 35–36, 150–151, 293, 298 Leborgne, Louis 82 Leibniz, Gottfried Wilhelm 24 Levitt, Steven 83

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:54:38.

332

Index

Copyright © 2018. Taylor & Francis Group. All rights reserved.

life sciences 17 light 51–54, 52, 54–55, 61; speed of 127 limitations of science 14 Literary Digest 216 Locke, John 24 logic 132, 134, 138, 147, 186, 245, 298, 315 longitudinal studies 83 Lotka, Alfred 98 Lotka-Volterra Model 98–99, 99, 102, 108; abstraction in 118; analysis of 100; as mathematical model 110–111; as theoretical use of modeling 116 lung cancer 269, 269–270 McClintock, Barbara 300 Malthus, Thomas Richard 289, 294, 297 mammography 236–237 Manson, Marsden 8 Markov condition 271–272 material conditionals 131 mathematical models 110–113, 112 Matilda effect 299 Mauna Loa Observatory (Hawai’i) 9 Maxwell, James 61 mean 191–192, 195; regression to the 200 measles, mumps, and rubella (MMR) vaccine 265 measurement error 57 mechanistic explanation 285 mechanistic models 109–110, 110 median 191 Mendel, Gregor 47–48, 95 mental models 93–94 meteorology 200 Mill’s methods: of agreement 257, 259; of concomitant variations 257, 259; of difference 257, 259; joint 258, 259; of residues 258, 259 methodological naturalism 22 methods: axiomatic 147–148; defining science by its 23–26; explanation 40–41; hypotheses 39–40; myth of the scientific method and 31–32; observation 41–42; in science 38–39 Michotte, Albert 246 Milgram, Stanley 69–70, 75 Mill, John Stuart 257–258, 259 mode 186, 190

models: accuracy in 119–120, 121; analogical 108–109, 109; analysis of 100–101; assumptions in 99; Bay Model 89–90, 90; causal 262–272; characteristics of good 118–122, 121; computer 113–114; construction of 97–100, 99; of data 103–105, 104; as experimentation and theorizing 115–116; generality of 119, 120; history of 93–94; idealizations in 99, 118; mathematical 110–113, 112; mechanistic 109–110, 110; of phenomena 105–106; precision of 119, 120, 121; robustness of 119, 122; role of 90–93, 92; scale 106–107, 107; similarity and difference 93–96, 96; specification of target system(s) 96–97; three features shared by all 117–118; tractability of 119, 121–122; trade-offs in building 122; types of 102–103 Modern Synthesis 294–295 modularity 271 modus ponens 133 modus tollens 133 Monetary National Income Analogue Computer (MONIAC) 108, 109, 110 monotonicity 132–133 Montagu, Kathleen 14, 299 Möstlin, Michael 20 Mount Wilson Observatory (California) 126, 127 multiplication rule 175–176, 176 mutually exclusive outcomes 173 NASA (National Aeronautics and Space Administration) 73, 162, 304 National Institutes of Health 153 National Institute of Standards and Technology (NIST) 57 National Research Council 9 natural experiments 82–83 natural explanations 22–23 naturalism 22–23 naturalistic inquiry 22–23 natural phenomena 21–22 natural selection 294 nature of science 26–29, 27 Nazi Germany 298 necessary and sufficient causes 250–251

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:54:38.

Index

Copyright © 2018. Taylor & Francis Group. All rights reserved.

necessary condition 130, 130 negative correlation 196–197 Neptune 11 Newton, Isaac 18, 52, 55, 56, 60, 145–146, 290; cannon thought experiment 86, 87; controlling variables and 67; light experiments 52, 52–53, 55, 61, 67; scientific laws and 281–282; on space 64 nodes 263, 264 nomological conception of explanation 279–284, 280 non-ampliative inferences 150 non-experimental studies: case studies and natural experiments 80–83; cholera outbreak of 1854 78–80, 79; extending over time 83–84; variation from the perfect experiment 73 non-monotonic arguments 153 non-revolutionary scientific change 294–295 normal distribution 187, 194–195, 195 normal science 291, 292 normative versus descriptive claims in science 32 norms: of investigators 35–36; social 37–38, 38 objectivity 238, 297, 302, 304–305, 308 observable phenomena 21–22 observational studies: case studies and natural experiments 80–83; cholera outbreak of 1854 78–80, 79; definition of 79; extending over time 83–84 observations 41–42; statistical significance of 227–229 observer bias 50 observer-expectancy effect 33–34 OKCupid 84, 85 Oklahoma 242, 243, 244–246, 251, 278, 278 ontological naturalism 22 openness to falsification 26 operational definitions 65 Opticks 60, 61 Oreskes, Naomi 12 Origin of Species 294 outcome space 172 outliers 191 overfitting 104, 106 oxygen 60, 250, 293; as dephlogisticated air 293

333

Pangaea 156–158, 157 paradigms 291; pre-paradigmatic phase of science 291, 292 paradox of inquiry 97 parameters 98 Paris Agreement 7 partial cause 252 participation in science, women’s 298–300 Pasteur, Louis 259, 265 pattern conception of explanation 282–284, 284 Patterson, Clair 35–36 Payne-Gaposchkin, Cecilia 298 payoff matrix 112, 112 Peano, Giuseppe 148 peanut allergies 2 Peirce, Charles Sanders 159 perfectly controlled experiments 63–70; controlling for bias in 68–70; controlling variables in 67–68; defining expectations in 63–66, 65; intervention in 66–67; variation from 73 Persian Golden Age 18, 18, 18–19, 20, 51 Pfungst, Oskar 34 phenomena 14; models of 105–106; natural 21–23; observable 21–22 Phillips, William 108 Phillips machine (MONIAC) 108, 109, 110, 117 Philosophiæ Naturalis Principia Mathematica 18 philosophy of science 3 phlogiston 292–293 phrenology 33 physical constants 60–61 physical processes 249–250 physical sciences 17 pie charts 184, 185 placebo effect 69 plagiarism 35 Planck, Max 57 plant fertilization 47, 47–48 plate tectonics 158 polio 1–2 pollution 78, 104, 268–271, 268–269 Popper, Karl 24, 26 populations and samples 169 population validity 75 positive correlation 196

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:54:38.

334

Index

Copyright © 2018. Taylor & Francis Group. All rights reserved.

posterior probability 234–235 post hoc, ergo propter hoc fallacy 247 power 230 precision of models 119, 120, 121 predictions 168 premises of arguments 129 Priestley, Joseph 293, 293 prior probability 234–235 prisoner’s dilemma 111–113 probability and causation 251–253 probability distributions 208–212, 209, 211, 225–227, 227 probability theory 170–171; addition rule 174–175, 176; conditional probability 177–180, 179; multiplication rule 175–176, 176; random variables in 172–174; subtraction rule 176, 176–177 problem of induction 155–156 projections, inductive 152 prospective studies 83 proximate causes 247 pseudoscience 16–17, 28–29 psychology 17, 20, 28, 32, 60, 93, 134, 284, 287 psychoanalytic theory 63, 64 PsycLit database 283–284, 284 Ptolemy 19–20 publication bias 305 puerperal fever 142–146, 143–144, 145 pure knowledge 13–14 p-value 228–229, 233 qualitative data 57 qualitative variables 183 quantitative analysis 26 quantitative data 57, 183–184 quantitative variables 183 questionnaires 57–58 Quine, Willard van Orman 147 randomization 68, 69, 77–78 random sampling 216 random variables 172–174 range 192 Rapaport, Anatole 114 rapid strep test 168–169 rational degree of belief 234

rationalism 24 reasoning: abductive 156–159; causal 242–272; deductive 125–148; definition of 128–129; statistical (see statistics) Reber, John 91, 92, 93–95, 97, 100, 116–117, 123, 128, 277 Reber Plan 91–93, 92, 116, 117, 128; analysis of 100 recipes for science 3–4, 31–32, 39 reciprocal altruism 114 redshift 127 regression analysis 197–201, 198 regression to the mean 200 relative frequency distributions 208 relativity, theory of general 64, 65, 289, 295 Renaissance 17, 125 replication 37–38, 59–60 representative samples 215–216 Retraction Watch 35 retrograde motion 19, 19, 20 retrospective studies 83 robustness analysis 101 robustness of models 119, 122 role of science 11–13 Rømer, Ole 61 Roosevelt, Franklin D. 1, 216 Royal Society 56, 60 Safe Drinking Water Act 150 Salk, Jonas 1 sample data 169 samples: estimating from 212–215, 213, 214; populations and 169; representative 215–216 sample size 68; choices in 77 sample space 172 sample standard deviation 213–214, 214 sampling, random 216 sampling distribution 215 sampling errors 216–217 San Francisco Bay Model see Bay Model scale models 106–107, 107 scatterplots 196, 196, 196–198, 198, 199, 200 science: climate change 7–11, 9–11; contributions of experiments to 46–48, 47; defined by its history 18–21, 19; defined by its methods 23–26, 31–32; defined by its subject matter 21–23; effects on daily life 1–2; expectations

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:54:38.

Copyright © 2018. Taylor & Francis Group. All rights reserved.

Index in 40–41; flaws in human reasoning and 33–34, 34; hypotheses in 39–40; illusion of understanding in 12–13; importance of 13–14; limitations of 14; methods in 38–39; models in (see models); nature of 26–29, 27; normative versus descriptive claims in 32; norms of investigators in 35–36; observations in 41–42; philosophy of 3; recipes for 3–4, 31–32, 39; role of 11–13; self-correction in 306–307; in social context 297–298; tricky work of defining 16–17; trust and objectivity challenges facing 304–307; value-free ideal in 300–301; values shaping 301–304, 302; why learn about 2–3; women in 298–300, 299 scientific breakthroughs 289–290 scientific law 281 scientific method 31–32 scientific progress 295–296 Scientific Revolution 18, 20–21, 290; chemical revolution and 292–294, 293; data collection during 56; Kuhn and 290–291, 292; scientific methods during 23 scientific theories 288–289 self-correction 306–307 self-explanation effect 277 self-interest 64 Semmelweis, Ignaz 142–145, 145 Seoul National University 35 significance level 227 similarity and difference 93–96, 96 Simpson’s paradox, Edward 253 68–95–99.7 rule 214, 214 skepticism 37–38; about causation 246 Snow, John 79–80, 80 Snydor, Rick 162 social context, science in 297–298 social norms 37–38, 38 social sciences 17 sound inferences 134 space exploration 73, 74 spatiotemporal contiguity as guide to causation 246–247 spurious correlations 248 Stahl, Georg Ernst 292–293, 293 standard deviation 194–195, 195, 227; sample 213–214, 214

335

standard error 215 Stapel, Diederik 35 State Research Centre of Virology and Biotechnology 153 statistical description 217 statistical evidence 168 statistically independent variables 177 statistical significance 227–229, 230 statistical thinking, importance of 167–169 statistics: descriptive (see descriptive statistics); importance of 167–169; inferential (see inferential statistics); populations and samples in 169; probability theory in 170–180 Stellar Atmospheres 298 strawman fallacy 136 strength, inference 153 string theory 25 Structure of Scientific Revolutions, The 290 Stumpf, Carl 34 subject matter, defining science by its 21–23 subjects, experimental 55 subtraction rule 176, 176–177 sufficient causes 250–251 sufficient condition 130, 130 supernatural entities and occurrences 22 super-observational access 42 surgical intervention 66–67, 76, 78 survey data 57–58 Tapestry of Values, A 302 target systems 93–96, 96, 103; specification of 96–97 taxi drivers 275–276, 280, 280–281, 282 Tertullian 93 testimony 162–163 Tharp, Marie 157, 158 theology/religion 14, 19–21, 82 theorems 147 theoretical claims 39 theories, scientific 288–289 theorizing and theory change: chemical revolution and 292–294; Kuhn’s scientific revolutions and 290–291, 292; non-revolutionary scientific change and 294–295; scientific breakthroughs and 289–290; scientific progress and 295–296; scientific theories and 288–289

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:54:38.

336

Index

thought experiments 85–86 time, studies extending over 83–84 Tit-for-Tat 114, 116 Tolman, Edward 93 total probability 173 total trihalomethane (TTHM) 150 tractability of models 119, 121–122 Trinity College, Cambridge 56 Trump, Donald 207, 215–216, 304 Trust 11–12, 15–16, 37–38, 38, 42, 60, 100, 117, 162, 290, 297, 304–305, 307 Truth 28, 40–41, 58, 132–135, 137, 141–142, 150, 152–154, 158, 161, 163, 233, 249, 295, 297–298 Turing, Alan 298–299 Tuskegee Syphilis Experiment 298, 301 Tversky, Amos 32 type I error 229, 233 type II error 229, 233

Copyright © 2018. Taylor & Francis Group. All rights reserved.

underdetermination 58, 59 understanding 276–279; definition of 277; illusion of 12–13; illusion of explanatory depth and 279–280 unification conception of explanation 282 uniform distribution 187, 188–189 uniformity of nature 155–156 UC Berkeley 253 US Army Corps of Engineers 91, 93 US Dairy Association (USDA) 248 US Public Health Service 298, 301 vaccinations 1–2, 28; causal modeling of immunity and 263–266, 264 validity: deductive reasoning and 132; ecological 75; population 75 value-free ideal 300–301 value of a variable 49, 182–184; visual representation of 184–187, 185–187

values: shaping science 301–304, 302; trust and objectivity 304–307; value-free ideal and 300–301 variability 188; measures of 191–195, 192, 193, 195 variables 48–51, 51, 66–67; choices in 76–78; controlling 67–68; correlated 196; definition of 183; in descriptive statistics 182–202; qualitative 183; quantitative 183; random 172–174; value of 49, 182–187, 185–187 variance 192–194 variation 167 virus: cowpox 265; ebola 83, 229; human immunodeficiency virus (HIV) 11; human papilloma virus (HPV) 1–2; influenza 273; smallpox (variola) 153–154, 265; Zika 301 visualization, data 84 visual representation of values of variables 184–187, 185–187 Vitruvius 93 Volterra, Vito 98 von Osten, Wilhelm 33–34, 34 Wallace, Alfred Russel 32, 294 water crisis, Flint, Michigan 150–151, 151, 153, 162, 163 Watson, James 102, 107, 107, 108, 295, 299 Wegener, Alfred 156–158 Wells, Herbert George 167 Western Electric Hawthorne Factory 50, 51, 75 ‘Women as policy makers’ 76 women in science 298–300, 299 Woo-suk, Hwang 35 World War 8, 113, 298 World Health Organization (WHO) 153 Zakariyya al-Razi, Bakr Muhammad ibn 19

Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122. Created from purdue on 2021-08-29 21:54:38.