Sampling in Judgment and Decision Making 1316518655, 9781316518656

Sampling approaches to judgment and decision making are distinct from traditional accounts in psychology and neuroscienc

197 73 8MB

English Pages 571 [573] Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Sampling in Judgment and Decision Making
 1316518655, 9781316518656

Citation preview

SAMPLING IN JUDGMENT AND DECISION MAKING

Sampling approaches to judgment and decision making are distinct from traditional accounts in psychology and neuroscience. While these traditional accounts focus on limitations of the human mind as a major source of bounded rationality, the sampling approach originates in a broader cognitive-ecological perspective. It starts from the fundamental assumption that in order to understand intrapsychic cognitive processes one first has to understand the distributions of, and the biases built into, the environmental information that provides input to all cognitive processes. Both the biases and restriction, but also the assets and capacities of the human mind often reflect, to a considerable degree, the irrational and rational features of the information environment and its manifestations in the literature, the Internet, and collective memory. Sampling approaches to judgment and decision making constitute a prime example of theorydriven research that promises to help behavioral scientists cope with the challenges of replicability and practical usefulness.   is a Full Professor at Heidelberg University, Germany. He is a member of the German National Academy of Sciences Leopoldina, and a recipient of several science awards. Currently, he is chief editor of Perspectives on Psychological Science. His recent research has concentrated on judgment and decision making from a cognitive-ecological perspective.   is Professor of Psychology at Uppsala University, Sweden, and a member of the Royal Swedish Academy of Sciences. His research primarily concerns judgment and decision making. He has published extensively in prominent psychology journals on topics related to subjective probability judgment, overconfidence, multiplecue judgment, and risky decision making.   is Professor of Behavioral Science at Warwick Business School, University of Warwick, UK. He previously held positions at the University of Oxford, UK, and Stanford University, USA. His work focuses on how the biased experiences available to people lead to systematic biases in choices and judgment. He has published numerous articles in Science, Proceedings of the National Academy of Sciences (PNAS), and Psychological Review.

Published online by Cambridge University Press

Published online by Cambridge University Press

SAMPLING IN JUDGMENT AND DECISION MAKING       KLAUS FIEDLER University of Heidelberg

PETER JUSLIN Uppsala University

JERKER DENRELL University of Warwick

Published online by Cambridge University Press

Shaftesbury Road, Cambridge  , United Kingdom One Liberty Plaza, th Floor, New York,  , USA  Williamstown Road, Port Melbourne,  , Australia –, rd Floor, Plot , Splendor Forum, Jasola District Centre, New Delhi – , India  Penang Road, #–/, Visioncrest Commercial, Singapore  Cambridge University Press is part of Cambridge University Press & Assessment, a department of the University of Cambridge. We share the University’s mission to contribute to society through the pursuit of education, learning and research at the highest international levels of excellence. www.cambridge.org Information on this title: www.cambridge.org/ : ./ © Cambridge University Press & Assessment  This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press & Assessment. First published  A catalogue record for this publication is available from the British Library. Library of Congress Cataloging-in-Publication Data : Fiedler, Klaus, – editor. | Juslin, Peter, editor. | Denrell, Jerker, editor. : Sampling in judgment and decision making / edited by Klaus Fiedler, University of Heidelberg, Peter Juslin, Uppsala Universitet, Sweden, Jerker Denrell, University of Warwick. :  Edition. | New York, NY : Cambridge University Press, . | Includes bibliographical references and index. :   (print) |   (ebook) |   (hardback) |   (paperback) |   (epub) : : Judgment–Psychological aspects. | Cognition. | Decision making. :   .  (print) |   (ebook) |  ./–dc/eng/ LC record available at https://lccn.loc.gov/ LC ebook record available at https://lccn.loc.gov/  ---- Hardback  ---- Paperback Cambridge University Press & Assessment has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.

Published online by Cambridge University Press

Contents

List of Figures List of Tables List of Contributors

page viii xvi xviii

           The Theoretical Beauty and Fertility of Sampling Approaches: A Historical and Meta-Theoretical Review



Klaus Fiedler, Peter Juslin, and Jerker Denrell

 Homo Ordinalus and Sampling Models: The Past, Present, and Future of Decision by Sampling



Gordon D. A. Brown and Lukasz Walasek

 In Decisions from Experience What You See Is Up to Your Sampling of the World



Timothy J. Pleskac and Ralph Hertwig

 The Hot Stove Effect



Jerker Denrell and Gaël Le Mens

     The J/DM Separation Paradox and the Reliance on the Small Samples Hypothesis



Ido Erev and Ori Plonsky

 Sampling as Preparedness in Evaluative Learning Mandy Hu¨tter and Zachary Adolph Niese

v

Published online by Cambridge University Press



Contents

vi 

The Dog That Didn’t Bark: Bayesian Approaches to Reasoning from Censored Data



Brett K. Hayes, Saoirse Connor Desai, Keith Ransom, and Charles Kemp



Unpacking Intuitive and Analytic Memory Sampling in Multiple-Cue Judgment



August Collsiöö, Joakim Sundh, and Peter Juslin

      

Biased Preferences through Exploitation



Chris Harris and Ruud Custers

 Evaluative Consequences of Sampling Distinct Information



Hans Alves, Alex Koch, and Christian Unkelbach

 Information Sampling in Contingency Learning: Sampling Strategies and Their Consequences for (Pseudo-) Contingency Inferences



Franziska M. Bott and Thorsten Meiser

 The Collective Hot Stove Effect



Gaël Le Mens, Balázs Kovács, Judith Avrahami, and Yaakov Kareev

       Sequential Decisions from Sampling: Inductive Generation of Stopping Decisions Using Instance-Based Learning Theory



Cleotilde Gonzalez and Palvi Aggarwal

 Thurstonian Uncertainty in Self-Determined Judgment and Decision Making



Johannes Prager, Klaus Fiedler, and Linda McCaughey

 The Information Cost–Benefit Trade-Off as a Sampling Problem in Information Search Linda McCaughey, Johannes Prager, and Klaus Fiedler

Published online by Cambridge University Press



Contents

vii

          Heuristic Social Sampling



Thorsten Pachur and Christin Schulze

 Social Sampling for Judgments and Predictions of Societal Trends



Henrik Olsson, Mirta Galesic, and Wändi Bruine de Bruin

 Group-Motivated Sampling: From Skewed Experiences to Biased Evaluations



Yrian Derreumaux, Robin Bergh, Marcus Lindskog, and Brent Hughes

 Opinion Homogenization and Polarization: Three Sampling Models



Elizaveta Konovalova and Gaël Le Mens

     An Introduction to Psychologically Plausible Sampling Schemes for Approximating Bayesian Inference



Jian-Qiao Zhu, Nick Chater, Pablo León-Villagrá, Jake Spicer, Joakim Sundh, and Adam Sanborn

 Approximating Bayesian Inference through Internal Sampling



Joakim Sundh, Adam Sanborn, Jian-Qiao Zhu, Jake Spicer, Pablo León-Villagrá, and Nick Chater

 Sampling Data, Beliefs, and Actions



Erik Brockbank, Cameron Holdaway, Daniel Acosta-Kane, and Edward Vul

Index

Published online by Cambridge University Press



Figures

. Two stages of information transmission from a cognitive-ecological perspective. . Expected (population) probability p as a function of observed sample proportion P, at two sample sizes n,  and  (assuming a uniform prior for p). An agent that myopically reports the sample proportion P as their estimate of probability p will make too extreme estimates, as identified by the deviation from the identity line. An agent who takes the sample size into account dampens the observed proportion according to n. . Increasing likelihood of sampling at t +  as a function of valence experienced at t. . Illustration of relative rank effects in valuation. Panel A: Two distributions of quantities (of bags of mixed sweets). Panel B: Amounts willing to pay (normalised). . Histogram of incomes (left panel) and theoretically predicted rank-based evaluation of incomes from the same distribution (right panel). . Left panel: probability weighting function. Right panel: cumulative distribution of relative subjective frequencies. . (a) Experimental conditions and prices of stocks in thousands of euros (i.e., index fund) across  monthly periods. At the bottom of the panel, there are four arrows. Solid arrow segments indicate periods of investment. Dotted arrow segments indicate periods of learning from descriptive sources. The four conditions were compared over the evaluation window from period  to period . (b) Percentages invested in stocks by condition. Dots indicate individuals’ allocations. viii

Published online by Cambridge University Press

page 

 

  

List of Figures

. . . . . . .

.

.

.

The think lines show the mean percentages; the thicker lines show the data smoothed by local polynomial regression fitting. Examples of studies of judgment and decisions making with and without J/DM separation. Example of studies of decisions from sampling without and with explicit presentation of the rare outcomes. The list of stimuli used by Erev, Shimonowitch, et al. (). A thought experiment (following Plonsky et al., ). Illustration of the experimental task, and main results, in the study of decisions from description with feedback conducted by Erev et al. (). Evaluative shift as a function of US valence, autonomy, and number of samples. Property generalization ratings demonstrating the “frames effect.” Test items S–S were small rocks drawn from the training sample. T–T are novel rocks of increasing size. A Bayesian mixed-model analysis of variance (N ¼ ) found strong evidence of an effect of sampling frame, BF ¼ :. The generalization space (A) and property generalization results (B) for the environmental contamination study with category and property framing. A Bayesian analysis of variance (N ¼ ) found strong evidence of an effect of sampling frames, BF ¼ .. Property generalization with a consistent category frame (No Switch) and a combination of category and property frames (Switch). S–S were small rocks drawn from the training sample. T–T are novel rocks of increasing size. A Bayesian mixed-model analysis of variance (N ¼ ) found moderate evidence of a difference between no-switch and switch conditions, BF ¼ .. The characteristic quantitative predictions by each of the four cognitive process models summarized in Table .. The identity lines in the graphs represent correct judgments and the rectangles in the left-most panels identify the rare but potentially

Published online by Cambridge University Press

ix

      







List of Figures

x

.

.

.

.

.

larger errors that are predicted by Analysis(B). The predictions in the graphs are stylized examples of error-free predictions by the models that have either been perturbed by a few larger errors (left-side panels for Analysis(B)) or perturbed by a ubiquitous Gaussian noise (right-side panels for Intuition(B)). The error distributions from simulations with adaptations of the Generalized Context Model when applied to a multiple-cue judgment task with a continuous criterion. From left to right, a GCM applied to a continuous criterion with extremely high specificity parameter that makes a judgment by sampling the  most similar exemplars from memory, sampling only the most similar exemplar. Distributions of lambda (λ) for participants best fit by an exemplar-based memory model, EBM (right side panels) and a cue-abstraction model, CAM (left side panels) performing an additive task (top panels) or a nonadditive task (bottom panels). The responses by ID  in an additive multiple-cue learning task (squares) and the prediction by a standard multiple regression model (the solid line) plotted against the correct criterion value. Predictions by the PNP model implementing a cue-abstraction model coincide with the dotted identity line (x ¼ y). Multiple data points overlap. The responses by ID  in a nonadditive multiple-cue learning task (squares) and the prediction by a standard multiple regression model (the solid line) plotted against the criterion value. Predictions by the PNP model implementing a cue-abstraction model coincide with the dotted identity line (x ¼ y). Multiple data points overlap. The responses (squares) by participant ID  and ID  both best fitted by an EBM model. The exemplars that are new in the test phase and require extrapolation are the filled symbols. Except for the extrapolation items, the predictions of the PNP model coincide with the dotted identity line (x ¼ y). Multiple data points overlap.

Published online by Cambridge University Press













List of Figures . Distribution of lambda (λ) for participants receiving deterministic (left panel) or probabilistic (right panel) feedback, best fit by CAM (top row) or EBM (bottom row). . Percentage of participants sampling the frequent option per trial (left: Experiment a, right: Experiment b). . Percentage of participants sampling the frequent option per trial (left: Experiment , right: Experiment ). . Degrees of distinctiveness among features and entities. . Distinctiveness of positive and negative features and entities. . Experiment: Scatterplot and regression lines of the errors in average ratings as a function of the number of ratings received by a picture. The black dots correspond to the pictures for which the final score is outside of the % CI for the quality q i (because our measure of quality is based on a finite number of ratings, it approximates the true quality and thus we computed the % CI on the true quality). The grey dots correspond to the pictures for which the final score is within the % CI for quality. . Scatterplot and regression line of the association between number of judgments and collective evaluation in the two-country data set. Each circle corresponds to one product. For visual clarity, the graph is based on observations such that   ΔP i   (N ¼ ). . The distribution of sample sizes from the human data and IBL model for (a) all problems; (b) Risky–Risky and Risky–Safe problems; (c) Gains, Losses, and Mixed domain problems. . Correlation for each of the , paper-problem pairs as predicted by the IBL model and the observed sample size in the human data: (a) all problems; (b) Risky–Risky and Risky-Safe problems; (c) Gains, Losses, and Mixed domain problems. . An example of the marginal value δGapt in human and model data up to  samples; (a) for all problems; (b) Risky–Risky and Risky–Safe problems; (c) Gain, Loss, and Mixed domain problems. . Empirically observed judgment strength J plotted against sample size n for (a) externally determined

Published online by Cambridge University Press

xi

    











List of Figures

xii

. .

. .

.

.

.

sample size (truncation was determined by the software), (b) for self-truncated sampling and (c) yoked controls who received exactly the samples of (b). Judgment strength denotes the extremity of the likability judgment towards the population direction (i.e., likability as it is judged for predominantly positive targets and likability judgments with reversed sign for predominantly negative targets). Thin grey lines connect individual averages per sample size and the solid black line averages per sample size of these individual averages of judgment strength J, error bars indicate corresponding standard errors. Schematic diagram of the self–other yoked controls design. The time axis is vertically oriented from top to bottom. Empirically observed judgment strength J plotted against sample size n for (A) self-truncated sampling in a first block, (B) yoked controls who received their own samples in a second block, and (C) yoked controls who received samples that were truncated by other participants, with yoked control participants of both (B) and (C) receiving samples from (A). Graphical illustration of a fund-investment choice task to investigate speed–accuracy trade-offs in sample-based choice tasks. Simulation results of the proportion of correct decisions as a function of mean sample size n based on an algorithm that always samples n observations. Schematic illustration of one of the sequences of cost parameters used in the sample-based decision task for Blocks  to , with displayed information costs and payoff in the middle row and the standardised ratio in the bottom row. Graph indicating participants’ mean sample sizes (grey squares with dashed bars) and corresponding mean optimal sample size (black points) for each cost ratio. Bars indicate standard deviations. Illustration of the distribution of subjective value x. The two shaded areas under the curve

Published online by Cambridge University Press

 

 







List of Figures

. .

.

.

.

xiii

represent the probabilities that event A or event B, respectively, are judged to be more frequent; in both cases, search is stopped and no further circles are inspected. The nonshaded area under the curve represents the case in which the subjective value is not large enough to discriminate between the events; in this case another circle is selected for inspection.  Illustration of some possible trajectories that social sampling with the social-circle model can take when sequentially inspecting in memory a person’s social circles.  Domain specificity of social sampling (Panel a) and available social information (Panel b). Panel a: Proportion of adult participants whose circle weight parameters indicated that the self, family, friends or acquaintance circle was most likely to be probed first across three different judgment domains. Panel b: Mean proportions of items (vacation destinations, sports, and first names) for which participants recalled at least one instance, across the self, family, friends, or acquaintance circle and for each of three judgment domains.  Environmental properties manipulated in the computer simulations: frequency distribution (Panel a) and spatial clustering (Panel b); and the resulting average sample size (Panel c) and accuracy (Panel d) achieved by exhaustive and heuristic sampling.  (a) Simulated example of the social sampling process when the population includes two levels of the target characteristic (voters of red and blue parties) and two levels of homophily. Left (right) panel shows an example of a social environment characterized by high (low) homophily. (b) Empirical examples of the social sampling process for populations characterized by right-skewed, left-skewed, and symmetrical distributions.  (a) Average of social-circle estimates tracks population distributions of different attributes well, and better than the average of people’s

Published online by Cambridge University Press

List of Figures

xiv

.

.

.

.

.

population estimates. Absolute errors in brackets. (b) Social-circle question produced better predictions of elections in France (N ¼ ,), Netherlands (N ¼ ,), and Sweden (N ¼ ,) than own-intention and (in the Netherlands) election-winner questions Absolute errors in brackets. (c) The social-circle question produced overall the lowest error of predictions in three recent US elections, across several survey waves (N > ,). A visual schema of where motivations may influence information processing. First, motivations constrain samples by guiding attention toward goal relevant information (Path A) Second, group-based motivation may lead to motivated interpretations of the sampled information (Path B). Finally, people may employ different sampling strategies over time that capitalize on skewed experiences (Path C). A visual representation of how information was sampled over time. Below the arrow is a generic representation of the paradigm flow. Above the arrow is a representation of how information was sampled in Studies  and . Evaluations of ingroups and outgroups as a function of valence of initial impressions (Panels a and b) or real-group differences (Panels c and d) in the political context and minimal group context. Worse, Same and Better is stated in reference to the ingroup. Vertical bars denote standard error of the mean. Probability of sampling from the ingroup when the ingroup was de facto worse. Dashed line denotes sampling as a function of a negative first sample, whereas the solid line denotes sampling as a function of a positive first sample. Error bars denote standard error of the mean. Social network structures analyzed in the simulations: a) network of two agents; b) network of two groups,  agents each, with  between-group links; c) network of two groups,  agents each, with  between-group link;

Published online by Cambridge University Press











List of Figures d) network of two groups,  agents each, with  betweengroup links; e) network of two groups with distinct identities,  agents each, with  between-group links. . A “family tree” of sampling algorithms where parent nodes represent more generalized concepts for specific algorithms in the leaf nodes. Circled algorithms require global or approximate global knowledge while squared ones require local knowledge. . Schematic illustration of the internal sampling process: Information is sampled from the environment, which shapes an internal distribution. Inferences are based on a small number of samples drawn from the internal distribution. . The expected utility framework for sample-based models in decision making.

Published online by Cambridge University Press

xv





 

Tables

. The original and reversed fourfold pattern. page  . Maximization rate in studies that examine a choice between “ with certainty” and “ with p ¼ :,  otherwise” under different conditions.  . Four cognitive processes that can be identified by the experimental design and the modeling reported in Chapter  (the central four cells of Table .). The cognitive algorithm employed is either rule-based or exemplar-memory based, as identified by extrapolation beyond the training range and the processing may either involve an Analysis(B) or an Intuition(B) process as identified by the judgment error distributions.  . Compilation of factors that are varied across the experiments in the database used for the presented analyses.  . Number of participants best fitted by exemplar-based memory (EBM) and rule-based cue-abstraction models (CAM), as well as number of noncategorized participants and participants best fitted by the null model. Additionally, median BIC-difference to the nd best fitting model and median lambda (λ) for participants best fitted by each model.  . Example contingency table with joint frequencies and marginal frequencies of the variables option and outcome.  . Possible joint frequencies (a, b, c, d) and associations (φ) given the marginal frequencies of the example in Table .. 

xvi

Published online by Cambridge University Press

List of Tables . Example trivariate contingency table with joint frequencies and marginal frequencies of the variables option and outcome per time. . Experiment results. . PMax in sampling and final choice phases for human and model data. . Overview of sample information for all  empirical studies. . Sampling algorithms and their statistical and psychological implications. . Examples of how various psychological phenomena are explained either by a prior on responses or alternative sampling algorithms.

Published online by Cambridge University Press

xvii

     

Contributors

Daniel Acosta-Kane, University of California at San Diego, USA Palvi Aggarwal, The University of Texas at El Paso, USA Hans Alves, Ruhr-University Bochum, Germany Judith Avrahami, The Hebrew University of Jerusalem, Israel Robin Bergh, Uppsala University, Sweden Franziska M. Bott, University of Mannheim, Germany Erik Brockbank, University of California at San Diego, USA Gordon D. A. Brown, University of Warwick, UK Wändi Bruine de Bruin, University of Southern California, USA Nick Chater, University of Warwick, UK August Collsiöö, Uppsala University, Sweden Ruud Custers, Utrecht University, The Netherlands Jerker Denrell, Warwick Business School, University of Warwick, UK. Yrian Derreumaux, University of California at Riverside, USA Saoirse Connor Desai, University of New South Wales, Australia Ido Erev, Technion, Israel Mirta Galesic, Santa Fe Institute, USA Cleotilde Gonzalez, Carnegie Mellon University, USA Chris Harris, Utrecht University, The Netherlands Brett K. Hayes, University of New South Wales, Australia xviii

Published online by Cambridge University Press

List of Contributors

xix

Ralph Hertwig, Max Planck Institute for Human Development, Germany Cameron Holdaway, University of California at San Diego, USA Brent Hughes, University of California at Riverside, USA Mandy Hu¨tter, Eberhard Karl University of Tu¨bingen, Germany Yaakov Kareev, The Hebrew University of Jerusalem, Israel Charles Kemp, University of Melbourne, Australia Alex Koch, University of Chicago, USA Elizaveta Konovalova, University of Warwick, UK Balázs Kovács, Yale University, USA Gaël Le Mens, Pompeu Fabra University, Spain Pablo León-Villagrá, Brown University, USA Marcus Lindskog, Uppsala University, Sweden Linda McCaughey, Heidelberg University, Germany Thorsten Meiser, University of Manheim, Germany Zachary Adolph Niese, Eberhard Karl University of Tu¨bingen, Germany Henrik Olsson, Santa Fe Institute, USA Thorsten Pachur, Max Planck Institute for Human Development, Germany Timothy J. Pleskac, University of Kansas, USA Ori Plonsky, Technion, Israel Johannes Prager, Heidelberg University, Germany Keith Ransom, University of Melbourne, Australia Adam Sanborn, University of Warwick, UK Christin Schulze, Max Planck Institute for Human Development, Germany Jake Spicer, University of Warwick, UK Joakim Sundh, Uppsala University, Sweden

Published online by Cambridge University Press

xx

List of Contributors

Christian Unkelbach, University of Cologne, France Edward Vul, University of California at San Diego, USA Lukasz Walasek, University of Warwick, UK Jian-Qiao Zhu, University of Warwick, UK

Published online by Cambridge University Press

 

Historical Review of Sampling Perspectives and Major Paradigms

https://doi.org/10.1017/9781009002042.001 Published online by Cambridge University Press

https://doi.org/10.1017/9781009002042.001 Published online by Cambridge University Press

 

The Theoretical Beauty and Fertility of Sampling Approaches A Historical and Meta-Theoretical Review Klaus Fiedler, Peter Juslin and Jerker Denrell . Introduction The topic of the present volume, sampling approaches to judgment and decision-making (JDM) research, is ideally suited to illustrate the power and fertility of theory-driven research and theorizing in a flourishing area of behavioral science. The last two decades of rationality research, in psychology, economics, philosophy, biology, and computer science, are replete with ideas borrowed from statistical sampling models that place distinct constraints on information transition processes. These sampling approaches highlight the wisdom gained from Kurt Lewin and Egon Brunswik that in order to understand cognitive and motivational processes within the individual, it is first of all essential to understand the structure and distribution of the environmental stimulus input that impinges on the individual’s mind. This is exactly the focus of sampling-theory approaches. The environmental input triggers, enables, constrains, and biases the information transmission process before any cognitive processes come into play. Because the information offered in newspapers, TV, Internet, textbooks, and literature, or through personal communication is hardly ever an unbiased representative sample of the world, but is inevitably selective and biased toward some and against other topics and sources, a comprehensive theory of judgment and decision-making must take the ecology into account. Importantly, the information input is not only reflective of existing biases of a wicked environment. It is also empowered by the statistical strength and reliability of a distributed array of observations, the statistical properties of which are well understood. So, the challenges of a potentially biased “wicked” environment (Hogarth et al., ) come Work on this chapter was supported to grants awarded to the first author by the Deutsche Forschungsgemeinschaft (Fi  /  and ) and to the second author by the Marcus and Amalia Wallenberg Foundation (MAW .).



https://doi.org/10.1017/9781009002042.002 Published online by Cambridge University Press



 ,  ,  

Figure .

Two stages of information transmission from a cognitive-ecological perspective

(Fiedler, ; Fiedler & Kutzner, ; Fiedler & Wänke, )

along with normative instruments for debiasing and separating the wheat from the chaff. For a comprehensive theory of judgments and decisions in a probabilistic world, the cognitive stage of information processing cannot be understood unless the logically antecedent stage of environmental sampling is understood in the first place. Figure . illustrates this fundamental notion. The left box at the middle level reflects the basic assumption that the distal constructs that constitute the focus of judgment – such as health risks, student ability, a defendant’s guilt, or the profitability of an investment – are not amenable to direct perception. We do not have sense organs to literally perceive risk, ability, or guilt. We only have access to samples of proximal cues (in the middle box) that are more directly assessable and that allow us to make inferences about the distal entities, to which they are statistically related. Samples of accident rates or expert advice serve to infer risk; students’ responses to knowledge questions allow teachers to infer their ability in math or languages; samples of linguistic truth criteria in eyewitness protocols inform inferences of a defendant’s guilt (Vrij & Mann, ). A nice feature of these proximal stimulus distributions is that normative rules of statistics allow us to monitor and control the process, inferring the reliability from the sample’s size and internal consistency and – when the proximal data are representative of a domain – even the validity of the given stimulus information.

https://doi.org/10.1017/9781009002042.002 Published online by Cambridge University Press

The Theoretical Beauty and Fertility of Sampling Approaches



Regardless of how valid or reliable the environmental input is, it constrains and predetermines the subsequent cognitive judgment and decision process. The accuracy of a health expert’s risk estimate, a teacher’s student evaluation, and a judge’s guilt assessment depend on the diagnostic value of the cue samples used to infer risk, ability, or guilt. The accuracy and confidence of their judgments and decisions depend primarily on the quality of the sampled data. Resulting distortions and biased judgments need not reflect biases in human memory or reasoning; such biases may already be inherent in the environmental sample with which the cognitive process was fed. Indeed, the lessons taken from the entire research program of the Kahneman–Tversky tradition can be revisited and revised fundamentally from a sampling-theoretical perspective. Illusions and biases may not, or not always, reflect deficits of human memory or flawed heuristic processes within the human mind. They may rather reflect an information transition process that is anchored in the environment, prior to all cognitive operations. Samples of risk-related cues may be deceptive or lopsided; too small a sample of student responses may be highly unreliable; the defendant’s sample of verbal utterances may be faked intentionally. Considered from a broader cognitive-ecological perspective, bounded rationality is not merely limited by memory restrictions or cognitive heuristics reflecting people’s laziness. Judgments and decisions in the real world are restricted, and enabled, by cognitive as well as ecological limitations and capacities. For instance, risk estimations – concerning the likelihood of contracting Covid- or being involved in car accidents – are not just restrained by wishful thinking or ease of retrieval (Block et al., ; Combs & Slovic, ). They also depend on a rational answer to the question: What sample affords an unbiased estimate of my personal risk of a disease or accident? Should it be a sample of the entire world population, a sample of people in my subculture, or a biographical sample of my own prior behavior? As the example shows, there is no alternative to devising a heuristic algorithm for risk estimation. Heuristics are sorely needed indeed, not just for the human mind but also for machine learning, and expert and robot systems (Fiedler et al., ).

. Historical Review of Origins and Underpinnings of Sampling Approaches The information transition process that underlies judgments and decisions can be decomposed into two stages (see Figure .): an ecological sampling

https://doi.org/10.1017/9781009002042.002 Published online by Cambridge University Press



 ,  ,  

stage and a cognitive processing stage. While traditional cognitive research was mainly concerned with the processing of stimulus input within the individual’s mind (attention, perception, encoding, storage, retrieval, constructive inferences) of stimulus cues, the ecological input to the cognitive– decision stage reflects a logically antecedent sampling stage, which takes place in the environment. Judgment biases and decision anomalies that were traditionally explained in terms of retrieval or reasoning biases during the cognitive–decision stage may already be inherent in the stimulus input, as a consequence of biased sampling in the environment, before any cognitive operations come into play. Biased judgments and decisions can thus result from fully unbiased mental operations applied to biased sampling input. Conversely, unbiased and accurate estimates may reflect the high quality of information from certain environments. .. Methodological and Meta-Theoretical Assets The causal sequence (of sampling as an antecedent condition of cognitive processing) and the normative-statistical constraints imposed on the sampling stage jointly explain the beauty and fertility, and the theoretical success of sampling approaches. As in psychophysics, an analysis of the samples of observations gathered in the information search process imposes strong constraints on the judgments and decisions informed by this input. Statistical sampling theory imposes distinct normative constraints (in terms of sample size, stochastic independence, etc.) on how inferences from the sample should be made. Both sources of constraints together lead to refined hypotheses that can be tested experimentally. Because the causal and statistical constraints are strong and clear-cut, the predictions tested in such experiments are cogent and nonarbitrary, and, not by coincidence, empirical findings often support the a priori considerations. Indeed, replication and validation do not appear to constitute serious problems for sampling research (Denrell & Le Mens, ; Fiedler, ; Galesic et al., ). ... Recording the Sampled Input Having a measure of the sampling input in addition to the judgments and decision in the ultimate dependent measure offers a natural candidate for a mediational account of cognitive inferences relying on the sample. Comparing the recorded sample to the ultimate cognitive measure provides a way to disentangle the two processes. A causal origin of a judgment or decision effect that is already visible in the actuarial sample must

https://doi.org/10.1017/9781009002042.002 Published online by Cambridge University Press

The Theoretical Beauty and Fertility of Sampling Approaches



originate in the environment, before cognition comes into play. Evidence for a genuine cognitive influence (e.g., selective retrieval or an anchoring bias) requires demonstrating a tendency in the cognitive process that is not yet visible in the recorded sample. Let us illustrate the methodological advantage of having a record of the sample with reference to recent research on sample-based impression judgments. Prager et al. () had participants provide integrative likeability judgments of target persons described by samples of n ¼ , , or  traits drawn at random from a universe defined by an experimentally controlled distribution of positive and negative traits. Each participant provided  impression judgments, nine based on random samples drawn from each of four universes of extremely positive, moderately positive, moderately negative, and extremely negative sets of traits, selected in careful pilot testing. Across all participants and trials, impression judgments were highly predictable from the recorded samples of traits. Not only the positive versus negative valence and extremity of the universe from which the stimulus traits were drawn, but also the deviations of the random samples from the respective universe strongly predicted the ultimate impression judgments. Consistent with Bayesian updating principles, impression extremity increased with increasing n. Altogether, these findings provided strong and regular support for the (actuarial) stimulus sample as major determinant of person impression (Asch, ; Norton et al., ; Ullrich et al., ). However, in spite of their close fit to the sampled input, the impression judgments were also highly sensitive to the structure of the environment, specifically, the diagnosticity of the information. The diagnosticity of a trait is determined by the covariation of features in the environment and can be defined in the same way as in a likelihood ratio in Bayesian updating; a trait is diagnostic for a hypothetical impression (e.g., for the hypothesis: likable person) to the extent that it is more likely to occur in a likable than a nonlikable person. Holding the valence scale value of the sampled traits constant, diagnostic traits exerted a stronger influence on person judgments than nondiagnostic traits. Diagnosticity was enhanced if a trait was negative rather than positive (Rothbart & Park, ); if a trait referred to negative morality or positive ability rather than positive morality or negative ability (Fiske et al., ; Reeder & Brewer, ); if a trait was infrequent rather than 

Thus, in Bayesian notation, a trait is diagnostic to the extent that the likelihood ratio LR = p(trait | Hlikeable) / p(trait | Hnot-likeable) exceeds .

https://doi.org/10.1017/9781009002042.002 Published online by Cambridge University Press



 ,  ,  

frequent (Prager & Fiedler, ); or if a trait’s distance from other traits in a semantic network was high (Unkelbach et al., ). However diagnosticity was operationalized, the resulting impression of a target person was not fully determined by the average valence scale value of the traits recorded in a sample but depended on the diagnosticity of the sampled traits. Adding a diagnostic trait had a stronger impact on a growing impression than adding a nondiagnostic trait of the same valence. Further evidence of how people actively interpret the observed samples will be provided later. Suffice it here to point out the advantage of a research design with a twofold measure for the sampling input on one hand and for the cognitive process output on the other hand. Let us now turn to the second major asset of the sampling-theory approach, namely, the existence of normative constraints imposed by statistical sampling theory on the information transition process. To the extent that judgments and decisions are sensitive to such distinct normative constraints, which often exceed intuition and common sense, this would provide cogent evidence for the explanatory value of sampling theories. ... Impact of Sampling Constraints The keywords in the lower left of Figure . refer to a number of subtle sampling constraints, which are firmly built into the probabilistic environment. For instance, in a world in which many frequency distributions are inherently skewed, probability theory constrains the probability that a sample reveals a dominant trend, for instance, that a sample reflects the relative frequency of lexical stimuli, animals, or causes of death. Skewed distributions are highly indicative of moral and material value. Rare objects tend to be more precious than common things (Pleskac & Hertwig, ); scarcity increases the price of economic goods. Abnormal or norm-deviant behaviors are less frequent than normal or norm-abiding behaviors. Likewise, skewness is indicative of psychological distance. Frequently encountered stimuli more likely belong to temporally, spatially, socially, close and probable origins than infrequent stimuli, which are indicative of distant origins (Bhatia & Walasek, ; Fiedler et al., ; Trope & Liberman, ). In any case, normal variation in distance, density, resolution level, and perspective can open up a variety of environmental information. Small samples from skewed distributions are often unrepresentative of the underlying distribution and this can lead to seemingly biased judgments. Suppose, for example, that the population probability of a success is .. In a small sample of five trials, an agent will most often observe a

https://doi.org/10.1017/9781009002042.002 Published online by Cambridge University Press

The Theoretical Beauty and Fertility of Sampling Approaches



proportion larger than .; the probability of observing five successes in five trials is .. It is . if the probability of a success is .. Thus, if judgments are sensitive to experienced proportions, most agents will overestimate the success probability. To be sure, agents may be more sophisticated and understand that small samples can be unrepresentative. Suppose an agent believes that all probabilities between zero and one are equally likely (a uniform prior distribution) and uses this information in combination with the observed proportion. Such a Bayesian agent will estimate the true success probability to be lower than the population proportion of .. Having observed five successes in five trials, this Bayesian agent will estimate the success probability to be only .. Thus, normative-statistical laws not only justify that sample proportions can deviate from true probabilities in the population but also specify predictions of how sample-dependent estimations can be expected to deviate from population parameters. It is no wonder then that decisions about risk-taking differ substantially between settings where the winning probability of a lottery is described numerically versus when a sample of outcomes is experienced extensionally – the so-called description–experience gap (Hertwig et al., ). Statistical sampling theory as an integral part of a cognitive-ecological approach can therefore offer a viable explanation of many findings related to the description–experience gap (Fox & Hadar, ; Rakov et al., ). The importance of skewed sampling distributions also inspired a prominent finding by Kareev (). Assuming an actually existing (population) correlation of, say, ρ ¼ :, the majority of observed correlations r in restricted samples from this population is higher than ρ. (Undoing this asymmetry of the sampling distribution of r-statistics is the purpose of the common Fisher-z transformation). Kareev () showed that the tendency of r to exaggerate existing correlations reaches a maximum at n ¼   , suggesting that the evolution may have prepared Homo sapiens with a memory span that maximally facilitates the extraction of existing regularities. Regardless of the viability of Kareev’s vision (see Juslin & Olsson, , for a critical note), it clearly highlights the fascinating ability of sampling theories to inform creative theorizing in cognitive-ecological context. 

According to the Bayesian rule of succession (Costello & Watts ); the underlying probability of the dominant outcome is p ¼ ðndominant þ Þ=ðntotal þ Þ. Thus, observing ndominant ¼  dominant outcomes in a sample of ntotal ¼  implies p ¼ :. Observing the same proportion ndominant ¼  in a sample of ntotal ¼  implies p ¼ :.

https://doi.org/10.1017/9781009002042.002 Published online by Cambridge University Press



 ,  ,  

Let us now move from unsystematic sampling error (around p or ρ) derived from statistical sampling theory to systematic sampling biases lying outside the domain of statistics. Some behavioral laws are so obvious and universal that one hardly recognizes their statistical consequences. For example, Thorndike’s () law of effect states that responses leading to pleasant outcomes are more likely repeated than responses leading to unpleasant outcomes. In other words, organisms are inclined to sample more from pleasant than from unpleasant sources. A hot-stove effect motivates organisms to stop sampling from highly unpleasant sources (e.g., a restaurant where one got sick). Such a simple and self-evident preference toward hedonically positive stimuli was sufficient to inspire a series of highly influential simulations and experiments that opened up completely novel perspectives on behavior regulation (Denrell, ; Denrell & Le Mens, , ; Fazio et al., ). The tendency to stop sampling from negative targets and more likely continue sampling from positive targets implies that negative first impressions are less likely corrected than positive first impressions. Long-term negativity biases may be the result of such a simple and incontestable hedonic bias. An example of another effect that has been prematurely taken for a cognitive bias refers to the seminal work on heuristics and biases by Tversky and Kahneman (). In their famous availability heuristic, they postulate a cognitive bias to overestimate the frequency or probability of easily retrievable events. Thus, a bias in frequentist judgments is attributed to a cognitive bias to overrate information that easily comes to one’s mind. Hardly anybody ever contested that the bias may be already apparent in the sampling stage, well before a retrieval bias may come into play, although this possibility was discussed from the beginning. For instance, the erroneous tendency to rate murder more frequent than suicide, to overrate lightning and to underrate coronary disease as causes of death, need not reflect a retrieval bias but a bias in newspaper coverage (Combs & Slovic, ). Thus, prior to cognitive retrieval processes, newspapers or the information environment are more likely to report on murder than on suicide, on lightning than coronary disease, and this preexisting sampling bias may account for availability effects. Even when every cause of death reported in the media is equally likely to be retrieved, biased media coverage may well account for biased probability estimates. A critical examination of the literature reveals, indeed, that countless experiments 

Note however, that statistical rules are essential to distinguish systematic biases from unsystematic (merely stochastic) error.

https://doi.org/10.1017/9781009002042.002 Published online by Cambridge University Press

The Theoretical Beauty and Fertility of Sampling Approaches



on the availability heuristic have provided little evidence for memory retrieval proper. .. Properties of Proximal Samples So far, we have seen that merely analyzing the statistical properties or the hedonic appeal of the environment opens up alternative explanations of various psychological phenomena as well as genuine innovations that had been never discovered without the sampling perspective. The following discussion of the properties of the proximal samples implanted by the distal world leads to further insights about the beauty, fertility, and the explanatory power of the sampling approach. One important and common property of observations based on noisy samples is regression to the mean. If the observed value, X’ is higher than MeanX, then the true value X is likely lower than X’, but if the observed value, X’ is lower than MeanX, then the true value X is likely higher than X’ (see Figure .; for precise definitions see Samuels [], and Schmittlein []). This property holds for many distributions and implies that observed values diverge regularly and in predictable ways from true values. Regressiveness increases with the amount of noise, or error variance. To illustrate this, consider two normally distributed random variables, X’ and X, where X’ is a noisy observation of X (i.e., X’ ¼ X þ e, where e is an error term). Suppose, for simplicity, that the variables are standardized to z scores with zero mean and variance: zX ¼ (X – MeanX)/ SDX. Then the expected zX given an observed standardized zX’ value is E[zX|zX’] ¼ rX,X’ zX’. Whenever the correlation between X’ and X is less than , the best estimate of zX given zX’ is less extreme than zX’. Specifically, if the correlation is rX,X’ ¼ ., the expected population values are only half as extreme as the observed values; if rX,X’ ¼ ., the expected population values shrink by one-fourth to  percent of the observed deviation from the mean. Thus, observed values diverge from expected values in predictable ways. While regression is not a bias but a reflection of noise in the probabilistic world, it can create what appears to be a bias. An agent that reports the raw observed sample value as their estimates – like ignoring effects implied by regression to the mean – will make systematically too extreme judgments, as compared to the long-run expected value. Alternatively, an agent may take expectable regression effects into account and make 

Regression to the mean does not hold for all distributions, however. There may be regression to the median or sometimes even regression to the extremes, see Schmittlein ().

https://doi.org/10.1017/9781009002042.002 Published online by Cambridge University Press



 ,  ,   Sample-based Probability Estimates 1.0

Expected probability p

Sample size 5 Sample size 20

0.8

0.6

0.4

0.2

0.0 0.0

0.2

0.4

0.6

0.8

1.0

Sample proportion P

Figure . Expected (population) probability p as a function of observed sample proportion P, at two sample sizes n,  and  (assuming a uniform prior for p). An agent that myopically reports the sample proportion P as their estimate of probability p will make too extreme estimates, as identified by the deviation from the identity line. An agent who takes the sample size into account dampens the observed proportion according to n.

estimates that are systematically less extreme than the observed sample values. Because regression effects decrease with increasing reliability, and reliability increases with sample size n, it follows that small samples should inform more regressive (“dampened”) estimates than larger samples drawn from the same population. Figure . illustrates this by plotting the expected population probability E ðp j P Þ, conditional on the observed sample proportion P at sample size n, assuming that any value of p is equally likely prior to the observation (a uniform prior for the population probability p). If an agent naïvely takes the observed sample proportions P as estimates of p, the resulting estimates of p will be profoundly too extreme, and more so for smaller samples. A more sophisticated agent may take the regression effect into account and make estimates of p that are less extreme than P, as captured by the function for the relevant sample size in Figure ., making the estimates a function both of the observed sample proportion P and the sample size n. Both the “naïve” or “myopic” use of the sample and the more sophisticated

https://doi.org/10.1017/9781009002042.002 Published online by Cambridge University Press

The Theoretical Beauty and Fertility of Sampling Approaches



use of the sample content P make clear and identifiable a priori predictions for the judgments. The regression slopes P z > P min . Its value would change, however, under normalisation as in Equation .. Divisive normalisation is used by Webb et al. () to account for violations of the axiom of independence in a model of decision-making, although their denominating summation term includes a power transformation not included in Equation . above. Despite the success of divisive normalisation models, however, it does not appear that such models can (at least in their present form) naturally capture the relative rank effects illustrated in Figure . and discussed above. The third approach to efficient coding is more directly relevant to our rank-based focus. Heng et al. () provide a general efficient-coding framework for modelling decision-making processes. First, as in DbS, internal representations of value are assumed to be based on binary signals.

https://doi.org/10.1017/9781009002042.003 Published online by Cambridge University Press



 . . ,  

Second, the conception of efficiency is extended beyond just maximising information between external stimuli and their internal representations. Instead, efficiency is defined in terms of the probability of making a correct choice (as opposed to accurately representing external stimuli). Third, and crucially, the concept of efficient representation is considered in environments in which the frequency distribution can change. More specifically, Heng et al. () conclude that DbS implements efficient sampling, as it allows cognitively cheap adaption to changing frequency distributions. These results offer insight into why people may rely on sampling and rank-based processes in choice. Such mechanisms achieve high levels of efficiency in the world where we must form accurate representations while, at the same time, allowing for those representations to be revised in the face of changing context. The relation between rank-based processes, sampling, and efficient coding has also been explored by Bhui and Gershman () (see also Frydman & Jin, , for an application to risky choice). These authors show that the rank-based coding of DbS offers an efficient information channel in the absence of communication noise. However, noise can be introduced to the finite samples. Bhui and Gershman propose a smoothing process based on kernel density estimation that occurs prior to rank transformation in order to maintain efficient coding. This smoothing process has an effect similar to that of the range effect in RFT. Thus Bhui and Gershman allow representations to vary continuously between  and . The model proposed by Bhui and Gershman shares some features of other extensions of DbS discussed elsewhere in this chapter (Brown & Matthews, ; Ronayne & Brown, ). What makes their particular model so relevant in the present context, however, is their novel and important demonstration that RFT-like, rather than just DbS-like, behaviour can maximise coding efficiency when the noise in a stimulus sample is taken into account. We note however that neural codes will ultimately have evolved to maximise fitness, rather than just coding efficiency, and hence that constraints related to reward maximisation may also impact internal representaions (Schaffner et al., ). Finally, what evidence exists for rank-based coding in the brain? Although most research has focused on predictions of range normalisation models, Mullett and Tunney () show that distinct brain regions encode rank information reflecting both local and global reward context. More specifically, they find that rank of monetary amounts in the entire experiment (global) is reflected in brain activation of the ventromedial prefrontal cortex. Activation in regions such as the caudate and

https://doi.org/10.1017/9781009002042.003 Published online by Cambridge University Press

Homo Ordinalus and Sampling Models



thalamus, in contrast, seems to reflect more local context-dependent rank encoding. In summary, the need for coding efficiency may explain both the ubiquity of rank-based coding and the fact that range effects are sometimes also observed. We note two qualifications. First, information-theoretic approaches necessarily exclude considerations about the importance (e.g., fitness-relevance) of information to the organism, and hence broader notions of ‘best coding’ may be relevant. Second, and relatedly, the best way to represent information depends on the goals that the representation is intended to support. Thus, the goal ‘respond differently to every stimulus’ may require representations of similar stimuli to be made less (relatively) similar than the stimuli themselves in order to make the best use of available capacity. However, if it is desirable that similar stimuli should evoke similar responses, as is plausible in many ecologically realistic contexts (Shepard, ), different considerations will apply.

.

Theoretical Implications, Applications, and Future Directions

In this final section we review the key theoretical implications of DbS, and consider ongoing and likely future developments of the approach. ..

Theoretical Issues

We highlight two closely related theoretical claims made by DbS. First, DbS claims that it is possible to explain the shape of psychoeconomic functions, such as utility curves and probability weighting functions, rather than simply assuming them or effectively viewing them as a summary of choices. A related, and rather deeper, theoretical claim is that there simply are no underlying internal value scales of the type that most models assume. Instead, values are constructed on the basis of comparison samples and hence are highly context-dependent. How should we interpret these claims? We make three observations. First, we note that arguments about underlying value scales parallel similar, and earlier, debates in psychophysics. Thus, Parducci () argues that the existence of context effects on subjective magnitude judgements precludes the existence of simple psychophysical laws, while others, such as Birnbaum () and Poulton (), instead regard context effects more as ‘nuisance factors’ that need to be removed by ingenious design and analysis. This latter perspective is akin to the dominant approach in behavioural economics, which we refer

https://doi.org/10.1017/9781009002042.003 Published online by Cambridge University Press



 . . ,  

to as ‘preference purification’. According to this view, underlying preferences do exist but are obscured by the context-dependence of choice and hence are difficult to elicit (e.g., Infante et al., ; Rubinstein & Salant, ). Full resolution of these ontological debates lies outside of the scope of this chapter. Second, and relatedly, models that assume context-dependent valuations constructed on the basis of small samples can often be seen as special cases of more general models which also allow the possibility of stable psychoeconomic valuation scales. Imagine that rank-based valuations were context-dependent in the way suggested by DbS, but that the contextgiving sample tends towards infinity, effectively capturing all the relevant information about the stimuli/attributes/environment. The element of ‘sampling’ would then have effectively disappeared, the resulting value functions would be stable and context-independent, and the model would simply incorporate the idea than ‘ranks matter’ (see Ronayne & Brown, , for discussion). Third, we note that the claim that there are no underlying socioeconomic value scales should not be confused with the claim that people do not have stable underlying preferences; instead the claim is that any such preferences are inaccessible in any better-than-ordinal way (Brown & Walasek, ). ..

Developments and Future Directions

Finally, we note some recent and ongoing extensions to DbS. The first concerns multi-attribute decisions. While the discussion above has focused on valuation of single attributes, such as prices or wages, most everyday choices involve multiple attributes. There are three extensions of the DbS approach to choices where more than one attribute is relevant. In sequential sampling models, choice-relevant values are integrated and accumulated over time. Noguchi and Stewart () proposed a sequential sampling model in which a decision-maker accumulates the results of (DbS-like) pairwise ordinary comparisons on a single attribute. Inspired by evidence about attention allocation during choice (Noguchi & Stewart, ), their Multi-alternative Decision by Sampling model (MDbS) extends DbS by assuming that the probability of comparing two alternatives increases with their similarity. Noguchi and Stewart show that their model can account for a wide array of context effects in multi-attribute choice (compromise, similarity, and attraction effects) as well as other phenomena (e.g., effects of time pressure, attribute spacing, and

https://doi.org/10.1017/9781009002042.003 Published online by Cambridge University Press

Homo Ordinalus and Sampling Models



alignability). Ronayne and Brown’s () Multi-Attribute Decision by Sampling (MADS) model assumes that the value of a choice option is determined by the number of sampled comparators, in an inferred market distribution which is influenced by the set of choice options, that the option dominates. MADS offers an alternative account of the attraction, compromise, and similarity effects. Finally, while not presenting a formal model of choice, Achtypi et al. () interpret endowment effects within a rank-based framework in which an item is seen as a ‘good deal’ to the extent that its rank in a price distribution is lower than its ranking in a quality or ‘preferredness’ distribution. For example, imagine that an th percentile (i.e., high in DbS terms) quality coffee mug is available at a th percentile (i.e., low in DbS terms) price. The difference between these relative ranks is therefore a decision-relevant quantity. These three approaches all illustrate ways in which rank-based valuations may be used in multi-attribute choice. The idea that attribute-specific rank-based evaluations can be combined is central to a further extension to DbS: Relative Rank Theory (RRT: Brown & Walasek, ). In the discussion above, we have focused on valuation of quantities where ‘more is better’ (e.g., wages) or ‘more is worse’ (e.g., prices). In such cases models of magnitude judgement can be reinterpreted as models of value judgement simply by assuming that magnitude judgements can be interpreted as value judgements. However, in many cases ‘more’ is not simply better or worse; many preferences (e.g., for curry spices) are single-peaked. RRT accommodates such cases, and also extends DbS to allow for uncertainty in estimates of relative ranked position (such uncertainty being greater for smaller samples). In DbS the subjective value of a stimulus, Si, is just its relative ranked position within its comparison context, that is, it is a single-point estimate (Equation .). Uncertainty in the relative rank estimates (which will be greater when the estimates are based on smaller samples) is not represented, and the estimate of relative rank will be the same (at .) if nlower ¼  and nhigher ¼  as if nlower ¼  and nhigher ¼ . RRT incorporates uncertainty into the estimates by using beta distributions to characterise uncertain estimates of relative rank:   S i ¼ beta nlower þ , nhigher þ  :

(2.6)

Further increasing nlower and nhigher is equivalent to incorporating additional observations, as when the sample is larger. The addition of . to nlower and nhigher is equivalent to assuming a uniform prior probability distribution on the estimate.

https://doi.org/10.1017/9781009002042.003 Published online by Cambridge University Press



 . . ,  

We have emphasised rank-based judgement and valuation, but note that DbS is a special case of a more general model. In DbS, the effect of Sj on the valuation of Si is independent of the distance between them. We can represent that effect in terms of that distance raised to a power γ (i.e., as |Sj  Si|γ ), when γ ¼ . However if instead γ < , a nearby Sj will have a larger effect on the valuation of Si than will a more distant Sj. If γ > , in contrast, a more distant Sj will have a greater effect than a nearby Sj. Different values of γ, sometimes combined with different weightings for upwards and downwards comparisons, turn out to be equivalent to different existing models (Brown et al., ). When the relevant dimension is income, for example, setting γ ¼  on upward comparisons gives the widely used model of relative deprivation according to which an individual’s deprivation depends on the sum, not just the number, of incomes higher than theirs (Hounkpatin et al., ). Ongoing research is therefore using model-based comparison to estimate γ in various contexts.

. Conclusion In summary, DbS is a process-level model of how people arrive at valuations of both economic and non-economic quantities. It differs from conventional economic approaches in assigning a central role to the context provided by comparison samples, in specifying the relevant psychological processes (binary ordinal comparison and summation) and in assuming that valuations depend on the relative ranked position of to-bejudged stimuli within a comparison context. DbS is consistent with a wide range of experimental evidence, offers a potential explanation of why psychoeconomic functions have the form that they do, and may reflect adaptive considerations regarding efficient coding in the brain. R E F E R EN C E S Achtypi, A., Ashby, N. J. S., Brown, G. D. A., Walasek, L., & Yechiam, E. (). The endowment effect and beliefs about the market. Decision, , –. Aldrovandi, S., Brown, G. D. A., & Wood, A. M. (). Social norms and rankbased nudging: Changing willingness to pay for healthy food. Journal of Experimental Psychology: Applied, (), –. Aldrovandi, S., Wood, A. M., & Brown, G. D. A. (). Sentencing, severity, and social norms: A rank-based model of contextual influence on judgments of crimes and punishments. Acta Psychologica, (), –.

https://doi.org/10.1017/9781009002042.003 Published online by Cambridge University Press

Homo Ordinalus and Sampling Models



Aldrovandi, S., Wood, A. M., Maltby, J., & Brown, G. D. A. (). Students’ concern about indebtedness: A rank based social norms account. Studies in Higher Education, (), –. Alempaki, D., Canic, E., & Mullett, T. L. et al. (). Reexamining how utility and weighting functions get their shapes: A quasi-adversarial collaboration providing a new interpretation. Management Science, (), –. Alessie, R. J. M., & Kapteyn, A. (). Preference formation, incomes, and the distribution of welfare. The Journal of Behavioral Economics, (), –. Anderson, C., Hildreth, J. A. D., & Howland, L. (). Is the desire for status a fundamental human motive? A review of the empirical literature. Psychological Bulletin, (), –. Arrow, K. J. (). A difficulty in the concept of social welfare. Journal of Political Economy, (), –. Bak, P. (). How nature works: The science of self-organised criticality. New York: Copernicus Press. Barlow, H. B. I. E. (). Possible principles underlying the transformation of sensory messages. In W. A. Rosenblith (Ed.), Sensory communication. Cambridge, MA: MIT. Barron, G., & Erev, I. (). Small feedback-based decisions and their limited correspondence to description-based decisions. Journal of Behavioral Decision Making, (), –. Bhui, R., & Gershman, S. J. (). Decision by sampling implements efficient coding of psychoeconomic functions. Psychological Review, (), –. Birnbaum, M. H. (). Using contextual effects to derive psychophysical scales. Perception & Psychophysics, (), –. Bordalo, P., Gennaioli, N., & Shleifer, A. (). Salience and consumer choice. Journal of Political Economy, (), –. Bower, G. H. (). Adaptation-level coding of stimuli and serial position effects. In M. H. Appley (Ed.), Adaptation-level theory (pp. –). New York: Academic Press. Boyce, C. J., Brown, G. D. A., & Moore, S. C. (). Money and happiness: Rank of income, not income, affects life satisfaction. Psychological Science, , –. Brown, G. D. A., Gardner, J., Oswald, A. J., & Qian, J. (). Rank dependence in employees’ wellbeing. Retrieved from Paper presented at the WarwickBrookings conference in Washington DC, June . (). Does wage rank affect employees’ well-being? Industrial Relations,  (), –. Brown, G. D. A., & Matthews, W. J. (). Decision by sampling and memory distinctiveness: Range effects from rank-based models of judgment and choice. Frontiers in Psychology, , . Brown, G. D. A., Neath, I., & Chater, N. (). A temporal ratio model of memory. Psychological Review, (), –.

https://doi.org/10.1017/9781009002042.003 Published online by Cambridge University Press



 . . ,  

Brown, G. D. A., & Walasek, L. (). Relative rank theory: The inaccessibility of preferences and the incommensurability of values. Unpublished manuscript. Brown, G. D. A., Wood, A. M., Ogden, R. S., & Maltby, J. (). Do student evaluations of university reflect inaccurate beliefs or actual experience? A relative rank model. Journal of Behavioral Decision Making, (), –. Bushong, B., Rabin, M., & Schwartzstein, J. (). A model of relative thinking. Review of Economic Studies, (), –. Chang, R. (). Making comparisons count. London and New York: Routledge. Chater, N., & Brown, G. D. A. (). Scale-invariance as a unifying psychological principle. Cognition, (), B–B. Clark, A. E., & Oswald, A. J. (). Satisfaction and comparison income. Journal of Public Economics, (), –. Cohen, D., & Teodorescu, K. (). On the effect of perceived patterns in decisions from sampling. Decision, (), –. Fennell, J., & Baddeley, R. (). Uncertainty plus prior equals rational bias: An intuitive Bayesian probability weighting function. Psychological Review,  (), –. Fiedler, K. (). Beware of samples! A cognitive-ecological sampling approach to judgment biases. Psychological Review, (), –. Fiedler, K., & Juslin, P. (Eds.). (). Information sampling and adaptive cognition. Cambridge, UK: Cambridge University Press. Frank, R. H. (). Luxury fever: Weighing the cost of excess. Princeton: Princeton University Press. Frydman, C., & Jin, L. J. (). Efficient coding and risky choice. Quarterly Journal of Economics, (), –. Gershoff, A. D., & Burson, K. A. (). Knowing where they stand: The role of inferred distributions of others in misestimates of relative standing. Journal of Consumer Research, , –. Hayden, B. Y., & Niv, Y. (). The case against economic values in the orbitofrontal cortex (or anywhere else in the brain). Behavioral Neuroscience, (), –. Heng, J. A., Woodford, M., & Polania, R. (). Efficient sampling and noisy decisions. Elife, , e. Hertwig, R., Barron, G., Weber, E. U., & Erev, I. (). Decisions from experience and the effect of rare events in risky choice. Psychological Science, (), –. Hounkpatin, H. O., Wood, A. M., & Brown, G. D. A. (). Comparing indices of relative deprivation using behavioural evidence Social Science & Medicine, , . Infante, G., Lecouteux, G., & Sugden, R. (). Preference purification and the inner rational agent: a critique of the conventional wisdom of behavioural welfare economics. Journal of Economic Methodology, (), –. Janiszewski, C., & Lichtenstein, D. R. (). A range theory account of price perception. Journal of Consumer Research, , –.

https://doi.org/10.1017/9781009002042.003 Published online by Cambridge University Press

Homo Ordinalus and Sampling Models



Juslin, P., Winman, A., & Hansson, P. (). The naive intuitive statistician: a naive sampling model of intuitive confidence intervals. Psychological Review, (), –. Kahneman, D., & Tversky, A. (). Prospect theory: An analysis of decision under risk. Econometrica, , –. Kamenica, E. (). Contextual inference in markets: On the informational content of product lines. American Economic Review, (), –. Kapteyn, A., & Wansbeek, T. (). The individual welfare function: A review. Journal of Economic Psychology, , –. Kapteyn, A., Wansbeek, T., & Buyze, J. (). Dynamics of preference formation. Economics Letters, (), –. Kello, C. T., Brown, G. D. A., & Ferrer-i-Cancho, R. et al. (). Scaling laws in cognitive sciences. Trends in Cognitive Sciences, (), –. Kornienko, T. (). Nature’s measuring tape: A cognitive basis for adaptive utility. University of Edinburgh. Köszegi, B., & Szeidl, A. (). A model of focusing in economic choice. Quarterly Journal of Economics, (), –. Laming, D. (). The measurement of sensation. Oxford: Oxford University Press. Layard, R., Mayraz, G., & Nickell, S. (). The marginal utility of income. Journal of Public Economics, (–), –. Lim, R. G. (). A range-frequency explanation of shifting reference points in risky decision-making. Organizational Behavior and Human Decision Processes, (), –. Louie, K., & Glimcher, P. W. (). Normalization principles in computational neuroscience. In S. M. Sherman (Ed.), Oxford research encyclopedia of neuroscience (pp. –). Oxford: Oxford University Press. Macchia, L., Plagnol, A. C., & Powdthavee, N. (). Buying happiness in an unequal world: Rank of income more strongly predicts well-being in more unequal countries. Personality and Social Psychology Bulletin, (), –. Maltby, J., Wood, A. M., Vlaev, I., Taylor, M. J., & Brown, G. D. A. (). Contextual effects on the perceived health benefits of exercise: The Exercise Rank Hypothesis. Journal of Sport & Exercise Psychology, (), –. Mazumdar, T., Raj, S. P., & Sinha, I. (). Reference price research: Review and propositions. Journal of Marketing, (), –. Mellers, B. A. (). Fair allocations of salaries and taxes. Journal of Experimental Psychology: Human Perception and Performance, (), –. Melrose, K. L., Brown, G. D. A., & Wood, A. M. (). Am I abnormal? Relative rank and social norm effects in judgments of anxiety and depression symptom severity. Journal of Behavioral Decision Making, (), –. Mullett, T. L., & Tunney, R. J. (). Value representations by rank order in a distributed network of varying context dependency. Brain and Cognition,  (), –. Murdock, B. B. (). The distinctiveness of stimuli. Psychological Review, (), –.

https://doi.org/10.1017/9781009002042.003 Published online by Cambridge University Press



 . . ,  

Niedrich, R. W., Sharma, S., & Wedell, D. H. (). Reference price and price perceptions: A comparison of alternative models. Journal of Consumer Research, (), –. Noguchi, T., & Stewart, N. (). In the attraction, compromise, and similarity effects, alternatives are repeatedly compared in pairs on single dimensions. Cognition, (), –. (). Multi-alternative decision by sampling: A model of decision making constrained by process data. Psychological Review, (), –. Olivola, C. Y., & Sagara, N. (). Distributions of observed death tolls govern sensitivity to human fatalities. Proceedings of the National Academy of Sciences of the United States of America, (), –. Pachur, T., Hertwig, R., & Rieskamp, J. (). Intuitive judgments of social statistics: How exhaustive does sampling need to be? Journal of Experimental Social Psychology, , –. Padoa-Schioppa, C. (). Range-adapting representation of economic value in the orbitofrontal cortex. Journal of Neuroscience, (), –. Parducci, A. (). Category judgment: A range-frequency model. Psychological Review, (), –. (). The relativism of absolute judgments. Scientific American, (), –. (). Scale values and phenomenal experience: There is no psychophysical law. In H.-G. Geissler & P. Petzold (Eds.), Psychophysical judgment and the process of perception (pp. –). Amsterdam: North Holland. (). Elaborations upon psychophysical contexts for judgment: Implications of cognitive models. In H. G. Geissler, S. W. Link, & J. T. Townsend (Eds.), Cognition, information processing, and psychophysics: Basic issues (pp. –). Hillsdale, NJ: Lawrence Erlbaum Associates. (). Happiness, pleasure and judgment: The contextual theory and its applications. Mahwah, NJ: Lawrence Erlbaum Associates. Parducci, A., & Perrett, L. F. (). Category rating scales: Effects of relative spacing and frequency of stimulus values. Journal of Experimental Psychology, (), –. Poulton, E. C. (). Models for biases in judging sensory magnitude. Psychological Bulletin, (), –. Prelec, D., Wernerfelt, B., & Zettelmeyer, F. (). The role of inference in context effects: Inferring what you want from what is available. Journal of Consumer Research, (), –. Quiggin, J. (). A theory of anticipated utility. Journal of Economic Behavior and Organization, , –. Rablen, M. D. (). Relativity, rank and the utility of income. Economic Journal, (), –. Rangel, A., & Clithero, J. A. (). Value normalization in decision making: Theory and evidence. Current Opinion in Neurobiology, (), –. Rigoli, F. (). Reference effects on decision-making elicited by previous rewards. Cognition, , .

https://doi.org/10.1017/9781009002042.003 Published online by Cambridge University Press

Homo Ordinalus and Sampling Models



Robson, A. J. (). The biological basis of economic behavior. Journal of Economic Literature, (), –. Ronayne, D., & Brown, G. D. A. (). Multi-attribute decision by sampling: An account of the attraction, compromise and similarity effects. Journal of Mathematical Psychology, , –. Rubinstein, A., & Salant, Y. (). Eliciting welfare preferences from behavioural data sets. Review of Economic Studies, (), –. Rustichini, A. (). Neuroeconomics: What have we found, and what should we search for. Current Opinion in Neurobiology, (), –. Rustichini, A., Conen, K. E., Cai, X. Y., & Padoa-Schioppa, C. (). Optimal coding and neuronal adaptation in economic decisions. Nature Communications, . Schaffner, J., Tobler, P., Hare, T., & Polania, R. (). Neural codes in early sensory areas maximize fitness. bioRxiv (...). Schulze, C., Hertwig, R., & Pachur, T. (). Who you know is what you know: Modeling boundedly rational social sampling. Journal of Experimental Psychology: General, (), –. Shenoy, P., & Yu, A. J. (). Rational preference shifts in multi-attribute choice: What is fair? In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the th Annual Conference of the Cognitive Science Society (pp. –). Austin, TX: Cognitive Science Society. Shepard, R. N. (). Toward a universal law of generalization for psychological science. Science, (), –. Sher, S., & McKenzie, C. R. M. (). Options as information: Rational reversals of evaluation and preference. Journal of Experimental Psychology: General, (), –. Simonson, I. (). Will I like a ‘medium’ pillow? Another look at constructed and inherent preferences. Journal of Consumer Psychology, (), –. Singh, S. K., & Maddala, G. S. (). Function for size distribution of incomes. Econometrica, (), –. Soltani, A., De Martino, B., & Camerer, C. (). A range-normalization model of context-dependent choice: A new model and evidence. PLOS Computational Biology, (), e. Stevens, S. S. (). On the psychophysical law. Psychological Review, (), –. Stevenson, B., & Wolfers, J. (). Subjective well-being and income: Is there any evidence of satiation? American Economic Review, (), –. Stewart, N. (). Decision by sampling: The role of the decision environment in risky choice. Quarterly Journal of Experimental Psychology, , –. Stewart, N., & Brown, G. D. A. (). Similarity and dissimilarity as evidence in perceptual categorization. Journal of Mathematical Psychology, , –. Stewart, N., Brown, G. D. A., & Chater, N. (). Sequence effects in categorization of simple perceptual stimuli. Journal of Experimental Psychology: Learning, Memory and Cognition, (), –.

https://doi.org/10.1017/9781009002042.003 Published online by Cambridge University Press



 . . ,  

(). Absolute identification by relative judgment. Psychological Review,  (), –. Stewart, N., Canic, E., & Mullett, T. L. (). On the futility of estimating utility functions: Why the parameters we measure are wrong, and why they do not generalize. Unpublished manuscript. Stewart, N., Chater, N., & Brown, G. D. A. (). Decision by sampling. Cognitive Psychology, (), –. Stewart, N., Reimers, S., & Harris, A. J. L. (). On the origin of utility, weighting, and discounting functions: How they get their shapes and how to change their shapes. Management Science, (), –. Taylor, M. J., Vlaev, I., Maltby, J., Brown, G. D. A., & Wood, A. M. (). Improving social norms interventions: Rank-framing increases excessive alcohol drinkers’ information-seeking. Health Psychology, (), –. Tripp, J., & Brown, G. D. A. (). Being paid relatively well most of the time: Negatively skewed payments are more satisfying. Memory & Cognition,  (), –. Ungemach, C., Stewart, N., & Reimers, S. (). How incidental values from our environment affect decisions about money, risk, and delay. Psychological Science, , –. Van Praag, B. M. S. (). Individual welfare functions and consumer behavior: A theory of rational irrationality. Amsterdam: North-Holland. Van Praag, B. M. S., & Kapteyn, A. (). Further evidence on the individual welfare function of income: An empirical investigation in The Netherlands. European Economic Review, (), –. Volkmann, J. (). Scales of judgment and their implications for social psychology. In J. H. Rohrer & M. Sherif (Eds.), Social psychology at the crossroads (pp. –). New York: Harper & Row. Walasek, L., & Brown, G. D. A. (). Incomparability and incommensurability in choice: No common currency of value? Unpublished manuscript. Walasek, L., & Stewart, N. (). Context-dependent sensitivity to losses: Range and skew manipulations. Journal of Experimental Psychology: Learning Memory, and Cognition, (), –. Watkinson, P., Wood, A. M., Lloyd, D. M., & Brown, G. D. A. (). Pain ratings reflect cognitive context: A range frequency model of pain perception. Pain, (), –. Webb, R., Glimcher, P. W., & Louie, K. (). The normalization of consumer valuations: Context-dependent preferences from neurobiological constraints. Management Science, , –. Wedell, D. H. (). Distinguishing among models of contextually induced preference reversals. Journal of Experimental Psychology: Learning Memory, and Cognition, (), –. (). Testing models of trade-off contrast in pairwise choice. Journal of Experimental Psychology: Human Perception and Performance,  (), –.

https://doi.org/10.1017/9781009002042.003 Published online by Cambridge University Press

Homo Ordinalus and Sampling Models



Wedell, D. H., Parducci, A., & Geiselman, R. E. (). A formal analysis of ratings of physical attractiveness: Successive contrast and simultaneous assimilation. Journal of Experimental Social Psychology, (), –. Wedell, D. H., Santoyo, E. M., & Pettibone, J. C. (). The thick and the thin of it: Contextual effects in body perception. Basic and Applied Social Psychology, (), –. Wernerfelt, B. (). A rational reconstruction of the compromise effect: Using market data to infer utilities. Journal of Consumer Research, (), –. Wood, A. M., Brown, G. D. A., & Maltby, J. (). Social norm influences on evaluations of the risks associated with alcohol consumption: Applying the rank-based decision by sampling model to health judgments. Alcohol and Alcoholism, (), –. Wood, A. M., Brown, G. D. A., Maltby, J., & Watkinson, P. (). How are personality judgments made? A cognitive model of reference group effects, personality scale responses, and behavioral reactions. Journal of Personality, , –. Wort, F., Walasek, L., & Brown, G. D. A. (). Rank-based alternatives to mean-based ensemble models of satisfaction with earnings: Comment on Putnam-Farr and Morewedge (). Journal of Experimental Psychology: General. (), –. Zou, D., Brown, G. D. A., Zhao, P., & Dong, S. (). 概率权重函数形状的 成因:二元比较任务中的发现. [The shape of the probability weighting function: Findings from binary comparison.] 营销科学学报 [Journal of Marketing Science; Tsinghua University], –.

https://doi.org/10.1017/9781009002042.003 Published online by Cambridge University Press

 

In Decisions from Experience What You See Is Up to Your Sampling of the World Timothy J. Pleskac and Ralph Hertwig

Sometimes when people make decisions, all the information – the risks, the rewards, the losses, the delays – is right in front of them, nicely packaged in accessible descriptions. When thus presented with symbolic representations of options to decide between, it seems that the human mind constructs a preference from the information in front of it, even if it is often limited, noisy, and unreliable (Kahneman & Tversky, ; Pleskac et al., ; Wakker, ). As a result, Kahneman () has suggested that a guiding principle for understanding how the mind makes decisions is “What you see is all there is.” That is, the descriptions of the options suffice for the mind to jump to a conclusion based on the limited information in front of it. Structurally, this view of the human mind treats it, in the words of the philosopher Susan Hurly, “as a subject and an agent standing, so to speak, back-to-back” (Hurley, , p. ) with perception as input from the world, action as the output, and the cognition that lies inbetween translating perception to action (Hurley, ). There is no further interaction and engagement with the world beyond the initial input into the mind. This input–output view of decision making has emerged by largely studying a specific set of decisions people face: decisions made based on explicit descriptions of the options or decisions from description (DFD). By focusing on these types of decisions, researchers obtain a view of the mind that is static and one where there is no interaction between the different layers of the mind. However, people often make decisions, and we suspect in most cases, by mustering information about the options from experience. These samples of experience may stem from people’s explorations of the world, such as what might occur when we taste wine by swirling the glass gently, inhaling its aroma, and taking a sip. Sometimes these slices of experience arrive via shared experiences with others, as when a friend raves about the taste of a wine. And in still other cases, samples of experience accompany descriptions such as when studying the wine bottle label while 

https://doi.org/10.1017/9781009002042.004 Published online by Cambridge University Press

Decisions from Experience



tasting the wine. What we have found is that there is much more than meets the eye when people make decisions from experience (DFE) (e.g., Camilleri & Newell, ; Erev et al., ; Gonzalez & Dutt, ; Hau et al., ; Hertwig et al., ; Ludvig et al., ; Pleskac, ; Plonsky et al., ; Ungemach et al., ; Wulff et al., ). In making DFE, the decision maker is not simply a subject and an agent standing back-to-back. Instead, what the person sees, and experiences depends on how the person acts, and how the person acts depends on what they have already seen and experienced. In this chapter, we explore how this dynamic interplay between perceiving and acting during DFE is central to understanding people’s decisions. We will illustrate how understanding this dynamic system’s properties can help explain when and why the decisions people make from experience can be different from the decisions people would make from descriptions; why in some cases the adaptive explorer appears to be riskseeking and in other cases risk-averse, and how the system may shape behaviors beyond risky decision making in a laboratory. To do this, we break the system down by asking two questions. First, in what ways during DFE do people’s actions shape what they experience? To answer this question, we review how people’s use of small samples, their adaptive methods of learning, and their similarity-based sampling process all influence what they observe. The second question takes the opposite perspective and asks: how do people’s experiences shape their actions? Answering this question highlights how the mere presentation of outcomes and their saliency shape people’s actions. Finally, we examine how understanding this dynamic interplay between perceiving and acting can help us understand how macroeconomic shocks shape people’s risk preferences, adolescent risk-taking behavior, and even human rationality.

. How Do People’s Actions Shape What They See? .. People Take Small Samples Our first illustration of the dynamic interplay between sensing and acting during DFE comes from people’s sampling behavior. It turns out that people tend to make DFE using objective and functionally small samples. Doing so creates a systematic difference between DFD and DFE in the rate of choosing the highest expected value option among objectively the same gambles (Barron & Erev, ; Hertwig et al., ; Weber et al., ). This difference between DFD and DFE has become known as the

https://doi.org/10.1017/9781009002042.004 Published online by Cambridge University Press



 . ,   Table .. The original and reversed fourfold pattern

Problem    

Domain

Risky

Safe

Description

Experience

Sample Size

Gain Gain Loss Loss

€ , .* € , .* € , . € , .

€ , . € , . € , .* € , .*

   

   

   

Note. The risky and safe gambles are simple gambles of the format x with a probability p otherwise  and are noted as x, p. The higher expected value option is denoted by *. The proportion of people that prefer the risky gamble in the Description and Experience conditions are provided. Sample sizes used during the experience conditions are average sample sizes across both options. Based on data from Wulff et al. ().

description–experience gap (Hau et al., ; Hertwig, ; Hertwig & Erev, ). One place where this gap has been exposed is with decisions among gambles used to exemplify the fourfold pattern of risk attitudes (Tversky & Kahneman, ; Tversky & Fox, ), as summarized in Table .. Focusing first on decisions from description, in the “gain” domain (top two rows of Table .), people behaved as risk-averse, preferring the safe option when the probability of winning was high (.). However, when the gamble had the same expected value, but the probability of winning was low (.) people chose the risky option, thus revealing the reversed preference. The fourfold aspect of this choice pattern emerged when people were presented with the same options but now with outcomes in the loss domain. In this case, the set of preferences flipped, as shown in the bottom two rows of Table .. People choose in a risk-averse way when the stated probability of losing is low, but in a risk-seeking way when it is high. Looking at the lotteries and choice proportions in Table . reveals how this fourfold pattern has been explained in terms of probability weighting. Probability weighting refers to the concept that people do not weight outcomes by their objective probabilities but rather by decision weights that reflect the psychological impact of events on the desirability of prospects. In particular, in description-based choices people appear to choose as if they overweight rare events (e.g., overweighting the probability of . of obtaining a  in problems  and ), creating a fourfold pattern (Tversky & Kahneman, ). Indeed, as Barberis () summarized in his review of prospect theory, the theory’s assumed probability weighting “leads the individual to overweight the tails of any distribution – in other words, to overweight unlikely extreme outcomes” (p. ).

https://doi.org/10.1017/9781009002042.004 Published online by Cambridge University Press

Decisions from Experience



Before we turn to how this fourfold pattern flips with DFE, let us emphasize that just as there are different experienced-based decisions outside the lab there are different ways to experimentally study these decisions (Hertwig & Erev, ). In general, DFE occur when people sample the choice options and learn from samples of experienced outcomes, such as coming across rewards and losses when sampling monetary gambles. Experimentally, this type of decision has been implemented by replacing the stated monetary gambles with money-machines of sorts (i.e., bandit problems: Berry & Fristedt, ). Typically, these machines are computerized buttons where clicking a button instantiates a random draw from the respective payoff distribution (gamble). In the case of the gamble of winning € with a probability of  percent (and a  percent chance of winning nothing), either a € or a € would be drawn. This sampled experience could be akin to trial-and-error learning, with each sample being consequential to the decision maker, and the learning being restricted to outcomes from the chosen option (the partial feedback paradigm; Barron & Erev, ; Erev & Barron, ). Alternatively, the sampled experience may be observational, such that the samples themselves are not consequential and only the final choice is consequential (the sampling paradigm; Hertwig et al., ). Sampled experiences could even come from memory where a decision maker learns a set of outcomes and then later must decide retrieving these instances or information about the instances from memory (Hotalling et al., ). From there, one can modify the tasks further, providing outcome information about both the sampled and nonsampled options (the full feedback paradigm; Yechiam & Busemeyer, ). One may even show a running record of past samples (the record paradigm; Hau et al., ). Last but not least, one can also model the situation where people have access to both descriptions of the options and the opportunity to sample the options (Erev et al., ; Jessup et al., ). Regardless of how and what kind of experience is implemented in these paradigms, one typically finds a discrepancy between DFE and DFD: the description–experience gap. In the case of the fourfold pattern and returning to Table ., the choice pattern reverses. In gains, when the probability of winning is high, people prefer the safe option, but when the probability is low, people prefer the safe option. And vice versa for losses. This reversal suggests the opposite weighting of rarity in DFE than in DFD. In DFE, choices appear as if rare events receive too little rather than too much weighting (measured against their objective probabilities).

https://doi.org/10.1017/9781009002042.004 Published online by Cambridge University Press



 . ,  

This pattern is robust across gambles. For instance, a meta-analysis synthesizing  datasets using the sampling paradigm has shown that the choice proportions for the rare event-underweighted option differ by on average . percent between description and experience, with the magnitude of the gap varying considerably across problem types and the probability magnitude of the rarest event (Wulff et al., ). The trend also holds up once the heterogeneity in people’s individual choice behavior is accounted for (Hertwig & Pleskac, ; Regenwetter & Robinson, ) (though see Kellen et al., ). A similar pattern, and if anything, a larger gap, appears when people make DFE with the feedback paradigm (Barron & Erev, ), and even when they have access to the descriptions of the monetary gambles (Erev et al., ). Yet, the paradigms also produce some differences. Studies that use the sampling paradigm tend to find clear underweighting of rare events only when three additional conditions hold: () The decision makers do not simultaneously receive a description of the possible outcomes (Abdellaoui et al., ; Erev et al., ); () when the choice problem involves a safe and a risky option (Glöckner et al., ; Fielder, ; Wulf et al., ); and () the value of the specific outcome is known at the time of sampling (as opposed to learning about the value at the conclusion of sampling) (Hotalling et al., ). One key reason why a description–experience gap emerges is the small sample size of sampled experience people base their choice on. One way to see that people rely on small samples is via the sampling paradigm where we observe the number of samples people take to make a decision, or the objective sample size. For instance, across the sort of gambles shown in Table . (involving a risk and a safe option), the median sample size in Wulff et al.’s () meta-analysis was , implying a sample size of about  samples per option (assuming sampling effort is symmetrically allocated). The objective sample size is not the only way people end up relying on small samples in DFE. They also base their decision on a functionally small sample when more weight is placed on recent than on earlier experience (Barron & Erev, ; Hertwig et al., ), when they retrieve a small sample of experiences from memory (Gonzalez & Dutt, ; Plonsky et al., ), or for strategic reasons when people cannot afford to keep sampling when other competitors may come and swipe the options (Hintze et al., ; Phillips et al., ). 

In choices between safe and risky gambles – the problem type often used to measure people’s risk preferences – the difference amounts to about  percentage points. In choices between two risky gambles, in contrast, it is about  percentage points.

https://doi.org/10.1017/9781009002042.004 Published online by Cambridge University Press

Decisions from Experience



Returning to the description–experience gap, basing a decision on small samples affects how rare events can impact the decision. In the most extreme case, small samples could mean that individuals will not even get to experience the rare event. For instance, if people base their decision on  samples from an option offering $ with a probability of . and $ with a probability of ., then the samples will not include the rare event ($) in  percent of the cases. A small sample also means that most samples will contain the rare event less frequently than expected, given its objective probability. In the example above, the expected number of experiences of the rare event in  samples is close to , with  percent of individuals expected to see the rare event exactly twice and  percent to see it more often. But a larger proportion of cases –  percent – are expected to contain the rare event only once or not at all. This is because the binomial distribution of samples is right-skewed for events with a probability smaller than ., implying more mass below the expected value than above. Due to this statistical regularity, small samples can result in DFE appearing as if rare events are underweighted, resulting in a description–experience gap in terms of rare events (Table .). Never observing a rare event or observing it less often than expected can have profound consequences on how people evaluate the attractiveness of different options. Why do people tend to break off their search so soon, foregoing the opportunity to gain a more accurate picture of the available options? Indeed, relying on small samples is often seen as erroneous reasoning (Tversky & Kahneman, ), and in fact can lead to a tendency to probability match (Plonsky et al., ). Yet, small samples themselves are not necessarily detrimental to decision making. In many cases, in a representative set of decision problems a very small set of samples is nearly as good as an optimal decision based on a full probability distribution (Hertwig & Pleskac, ; Vul et al., ). In fact, once the mental or material costs of sampling are figured into the equation, a policy of making decisions based on very few samples (even just one) can be globally optimal. Moreover, small samples can also support the decision maker by amplifying the difference between the expected payoffs of the options, rendering choice easier (Hertwig & Pleskac, ). We should emphasize that description–experience gaps that arise via the sample sizes people use are not limited to risky decisions. Modes of learning about the probabilistic texture of the world – via sampling and experience versus via symbolic descriptions – also appear to matter in numerous domains of choice, judgment, and reasoning. For instance, there is evidence for description–experience gaps in intertemporal choice

https://doi.org/10.1017/9781009002042.004 Published online by Cambridge University Press



 . ,  

(Dai et al., ), social interaction in strategic games (Martin et al., ), ambiguity aversion (Dutt et al., ; Gu¨ney & Newell, ), consumer choice (Wulff et al., ), financial risk-taking (Lejarraga et al., ), medical judgments and decisions (Armstrong & Spaniol, ; Fraenkel et al., ; Lejarraga et al., ; Wegier & Shaffer, ), adolescent risk-taking (van den Bos & Hertwig, ), categorization (Nelson et al., ), visual search (Zhang & Houpt, ), and causal reasoning (Rehder & Waldmann, ). ..

People Learn Adaptively

The amount of information, that is, the sample sizes that people take, is only one of the dimensions by which people’s actions during DFE shape what they see. How a person learns from experience can also shape what they see. To appreciate this, consider a person faced with learning from experience while making consequential choices and only can observe the outcome they choose (i.e., a partial feedback paradigm). In this case, the person faces an exploration-exploitation dilemma between exploring alternatives that are perceived as potentially inferior in hopes of learning more about the alternative and exploiting the alternative the decision maker believes has the highest expectancy at that point in time (Kaelbling et al., ). A good approach to solving this dilemma is to implement a strategy where the probability of sampling from an alternative depends on the history with those options so that good experiences with that option increase the chances of sampling from it and bad experiences decrease the chances. Such a strategy aligns with Thorndike’s law of effect (Thorndike, ) and is also consistent with reinforcement learning models (Bush & Mosteller, ; Pleskac, ; Sutton & Barto, ). Overall, this adaptive process of learning from past successes and failures has the consequence of reducing the chances of choosing an alternative again if past outcomes were poor. It also can be optimal in terms of addressing the exploration–exploitation dilemma (Berry & Fristedt, ; Burnetas & Katehakis, ). However, by asymmetrically sampling from presumably good payoff distributions, people lose the chance to accumulate information from other alternatives. This asymmetry in the accumulation of information can have the unfortunate consequence of introducing a bias into the decision maker’s expectations about the average return from these alternatives (Denrell, , ; Denrell & March, ; March, ). For instance, when rare positive outcomes are unlikely to occur from a given alternative, it is unlikely that the decision

https://doi.org/10.1017/9781009002042.004 Published online by Cambridge University Press

Decisions from Experience



maker will experience them. As a result, the decision maker is likely to form and retain a low expectation for this alternative. This low expectation, in turn, makes it unlikely that the decision maker will visit and revisit the respective option. This behavior will make the decision maker appear risk-averse by choosing a sure thing over a risky option. Thus, adaptive sampling can be sufficient to account for the risk-averse choices in gains and risk-seeking in the losses with low probability events (Denrell, ). Asymmetrical sampling is also one reason for the emergence of a description–experience gap when people make DFE with partial feedback (Barron & Erev, ; Lejarraga & Gonzales, ). For instance, when faced with a choice between low-probability, big gain gamble like Problem  in Table ., this adaptive learning strategy will tend to favor the safe certain option, in contrast to making a decision from description with the same option. The opposite is true for a low probability, low payoff gain. ..

People Engage in Similarity-Based Sampling

There is still another factor that can contribute to how people’s actions shape what they see during DFE. In experience-based decision making, people typically must sample past experiences from memory. Even if people have just now sampled from an option, memory is needed to represent this episode of experience. One key property of how people sample from their memories in these situations seems to be a contingencybased retrieval process, wherein they retrieve memories of outcomes that occurred under similar situations (Plonsky et al., ). Such a retrieval process is consistent with Skinner’s () assertion that behavior is selected by contingencies of reinforcement. It is also consistent with how basic memory systems operate, where the likelihood of retrieving an item from memory is a function of how similar an item is to the current situation or context (Anderson et al., ; Dougherty et al., ; Gonzalez & Dutt, ; Gonzalez et al., ). Though we are still learning how to model this process, a reasonable representation of this contingency-based sampling process is as follows. People monitor the last k outcomes from an option. To make a choice, the decision maker retrieves all the samples that match the same sequence of k outcomes, averages across those samples and selects the option with the highest average. Such a strategy recreates human choice behavior reasonably well, including capturing some of the oscillations that arise in people’s choice behavior as they experience a rare event (Plonsky et al., ). This strategy is interesting for at least two reasons. First, it is highly effective in

https://doi.org/10.1017/9781009002042.004 Published online by Cambridge University Press



 . ,  

dynamic settings where there are repeated regularities. Such a situation might occur when a person is deciding whether to go to a local restaurant on a particular night. Their experience with the restaurant has been good, but occasionally, on Wednesday nights they, have had some poor meals. Deciding based on an average impression across these experiences would not reflect the fact that different cooks work in the kitchen on different nights. In these situations, deciding on using contingency-based rules (e.g., using experiences other than Wednesday night) will be more effective than one that relies on all the samples or even one that discounts outcomes via a decay process. Second, the contingency-based strategy highlights another means by which people can functionally decide on a small sample of outcomes. Thus, in many cases, this mechanism can create a similar effect of underweighting rare events.

. How Does What People See Shape How They Act? We just reviewed how people’s exploratory actions shape, among other factors, what they perceive. Naturally, what is being perceived shapes subsequent actions leading to choices consistent with underweighting of rare events, risk-aversion and risk-seeking, or even systematic dynamic changes in their preferences. However, there are more direct and immediate ways by which people’s perceptions and experiences shape their choices, and vice versa. We turn to these next. ..

The Mere Presentation of an Outcome Impacts Choices

Another approach to thinking about the psychological impact of rare events is not to focus on as-if underweighting of rare events in experience, but on the assumed as-if overweighting of rare events in description. Why, once stated, does an event appear to receive too much psychological weight? A truly convincing answer to this question is still amiss, but we have some candidate explanations such as the mere-presentation effect. Hertwig et al. () and Erev et al. () suggested that mere mentioning of a rare event () in terms of the propositional (symbolic) representations of options in decisions from description – for instance, “ with probability .;  otherwise” (choice Problem  in Table .) – lends weight to it (a mere-presentation effect). Furthermore, presenting the rare () on a par with the common event () may direct more equal allocation of attention to both events than is warranted by their actual probabilities. To the extent that attention translates into decision weight as some research

https://doi.org/10.1017/9781009002042.004 Published online by Cambridge University Press

Decisions from Experience



suggests (Pachur et al., ; Weber & Kirsner, ; Zeigenfuse et al., ), the weights of rare and common events will draw closer to one another than they should. In contrast, DFE gives rise to an analogical representation. For instance, ten draws, distributed across time, from the option “ with probability .;  otherwise” may cause this experience: , , , , , , , , , . In this stream of experience, the frequency of the option’s events can be read off directly. Moreover, to the extent that attention is allocated as a function of experienced frequency, the resulting decision weights may reliably reflect the sample probabilities (and, by extension, the objective probabilities). The power of mere presentation becomes visceral if one pictures a conversation with a physician who mentions the possibility of rare but grave disease after being told about an odd but unspecific symptom, or of an insurance salesperson who mentions that a malfunctioning water heater can cause extensive water damage. In both cases, the mere mention of the rare event and its potentially severe associations and consequences inevitably draws the person’s attention. It turns out that the mere presentation of an outcome can have a substantial effect on the actions people take. For instance, in the sampling paradigm, mentioning possible outcomes associated with each option reinstates the certainty effect or the tendency to select the safer of two options when the safer option offers a good outcome with certainty (Table .). That is, the simple presentation of a possible outcome can make DFE look more like DFD (Erev et al., ) (see also Hotalling et al., ). Interestingly, while merely mentioning an outcome impacts choice, the mere presentation of the possible outcomes does not appear to change the number of samples people take substantially. In the aforementioned study, people who were merely presented with all options sampled on average  times, barely different from the average sample size of  in the study’s no-description, “pure” DFE condition. .. The Saliency of Experienced Outcomes Shapes Actions A second and related pathway through which people’s experience impacts their actions is via the saliency of experienced outcomes relative to their context. It is well established that context matters in perception: colors can appear different because of the other colors around them (Lotto & Purves, ) and lines can appear shorter or longer depending how other lines are configured around them (Mu¨ller-Lyer, ). Similar effects extend to many areas of decision making (Huber et al., ; Simonson & Tversky,

https://doi.org/10.1017/9781009002042.004 Published online by Cambridge University Press



 . ,  

; Trueblood, ; Trueblood et al., ; Tversky & Simonson, ). There are many instances demonstrating how the saliency of experienced outcomes shapes experienced-based choices (Ludvig et al., ; Ludvig & Spetch, ; Pleskac et al., ; Spektor et al., ; Tsetsos et al., ; Zeigenfuse et al., ). Specifically, at least in some cases the salience of experienced outcomes can give rise to another description–experience gap, one that is not the result of rarity but its opposite. Consider, for instance, the choice between a guaranteed $ or a / chance at $. When making a decision between these two stated options people are usually risk-averse for gains, but risk-seeking for losses – a reflection effect similar to results in Table .. However, the opposite choice pattern appears for DFE in the partial-feedback paradigm (Ludvig et al., ; Ludvig & Spetch, ). The reason why the pattern reverses is that the best (or worst) possible option in a sample has a disproportionate impact on people’s ultimate choice, even for those options that do not involve a rare event. In fact, in the present case, the outcomes were even equally likely. .. The Explanatory Power of Studying the Dynamic Interplay between Perceiving and Acting This dynamic interplay between perceiving and choosing that unfolds in DFE is key to understanding and predicting choice behavior. As we have shown, there are several different ways in which how people act impacts what they sense and what they sense shapes how they act. These different features of this dynamic system give rise to specific phenomena during DFE like underweighting of rare events and risk preferences. Equally important, it may help us better understand and predict human behaviors beyond risky choice. We turn now to a few illustrations. .. Experiencing Macroeconomic Shocks and Risk-Taking in the Stock Market Using historical survey data, Malmendier and Nagel () asked: are people in the USA who experienced a macroeconomic shock like the Great Depression more risk-averse than other cohorts? What they found was that individuals who have experienced low stock market returns throughout their lives reported lower willingness to take financial risk, were less likely to participate in the stock market, invested a lower fraction of their liquid

https://doi.org/10.1017/9781009002042.004 Published online by Cambridge University Press

Decisions from Experience



assets in stocks if they participated, and were more pessimistic about future stock returns. Experiencing a macroeconomic shock, however, is not the only way to learn about a catastrophic event. People can, of course, learn about those profound economic downturns from records of the past such as books, films, or media coverage (Lejarraga et al., ). That is, in a great simplification, some people learn about historic macroeconomic shocks from experience whereas others learn about it from descriptions in books. The traditional financial theory assumes that if wealth and income are held constant, personally experiencing unfavorable (or favorable) financial outcomes or learning about them through description should not lead to different behavior than other learning modes. But Malmendier and Nagel’s analysis suggests otherwise. Indeed Lejarraga et al. () experimentally tested whether the mode of learning about shocks impacts investment behavior. They used actual stock data from the Spanish stock index, the IBEX-, across  monthly periods (from July  to September ). During this period, the IBEX  went through two shocks: a massive  percent drop in price from  to , and then after an upturn, another massive drop of  percent between  and  (Figure .a). One group of participants “lived” through the experimentally simulated crisis, making investments before the  dot-com crisis, whereas other learned about the crises from description via a graph showing the price of stocks over time. Investors who learned about the market from personal experience took less financial risk than those who learned from graphs, echoing the description–experience gap observed in risky choice (Figure .b). The result is not limited to economic downturns. Reversing the market, thus turning the busts into booms, resulted in those investors who experienced the boom taking more risk than those who did not. Altogether these results suggest that understanding how people learn about economic or societal shocks (e.g., economic depressions or global pandemics) may help us better understand how the shocks will change people’s preferences, beliefs, or both; and how those changes impact eventual choices (see also Banartzi & Thaler, ). In particular, we suggest that the fact that Lejarraga et al. () established a description–experience gap raises the possibility that the dynamic interplay between perceiving and acting may even play an important contributing role in these financial swings. An intriguing possibility is that some subset of the factors that we have highlighted in this chapter can help explain and (partly) predict these. For instance, could it be that relying on the most recent events to make decisions leads some people to largely underweight rare events, but experiencing some sort of

https://doi.org/10.1017/9781009002042.004 Published online by Cambridge University Press



 . ,  

Price of stocks (thousands)

(a) 15

Evaluation window

10

5

Shock experience No-shock experience Shock description No-shock description

0 0

50

100

150

0

50

100 Investment period

150

(b)

Percentage invested in stocks

100

75

50

25

0

Figure . (a) Experimental conditions and prices of stocks in thousands of euros (i.e., index fund) across  monthly periods. At the bottom of the panel, there are four arrows. Solid arrow segments indicate periods of investment. Dotted arrow segments indicate periods of learning from descriptive sources. The four conditions were compared over the evaluation window from period  to period . (b) Percentages invested in stocks by condition. Dots indicate individuals’ allocations. The think lines show the mean percentages; the thicker lines show the data smoothed by local polynomial regression fitting. Based on Lejarraga et al. ().

exogenous shock could lead to a sudden overweighting of rare events as in Erev et al. ()? Financial risk-taking is only one dimension of risk-taking behavior, but the impact of the dynamic interplay of sensing and acting would seem to

https://doi.org/10.1017/9781009002042.004 Published online by Cambridge University Press

Decisions from Experience



extend to other risk-taking domains, including social (Denrell, ), health (Pleskac, ), and across the lifespan (Mata et al., ). More generally, Hertwig and Wulff () have proposed a description– experience framework of risk in which they distinguish between two powerful but imperfect teachers of risk: description and experience. Based on this framework, they derived distinct behavioral predictions based on the psychological dynamics associated with these teachers. ..

Measuring Risk Preferences in Adolescence through the Act of Sampling

Adolescence is often portrayed as a turbulent period of wildly erratic behavior. The empirical data supports to some extent this portrayal. Mortality and morbidity caused by risky and impulsive behaviors increase in adolescence. In the USA, for instance, nearly  percent of deaths in the second decade of life result from unintentional injuries incurred in circumstances such as car accidents and drug overdose; in addition, there is a notable rise in criminal behavior in adolescence (van den Bos et al., ). There are number of possible explanations for these patterns; one prominent one is that the period of adolescence is characterized by a temporary reduction in anxiety about the unknown, fostering a higher and adaptive intensity of exploratory and risk-taking behavior. Is there evidence for this proposition? Surprisingly few studies of adolescent behavior have presented partially known or even unknown options that one needs to explore. Instead, most studies have presented adolescents with choice problems where outcomes and probabilities are explicitly stated – that is, DFD or business as usual. This is unfortunate as when adolescents engage in potentially criminal behaviors, use drugs, or engage in unprotected sex, they may have only a vague idea of the possible consequences of their actions as they lack personal experience. Van den Bos and Hertwig () examined what happens when adolescents have the opportunity to “look before they leap,” that is, when they are empowered to sample for information, hence reducing the level of uncertainty, before embarking on a choice. Specifically, they asked people ranging in age from  to  years to make risky (fully described monetary gambles), ambiguous (incomplete information on probabilities), and uncertain (unknown payoff distributions that, however, can be sampled) choices. Two major results emerged: In the canonical DFD (risky choices), children took more risks than adolescents, consistent with past findings. In contrast, under ambiguity and uncertainty a nonlinear developmental

https://doi.org/10.1017/9781009002042.004 Published online by Cambridge University Press



 . ,  

trend emerged, with tolerance for ambiguity and tolerance for small samples peaking in adolescence. Specifically, adolescents search for markedly less information before making a consequential decision than either children or adults. This suggests that adolescents have different ambiguity and uncertainty preferences than children and adults. It also means that, depending on a person’s development stage, the way a preference is best revealed is not necessarily through the eventual choice but through the sampling behavior that gives rise to a choice. In other words, experiential sampling may offer a much more transparent window on adolescents’ developing risk preferences than their decisions from description. A question that follows from this is whether other aspects of the adolescents’ decision-making module for DFE are different, beyond just the samples of experience. For instance, are their adaptive learning processes different? Or does the mere presentation of outcomes have a different impact on adolescents? These are just some of the questions that follow from employing a description– experience framework to adolescent decision making. .. Experimenting with a ‘Descriptive’ or ‘Experiential’ Methodology and Human Competence and Rationality How competent are people at sound statistical intuitions and inferences? Surprisingly, psychological research has given two essentially opposite answers. In , Peterson and Beach reviewed more than  experiments concerned with people’s statistical intuitions. Invoking the Brunswikian metaphor of the mind as an intuitive statistician, they concluded that “probability theory and statistics can be used as the basis for psychological models that integrate and account for human performance in a wide range of inferential tasks” (p. ). Yet in a  Science article, Tversky and Kahneman rejected this conclusion, arguing that “people rely on a limited number of heuristic principles which reduce the complex tasks of assessing probabilities and predicting values to simple judgmental operations” (p. ). With that, they introduced the heuristics-and-biases research program, which until today has profoundly shaped how psychology and the behavioral sciences, more generally, view the mind’s competences and rationality. Based on this research, scholars have suggested that people “lack the correct programs for many important judgmental tasks” (Slovic et al., , p. ) or that human minds “are not built (for whatever reason) to work by the rules of probability” (Gould, , p. ). How was this radical transformation from the mind as an intuitive statistician to

https://doi.org/10.1017/9781009002042.004 Published online by Cambridge University Press

Decisions from Experience



the mind lacking the correct software for appropriate statistical intuitions altogether possible? Recently, Lejarraga and Hertwig () highlighted a previously neglected driver for this transformation: The heuristics-and-biases program established a default experimental protocol in behavioral decision research that predominantly relied on described scenarios (e.g., framing problems, statistical reasoning problems such as the Linda problem, the engineer– lawyer problem, the maternity-ward problem) rather than learning and experience. They demonstrated this shift with an analysis of  experiments, which shows that the descriptive protocol has dominated post- research. Specifically, they examine four lines of research: Bayesian reasoning, judgments of compound, framing and anchoring and adjustment – all of them were mostly based on description-based experimental protocols in which learning, sampling, feedback or physical instantiation of the experimental stimuli receded into the background and experimental stimuli were described in vignettes or other written representations. In contrast, the research reviewed by Peterson and Beach () was mostly based on an experiential protocol in which people often had to sample information (e.g., in Bayesian reasoning), received feedback, stimuli were physically present and not invoked descriptively, and in which people’s behavior was mustered across many trials. Turning to another equally puzzling contradiction in research on statistical intuitions and probabilistic reasoning, Schulze and Hertwig () suggested that one reason why babies and nonhuman primates, relative to adults (tested in the s and later), have been found to be smart, intuitive statisticians who are surprisingly capable of statistical learning and inference is, again, the protocol in the experiments: babies’ and primates’ statistical intuitions are not tested and measured with symbolic, abstract descriptions, but rather based on direct experience of statistical information. Taken together, Lejarraga and Hertwig’s () and Schulze and Hertwig’s () analyses converge in finding and concluding that how the human mind’s competence for statistical reasoning is investigated is one important source for the conflicting views on the human mind’s competence and, by extension, rationality: description-based protocols are more likely to support the view of the profound error-proneness of statistical intuitions, whereas experience-based protocols, with an emphasis on learning and repetition, are more likely to supports the view that nonhuman primates, babies, and adults do not lack the cognitive software for good statistical intuitions.

https://doi.org/10.1017/9781009002042.004 Published online by Cambridge University Press



 . ,  

It is tempting to speculate that perhaps one reason for these different conclusions about competence may have something to do with the operation of our decision-making system and the dynamic interplay between perceiving and acting. Perhaps experience-based inferences, judgments, and choices are the ancestral habitat to which our cognitive system is adapted. Therefore, our cognitive algorithms and learning faculties may be more adapted in such environments than in environments with symbolic representations of (statistical) information. Indeed, human symbolic behaviors are relatively recent developments when measured in evolutionary terms – estimates suggest these behaviors developed about , years ago (see Tylén et al., ). Furthermore, the mind’s cognitive algorithms that can perform statistical inferences are not likely to be tuned to symbiotically described probabilities or percentages because these are recently invented concepts and notations (Gigerenzer et al., ). Finally, in a world in which uncertainty abounds, reliance on the symbolic description is often not an option as no information is easily available that could be abstracted into symbolically stated probabilities or percentages. If this line of reasoning were correct, one would expect experienced-based learning and decision making to be the default mode of human cognition. This, in turn, could mean that humans had ample opportunity and pressures to develop those experienced-based faculties. In contrast, the processing of symbolic representations and reasoning require explicit instructions (e.g., through parents or schools).

. Conclusion Philosophers have argued that experience, as opposed to deduction, is the ultimate source of our knowledge (Dewey, ; Locke, /). Regardless of these deeper philosophical concerns, when faced with a choice under incomplete knowledge, people can turn to the pragmatic option of actively collecting information (Hertwig et al., ). This is what people are doing when they must make DFE. They collect information to learn about the outcomes, the probabilities, the potential delays in receiving the outcomes, and other features associated with the different possible options. In doing so, they must adjust how they sample the environment based on their goals, their cognitive abilities, and in some cases their past experiences. This adaptive exploration creates a linkage between how people explore and the choices they make, and vice versa. It is this active collection of information that gives rise to a dynamic interplay between perceiving and acting during DFE. There have been some

https://doi.org/10.1017/9781009002042.004 Published online by Cambridge University Press

Decisions from Experience



promising inroads to this problem of explaining the process by which people make DFE (e.g., Erev et al., ; Kellen et al., ; Markant et al., ; Regenwetter & Robinson, ). However, by and large these approaches have focused solely on the choices people make while taking the search and sampling processes during DFE as a given. We contend that a truly successful model of DFE needs to successfully capture how people search, how people choose, and the interaction between search and choice. Only then will we be able to successfully explain and predict DFE across a wide variety of situations. REF ERE NCE S dAbdellaoui, M., L’Haridon, O., & Paraschiv, C. (). Experienced vs. Described uncertainty: Do we need two prospect theory specifications? Management Science, (), –. https://doi.org/./mnsc .. Anderson, J. R., Bothell, D., Byrne, M. D., Douglass, S., Lebiere, C., & Qin, Y. (). An integrated theory of the mind. Psychological Review, (), –. https://doi.org/./-X... Armstrong, B., & Spaniol, J. (). Experienced probabilities increase understanding of diagnostic test results in younger and older adults. Medical Decision Making, (), –. https://doi.org/./ X Banartzi, S., & Thaler, R. H. (). Myopic loss aversion and the equity premium puzzle. Quarterly Journal of Economics, (), –. Barberis, Nicholas C. . Thirty years of prospect theory in economics: A review and assessment. Journal of Economic Perspectives, (): –. Barron, G., & Erev, I. (). Small feedback-based decisions and their limited correspondence to description-based decisions. Journal of Behavioral Decision Making, (), –. https://doi.org/./bdm. Berry, D., & Fristedt, B. (). Bandit problems. London: Chapman & Hall. Burnetas, A. N., & Katehakis, M. N. (). On the finite horizon one-armed bandit problem. Stochastic Analysis and Applications, , –. Bush, R. R., & Mosteller, F. (). Stochastic models for learning. New York: John Wiley. Camilleri, A. R., & Newell, B. R. (). When and why rare events are underweighted: a direct comparison of the sampling, partial feedback, full feedback and description choice paradigms. Psychonomic Bulletin Review, (), –. https://doi.org/./s--- Dai, J., Pachur, T., Pleskac, T. J., & Hertwig, R. (). What the future holds and when: A description–experience gap in intertemporal choice. Psychological Science, (), –. https://doi.org/./ Denrell, J. (). Why most people disapprove of me: Experience sampling in impression formation. Psychological Review, (), –. https://doi .org/./-X...

https://doi.org/10.1017/9781009002042.004 Published online by Cambridge University Press



 . ,  

(). Adaptive learning and risk taking. Psychological Review, (), –. https://doi.org/./-x... Denrell, J., & March, J. G. (). Adaptation as information restriction: The hot stove effect. Organization Science, (), –. Dewey, J. (). Studies in logical theory (Vol. ). Chicago: University of Chicago Press. Dougherty, M. R. P., Gettys, C. F., & Ogden, E. E. (). MINERVA-DM: A memory processes model for judgments of likelihood. Psychological Review, (), –. https://doi.org/./-X... Dutt, V., Arlό-Costa, H., Helzner, J., & Gonzalez, C. (). The description– experience gap in risky and ambiguous gambles. Journal of Behavioral Decision Making, , –. https://doi.org/./bdm. Erev, I., & Barron, G. (). On adaptation, maximization, and reinforcement learning among cognitive strategies. Psychological Review, (), –. https://doi.org/./-X... Erev, I., Ert, E., Plonsky, O., Cohen, D., & Cohen, O. (). From anomalies to forecasts: Toward a descriptive model of decisions under risk, under ambiguity, and from experience. Psychological Review, (), –. https://doi.org/./rev Erev, I., Glozman, I., & Hertwig, R. (). What impacts the impact of rare events. Journal of Risk and Uncertainty, , –. https://doi.org/ ./s–--z Fraenkel, L., Peters, E., Tyra, S., & Oelberg, D. (). Shared medical decision making in lung cancer screening: Experienced versus descriptive risk formats. Medical Decision Making, (), –. https://doi.org/./ X Gigerenzer, G., Swijtink, Z., Porter, T., Daston, L., Beatty, J., & Kru¨ger, L. (). The empire of chance: How probability changed science and everyday life. Cambridge, UK: Cambridge University Press. Glöckner, A., Hilbig, B. E., Henninger, F., & Fiedler, S. (). The reversed description–experience gap: Disentangling sources of presentation format effects in risky choice. Journal of Experimental Psychology: General, (), –. https://doi.org/./a Gonzalez, C., & Dutt, V. (). Instance-based learning: Integrating sampling and repeated decisions from experience. Psychological Review, (), . Gonzalez, C., Lerch, J. F., & Lebiere, C. (). Instance-based learning in dynamic decision making. Cognitive Science, (), –. Gould, S. J. (). Bully for Brontosaurus: Further reflections in natural history. Penguin. Gu¨ney, S., & Newell, B. R. (). Overcoming ambiguity aversion through experience. Journal of Behavioral Decision Making, (), –. https:// doi.org/./bdm. Hau, R., Pleskac, T. J., & Hertwig, R. (). Decisions from experience and statistical probabilities: Why they trigger different choices than a priori probabilities. Journal of Behavioral Decision Making, (), –. https:// doi.org/./bdm.

https://doi.org/10.1017/9781009002042.004 Published online by Cambridge University Press

Decisions from Experience



Hau, R., Pleskac, T. J., Kiefer, J., & Hertwig, R. (). The description– experience gap in risky choice: The role of sample size and experienced probabilities. Journal of Behavioral Decision Making, (), –. https:// doi.org/./bdm. Hertwig, R. (). Decisions from experience. In G. Keren & G. Wu (Eds.), Blackwell’s handbook of judgment & decision making (Vol. , pp. –). Hoboken, NJ :Wiley Blackwell. Hertwig, R., Barron, G., Weber, E. U., & Erev, I. (). Decisions from experience and the effect of rare events in risky choice. Psychological Science, (), –. https://doi.org/./j.-...x (). Decisions from experience: Sampling and updating of information. Cambridge, UK: Cambridge University Press. Hertwig, R., & Erev, I. (). The description–experience gap in risky choice. Trends in Cognitive Sciences, , –. https://doi.org/./j.tics ... Hertwig, R., & Pleskac, T. J. (). Decisions from experience: Why small samples? Cognition, , –. (). The construct–behavior gap and the description–experience gap: Comment on Regenwetter and Robinson (). Psychological Review, (), –. https://doi.org/./rev Hertwig, R., Pleskac, T. J., Pachur,T., & Center for Adaptive Rationality (). Taming uncertainty. Cambridge, MA: MIT. https://doi.org/./mit press/.. Hertwig, R., & Wullf, D. (). A description–experience framework of the dynamic response to risk. Perspectives on Psychological Science, (), – https://doi.org/./ Hintze, A., Phillips, N., & Hertwig, R. (). The Janus face of Darwinian competition. Scientific Reports, , . https://doi.org/./srep Hotaling, J. M., Jarvstad, A., Donkin, C., & Newell, B. R. (). How to change the weight of rare events in decisions from experience. Psychological Science, (), –. https://doi.org/./ Huber, J., Payne, J. W., & Puto, C. (). Adding asymmetrically dominated alternatives: Violations of regularity and the similarity hypothesis. Journal of Consumer Research, (), –. Hurley, S. L. (). Consciousness in action. Cambridge, MA: Harvard University Press. Hurley, S. (). The shared circuits model (SCM): How control, mirroring, and simulation can enable imitation, deliberation, and mindreading. Behavioral and Brain Sciences, (), –. https://doi.org/./SX Jessup, R. K., Bishara, A. J., & Busemeyer, J. R. (). Feedback produces divergence from prospect theory in descriptive choice. Psychological Science, (), –. https://doi.org/./j.-...x Kaelbling, L. P., Littman, M. L., & Moore, A. W. (). Reinforcement learning: A survey. Journal of Artificial Intelligence, , –. https://doi .org/./jair.

https://doi.org/10.1017/9781009002042.004 Published online by Cambridge University Press



 . ,  

Kahneman, D. (). Thinking, fast and slow. New York: Farrar, Straus & Giroux. Kahneman, D., & Tversky, A. (). Choices, values, and frames. American Psychologist, (), –. Kellen, D., Pachur, T., & Hertwig, R. (). How (in) variant are subjective representations of described and experienced risk and rewards? Cognition, , –. https://doi.org/./j.cognition... Lejarraga, T., & Gonzales, C. (). Effects of feedback and complexity on repeated decisions from description. Organizational Behavior & Human Decision Processes, , –. https://doi.org/./j .obhdp.. . Lejarraga, T., & Hertwig, R. (). How experimental methods shaped views on human competence and rationality. Psychological Bulletin, (), –. https://doi.org/./bul Lejarraga, T., Pachur, T., Frey, R., & Hertwig, R. (). Decisions from experience: From monetary to medical gambles. Journal of Behavioral Decision Making, (), –. https://doi.org/./bdm. Lejarraga, T., Woike, J. K., & Hertwig, R. (). Description and experience: How experimental investors learn about booms and busts affects their financial risk taking. Cognition, , –. https://doi.org/./j .cognition... (). Experiences and descriptions of financial uncertainty: Are they equivalent? In Ralph Hertwig, Timothy J. Pleskac & Thorsten Pachur (Eds.), Taming uncertainty, –. Cambridge, MA: MIT. https://doi.org/ ./mitpress/ .. Locke, J. (/). An essay concerning human understanding. New York: Dover. Lotto, R. B., & Purves, D. (). An empirical explanation of color contrast. Proceedings of the National Academy of Sciences USA, (), –. https://doi.org/./pnas. Ludvig, E. A., Madan, C. R., & Spetch, M. L. (). Extreme outcomes sway risky decisions from experience. Journal of Behavioral Decision Making,  (), –. https://doi.org/./bdm. Ludvig, E. A., & Spetch, M. L. (). Of black swans and tossed coins: Is the description–experience gap in risky choice limited to rare events? PLoS ONE, (), e. https://doi.org/./journal.pone. Malmendier, U., & Nagel, S. (). Depression babies: Do macroeconomic experiences affect risk-taking? Quarterly Journal of Economics, , –. https://doi.org/./qje/qjq Markant, D., Pleskac, T. J., Diederich, A., Pachur, T., & Hertwig, R. (). Modeling choice and search in decisions from experience: A sequential sampling approach. In R. Dale, C. Jennings, P. Maglio, T. Matlock, D. Noell, A. Warlaumont et al. (Eds.), Proceedings of the th annual conference of the Cognitive Science Society. Austin, TX: Cognitive Science Society.

https://doi.org/10.1017/9781009002042.004 Published online by Cambridge University Press

Decisions from Experience



March, J. G. (). Learning to be risk averse. Psychological Review, (), –. doi:./-X... Martin, J. M., Gonzalez, C., Juvina, I., & Lebiere, C. (). A description– experience gap in social interactions: Information about interdependence and its effects on cooperation. Journal of Behavioral Decision Making, , –. https://doi.org/./bdm. Mata, R., Josef, A. K., Samanez-Larkin, G. R., & Hertwig, R. (). Age differences in risky choice: A meta-analysis. Annals of the New York Academy of Sciences, , –. https://doi.org/./j.-...x Mu¨ller-Lyer, F. C. (). Optische Urteilstäuschungen. Archiv fu¨r Physiologie, Supp, –. Nelson, J. D., McKenzie, C. R. M., Cottrell, G. W., & Sejnowski, T. J. (). Experience matters: Information acquisition optimizes probability gain. https://doi.org/./ Psychological Science, (), –.  Pachur, T., Schulte-Mecklenbeck, M., Murphy, R. O., & Hertwig, R. (). Prospect theory reflects selective allocation of attention. Journal of Experimental Psychology: General, (), . Peterson, C. R., & Beach, L. R. (). Man as an intuitive statistician. Psychological Bulletin, (), –. https://doi.org/./h Phillips, N. D., Hertwig, R., Kareev, Y., & Avrahami, J. (). Rivals in the dark: how competition influences search in decisions under uncertainty. Cognition, (), –. https://doi.org/./j.cognition... Pleskac, T. J. (). Decision making and learning while taking sequential risks. Journal of Experimental Psychology: Learning Memory and Cognition, (), –. https://doi.org/./-... (). Learning models in decision making. In G. Keren & G. Wu (Eds.), Blackwell’s handbook of judgment & decision making, –. Chichester, UK: Wiley Blackwell. Pleskac, T. J., Diederich, A., & Wallsten, T. (). Models of decision making under risk and uncertainty. In J. Busemeyer, J. Wang, J. Townsend, & A. Eidels (Eds.), The Oxford handbook of computational and mathematical psychology (pp. –). Oxford: Oxford University Press. Pleskac, T. J., Yu, S., Hopwood, C., & Liu, T. (). Mechanisms of deliberation during preferential choice: Perspectives from computational modeling and individual differences. Decision, (), –. https://doi.org/./ dec Plonsky, O., Teodorescu, K., & Erev, I. (). Reliance on small samples, the wavy recency effect, and similarity-based learning. Psychological Review,  (), –. https://doi.org/./a Regenwetter, M., & Robinson, M. M. (). The construct–behavior gap in behavioral decision research: A challenge beyond replicability. Psychological Review, (), –. https://doi.org/./rev Rehder, B., & Waldmann, M. R. (). Failures of explaining away and screening off in described versus experienced causal learning scenarios.

https://doi.org/10.1017/9781009002042.004 Published online by Cambridge University Press



 . ,  

Memory & Cognition, (), –. https://doi.org/./s– -- Schulze, C., & Hertwig, R. (). A description–experience gap in statistical intuitions: Of smart babies, risk-savvy chimps, intuitive statisticians, and stupid grown-ups. Cognition, , https://doi.org/./j.cognition.. Simonson, I., & Tversky, A. (). Choice in context: Tradeoff contrast and extremeness aversion. Journal of Marketing Research, (), –. Skinner, B. F. (). Science and human behavior. New York, NY: Macmillan. Slovic, P., Fischhoff, B., & Lichtenstein, S. (). Cognitive processes and societal risk taking. In J. S. Carol & J. W. Payne (Eds.), Cognition and social behavior, –. Potomac, MA: Erlbaum. Spektor, M., Gluth, S., Fontanesi, L., & Rieskamp, J. (). How similarity between choice options affects decisions from experience: The accentuationof-differences model. Psychological Review, , –. https://doi.org/ ./rev Sutton, R. S., & Barto, A. G. (). Reinforcement learning: An introduction. Cambridge, MA: MIT. Thorndike, E. L. (). The law of effect. American Journal of Psychology, (/ ), –. Tylén, K., Fusaroli, R., Rojo, S., Heimann, K., Fay, N., Johannsen, N. N. et al. (). The evolution of early symbolic behavior in Homo sapiens. Proceedings of the National Academy of Sciences, (), –. Trueblood, J. S. (). Multialternative context effects obtained using an inference task. Psychonomic Bulletin and Review, (), –. https:// doi.org/./s--- Trueblood, J. S., Brown, S. D., Heathcote, A., & Busemeyer, J. R. (). Not just for consumers: Context effects are fundamental to decision making. Psychological Science, (), –. https://doi.org/./ Tsetsos, K., Chater, N., & Usher, M. (). Salience driven value integration explains decision biases and preference reversal. Proceedings of the National Academy of Sciences, (), –. Tversky, A., & Fox, C. R. (). Weighing risk and uncertainty. Psychological Review, (), –. https://doi.org/.//-X... Tversky, A., & Kahneman, D. (). Belief in law of small numbers. Psychological Bulletin, (), –. https://doi.org/./h (). Judgment under uncertainty: Heuristics and biases. Science, (), –. https://doi.org/./science... (). Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and Uncertainty, (), –. https://doi.org/ ./BF Tversky, A., & Simonson, I. (). Context-dependent preferences. Management Science, (), –. Ungemach, C., Chater, N., & Stewart, N. (). Are probabilities overweighted or underweighted when rare outcomes are experienced (rarely)? Psychological Science, (), –. https://doi.org/./j.-...x

https://doi.org/10.1017/9781009002042.004 Published online by Cambridge University Press

Decisions from Experience



Van den Bos, W., & Hertwig, R. (). Adolescents display distinctive tolerance to ambiguity and to uncertainty during risky decision making. Scientific Reports, , . https://doi.org/./srep Van den Bos, W., Laube, C., & Hertwig, R. (). How the adaptive adolescent mind navigates uncertainty. In Ralph Hertwig, Timothy J. Pleskac & Thorsten Pachur (Eds.), Taming uncertainty, –. Cambridge, MA: MIT. https://doi.org/./mitpress/.. Vul, E., Goodman, N., Griffiths, T. L., & Tenenbaum, J. B. (). One and done? Optimal decisions from very few samples. Cognitive Science, (), –. https://doi.org/./cogs. Wakker, P. (). Prospect theory for risk and ambiguity. Cambridge: Cambridge University Press. Weber, E., & Kirsner, B. (). Reasons for rank-dependent utility evaluation. Journal of Risk and Uncertainty, (), –. Weber, E. U., Shafir, S., & Blais, A. R. (). Predicting risk sensitivity in humans and lower animals: Risk as variance or coefficient of variation. Psychological Review, (), –. Wegier, P., & Shaffer, V. A. (). Aiding risk information learning through simulated experience (ARISE): Using simulated outcomes to improve understanding of conditional probabilities in prenatal Down syndrome screening. Patient Education and Counseling, (), –. https://doi.org/ ./j.pec... Wulff, D. U., Hills, T. T., & Hertwig, R. (). Online product reviews and the description–experience gap. Journal of Behavioral Decision Making, (), –. https://doi.org/./bdm. Wulff, D., Markant, D., Pleskac,T. J., & Hertwig, R. (). Adaptive exploration: What you see is up to you. In Ralph Hertwig, Timothy J. Pleskac & Thorsten Pachur (Eds.), Taming uncertainty, –. Cambridge, MA: MIT. https://doi.org/./mitpress/.. Wulff, D. U., Mergenthaler-Canseco, M., & Hertwig, R. (). A meta-analytic review of two modes of learning and the description–experience gap. Psychological Bulletin, (), –. https://doi.org/./bul Yechiam, E., & Busemeyer, J. (). The effect of foregone payoffs on underweighting small probability events. Journal of Behavioral Decision Making, , –. Zeigenfuse, M. D., Pleskac, T. J., & Liu, T. (). Rapid decisions from experience. Cognition, (), –. https://doi.org/./j .cognition... Zhang, H., & Houpt, J. W. (). Exaggerated prevalence effect with the explicit prevalence information: The description–experience gap in visual search. Attention, Perception & Psychophysics, (), –. https://doi .org/./s–--

https://doi.org/10.1017/9781009002042.004 Published online by Cambridge University Press

 

The Hot Stove Effect Jerker Denrell and Gaël Le Mens

.

Introduction

On February  and  in , the workers on the London Underground network went on strike. Many stations were closed, and Londoners had to find alternative travel routes. Did Londoners discover new preferable travel routes? And if so, did they change their travel habits? A team of researchers examined these issues using data from Londoners’ use of the Oyster card, which electronically registers all the travels of its owners not only on the underground but also on buses and boats (Larcom et al., ). The researchers found that the strike had long-term effects on Londoners’ travel habits. Some commuters discovered alternative underground routes that were faster than their previously preferred routes and kept using these routes after the end of the strike. Finding the fastest route on the London Underground network is difficult, partly because the official underground map provides a misleading rendition of the real distances between stations. To make the map readable, designers of the map had to change the location of the stations on the map relative to their actual physical locations. As a result, some stations which are geographically close to each other are relatively far away on the map. The underground study reveals that a likely cause for people’s continued use of a suboptimal alternative is that they are not aware of the existence of superior alternatives. If people believe route A is the best, they may not experiment and try another route B. Such avoidance of experimentation is not necessarily irrational. Experimentation is costly and if the current route seems satisfactory, commuters may not want to try alternatives that could G. Le Mens benefited from financial support from grant PID–GB-I/AEI/./  from the Spanish Ministerio de Ciencia, Innovacion y Universidades (MCIU) and the Agencia Estatal de Investigacion (AEI), ERC Consolidator Grant # from the European Commission and BBVA Foundation Grant GQ.



https://doi.org/10.1017/9781009002042.005 Published online by Cambridge University Press

The Hot Stove Effect



result in their arriving late at their workplace. Absent such experimentation, however, mistaken beliefs about the superiority of the chosen route may remain unchallenged. Interestingly, persistent mistakes are characterized by a systematic asymmetry. Suppose a commuter uses route A, that takes  minutes to get her to work and that she believes that route B takes  minutes when it actually only takes  minutes. As a result, she does not try route B and keeps using route A every day. By avoiding route B, the commuter does not discover that route B is faster. The commuter believed that B was worse (slower) than it actually was, and this mistake became persistent because this belief led to avoidance. Contrast this with a situation in which the commuter believes that route B takes  minutes, when it actually takes  minutes. If she believes route A takes  minutes, she tries route B. Doing so, she learns that it takes longer than expected and longer than route A. In this case, the commuter believed that route B was better (quicker) than it actually was, and this mistake was corrected because the belief that B was better than A made her try this route and this enabled her to discover her mistake. This example is only one instance of a much more general phenomenon: errors that consist of overestimating the attractiveness of an alternative are more likely to be corrected than errors of underestimating the attractiveness of an alternative. Learners who overestimate the attractiveness of an alternative will likely try this alternative again, and in doing so they may discover their error. Learners who underestimate the attractiveness of an alternative, in contrast, will avoid the alternative and by doing so they will not be exposed to information that could correct their mistake. Denrell and March () named this asymmetry in error correction, the “Hot Stove Effect” in deference to Mark Twain’s observation about the cat and a Hot Stove: “We should be careful to get out of an experience only the wisdom that is in it and stop there; lest we be like the cat that sits down on a Hot Stove lid. She will never sit down on a Hot Stove lid again and that is well; but also she will never sit down on a cold one” (Twain [], p. ). The Hot Stove Effect has interesting implications for our understanding of judgment biases because it focuses on a particular mechanism that has received scant attention in behavioral science. Even if a judgment were based on all available evidence, and the evidence were accurately interpreted, judgments could be biased as a result of the Hot Stove effect. This is because, when learners select alternatives that produced positive experiences, they also avoid alternatives that produced negative experiences.

https://doi.org/10.1017/9781009002042.005 Published online by Cambridge University Press



 , e¨   

Imagine that you accurately remember all interactions with everybody you have met, and that your impression of each individual is an average of the impressions you had at each encounter with that individual. Despite this arguably unbiased processing of information, your impressions will be subject to a negativity bias: the proportion of people about whom you have negative impressions will be higher than what it would have been, had you interacted a fixed number of times with each individual. As we will show later, this result holds even if impression formation is rational in the sense of adhering to Bayes’ rule. This chapter explains the basic logic behind the Hot Stove Effect and reviews research in psychology and other subjects on how the Hot Stove effect can explain a variety of biases in impression formation and judgment.

. The Basic Logic behind the Hot Stove Effect ..

Negativity Bias Resulting from Avoidance

To explain when and why avoidance generates a negativity bias, suppose a learner samples, in period one, a random variable, X, with expected value equal to zero (E ½X  ¼ ). Suppose the impression after the first observation is the first observation: X  . The probability that the learner chooses the alternative again in period two is P ðX  Þ. If the learner samples the alternative again in period two, the learner draws another value, X  , independent of X  , and the impression after the second period is the average: :ðX  þ X  Þ. If the alternative was not sampled in period two, the impression remains at X  . The expected impression after the second period is E ½ð  P ðX  ÞÞX  þ P ðX  Þ:ðX  þ X  Þ ¼ E ½X    :E ½X  P ðX  Þ þ :E ½X  E ½P ðX  Þ

Because E ½X   ¼ E ½X   ¼ , this simplifies to :E ½X  P ðX  Þ. This expectation is negative whenever P ðX  Þ is an increasing function of X  (Denrell, ). Thus, if the learner is more likely to sample an alternative she has a positive impression of (P ðX  Þ is increasing in X  ), the expected impression after the second period is negative, i.e., below E ½X  ¼ . More generally, if E ½X  ¼ u, the expected impression after the second period is below u, implying a negativity bias. If, in contrast, P ðX  Þ is a decreasing function (the learner avoids alternative with positive impressions) then the expected impression is larger than E ½X  ¼ u:

https://doi.org/10.1017/9781009002042.005 Published online by Cambridge University Press

The Hot Stove Effect



.. A Multi-period Learning Model Consider now a learner who repeatedly chooses between an unknown and uncertain alternative which generates a payoff drawn from a normal distribution f(x) with mean zero and variance σ  and a known alternative with expected payoff equal to zero. We assume that the impression of the unknown alternative is a function of the past observed payoffs. Specifically, suppose the impression of the alternative in period t, zt, is a weighted average of the past impression and the new observed payoff, if any. That is, if the learner chooses the unknown alternative in period t, the impression at the beginning of period t +  is z tþ ¼ ð  bÞz t þ bx t , where b 2 ð,  is the learning rate and x t is the observed payoff. If the learner does not choose the unknown alternative, no new information about its payoff is observed, and the impression stays the same. We set the initial impression, in period one, to , z  ¼  (thus, the initial impression is unbiased). Finally, we assume that the probability that the learner chooses the unknown alternative (instead of the known one) is an increasing function of the impression. Specifically, the probability that the learner chooses the unknown alternative in period t is P t ¼ g ðz t Þ. We assume initially that g ð·Þ is the Logit choice rule: Pt ¼

 :  þ e sz t

In this formula, s denotes the sensitivity of the choice to the impression. When s is higher, a small variation in impressions implies larger changes in the choice probability. Conversely, if s equals zero, the choice does not depend on the impression. How does the distribution of the impression change over time? The answer is that it becomes increasingly concentrated on the negative side. The reason is the Hot Stove Effect: negative impressions lead to lower probabilities of choosing the unknown alternative (Denrell, , ). To explain this, consider first the case when the learner always chooses the unknown alternative. What would the distribution of the impression be in that case? It would not be biased, but it would have an average of zero (the mean of the payoff distribution) and  percent of all impressions would be above zero. The variance would not be σ, however, but the variance would depend on the learning rate: the variance would be bσ  =ð  bÞ: Let f Always ðz Þ denote the density of this normal distribution with mean zero and variance bσ  =ð  bÞ:

https://doi.org/10.1017/9781009002042.005 Published online by Cambridge University Press



 , e¨   

Let us return to the case when the probability that the unknown alternative is chosen increases in the impression of this alternative, as per the logistic choice rule described above. It can be shown that, in the long run, the distribution of the impression is a weighted version of f Always ð·Þ: h ðz Þ ¼ Z



ð þ e sz Þf Always ðz Þ

z¼∞

ð þ e sz Þf Always ðz Þdz

:

Because ð þ e sz Þ is larger for negative values of z than for positive values, negative values of z are weighted more than positive values. This generates a negativity bias. It can also be demonstrated that (i) more than  percent of all impressions are negative and (ii) the expected value of the impression is negative. Finally, there is a simple formula for the probability that the learner chooses the unknown alternative in the long run (as t ! ∞):  

P∞ ¼ þe

s bσ  ðbÞ



(4.1)

Inspecting Equation . reveals that this choice probability is a Logit choice rule. Moreover, it is a decreasing function of the variance of the unknown alternative, σ  . Hence, the learner will behave as if she were risk averse: she will be less likely to choose the unknown alternative in any given period t, if its payoff is more variable. Stated differently, the learner will learn to become (or behave as if she was) risk averse even though the probability she chooses the uncertain alternative given her impression of that alternative does not depend on payoff variability. These results can be generalized considerably. Suppose the payoff for the unknown alternative is a random variable with density f, expected value u, and positive variance. We continue to assume that the impression is a weighted average of the previous impression and the new payoff, if any, with learning rate b. Finally, the probability that the learner chooses the unknown alternative in period t is an increasing function of her impression at the beginning of the period: P t ¼ g ðz t Þ, where g ð·Þ is an increasing function. As before, we first consider the case where the learner always selects the uncertain alternative. We use the same notation to denote the density of the long-run impression f Always ðz Þ. When the learner does not always select the unknown alternative but instead is more likely to select it when her impression is higher, the long-run distribution of the impression is a weighted version of f Always ð·Þ:

https://doi.org/10.1017/9781009002042.005 Published online by Cambridge University Press



The Hot Stove Effect h ðz Þ ¼ Z

 g ðz Þ f Always ðz Þ

 f ðz Þdz g ðz Þ Always

·

The crucial result is that the expected value of this long-run impression is below the expected value of the payoff, u. Let z ∞ denote the long-run impression. We have: E ½z ∞  < u:

This holds for any increasing choice function g ð·Þ. Moreover, in the long run, the probability that the unknown alternative is chosen is: P∞ ¼ Z

 :  f ðz Þdz g ðz Þ Always

Suppose the learning rule is risk neutral in the following sense: R g ðz Þf Always ðz Þdz ¼ :. That is, suppose the learner is forced to choose the unknown alternative for many periods and then chooses the unknown alternative with probability g(z). Across realizations of this process, the learner would choose the unknown alternative  percent of the time. This condition holds, for example, for the above case with normally distributed payoffs and the Logit choice rule. It would not hold for a skewed payoff distribution or a choice rule biased against or for the unknown alternative. If this condition is satisfied, it can be demonstrated that P ∞ < :,

that is, the learner will learn to be risk averse. ..

Changes in Sample Size

A Hot Stove Effect can emerge even if negative experiences do not lead to avoidance but instead to a reduced sample size. A manager who has a negative experience with graduates from university B may reduce, but not eliminate, the number of graduates hired from B, because there may not be enough graduates from university A that accept the offer. The Hot Stove Effect holds even in this context, as recently demonstrated (Denrell, ). If a manager decreases the sample size if an experience was negative, the result is a negativity bias: the final belief will be lower than the expected value of the random variable the learner is learning about. (See also Le Mens et al. [] for an application to online ratings).

https://doi.org/10.1017/9781009002042.005 Published online by Cambridge University Press



 , e¨    ..

Even Bayesian Algorithms are Subject to the Hot Stove Effect

Does the Hot Stove Effect occur only when people fail to understand that their past experiences are biased? Or would it occur even if decision makers were rational in the sense that their updating follows Bayes’ rule, and they understood how their choices about when to sample impact the information available? The answer to this matters even if people are not Bayesian, because decision-making algorithms that are increasingly used in practice often try to approximate Bayesian payoff-maximizing behavior. Interestingly, the Hot Stove Effect remains even if the decision makers were Bayesian expected payoff maximizers (Denrell, , ). To illustrate this, consider a risk-neutral decision maker who can choose, in three periods, between an alternative with known payoff equal to zero and an uncertain alternative with a normal distributed payoff with mean u. The value of u is not known, but is known to be drawn from a normally distributed prior with mean zero. The decision maker can only observe the payoff generated by the uncertain alternative if the decision maker chooses this alternative. Even if the decision maker chooses the strategy that maximizes the total expected payoff during the three periods, the decision maker has correct priors, and updates his or her beliefs about u using Bayes’ theorem, the Hot Stove Effect emerges in the sense that most decision makers believe, after the third period, that the uncertain alternative is worse than the known alternative. An expected payoff-maximizing decision maker should of course recognize the value of exploring uncertain alternatives with negative payoffs. Still, exploration is not optimal if the payoff is sufficiently negative. Thus, even a Bayesian expected payoff-maximizing decision maker will avoid alternatives with sufficiently negative initial payoffs. Because beliefs about these alternatives are not updated, the negative impression will remain, which implies that more than  percent of beliefs will be negative after the third period. (Note that in this case, the mean of the Bayesian posteriors will be unbiased [the average will be equal to ], but the distribution of Bayesian posteriors will be skewed). The implication is that a tendency for systematic underestimation resulting from the Hot Stove Effect is not necessarily an irrational bias that should be eliminated. Rather, in some settings, it can be a consequence of payoff-maximizing behavior and the sequential nature of experiential learning (Denrell, ; Denrell & March, ). Avoiding the underestimation tendency would be maladaptive in such settings.

https://doi.org/10.1017/9781009002042.005 Published online by Cambridge University Press

The Hot Stove Effect ..



When the Hot Stove Effect Does Not Occur

The model that produces the Hot Stove Effect relies on several assumptions. The effect might not hold if these are violated: . . .

Impressions of alternatives impact the probability that people approach or avoid an alternative. If an alternative is avoided, no new information is provided about it. Impressions change with new experiences.

Sometimes people get exposed to an alternative regardless of whether they like it or not, violating assumption  which states that the impression should impact exposure. If the probability of sampling is independent of the impression, there is no Hot Stove Effect. Sometimes people get to hear about alternatives or people they avoid, violating assumption . For example, one can check the returns to stocks one did not invest in, or friends may tell you about concerts you did not attend. If information about foregone payoffs (payoffs to alternatives you did not choose) is available, there is no asymmetry in error correction and no Hot Stove Effect. Finally, sometimes the first impression is resistant to change, violating assumption . If the first impression does not change, there is no asymmetry in error correction (because no errors are corrected) and no Hot Stove Effect. More generally, whether a Hot Stove Effect occurs, and how large it is, depends on how beliefs and impressions are updated. Predictions about when the Hot Stove Effect does not hold can also have interesting implications when contrasted with settings in which a Hot Stove Effect holds. For example, suppose people avoid alternatives believed to have negative payoffs. Suppose some people hear about the payoffs of these avoided alternatives from friends who try them. The prediction is that those who hear about foregone payoffs from friends will have a more positive evaluation of these alternatives (because of the absence of the Hot Stove Effect) compared to those who do not have friends choosing these alternatives (and for which the Hot Stove Effect holds) (Denrell, ; Le Mens & Denrell, ).

. Empirical Evidence for a Hot Stove Effect in Human Behavior The Hot Stove Effect is a statistical phenomenon, akin to regression-tothe-mean, and not a hypothesis about human behavior. Still, because past impressions frequently but not always – impact the probability that people

https://doi.org/10.1017/9781009002042.005 Published online by Cambridge University Press



 , e¨   

choose an alternative again, the Hot Stove Effect can help to explain biases in impressions, beliefs, or attitudes. The Hot Stove Effect can lead to both a negativity bias (if people avoid alternatives with poor past outcomes) or a positivity bias (if people approach alternatives with poor past outcomes). In the following, we will mainly focus on how the Hot Stove Effect can explain a negativity bias. A negativity bias can of course occur for reasons other than the Hot Stove Effect (negative information being more salient, memorable, diagnostic, etc.). Sometimes such alternative explanations can be controlled for or excluded by design. Even when this is not possible, a Hot Stove Effect can be identified if the negativity bias is mediated by avoidance and lack of information. For example, suppose a negativity bias only occurs when people (a) avoid an alternative and (b) do not get further information about this alternative if they avoid it, but a negativity bias does not occur if people get to observe alternatives a fixed number of times. This pattern is consistent with the Hot Stove Effect but not with many other accounts. In the rest of this chapter, we only discuss empirical studies in which researchers (a) attribute their finding to the Hot Stove Effect, and (b) offer evidence that this is the mechanism that explains their findings. Several experimental studies have demonstrated how a negativity bias can result from avoidance. The most illustrative study is the “BeanFeast” by Fazio et al. (). Participants had to learn how to identify “beans” with positive energy levels and learn to avoid beans with negative energy values. They were presented with beans one by one, and their score was going down in every period unless they “ate” a bean with a positive energy level. Participants with a score below a threshold did not “survive” and hence could not continue with the game. Participants had to learn from their own experience to identify the characteristics of beans that were associated with positive and negative energy levels. By “eating” a bean, they could observe if its energy value was positive or negative. If they did not eat the bean, they did not observe the energy value of the bean. At the end of the game, participants were asked to predict whether different types of beans were of the positive or the negative type. The Hot Stove Effect implies that errors will be asymmetric: errors of underestimation of the bean energy level will be more likely than errors of overestimation. This is because beans believed to have positive energy levels will be “eaten” while those believed to be negative were not. Therefore, if a participant would mistakenly believe that a bean has positive energy, they will discover their error. However, if a participant mistaken believes a bean has negative energy, they will avoid this bean and will not discover their error. Participants’ estimates of beans’ energy level were consistent with

https://doi.org/10.1017/9781009002042.005 Published online by Cambridge University Press

The Hot Stove Effect



these predictions. Errors of mistakenly believing a bean has positive energy occurred  percent of the time, whereas errors of mistakenly believing that a bean has negative energy occurred  percent of the time. Additional experimental evidence collected by Fazio et al. reveals that this asymmetry emerged because of the Hot Stove Effect. They ran another study in which participants could observe the energy level of the beans they did not “eat.” This experiment was thus designed to eliminate the possibility of sampling asymmetry. As expected, errors were no longer asymmetric: errors of mistakenly believing a bean to be negative were as likely as errors of mistakenly believing a bean to be positive. A recent field study of classical concertgoers also provides a nice illustration of the Hot Stove Effect in consumer choices (Kim, ). The basic idea was that concertgoers with a negative initial impression will generalize this experience, avoid future concerts, and thus fail to discover that they might experience different concerts as more enjoyable/suitable for them. To examine this, Kim develops a model that predicts whether a concert is a good match for a given consumer. He shows that this measure predicts whether a consumer will avoid future concerts: avoidance is more likely after attending a concert with a poor match. He also shows that the experience of attending a concert with a poor match is extrapolated to many other concerts, implying that consumers with initially poor experiences tend to avoid the concert hall for the foreseeable future, resulting in several missed opportunities for the consumer to attend concerts that are superior matches and lower revenues for the concert hall. To avoid this, firms should ensure that initial experiences are positive or try to lure consumers back, as Ford tried to do with the advertising slogan “If you have not tried Ford lately, try again,” which was designed to attract consumers who avoided Ford due to quality problems that had largely been fixed.

. Impact on Risk Taking Risk aversion is usually attributed to individual preferences or cognitive biases. The Hot Stove Effect offers an alternative approach to explain the tendency to choose a safe alternative over a variable alternative with the same expected value. People may learn to believe that highly variable alternatives have a low expected value. The intuition is that highly variable alternatives can generate both very high and very low outcomes. The low outcomes lead to avoidance, which implies that the decision maker will not find out about the possibility of high outcomes (Denrell & March, ;

https://doi.org/10.1017/9781009002042.005 Published online by Cambridge University Press



 , e¨   

March, ). Indeed, as illustrated in Equation ., a simple reinforcement learning model indeed leads to risk averse behavior. More generally, Denrell () demonstrated formally that risk-averse behavior occurs for a broad class of learning procedures, although risk-seeking behavior can also be compatible with reinforcement learning if the payoff distribution is skewed. ..

Evidence on the Impact of the Hot Stove Effect on Risk Taking

The learning explanation of risk aversion has two distinct empirical implications. First, it implies that the propensity to choose a risky alternative should decline over time as people learn about a novel alternative. Second, the propensity to choose a risky alternative should be higher in settings where people can observe payoffs for alternatives they did not choose (the foregone payoffs). There exists substantial evidence for these predictions from experiments on individual decision making (Yechiam & Busemeyer, ; Zion et al., ). To give an example, suppose an individual is asked to repeatedly choose between two uncertain alternatives: one always generates a payoff of  while the other generates a payoff equal to  with probability . and a payoff equal to zero with probability .. Participants in the experiment do not know these payoff distributions. They can only learn about the payoffs from choosing the alternatives and observing their payoffs. Now compare two ways of learning about the alternatives. First, participants are asked to sample each of the two alternatives to learn about them. After this, they are asked to choose one of them. The only payoff that matters is the payoff they get in the final choice. The probability of choosing the risky alternative (which generates  or ) in the final choice was then  percent (Hertwig et al., ). Why so high? The reason is that some people do not even observe the zero payoff when they sample this alternative a few times. Thus, they overestimate it. Consider now a different procedure. Suppose participants choose, in each of  periods, between the two alternatives. Their payoff is the total payoff of all chosen alternatives. Finally, they are only told the payoff of the alternative they chose, not the payoff of the alternative they did not choose. In this case, the probability of choosing the risky alternative towards the end was  percent (Barron & Erev, ). The propensity to choose the risky alternative is now much reduced. Why? Consider a participant who has tried the risky alternative three times and only observed a payoff of . If this participant has also tried the safe alternative,

https://doi.org/10.1017/9781009002042.005 Published online by Cambridge University Press

The Hot Stove Effect



which generates , she is likely to continue choosing the risky alternative. By choosing it, however, she may get the zero payoff. At this point, she may believe the mean of the risky alternative is lower than the payoff from the safe alternative. If so, she may avoid the risky alternative. However, in doing so, she may not correct her negative impression of the risky alternative. Another illustration is provided by the Iowa Gambling task, introduced by Bechara et al. () to examine the impact of brain damage on the tendency to avoid and approach alternatives with poor outcomes, and which has become popular among both neuroscientists and psychologists. In the task, participants can choose between four alternatives (“decks”) and they only found out about the payoff of the alternative they chose. One alternative, “B,” usually generates a good payoff but can generate a rare large loss. People are typically initially attracted to this alternative, but when they observe a loss, they start to avoid it. Wright et al. () designed a “loss” version of this task, in which participants were told to maximize the losses and minimize the gains. In the loss version of the task, most participants now avoided alternative B, since it usually generated a gain, which was now viewed negatively. Because participants avoided alternative B, most never observed the large loss (which now was positively evaluated). Wright et al. note that this prediction is exactly in line with the Hot Stove Effect: “In the standard (Win) IGT, Deck B is like Twain’s ‘stove’ – most of the time it is a good option, but getting ‘badly burned’ once or twice teaches you not to go there (figure .). In the Lose IGT, Deck B contains ‘rare treasures’ (Teodorescu & Erev, ) – if you don’t persevere, you will never know what bounty awaits you (figure .). This neatly explains the patterns of preference we observe . . .” (p. ). Experiments with animals have also demonstrated that they can “learn to be risk averse.” Ilan et al. () allowed sparrows to repeatedly choose between two feeding options, one which generated a constant number of seeds every time the bird tried it and one which had a higher expected value but generated a variable number of seeds. The motivation for this experiment was to examine whether risk taking was impacted by learning and exploitation as predicted by reinforcement learning models, including the Hot Stove Effect prediction by Denrell and March (). The findings showed that sparrows increased the probability of choosing the constant feeding option over time. Further analysis showed that this occurred when a bird tried the variable option and received a low number of seeds, a result the authors note is consistent with the Hot Stove Effect but is not predicted by alternative accounts such as prospect theory or

https://doi.org/10.1017/9781009002042.005 Published online by Cambridge University Press



 , e¨   

maximization of the probability of survival. Ilan et al. () conclude that their findings are consistent with predictions of reinforcement learning models, although group dynamics (birds observing others) can also play a role. These experiments examine individual behavior, but Denrell and March () postulated that a similar process would impact the risk-taking behavior of organizations. To the extent that organizations avoid alternatives associated with poor past outcomes, such avoidance behavior would lead to a reduction in risk taking over time. Inspired by Denrell and March (), Dittmar and Duchin () examined how the experiences of managers impact risk-taking behavior by using data on experience and risk-taking behavior by , CEOs and CFOs. Using data on the past careers of these managers, the researchers examined whether the managers had experience of being employed in firms that filed for bankruptcy or firms that experienced other adverse financial shocks. Managers with such adverse experiences were significantly more risk averse, in the sense that they held less debt and invested less. These effects remained when experiences of economy-wide financial distress (such as the Great Depression) were controlled for and when a sample of CEOs with exogenous turnover events (due, for example, to death) was used to control for potential endogeneity. The authors noted that, consistent with behavioral models of the Hot Stove Effect, the most recent experiences had the largest effect and positive experiences had no effect on risk-taking tendencies. .. Myopic Loss Aversion The concept of “myopic loss aversion” (cited in the scientific background for Richard Thaler’s Nobel Prize in economics) was introduced to explain the equity premium puzzle, which is the surprisingly large risk premium observed in the stock market, suggesting excessive risk aversion by investors (Benartzi & Thaler, ). The explanation by Benartzi and Thaler focused on how loss aversion and frequent observations of stock returns could lead to excessive risk aversion: traders who frequently check stocks will frequently be exposed to losses as well as gains, but the observed losses will have larger impact on their willingness to trade than the observed gains. Thaler et al. tested this idea experimentally by manipulating how often traders received feedback in an investing task, where participants had to learn about the risks and returns of two alternative investment opportunities and decide how much to invest in each. The experiment supported

https://doi.org/10.1017/9781009002042.005 Published online by Cambridge University Press

The Hot Stove Effect



the prediction that more frequent observations led to less risk taking, thus offering evidence supporting myopic loss aversion. Yechiam and Yakobi (), however, show that the Hot Stove Effect largely explains why risk taking decreased with more frequent feedback in this experiment. In the experimental task, no information was provided about alternatives not chosen. As a result, risk aversion can emerge through the Hot Stove Effect. Less frequent feedback will attenuate the Hot Stove Effect because losses, which lead to avoidance, are less likely. Consistent with the Hot Stove Effect account, avoidance of the risky task emerged when it generated losses and participants avoided it. When participants were provided with information about the return to investment opportunities not chosen, the tendency for risk taking to reduce over time was much attenuated.

. Implications and Applications ..

Trust

A trust-game experiment by Fetchenhauer and Dunning () provides an illustration of how the Hot Stove Effect can offer an alternative, learning-based explanation of an important judgment bias. Their starting point is that people tend to underestimate the proportion of people who can be trusted in the so-called trust game. Following the predictions of Denrell (), Fetchenhauer and Dunning () argue that an important reason for mistrust is the asymmetry in error correction in learning about whether another individual is trustworthy. To examine this experimentally, they require participants to play the trust game repeatedly with and without access to foregone payoffs, that is, with or without information about what would have happened if they did not decide to trust someone. In the trust game of the study, a player is given . dollars. The person can keep the money or give it to another individual. If the money is given to the second individual, it is increased to . At this stage, the second individual can return half of it () or simply keep it all. The game abstractly represents a favorable trade that requires trust. To decide whether to hand over the money to the second individual, participants have to estimate how trustworthy the person is: how likely is it that the second person will return any money? Most people underestimate how trustworthy others are. People believe that about  percent will return any money, but in fact  percent of all people return money. What explains

https://doi.org/10.1017/9781009002042.005 Published online by Cambridge University Press



 , e¨   

this “cynicism bias”? Various motivational explanations have been offered in the social psychological literature, but Fetchenhauer and Dunning () note that the asymmetric nature of feedback about trustworthiness offers an alternative explanation, as suggested by Denrell (). If an individual mistakenly decides to trust an untrustworthy person, the individual will likely find out about his or her mistake. If an individual decides to avoid trusting a person, mistakenly believing that the person is untrustworthy, the individual will seldom be able to find out about his or her mistake. Because of this asymmetry in error correction, individuals who have to learn about the trustworthiness of others from experience will be more likely to end up underestimating the trustworthiness of others than overestimating the trustworthiness of others. To explore if this asymmetry in error correction can explain the “cynicism bias,” Fetchenhauer and Dunning () had participants play several trust games with different opponents. In one condition, participants only found out about whether the second individual would return any money if they decided to trust the second individual. In the second condition, Fetchenhauer and Dunning provided participants with information about whether the second individual would have returned any money even in those cases when a participant decides not to trust the second individual. The results show that most people came in believing that only about  percent of individuals could be trusted, whereas in fact about  percent can be trusted (similar to previous studies). By repeatedly playing the game, participants had a chance to learn that the second individual could be trusted. However, the results show that such learning only occurs in the second condition when information about foregone payoffs is provided. There was hardly any learning in the first condition, when no information about trustworthiness is provided unless an individual decides to trust the second person. Overall, the results illustrate how the belief that others cannot be trusted can end up being a “self-perpetuating prophecy” (Gilovich, ). ..

Negotiation

Imagine you are trying to sell your used car to a potential buyer. You are willing to sell it for any price at or above , euros, this is your reservation price. The buyer is willing to pay at most , euros, this is his reservation price. The interval between , euros (the seller’s reservation price) and , euros (the buyer’s reservation price) is often called the negotiation “pie.” Both the seller and the buyer typically want to

https://doi.org/10.1017/9781009002042.005 Published online by Cambridge University Press

The Hot Stove Effect



get as large shares of this pie as possible. Negotiators will evaluate their success in a negotiation in terms of how much of the pie they have obtained. An interesting implication of the Hot Stove Effect is that negotiators who learn from experience to estimate the size of negotiation pies will typically underestimate them. Sellers will underestimate buyers’ reservation prices, and buyers will overestimate the sellers’ reservation prices. To understand why, let us return to our car sales example. As the seller, you are aware of your reservation price (, euros), but you might be mistaken about the buyer’s reservation price. Suppose you overestimate the buyer’s reservation price and believe it is , euros. You make your first offer at , euros. Yet, the response is not the one you expected. The buyer not only declines to purchase the car at your asking price, but walks away from the negotiation, complaining that you asked for an “outrageously high” price. You conclude that you overestimated the buyer’s reservation price. Then comes a second buyer. Based on the first experience, you adjust downward your estimate of the buyer’s reservation price to , euros. Given this, you make an offer just above , euros. Suppose you offer , euros. The buyer declines but stays at the negotiation table, and you settle for a price of , euros. If you believe the buyer’s reservation price is , euros, you should be satisfied as you got a very good deal and got almost the whole negotiation pie. The crucial aspect of this story is that you could correct your overestimation of the first buyer’s reservation price, but you could not know that you actually underestimated the second buyer’s reservation price. This asymmetry in error correction implies that errors of underestimation will persist, and most sellers will learn to underestimate the buyer’s reservation values. Moreover, because a seller’s overestimation of a buyer’s reservation price can lead to negotiations breaking down, and to an unpleasant feeling of having wasted one’s time and having been rejected, sellers will often lower their asking price, in subsequent negotiations, after a failed negotiation. If they decrease the asking price substantially, sellers will not discover their mistake. In other words, experience will teach sellers to be overly conservative in setting the asking price. A similar dynamic applies to learning by buyers about their opening offer: experience will teach them to make high first bids because low bids will be rejected, but high bids will be accepted, and the buyer may not realize that the seller might have sold for a lower price. Overall, both the seller and the buyer are likely to believe they got a larger slice of the pie than what was actually the case.

https://doi.org/10.1017/9781009002042.005 Published online by Cambridge University Press



 , e¨   

This is exactly what Larrick and Wu () found in experimental investigations of negotiator’s beliefs of the other party’s reservation price: most sellers underestimated buyers’ reservation prices, most buyers overestimated sellers’ reservation prices, and both parties felt they had obtained a larger slice of the “negotiation pie” than what was actually the case. Therefore, they were more satisfied than what they should have been based on an accurate perception of the size of the pie. These errors were particularly strong when the initial estimates of the other party’s reservation price were more conservative. At the same time, they were partly corrected when the initial estimates were more extreme and could be disconfirmed during the negotiation.

. Behavioral Nuances That Impact the Hot Stove Effect .. Recency, Primacy, and Pattern Identification Many models of the Hot Stove Effect assume that the impression of an alternative is some weighted average of the experienced payoffs, with the most recent payoff given most weight. Such a model is not always realistic. The first payoff experienced can have a large impact on impressions (a primacy effect), see Shteingart et al. (). If the initial outcome is a loss, such a primacy effect reinforces the tendency to avoid a risky alternative. If the initial outcome is positive, however, negative subsequent outcomes may not change the impression much, attenuating the tendency for risk taking to decline over time as people with positive initial impressions may be less likely to change their impressions from positive to negative. If the primacy effect is strong in person impression, this also attenuates or eliminates the impact of “forced sampling” via social connections. If you have already made up your mind about some individual, you may not change your impression from negative to positive if you later meet this person because she is a friend of your friend. Recency might not hold because people search for patterns in sequences of outcomes (Plonsky & Erev, ; Plonsky et al., ). Suppose you try an alternative three times and observe two positive outcomes followed by a loss. Will you avoid this alternative in the next period? Perhaps not, if you believe there is a pattern to the outcomes: it generates sequences with two positive outcomes followed by a loss. If you are convinced this is the pattern, you want to choose this outcome again. Only later, after two positive outcomes, may you avoid this alternative, because now you believe a loss is imminent. Plonsky et al. () showed that such pattern

https://doi.org/10.1017/9781009002042.005 Published online by Cambridge University Press

The Hot Stove Effect



identification is common and implies weighting of experiences that depart systematically from recency. The implication for the Hot Stove Effect is that people may not avoid an alternative immediately after a loss, but the reaction may be delayed. Nevertheless, Plonsky and Erev () showed that in problems where risky alternatives may generate a large loss, and information about foregone payoffs is not observed, there is still a sizeable Hot Stove Effect. Specifically, losses lead to avoidance and the avoidance implies that people do not update about the risky alternative, generating long sequences of avoidance. ..

Impact of Descriptions

People do not only learn from their own and others’ experiences, but also from description, that is, verbal summaries or graphs of the possible payoffs of alternatives. Weiss-Cohen et al. () showed that such descriptions can induce a Hot Stove Effect which leads to avoidance of risky alternatives, even when the descriptions accurately outline the payoff distributions of alternatives. Specifically, when participants were shown the payoff distributions of risky alternatives that could generate rare losses, they avoided such alternatives. By avoiding them, they did not get to experience how infrequent these losses were. If they did get more experience with such an alternative, people often become less risk averse (even if they have been told the correct payoff distribution). The “overreaction” induced by the description, however, caused them to avoid the risky alternative, so they never got used to it. Consistent with the Hot Stove Effect, this tendency for descriptions to reduce the propensity to choose the risky alternative did not occur when people were informed about foregone payoffs.

. Future Prospects: Generalization Models of the Hot Stove Effect usually assume that if people choose an alternative different from B, and get no more information about B, they do not update their beliefs about B. This assumption may not be correct: people generalize from experiences with one alternative to another. Suppose you have a bad experience with a quantitative cognitive science student from university B. Does this lead to avoidance of any students from this university, only of cognitive science students, only of quantitative cognitive science students, or only of this student? Understanding how generalization affects avoidance behavior and analyzing the impact of such

https://doi.org/10.1017/9781009002042.005 Published online by Cambridge University Press



 , e¨   

“generalized avoidance” on beliefs, attitudes, and choices is the next stage in research on the Hot Stove Effect. Using computer simulations and an experiment, Woiczyk et al. () have shown that when choice alternatives belong to a set of categories and people generalize more within categories than across categories, avoidance of risky alternatives (and conversely, a preference for less risky alternatives) will generally be stronger when there are fewer categories. This is because a negative experience with one member of a category leads to an inference that other category members have low values. When there are few categories, such generalization will affect many choice alternatives. But when there are many categories, categorical boundaries limit generalization. Hence, the overall avoidance of risky alternatives is less strong in this later case. Computer simulations also show that when categories are consistent with differences in alternative qualities (e.g., good alternatives belong to a category, and bad alternatives to another one), within category generalization leads to a higher overall payoff. But if categories are heterogeneous in the sense that each category contains good and bad alternatives, generalization decreases performance. Whereas Woiczyk et al. () focus on the effect of categories on generalization, a different perspective focuses on how people generalize based on visible features of the alternatives. How people integrate past experience to make inferences about new objects can change how the Hot Stove Effect operates and its implications (Elwin et al., ). Consider the case of hiring, and suppose that a manager interviews potential employees, observes their interview performance, and decides whether to hire them or not. The manager can only observe the performance of those who were hired. Based on this outcome, the manager tries to learn how to predict job performance from interview performance. A classical version of the Hot Stove Effect may operate in this setting: the manager may have negative experiences with most hires and mistakenly avoid hiring. In this case, the mistake is not corrected. Absent such avoidance, however, learning in this setting can lead to a positivity rather than a negativity bias (Elwin et al., ). This can occur, for example, if the manager relies on an exemplar model to learn about attributes associated with adequate job performance. The reason is that if the manager tends to choose reasonably good people, those who are hired tend to have high performance. Because there are many exemplars with positive performance, and few with negative, generalization from the available data will lead to overestimation, unless the manager corrects for this bias. Experimentally, however, such a positivity bias is not observed;

https://doi.org/10.1017/9781009002042.005 Published online by Cambridge University Press

The Hot Stove Effect



instead, managers learn to be conservative. This can be explained by a tendency to impute negative outcomes when there is missing feedback (Elwin et al., ). Conservatism can also be explained as a result of a Hot Stove Effect when people are uncertain about the functional form between interview and job performance and only generalize in limited regions of the attribute space (Denrell et al., ). Finally, in many domains, people are not presented with a discrete set of alternatives. Rather, people can make choices along a continuum. For example, people decide on the amount of time they spend on tasks, how much money to invest, and so on. Understanding how people generalize their experiences when the action space is continuous is necessary to understand how the Hot Stove Effect will unfold in many real-life settings. Shepard’s “universal law of generalization” suggests a crucial element resides in the slope of the generalization gradient (Shepard, ). It is likely related to the scope of the negativity bias in the sense that when people generalize discrete experiences to broad areas of the action space (low generalization gradient), they will tend to underestimate the value of actions in large domains of the action space. Conversely, when people generalize to narrow areas of space, the negativity bias implied by the Hot Stove Effect might be less prevalent. A number of questions remain unexplored, such as whether people generalize positive and negative experiences in the same way, how to conceptualize “foregone payoffs” in continuous actions spaces, or what contextual factors, such as domainspecific experience affect how people generalize positive and negative experiences. REF ERE NCE S Barron, G., & Erev, I. (). Small feedback-based decisions and their limited correspondence to description-based decisions. Journal of Behavioral Decision Making, (), –. doi: ./bdm. Bechara, A., Damasio, A. R., Damasio, H., & Anderson, S. W. (). Insensitivity to future consequences following damage to human prefrontal cortex. Cognition, (–), –. doi: ./-()- Benartzi, S., & Thaler, R. H. (). Myopic loss aversion and the equity premium puzzle. Quarterly Journal of Economics, (), –. doi: ./ Denrell, J. (). Why most people disapprove of me: Experience sampling in impression formation. Psychological Review, (), –. (). Adaptive learning and risk taking. Psychological Review, (), –. doi: ./-X...

https://doi.org/10.1017/9781009002042.005 Published online by Cambridge University Press



 , e¨   

(). Adaptive sampling policies imply biased beliefs: A generalization of the hot stove effect. In S. Denison., M. Mack, Y. Xu, & B. C. Armstrong (Eds.), Proceedings of the nd annual conference of the cognitive science society (pp. –). Denrell, J., & March, J. G. (). Adaptation as information restriction: The hot stove effect. Organization Science, (), –. Denrell, J., Sanborn, A., & Spicer, J. (). Learning from feedback: Exemplar versus rule-based algorithms. Working Paper. Dittmar, A., & Duchin, R. (). Looking in the rearview mirror: The effect of managers’ professional experience on corporate financial policy. Review of Financial Studies, (), –. doi: ./rfs/hhv Elwin, E., Juslin, P., Olsson, H., & Enkvist, T. (). Constructivist coding: Learning from selective feedback. Psychological Science, (), –. doi: ./j.-...x Fazio, R. H., Eiser, J. R., & Shook, N. J. (). Attitude formation through exploration: Valence asymmetries. Journal of Personality and Social Psychology, (), –. Fetchenhauer, D., & Dunning, D. (). Why so cynical? Asymmetric feedback underlies misguided skepticism regarding the trustworthiness of others. Psychological Science, (), –. doi: ./ Gilovich, T. (). How we know what isn’t so: The fallibility of human reason in everyday life. New York: Simon & Schuster. Hertwig, R., Barron, G., Weber, E. U., & Erev, I. (). Decisions from experience and the effect of rare events in risky choice. Psychological Science, (), –. doi: ./j.-...x Ilan, T., Katsnelson, E., Motro, U., Feldman, M. W., & Lotem, A. (). The role of beginner’s luck in learning to prefer risky patches by socially foraging house sparrows. Behavioral Ecology, (), –. doi: ./beheco/ art Kim, Y. (). Customer retention under imperfect information. Working Paper. doi: ./ssrn.. Larcom, S., Rauch, F., & Willems, T. (). The benefits of forced experimentation: Striking evidence from the London underground network. Quarterly Journal of Economics, (), –. doi: ./qje/qjx Larrick, R. P., & Wu, G. (). Claiming a large slice of a small pie: Asymmetric disconfirmation in negotiation. Journal of Personality and Social Psychology, (), –. doi: ./-... Le Mens, G., & Denrell, J. (). Rational learning and information sampling: On the “naivety” assumption in sampling explanations of judgment biases. Psychological Review, (), –. doi: ./a Le Mens, G., Kovács, B., Avrahami, J., & Kareev, Y. (). How endogenous crowd formation undermines the wisdom of the crowd in online ratings. Psychological Science, (), –. doi: ./ March, J. G. (). Learning to be risk averse. Psychological Review, (), –.

https://doi.org/10.1017/9781009002042.005 Published online by Cambridge University Press

The Hot Stove Effect



Plonsky, O., & Erev, I. (). Learning in settings with partial feedback and the wavy recency effect of rare events. Cognitive Psychology, , –. doi: ./j.cogpsych... Plonsky, O., Teodorescu, K., & Erev, I. (). Reliance on small samples, the wavy recency effect, and similarity-based learning. Psychological Review, (), –. doi: ./a Shepard, R. N. (). Toward a universal law of generalization for psychological science. Science, (), –. Shteingart, H., Neiman, T., & Loewenstein, Y. (). The role of first impression in operant learning. Journal of Experimental Psychology: General, (), –. doi: ./a Teodorescu, K., & Erev, I. (). On the decision to explore new alternatives: The coexistence of under- and over-exploration. Journal of Behavioral Decision Making, (), –. Retrieved from https://onlinelibrary .wiley.com/doi/./bdm. doi: ./bdm. Thaler, R. H., Tversky, A., Kahneman, D., & Schwartz, A. (). The effect of myopia and loss aversion on risk taking: An experimental test. Quarterly Journal of Economics, (), –. doi: ./ Twain, M. (). Following the equator: A journey around the world. Hartford, CT: American Publishing Co. Weiss-Cohen, L., Konstantinidis, E., & Harvey, N. (). Timing of descriptions shapes experience-based risky choice. Journal of Behavioral Decision Making, (), –. doi: ./bdm. Woiczyk, T. K. A., Lauenstein, F., & Le Mens, G. (). The hot kitchen effect: Categories, generalization, and exploration. Working Paper. Wright, R. J., Rakow, T., & Russo, R. (). Go for broke: The role of somatic states when asked to lose in the Iowa Gambling Task. Biological Psychology, , –. doi: ./j.biopsycho... Yechiam, E., & Busemeyer, J. R. (). The effect of foregone payoffs on underweighting small probability events. Journal of Behavioral Decision Making , (), –. doi: ./bdm. Yechiam, E., & Yakobi, O. (). Loss attention and the Equity Premium Puzzle: An examination of the myopic loss aversion hypothesis. Working Paper. Zion, U. B., Erev, I., Haruvy, E., & Shavit, T. (). Adaptive behavior leads to under-diversification. Journal of Economic Psychology, (), –. doi: ./j.joep...

https://doi.org/10.1017/9781009002042.005 Published online by Cambridge University Press

https://doi.org/10.1017/9781009002042.005 Published online by Cambridge University Press

 

Sampling Mechanisms

https://doi.org/10.1017/9781009002042.006 Published online by Cambridge University Press

https://doi.org/10.1017/9781009002042.006 Published online by Cambridge University Press

 

The J/DM Separation Paradox and the Reliance on the Small Samples Hypothesis Ido Erev and Ori Plonsky

The rational model of individual decision making under uncertainty (Savage, ) implies that people behave “as if” they form beliefs concerning the payoff distributions associated with the feasible actions, and select the action that maximizes expected utility given these beliefs. To clarify human decision processes and their relationship to the rational model, behavioral decision research tends to separate the study of beliefs from the study of the decision processes. The leading studies of beliefs focus on human judgment; they examine how people estimate the probabilities of different events based on their past experiences (e.g., Phillips & Edwards, ; see example in the top panel of Figure .). The leading studies of decision making (Allais, ; Kahneman & Tversky, ) focus on “decisions from description”: the way people decide when they are presented with a description of the payoff distributions (see example in middle panel of Figure .). The separation between studies of judgment and of decision making (J/DM) has led to many useful insights, but the results presented by Barron and Erev (; lower panel of Figure .) suggest that it can also lead to an incorrect conclusion concerning the impact of rare (low probability) events. Studies of judgment highlight robust overestimation of the probability of rare events (Erev et al., ; Phillips & Edwards, ), and studies of decisions from description documents overweighting of low probability outcomes (Kahneman & Tversky, ), thus, it is natural to conclude that oversensitivity to rare events is a general tendency (Fox & Tversky, ). In sharp contradiction to this natural conclusion, Barron and Erev show that in tasks in which judgement and decision making are not separated, and people decide based on past experience from which they can form judgments, their behavior reflects underweighting of rare events. That is, separately both judgment and decision making reflect This research was supported by a grant from the Israel Science Foundation (grant /).



https://doi.org/10.1017/9781009002042.007 Published online by Cambridge University Press



 ,  

Typical experimental tasks Judgment: Two urns (Rapoport et al., 1990; following Phillips & Edwards, 1966) Instructions: Urn A includes 30 Black balls and 70 White balls. Urn B includes 70 Black balls and 30 White balls. One of the two urns was randomly selected. The experimenter will sample (with Replacement) balls from that urn. After observing each ball, you will be asked to estimate the probability that the selected urn is A. y Screens in a typical trial

Decisions from description (Erev et al., 2017; following Kahneman & Tversky, 1979) Instructions: Choose between the following two options:

JDM Without separation. Decisions from experience with partial feedback (Barron & Erev, 2003). Instructions: The current experiment includes many trials. Your task, in each trial, is to click on one of the two keys presented on the screen. Each click will be followed by the presentation of the keys’ payoffs. Your payoff for the trial is the payoff of the selected key. y ) Screens in two typical trials ((T1 and T2):

Figure .

Typical results After experiencing a sequence with three White balls and one Black ball, when the objective probability of A (computed with Bayes rule) is 0.06, the mean judgment is 0.24. That is, the probability of the rare event (Urn A) is overestimated

The maximization rate (Right-rate) was only 41%. This and similar results are captured, in prospect theory, with the assertion that the low-probability outcome (the payoff 0) is over-weighted. When one option provided “3 with certainty,” and the alternative provides 4 in 80% of the trials, 0 otherwise” the choice rate of the risky, EV maximizing rate was 65%. In contrast, when the risky option provided “32 in 10% of the trials, 0 otherwise,” the maximization rate was only 30%. This, and similar results suggest insufficient sensitivity to the rare events.

Examples of studies of judgment and decisions making with and without J/DM separation.

oversensitivity to rare events, but without the separation these processes often lead to the opposite bias. We call this puzzle the J/DM separation paradox. One of the implications of this paradox is the description– experience gap of risky choice (Hertwig & Erev, ): higher sensitivity to rare events in decisions from description than in decisions from experience. However, the observation that the separated judgment (from experience) also reflects oversensitivity to the rare events suggests that there is something more general than the description–experience gap. Most previous efforts to explain the differences between the distinct phenomena summarized in Figure . assume that the different tasks trigger different cognitive processes and biases. The current chapter reviews three classes of observations that imply a different explanation. First, we clarify the existence of a robust behavioral tendency that can capture the

https://doi.org/10.1017/9781009002042.007 Published online by Cambridge University Press

J/DM Separation Paradox and Reliance on Small Samples



differences between Figure .’s tasks without assuming different cognitive processes. It is enough to assume a mere presentation effect (Erev, Glozman, et al., ): higher sensitivity to rare events when these events are explicitly presented (the upper two panels in Figure .). Importantly, we show that this effect can also capture variations within each of Figure .’s paradigms. Second, we highlight that in absence of an explicit presentation of the rare events (the lower panel in Figure .), people behave as if they underweight these events. Comparison of alternative explanations to such behavior highlights the descriptive value of the hypothesis that people tend to rely on small samples of past experiences (Hertwig et al., ). We review several potential drivers of the tendency. Specifically, reliance on small samples can be the product of cognitive limitations, but can also reflect sophisticated processes. Third, we clarify the descriptive value of the mere presentation effect and the tendency to rely on small samples by considering the role of feedback. The results suggest that feedback reduces the impact of the mere presentation effect, but have limited effect on the descriptive value of the reliance on samples hypothesis. Our analysis concludes with a discussion of the implications of the results to descriptive models of judgment and decision making. The current explanation to the J/DM separation paradox has at least two advantages over the popular explanations that assume different cognitive processes govern behavior in each of the tasks from Figure .. First, our explanation can clarify the impact of rare events. It not only captures the known indications of the description–experience gap, but also captures the fact that certain manipulations trigger overweighting of rare events in decisions from experience and underweighting of rare events in decisions from description. A second advantage of the current explanation is the fact that it is more parsimonious, as it does not assume task-specific processes. This advantage facilitates the derivation of clear and useful predictions, as demonstrated by recent choice-prediction competitions (e.g., Erev et al., ).

. The Mere Presentation Effect Evaluation of the studies illustrated in Figure . suggests oversensitivity to the rare events in the separated tasks (top panels), but underweighting of rare events when the two tasks are not separated (lower panel). Erev et al. (a) presented the mere presentation effect: oversensitivity to the rare events is observed when they are explicitly presented (as in the separated

https://doi.org/10.1017/9781009002042.007 Published online by Cambridge University Press



 ,  

Instructions and main screens

Main results

Condition Blank Instructions: The current experiment includes many trials. Your task, in each trial, is to select one of two options. To inform your choice you can draw samples from each of the options by clicking on them.

When one option provided “3 with certainty,” and the alternative provided 4 in 80% of the trials, 0 otherwise” the choice rate of the risky, EV maximizing option, was 72% in Condition Blank, and only 40% in Condition Mere presentation. Thus, the mere presentation triggers deviations from maximization that suggest oversensitivity to the rare outcome.

Screens in a typical trial.

Condition Mere presentation Instructions: The current experiment includes many trials. Your task, in each trial, is to select one of two options where one option provides 3 with certainty, and the second option yields 4 or 0. To inform your choice you can draw samples from each of the options by clicking on them.

Figure . Example of studies of decisions from sampling without and with explicit presentation of the rare outcomes (Erev, Glozman, et al., a).

tasks), and the opposite bias emerges in the absence of explicit presentation of the rare events. To test the mere presentation hypothesis, Erev et al. compared the two conditions presented in Figure .. Both conditions involved decisions based on free sampling of the payoff distribution. The participants did not receive a description of the incentive structure, but they could sample each option by clicking on the option’s key on the computer screen as many times as they wished. Clicking involved no cost. While neither condition involved J/DM separation, the two conditions differed with respect to the information presented on the options’ keys. In Condition Blank, the keys were blank whereas in Condition Mere Presentation the keys presented the possible outcomes (e.g. the information above the risky key was “ or .”) The results show that the mere presentation of the rare outcome increased deviations from maximization that reflect overweighting of the rare outcome; it reduced the maximization rate from  percent to  percent. Similar results were observed by Abdellaoui et al. (). Table . compares these results to the results of studies that examined the same problem using different experimental paradigms. The comparison suggests that the choice rate with mere presentation is similar to the choice rate in

https://doi.org/10.1017/9781009002042.007 Published online by Cambridge University Press

J/DM Separation Paradox and Reliance on Small Samples



Table .. Maximization rate in studies that examine a choice between “ with certainty” and “ with p ¼ .,  otherwise” under different conditions.

Sampling (Erev, Glozman, et al., a) Sampling with mere presentation (Erev et al., a) Description (Erev et al., ) Partial feedback (Barron & Erev, ) Full feedback (unpublished data) Description & feedback (Erev et al., )

Maximization rate

Rate of deviations from maximization suggesting oversensitivity to the rare outcome

.

.

.

.

. .a .b .a .c

. . . . .

Mean over the first  trials. Mean over trials  to . c Mean over trials  to . a

b

The list of stimuli

Main results When asked to memorize the list, only 12% interpreted the central stimuli as the number 13. With mere presentation (of the event “number in the list”), when asked if the central stimulus is 13, 44% answered “Yes.”

Figure . The list of stimuli used by Erev, Shimonowitch, et al., (b).

decisions from description without feedback, and the choice rate without mere presentation is similar to the choice rate in situations in which the decision makers receive feedback concerning the outcome of past choices. In another demonstration of the significance of the mere presentation effect, Erev, Shimonowitch, et al. (b) examined reaction to a -second presentation of the series of stimuli presented in Figure .. Their analysis compared two groups. Participants in Group Memory were only asked to memorize the list. Participants in Group Memory & Decision were asked to memorize the list, and to decide whether the stimulus in the center of the series represented the letter “B” or the number “.” Notice that the rare event in this case is the possibility of a number in a sequence of letters. Without explicit presentation of this possibility

https://doi.org/10.1017/9781009002042.007 Published online by Cambridge University Press



 ,  

(Condition Memory) only  percent interpreted the central stimuli as the number , the mere presentation of this possibility (Condition Memory & Decision) increased this rate to  percent. A clear demonstration of the mere presentation effect in the context of probability judgment is presented by Fischhoff et al. (). In one of the conditions they examined, the participants were asked to judge the probability that the reason for the observation that a “a car won’t start,” is “fuel system defective.” The mere presentation of a list of possible fuel system problems increased the mean estimate from . to .. It is important to emphasize that we do not assert that the mere presentation of rare events is the only contributor to overestimation of low probability events. The stochastic nature of human judgment that implies regressive mean estimates (Erev et al., ; Juslin & Olsson, ) is another important contributor. Yet, Fischhoff et al.’s findings cannot be explained with the noisy judgment explanation. We acknowledge the stochastic nature of judgment, and of decision making, in Section ..

..

The Sampling Explanation of the Mere Presentation Effect

The mere presentation effect can be explained by assuming that () judgment and decision making are based on samples of subjectively similar past experiences (See similar ideas in Gonzalez et al., ; Skinner, ), and () the explicit presentation of a rare outcome increases the probability of sampling past experiences in which this outcome occurred. That is, the mere presentation of a rare outcome increases the subjective similarity of situations in which this outcome occurred to the current choice task. When these situations are not truly similar to the current task but only appear similar, this effect implies overgeneralization (Marchiori et al., ). For example, consider decision makers during the process of renting a car, when they can choose to add unnecessary car insurance (in addition to the mandatory insurance that provides optimal coverage). The decision makers are assumed to recall similar past car renting experiences, and select the option that provided the best outcomes in these experiences. The mere presentation of the fact that the additional insurance will “cover a damage up to  million Euro,” might lead them to recall past experiences in which people suffered large losses that were not covered by insurance, and can increase the probability of buying the unnecessary insurance.

https://doi.org/10.1017/9781009002042.007 Published online by Cambridge University Press

J/DM Separation Paradox and Reliance on Small Samples



. Reliance on Small Samples While the mere presentation effect explains why rare events receive more weight when they are explicitly presented, it does not explain the tendency to underweight rare events in the absence of explicit presentation. Hertwig et al. () note that the tendency to underweight rare events in decisions from experience, in the absence of explicit presentation, can be captured by assuming reliance on small samples of past experiences. To see why reliance on small random samples implies underweighting of rare events, note that the probability that a small sample will not include events that occur with probability p < :, tends to be larger than .. Specifically, most samples of size k will not include a rare event (that occurs with probability p) when the following inequality holds: P ðno rareÞ ¼ ð  pÞk > :. This inequality implies that k < Logð:Þ=Logð  pÞ. For example, when p ¼ :, k < :. That is, when k is  or smaller, most samples do not include the rare event (Teodorescu et al., ). Therefore, if people draw small samples from the true payoff distributions and choose the option with the higher sample mean, in most cases most of them will choose as if the rare event is unlikely to occur. The hypothesis that people rely on small samples underlies the most successful models in a series of choice-prediction competitions (Erev et al., a; b; ; Plonsky et al., ) and can explain many judgment and decision making phenomena (e.g., Erev & Roth, ; Fiedler et al., ; Fiedler & Juslin, ; Kareev, ; Plonsky et al., ). Importantly, reliance on small samples does not imply underestimation of rare events in typical estimation tasks. Assuming people estimate the probability of the target rare event using its proportion in the collected sample, according to the law of large numbers, the average estimate (albeit not the median estimate) will equal the true probability. While reliance on small samples does not imply deviation from the true value when estimating probabilities, it was also shown to clarify nontrivial judgment phenomena. For example, Juslin et al. () demonstrate that this hypothesis can capture the observation of larger overconfidence in setting than in assessing confidence intervals. A particularly clear indication of the tendency to rely on small samples and thus underweight rare events is provided by studies that use the basic sampling paradigm described in Figure . (Condition Blank). The use of this paradigm allows the observation of the explicit samples taken by the subjects, and results suggest that the median sample size (in studies that

https://doi.org/10.1017/9781009002042.007 Published online by Cambridge University Press



 ,  

focus on a choice between a safe and a risky option) is only  (Wulff et al., ). When the decision makers must decide without explicitly taking samples, the reliance on small samples hypothesis implies the recall of a small sample of past experiences drawn from memory. While this hypothesis can be captured with the natural assumption that the reliance on small samples reflects the impact of cognitive and memory limitations (see Kareev, ), experimental research suggests that reliance on small samples is also likely to reflect four additional processes. ..

Opportunity Cost

Rational decision makers are expected to rely on small samples, even when sampling is free, when they believe that the cost of sampling in terms of lost time is higher than the expected benefit from the added information that sampling allows. The impact of this factor is particularly clear when the decision maker can explicitly sample information (e.g., by clicking on the computer screen) before drawing samples from memory (as in the sampling paradigm illustrated in Figure .). In accordance with this prediction, Hau et al. () found that increasing payoffs by an order of magnitude can triple the size of the explicitly drawn sample. ..

The Amplification Effect

Hertwig and Pleskac () highlight another advantage of small samples that can affect data collection by sophisticated, but lazy, decision makers. When comparing a safe and a risky option with similar expected returns, the difference between the sample means tends to decrease with the sample size. That is, reliance on small samples amplifies the difference, and appears to facilitate an easy decision. ..

The Hot Stove Effect

Denrell and March () analyzed decisions from feedback when the feedback is limited to the obtained payoff (and the payoff from the unselected options situations is not revealed, as in the lower panel in 

Notice that the sample taken from memory before the choice (after the experimental sampling stage), can differ from the sample taken by clicking on the screen. Yet, in the current context, the accuracy of the information provided by the memory sample cannot exceed the accuracy of information provided by the observed sample.

https://doi.org/10.1017/9781009002042.007 Published online by Cambridge University Press

J/DM Separation Paradox and Reliance on Small Samples



Figure .). Their analysis shows that the natural assumption that individuals tend to avoid options after obtaining bad outcomes (e.g., avoiding the stove after getting burned from it), implies that a sequence of bad outcomes from one of the options will reduce the tendency to try this option again, and can lead to reliance on small samples. Importantly, this effect triggers risk aversion: the probability of a sequence of bad outcomes from an option with high expected value increases with payoff variance. The hot stove effect is particularly important in decisions among multiple alternatives. In this context it can trigger behavior that appears to reflect learned helplessness (Teodorescu & Erev, ). .. Reaction to Patterns Plonsky et al. () demonstrate that behavior that appears to reflect reliance on small samples can be the product of a sophisticated attempt to approximate the optimal strategies. Specifically, under the assumption that the environment is dynamic (e.g., the probability of gain is determined by a Markov chain), the optimal strategy can be approximated by reliance on a small sample of the most similar past experience. The thought experiment presented in Figure . illustrates this prediction. It is easy to see that in Figure .’s example the intuition (of intelligent decision makers who have not taught statistics too many years) is to base the decision in Trial  on only three of the  past experiences; only the past experience that seem most similar to trial . Plonsky et al.’s results

(a) Task: In each trial of the current study, you are asked to choose between “Top” and “Bottom,” and earn the payoff that appears on the selected key after your choice. The following table summarizes the results of the first 15 trials. What would you select in trial 16? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Top –1 –1 –1 +2 –1 –1 –1 +2 –1 –1 –1 +2 –1 –1 –1 Bottom 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 (b) Implications: In trial 16, intuition favors “Top” despite the fact that the average payoff from “Top” over all 15 trials is negative (–0.4). This intuition suggests a tendency to respond to a pattern, and implies that only 3 of the 15 trials (Trials 4, 8 and 12) are used to compute the value from “Top” in trial 16. Note. (a) the task; (b) the implications of the likely behavior.

Figure . A thought experiment (following Plonsky et al., ).

https://doi.org/10.1017/9781009002042.007 Published online by Cambridge University Press



 ,  

suggest that people behave in a similar way even when the intuition is less clear.

. The Impact of Feedback Early studies of decisions from experience documented similar reactions to feedback (e.g., the outcomes of the previous choices in Barron & Erev’s study illustrated in Figure .), and to the outcomes observed during prechoice free sampling (as in Hertwig et al., , and Condition Blank in Figure .). Both sources of experience were found to trigger underweighting of rare events. However, subsequent research shows that when the rare events are explicitly presented, experiencing feedback and experiencing prechoice samples have very different effects: Explicit presentation of the possible outcomes triggers overweighting of rare events in decisions from sampling (as demonstrated in Figure .), but does not change the tendency to underweight rare events in decisions from feedback. One indication of the effect of feedback on the mere presentation effect is provided by Schurr (). In the first part of this study, the participants faced  decisions from feedback trials (as in Barron & Erev, ). All  trials involved a choice between two unmarked keys: One key was a safe prospect that led to “loss of  for sure,” and the second was a risky prospect that led to “loss of  with p ¼ ., and no loss otherwise.” The choice rate of the risky prospect (risk-rate) was  percent. The second part of the study was conducted (with the same participants) six months after the first part. The participants were informed that they will face the same payoff distribution as in the first part, with the exception that the possible outcomes were listed on the keys. The labels on the keys were “loss of ” or “loss of  or .” The initial risk-rate was only  percent, but it increased to  percent after four trials. Thus, the results suggest that the mere presentation effect can trigger overweighting of rare events in decisions based on old past experiences, but recent feedback eliminates this effect and triggers underweighting of rare events. A similar pattern was documented in studies that examine the joint impact of description and feedback (See Cohen et al., ; Erev et al., ; Jessup et al., ; Lejarraga & Gonzalez, ; Marchiori et al., ; Yechiam et al.,  and illustration of the typical study and results in Figure .). These studies show an initial tendency to overweight rare events, and a reversal of this tendency after several trials (typically less than ) with feedback. The lower/left panel in Figure . (and the last row in

https://doi.org/10.1017/9781009002042.007 Published online by Cambridge University Press



J/DM Separation Paradox and Reliance on Small Samples Instructions and main screens

Main results

Please select one of the two options.

In the first 5 trials (before receiving feedback) the results reflect overweighting of the rare outcomes. Feedback reversed this bias and triggered underweighting of rare events.

Choice rate in 5 blocks of 5 trials in two experimental tasks. Feedback was provided after the first block. A: 2 for sure

B: 4 with p = .8; 0 otherwise

B: 101 with p = .01; 1 otherwise

1.00

1.00

0.75

0.75 P(B)

P(B)

A: 3 for sure

0.50

0.25

0.25 0.00

0.50

0.00 1

2

3

4

5

1

Block

2

3

4

5

Block

Figure . Illustration of the experimental task, and main results, in the study of decisions from description with feedback conducted by Erev et al. ().

Table .) illustrates these results using a study that focuses on a choice between “ for sure” and “ with p ¼ :,  otherwise.” The results show that the initial choice rate in this study is similar to the rate observed in the mere presentation condition in the study of the same problem using free sampling (Figure .). Yet, feedback triggered underweighting of rare events. Table . demonstrates that decisions from description with feedback are similar to decisions from feedback without description. The results suggest that feedback eliminates the impact of the description (and the implied mere presentation effect). Marchiori et al. () note that the large impact of feedback on the mere presentation effect can be explained by a model that rests on two assumptions: The first states (as suggested above) that at least part of the mere presentation effect reflects overgeneralization. Specifically, the presentation of the rare outcomes increases the tendency to overgeneralize from past experiences in which

https://doi.org/10.1017/9781009002042.007 Published online by Cambridge University Press



 ,  

the probability of the low probability outcomes were relatively high. The second assumption states that feedback reduces overgeneralization from old past experiences as it increases the availability of recent past experiences that are truly similar to the next choice. When overgeneralization is low, the impact of the tendency to rely on small samples of past experiences is clearer, and the choice rate reflects underweighting of rare events. Another indication of the strong impact of feedback is presented by Erev et al.’s () study of decisions from sampling. This study examined repeated decisions from sampling with feedback concerning the final payoff. The results reveal high initial sensitivity to the content of the sample. Specifically, when the explicitly drawn sample included a rare event, the initial behavior did not exhibit underweighting of these events (as in the studies reviewed in Wulff et al., ). However, after ten trials with feedback (each time drawing an explicit sample before making a single choice), the participants tended to underweight the rare events even when the rare event was over-represented in explicitly drawn samples relative to its objective probability (see also Plonsky & Teodorescu, ). Erev et al. show that their results can be captured with a “face or cue” model assuming that repeated performance of decisions from sampling tasks with feedback concerning the final outcomes changes the way people use the explicitly drawn free samples. While the initial behavior suggests that people focus on these samples’ face value (the mean outcome), feedback leads them to treat the sample as a “recall cue” and select the option that led to the best outcomes in past situations with similar cues.

. Implications for Descriptive Models Classical research that separates the study of judgment from the study of decision making proposes very different models to capture the underlying processes. The leading descriptive models of choice behavior add psychological biases to the computation required to derive expected values, and leading models of judgments add biases to the computations assumed by Bayes’ theorem. In contrast, the current analysis suggests that the difference between judgment and decision processes may not be large. Both judgment and decision making tasks call for the combination of past experiences and newer information, to select one of several responses. While the number of possible responses tends to be larger in judgment studies, it is not clear that this fact should change the underlying processes. Indeed, many natural decisions (including shopping and investing) also involve a large number of possible responses.

https://doi.org/10.1017/9781009002042.007 Published online by Cambridge University Press

J/DM Separation Paradox and Reliance on Small Samples



The main judgment and decision making phenomena reviewed above can be captured with models that share three general assumptions: () People tend to select the strategies that led to the best outcomes in a small sample of similar past experiences. The sampling process is likely to include a stochastic element (as in Erev et al., ; Juslin & Olsson, ). () Explicit presentation of rare events triggers overgeneralization that increases the probability that past experiences in which similar events occurred will be included in the sample. () Feedback reduces overgeneralization, but does not appear to decrease the tendency to rely on small samples. One demonstration of the value of these assumptions is provided by the  choice-prediction competition summarized by Erev et al. (). This competition focused on decisions from description with and without feedback. The winner (the model Best Estimate and Sampling Tools, or BEAST) assumes that the initial deviations from maximization reflects reliance on small samples plus three forms of overgeneralizations, and feedback reduces overgeneralization but does not reduce the tendency to rely on small samples.

. Summary Previous research demonstrates a large difference between decisions from description and decisions from experience, and also a large difference between decisions and probability judgment from experience. Comparison of decisions from description and from experience reveals a description–experience gap (Hertwig & Erev, ): higher sensitivity to rare events in decisions from description. Comparison of judgment and decisions from experience reveals the coexistence of overestimation and underweighting of rare events (Barron & Yechiam, ). The current review suggests that both sets of differences are examples of the J/DM separation paradox: While separated studies of judgment and decision making reveal oversensitivity to rare events, without the separation, these processes often lead to the opposite bias. Our analysis shows that the J/DM paradox can be the product of the fact that the separation of judgment from decisions making requires an explicit presentation of the rare events, and the presentation increases the apparent weighting of these events. Yet, our analysis also suggests that the mere presentation effect is not necessarily the main contributor to the impact of rare events in natural settings. Its impact diminishes when people receive feedback, whereas the tendency to rely on small samples

https://doi.org/10.1017/9781009002042.007 Published online by Cambridge University Press



 ,  

of past experience – that implies underweighting of rare events – appears to be more robust. Thus, experimental studies that separate judgment from decision making and focus on feedback-free environments are at risk of overestimating the impact of rare events in many natural settings. In addition, our results suggest that the difference between the processes that underlie judgment and decision making, and between decision making with different sources of information, may not be large. It is possible that people always rely on small samples of similar past experiences, and the main differences between the distinct paradigms reflect initial biases that diminish with feedback. R E F E R EN C E S Abdellaoui, M., L’Haridon, O., & Paraschiv, C. (). Experienced vs described uncertainty: Do we need two prospect theory specifications? Management Science, (), –. Allais, M. (). Le comportement de l’homme rationnel devant le risque: critique des postulats et axiomes de l’école américaine. Econometrica: Journal of the Econometric Society, (), –. www.jstor.org/stable/ Barron, G., & Erev, I. (). Small feedback-based decisions and their limited correspondence to description-based decisions. Journal of Behavioral Decision Making, (), –. Barron, G., & Yechiam, E. (). The coexistence of overestimation and underweighting of rare events and the contingent recency effect. Judgment and Decision Making, (), –. www.sjdm.org/~baron/journal/b/ jdmb.pdf Cohen, D., Plonsky, O., & Erev, I. (). On the impact of experience on probability weighting in decisions under risk. Decision, (), –. Denrell, J., & March, J. G. (). Adaptation as information restriction: The hot stove effect. Organization Science, (), –. Erev, I., Ert, E., Plonsky, O., Cohen, D., & Cohen, O. (). From anomalies to forecasts: Toward a descriptive model of decisions under risk, under ambiguity, and from experience. Psychological Review, (), –. https://doi.org/./rev Erev, I., Ert, E., & Roth, A. E. (a). A choice prediction competition for market entry games: An introduction. Games, (), –. https://doi .org/./g Erev, I., Ert, E., Roth, A. E., Haruvy, E., Herzog, S. M., Hau, R., Hertwig, R., Stewart, T., West, R., & Lebiere, C. (b). A choice prediction competition: Choices from experience and from description. Journal of Behavioral Decision Making, (), –. https://doi.org/./bdm. Erev, I., Glozman, I., & Hertwig, R. (a). What impacts the impact of rare events. Journal of Risk and Uncertainty, (), –. https://doi.org/ ./s–--z

https://doi.org/10.1017/9781009002042.007 Published online by Cambridge University Press

J/DM Separation Paradox and Reliance on Small Samples



Erev, I., & Roth, A. E. (). Maximization, learning, and economic behavior. Proceedings of the National Academy of Sciences, (Supplement ), –. Erev, I., Shimonovich, D., Schurr, A., & Hertwig, R. (b). Base rates: How to make the intuitive mind appreciate or neglect them. In Intuition in judgment and decision making (pp. –). Hillsdale, NJ: Erlbaum. Erev, I., Wallsten, T. S., & Budescu, D. V. (). Simultaneous over- and : underconfidence: The role of error in judgment processes. Psychological Review, (), –. Erev, I. Yakobi, O., Ashby, N. J. S., & Chater, N. (). The impact of experience on decisions based on pre-choice samples, and the face-or-cue hypothesis. Theory and Decisions, (), –. Fiedler, K., Brinkmann, B., Betsch, T., & Wild, B. (). A sampling approach to biases in conditional probability judgments: Beyond base rate neglect and statistical format. Journal of Experimental Psychology: General, (), . Fiedler, K., & Juslin, P. (). Information sampling and adaptive cognition. http://books.google.com/books?hl=en&lr=&id=-vNh-YNVgC&oi=fnd&pg= PR&dq=Information+Sampling+and+adaptive+cognition&ots=ZJhKclvj& sig=MtQKmyqMFOZodurPF-TusOUgA Fischhoff, B., Slovic, P., & Lichtenstein, S. (). Fault trees: Sensitivity of estimated failure probabilities to problem representation. Journal of Experimental Psychology: Human Perception and Performance, (), –. Fox, C. R., & Tversky, A. (). A belief-based account of decision under uncertainty. Management Science, (), –. Gonzalez, C., Lerch, J. F., & Lebiere, C. (). Instance-based learning in dynamic decision making. Cognitive Science, (), –. https://doi .org/./S-()- Hau, R., Pleskac, T. J., Kiefer, J., & Hertwig, R. (). The description– experience gap in risky choice: The role of sample size and experienced probabilities. Journal of Behavioral Decision Making, (), –. Hertwig, R., Barron, G., Weber, E., & Erev, I. (). Decisions from experience and the effect of rare events in risky choice. Psychological Science, (), –. https://doi.org/./j.-...x Hertwig, R., & Erev, I. (). The description–experience gap in risky choice. Trends in Cognitive Sciences, (), –. https://doi.org/./j.tics ... Hertwig, R., & Pleskac, T. J. (). Decisions from experience: Why small samples? Cognition, (), –. https://doi.org/./j.cognition ... Jessup, R. K., Bishara, A. J., & Busemeyer, J. R. (). Feedback produces divergence from prospect theory in descriptive choice. Psychological Science, (), –. https://doi.org/./j.-...x Juslin, P., & Olsson, H. (). Thurstonian and Brunswikian origins of uncertainty in judgment: A sampling model of confidence in sensory discrimination. Psychological Review, (), –.

https://doi.org/10.1017/9781009002042.007 Published online by Cambridge University Press



 ,  

Juslin, P., Winman, A., & Hansson, P. (). The naïve intuitive statistician: A naïve sampling model of intuitive confidence intervals. Psychological Review, (), –. Kahneman, D., & Tversky, A. (). Prospect theory: An analysis of decision under risk. Econometrica, (), –. Kareev, Y. (). Seven (indeed, plus or minus two) and the detection of correlations. Psychological Review, (), –. https://doi.org/ ./TO-X... Lejarraga, T., & Gonzalez, C. (). Effects of feedback and complexity on repeated decisions from description. Organizational Behavior and Human Decision Processes, (), –. https://doi.org/./j.obhdp... Marchiori, D., Di Guida, S., & Erev, I. (). Noisy retrieval models of overand undersensitivity to rare events. Decision, (), –. Phillips, L. D., & Edwards, W. (). Conservatism in a simple probability inference task. Journal of Experimental Psychology, (), –. Plonsky, O., Apel, R., Ert, E., et al. (). Predicting human decisions with behavioral theories and machine learning. ArXiv Preprint ArXiv:.. Plonsky, O., & Teodorescu, K. (). The influence of biased exposure to foregone outcomes. Journal of Behavioral Decision Making, (), –. Plonsky, O., Teodorescu, K., & Erev, I. (). Reliance on small samples, the wavy recency effect, and similarity-based learning. Psychological Review,  (), –. Rapoport, A., Wallsten, T. S., Erev, I., & Cohen, B. L. (). Revision of opinion with verbally and numerically expressed uncertainties. Acta Psychologica, (), –. Savage, L. J. (). The foundations of statistics. New York: John Wiley. Schurr, A. (). Peak or freq? The effect of unpleasant extreme experiences. Haifa, Israel: Technion-Israel Institute of Technology. Skinner, B. (). Science and human behavior. Free Press. Teodorescu, K., Amir, M., & Erev, I. (). The experience–description gap and the role of the inter decision interval. In N. Srinivasan & V. S. C. Pamni (Eds.), Progress in brain research (st ed., Vol. , pp. –). Amsterdam: Elsevier. https://doi.org/./B----.-X Teodorescu, K., & Erev, I. (). Learned helplessness and learned prevalence: Exploring the causal relations among perceived controllability, reward prevalence, and exploration. Psychological Science, (), –. Wulff, D. U., Mergenthaler-Canseco, M., & Hertwig, R. (). A meta-analytic review of two modes of learning and the description–experience gap. Psychological Bulletin, (), –. Yechiam, E., Barron, G., & Erev, I. (). The role of personal experience in contributing to different patterns of response to rare terrorist attacks. Journal of Conflict Resolution, (), –. https://doi.org/./

https://doi.org/10.1017/9781009002042.007 Published online by Cambridge University Press

 

Sampling as Preparedness in Evaluative Learning Mandy Hu¨tter and Zachary Adolph Niese

The preparation of this chapter was supported by an Emmy Noether grant (HU /-) and a Heisenberg grant (HU /-) awarded to Mandy Hu¨tter by the German Research Foundation.

Research into evaluative learning has largely relied on paradigms that present passive participants with carefully controlled stimulus arrangements. For instance, in the evaluative conditioning (EC) paradigm, participants view initially neutral stimuli (conditioned stimuli; CSs) that are consistently paired with either positive or negative stimuli (unconditioned stimuli; USs). In line with this general approach, research into moderators of EC effects has typically focused on the nature of the stimuli or features of the learning phase. Accordingly, a widely cited meta-analysis on EC (Hofmann et al., ) mainly focused on properties of the CSs and USs and procedural aspects of their pairing. Complementing this traditional focus, an increasing number of papers demonstrates that the disposition of the learner (e.g., mood; Walther & Grigoriadis, ) and the context of the pairing procedure (e.g., stimulusvalence contingencies in the context of a pairing; Bar-Anan & Dahan, ) influence the size and direction of effects that can be observed (Hu¨tter & Fiedler, ). Together these findings suggest that the processing of a pairing is never “simple,” but is influenced by people’s knowledge, affect, and motivation. The size and the direction of the EC effect depend on people’s readiness to process a pairing at a certain point in time in a certain way. In the present chapter, we provide an integrative theoretical framework in the form of a constructivist conceptualization of preparedness and argue that having participants sample the CS–US pairings offers an interesting and promising way of making peoples’ readiness to process stimulus pairings visible.

. Traditional Conceptualization of Preparedness The notion that the efficiency of CS–US pairings depends on certain boundary conditions is well established in classical conditioning. As 

https://doi.org/10.1017/9781009002042.008 Published online by Cambridge University Press



 u¨,   

Seligman () noted, some classes of stimuli show a higher associability than others. That is, associations between those stimuli are possibly established within only a few trials and show a high resistance to extinction and higher-level cognitive information. Seligman () attributed this higher associability to the organism’s “preparedness.” Preparedness was conceptualized as a phylogenetic phenomenon, that is, as the result of evolutionary pressures. For instance, humans and many other species are said to be prepared to associate tastes with nausea or other forms of sickness. The concept of preparedness is also referred to as an explanation for the genesis of phobias (Seligman, ). For instance, phobias primarily occur for stimuli that were better avoided throughout the development of humankind such as snakes and spiders. The avoidance of such stimuli increased the likelihood that humans survived. In this traditional view, preparedness is a stable characteristic of biologically relevant stimuli. Alternative theoretical approaches turned away from evolutionary selection pressures as an explanation of preparedness but explained differences in associability and extinction rates by reference to expectations formed on the basis of social norms or other cultural influences (e.g., Delprato, ). Irrespective of the assumed evolutionary or cultural basis of preparedness, previous definitions ignore crucial short-term influences of the ecology and the individual that allow evaluative learning to be flexible. .. A Constructivist Perspective on Preparedness Traditional views on evaluative learning regard observed effects as stimulus-driven and, consequently, rather inflexible (Gawronski & Bodenhausen, ). Moreover, many publications refer to classical and evaluative conditioning as “simple” procedures involving “simple” pairings. However, the simplicity of the procedure should not be mistaken for simple mental operations performed in response to a pairing. First, it occurs in the context of other regularities in the environment that could influence its interpretation. For instance, research into affective meaning shows that valence is strongly related to arousal and dominance (Marchewka et al., ). That is, highly negative stimuli are likely also high in arousal and dominance, whereas highly positive stimuli are likely low in arousal and dominance. Second, the pairing is received by an individual who is equipped with certain processing capacities, intentions, motivations, and a history of previous learning. From this perspective, preparedness is a function of the environment that has certain regularities and affordances, but it is also a function of the individual.

https://doi.org/10.1017/9781009002042.008 Published online by Cambridge University Press

Sampling as Preparedness in Evaluative Learning



Accordingly, the notion of preparedness need not be confined to phylogenetic influences on learning. Instead, our constructivist conceptualization of preparedness assumes that the strength and direction of learning effects of stimulus pairings can vary flexibly between environments and individuals. Individuals are not merely passive recipients of stimulus pairings. Rather they interpret and enrich these pairings with self-generated information. For instance, Unkelbach and Fiedler () showed that a concurrent judgment task can induce an assimilative or contrastive mindset that has the power to moderate the direction of the EC effect. They thereby demonstrated that the default predicate assumed by conventional EC research does not always hold. The interpretation of a pairing can also depend on context pairings that are comprised by identical or opposite stimuli (Hughes et al., ; see also Bar-Anan & Dahan, ). Finally, the interpretation of stimulus pairings can be influenced by the stimuli’s usual relationship in the ecology. Specifically, Fan et al. () showed that pairing vaccine brands with images representing sickness can induce positive evaluations of these brands. All these examples are notable illustrations of the learner interpreting the pairing leading to a reduction or reversal of standard EC effects when relationships in the environment suggest a contrastive relation between the CS and the US. In sum, we use preparedness as an umbrella term by which we refer to the influence of features of the stimuli, the learner, and the learning environment that have the potential to alter the processing of stimulus pairings and, as a consequence, their effects on evaluations. Importantly, we do not assume a unitary cognitive mechanism underlying these influences on learning. Rather, we call for future research to document these interesting effects on the one hand and to identify the mental processes underlying these effects on the other hand. Previous research demonstrating that characteristics of the ecology and the individual lay the foundation for the effectiveness of a stimulus pairing focused on the traditional EC paradigm that involves passively presented CS–US pairings. Although it breaks with fundamental principles of conditioning, a sampling approach to this basic paradigm offers a novel perspective that has the power to reveal important insights into evaluative learning. It is conceivable that the influence of predispositions (whether they lay in the ecology or in the individual) becomes even more apparent when people create their own samples of pairings to begin with. Allowing participants to create their own sample of CS–US pairings reveals which pairings they prefer to receive. Moreover, given that these samples may reflect only a biased image of the information ecology, the impression that

https://doi.org/10.1017/9781009002042.008 Published online by Cambridge University Press



 u¨,   

people form of the CSs may be biased as a function of the sample they produced. For instance, previous research has demonstrated that people’s samples can be biased by the value of the outcome of sampled options. In particular, people tend to sample those options more often that they expect to produce positive outcomes (Denrell, ; Fazio et al., ; Le Mens & Denrell, ). Samples can also indicate which information people find diagnostic. For instance, Prager et al. () showed that participants’ impressions of target persons were more extreme the fewer pieces of information they had sampled. That is, participants given the opportunity to truncate sampling do so early when their impression is sufficiently clear. In sum, endowing participants with autonomy over the size of the stimulus samples allows the researcher to observe which pairings participants are willing or ready to process. The sampling approach to EC also offers an opportunity to test the role of autonomy in the processing of CS–US pairings. It is conceivable and even likely that participants sample pairings for those CSs that they are ready to process at a given point in time (cf. Prager et al., ). From this perspective, it may be expected that EC effects are generally larger for participants who actively engage in sampling. This notion resonates with at least two accounts relevant to evaluative change. First, propositional accounts of evaluative learning may expect sampling participants to engage more in reasoning about the CS–US pairings (De Houwer, , ; Gawronski & Bodenhausen, , ). As propositional learning is assumed to produce EC effects by means of inferences about these pairings, an increase in deliberate reasoning should also produce larger EC effects. Second, in line with an amplification account of processing fluency (Albrecht & Carbon, ; Landwehr & Eckmann, ) it may be expected that being prepared to process a pairing increases subjective feelings of ease, which in turn increase the effects of the valence of the US on the evaluation of the CS. In particular, comparing the EC effects of sampling participants with participants who are passively presented with the sample that another participant produced allows investigating whether the impact of pairings changes as a function of autonomy over frequency and the time at which a certain pairing is presented. In the research we discuss in this chapter, we introduced such a yoked design to test for these particular effects of a short-lived, fluctuating preparedness. Prager and colleagues (Chapter  this volume and also Prager et al., ) introduced a yoked design in their impression formation paradigm and showed that while small samples coincided with extreme impressions in sampling participants, this was not the case in their yoked counterparts who were

https://doi.org/10.1017/9781009002042.008 Published online by Cambridge University Press

Sampling as Preparedness in Evaluative Learning



passively presented with the samples generated by the sampling participant. This finding strongly suggests that the extreme impressions are not a function of the stimulus samples drawn, but of the internally generated reactions to those samples, which can vary between individuals, contexts, and points in time. Prager and colleagues (Chapter  this volume and also Prager et al., ) use the term “Thurstonian sampling” to refer to this phenomenon to discriminate it from “Brunswikian sampling“ that refers to stimulus samples drawn from the ecology (cf. Juslin & Olsson, ). As we will show below, evaluative learning can be codetermined by such effects of sampling in addition to effects of the sample, which can be considered a prime instance of preparedness.

. Introduction of Sampling to Evaluative Learning Building from the insights outlined in the previous section, we have developed a novel paradigm designed to explore how people’s readiness to process information influences evaluative learning by incorporating autonomous sampling into an evaluative conditioning procedure. Here, we present the procedure of this paradigm, as well as data from an experiment to highlight the value of incorporating sampling into EC paradigms and the insights it provides to the role of preparedness in evaluative learning more broadly. As in other EC tasks, participants in this paradigm view pairings of CSs and positive or negative USs. However, unlike other EC tasks, this paradigm provides participants with a degree of autonomy over the pairings they view. Specifically, on each trial, participants are shown an overview of the entire set of CSs and select one to view. Upon selection, they are shown a pairing of the CS with a US. Then, participants return to the CS overview and again select a CS to view the next pairing. Participants view a predetermined number of pairings in total, but how they distribute the number of pairings they view across the CSs is up to them. .. Methodology To illustrate, we present a recent experiment employing this paradigm with data from  participants who completed the experiment in 

A total of  participants completed the experiment. However, data from  were excluded due to technical or human error.

https://doi.org/10.1017/9781009002042.008 Published online by Cambridge University Press



 u¨,   

exchange for compensation at the University of Tu¨bingen (Germany). Participants first provided preratings on a -point scale of a pool of  faces, and for each participant, the  most neutrally rated faces were used as CSs. Half the participants were randomly assigned to the high-autonomy condition. These participants were told that the task would involve interactions with the faces, that they would complete  total interactions, and that on each trial, they should simply select who they want to interact with next. The  CSs were randomly arranged in a three-by-four grid and participants were asked to click on a CS to start an interaction. Upon clicking the CS, the CS picture appeared next to a US image drawn from the IAPS database (Lang et al., ). This pairing was shown for  ms. The US pool consisted of  positive and  negative images. Valence differed clearly between the positive and negative US images, t ðÞ ¼ :, p < :, but arousal was equated between the sets, t ð:Þ ¼ :, p ¼ :. Then, participants returned to the grid of CSs and selected another to have an interaction with. Six of the CSs were always paired with clearly positive USs and six were always paired with clearly negative USs. The position of the CSs in the grid was fixed throughout the learning phase. There were no constraints on stimulus selection. In principle, participants could have chosen one CS  times. After completing the  interactions, participants provided postratings of the CSs (again on -point scales), as well as a few memory measures, including a measure of their memory for whether each face was paired with positive or negative images. Importantly, only half of the participants were given autonomy. The other half of the participants were each yoked to the sampling pattern of one of the autonomous participants. Participants in the yoked condition viewed a presentation that alternated between the grid of their  most neutrally rated CSs and a CS–US pairing automatically every , ms. That is, for each participant in this condition, the task mirrored that of one of the high-autonomy participants (for each pair of yoked and autonomous participants, the procedure held constant the number of positive vs. negative pairings, the number of times each CS was presented, the order of positive and negative pairings, etc.), with the critical difference being that participants in this condition passively viewed the pairings, rather than being afforded the autonomy to select them. Participants in this condition were not informed that the presentation was based on another participant’s sampling decisions. The procedure of this experiment is identical to Experiment  presented in Hu¨tter et al. () with the exception that the yoking procedure

https://doi.org/10.1017/9781009002042.008 Published online by Cambridge University Press

Sampling as Preparedness in Evaluative Learning



regards only US valence and sampling frequency, and randomly draws images from the positive or negative US sets. In Experiment  of Hu¨tter et al. (), the yoked participant was shown the identical US images that the sampling participant saw. ..

Results

We were first interested in investigating how the pairings shaped sampling behavior. To test this, we analyzed whether sampling trial (–) was a significant predictor of the mean sampled valence in a multilevel logistic regression analysis (i.e., with  indicating negative US valence and + positive US valence) that included participant and CS identity as random factors among participants in the high-autonomy sampling condition. As expected, we found that sampling trial was a significant predictor of sampled valence, B ¼ :, SE ¼ :, z ¼ :, p < :. In particular, whereas sampling a positive or negative CS was roughly equal at the beginning of the task (intercept = ., SE ¼ :, z ¼ :, p ¼ :), participants’ preference for positively paired CSs increased by roughly  percent (e^.) with each additional trial. This highlights that when participants are given autonomy over the pairings they view in an EC procedure, they are likely to create positively skewed samples that differ from the equally balanced samples often utilized in passive paradigms. Thus, an important next question is to test whether typical EC effects replicate in situations where the sample of pairings is determined by individuals’ sampling behavior. To do so, we predicted evaluative shift (postrating minus prerating) for each CS depending on paired valence (. ¼ negative, . ¼ positive), number of samples (person-mean-centered), autonomy condition (. ¼ low autonomy; . ¼ high autonomy), and their interactions in a multilevel regression model that included participant and CS identity as random intercepts. Doing so revealed a significant three-way interaction between US valence, autonomy condition, and number of samples, B ¼ :, SE ¼ :, t ð:Þ ¼ :, p ¼ :, β ¼ :. The nature of this interaction can be seen in Figure .. In the low-autonomy, yoked condition, there was an effect of US valence, B ¼ :, SE ¼ :, t ð:Þ ¼ :, p < :, β ¼ :, such that positively paired (vs. negatively paired) CSs showed a more positive



Effect sizes are reported in the form of standardized regression coefficients (β).

https://doi.org/10.1017/9781009002042.008 Published online by Cambridge University Press



 u¨,   

Figure .

Evaluative shift as a function of US valence, autonomy, and number of samples. Note: Twenty-two values with number of pairings > are not shown in the graph to make it more readable. Regression lines are calculated using all values, but without accounting for random effects. Grey areas indicate  percent confidence intervals. Points are offset in dense areas to reduce overplotting.

evaluative shift. This effect became stronger with more samples, B ¼ :, SE ¼ :, t ð:Þ ¼ :, p ¼ :, β ¼ :. In the high-autonomy, sampling condition, there was also a main effect of US valence, B ¼ :, SE ¼ :, t ð:Þ ¼ :, p < :, β ¼ :. However, unlike the yoked condition, this effect was not moderated by number of samples, B ¼ :, SE ¼ :, t ð:Þ ¼ :, p ¼ :, β ¼ :. Thus, these results highlight the robustness of evaluative conditioning effects: they appear even in situations with unbalanced samples and in conditions where individuals actively sample the pairings they viewed. Indeed, in both conditions, we found evidence of clear and strong effects on evaluative shift of the paired US valence. Further, this effect strengthened with the number of samples in the low-autonomy condition, but not in the high-autonomy sampling condition.

https://doi.org/10.1017/9781009002042.008 Published online by Cambridge University Press

Sampling as Preparedness in Evaluative Learning



In addition, this experiment provided evidence of an effect of sampling itself on evaluative shift. In particular, there was a significant interaction between number of samples and autonomy condition, B ¼ :, SE ¼ :, t ð:Þ ¼ :, p < :, β ¼ :. In the yoked condition, there was not a main effect of number of samples, B ¼ :, SE ¼ :, t ð:Þ ¼ :, p ¼ :, β ¼ :. As noted above, the effect of number of samples instead interacted with paired valence in this condition. In contrast, in the sampling condition, there was a main effect of the number of samples, such that an increased number of samples was related to a more positive evaluative shift, B ¼ :, SE ¼ :, t ð:Þ ¼ :, p < :, β ¼ :. And, as noted above, there was not a significant interaction between US valence and number of samples in this condition. That is, in the high-autonomy sampling condition, number of samples positively predicted evaluative shift regardless of US valence. Indeed, further analyses indicated that within the high-autonomy, sampling condition, number of samples predicted a positive evaluative shift for both positively paired CSs, B ¼ :, SE ¼ :, t ð:Þ ¼ :, p < :, β ¼ :, and negatively paired CSs, B ¼ :, SE ¼ :, t ð:Þ ¼ :, p ¼ :, β ¼ :. That is, this experiment provides evidence that sampling a stimulus more (vs. less) frequently predicts a positive effect on evaluation even when sampling the stimulus reliably results in a negative outcome. Finally, we investigated what effects sampling had on participants’ memory for whether each CS was paired with positive or negative USs. For each CS that was viewed at least once, we predicted memory accuracy ( ¼ correct,  ¼ incorrect) depending on paired valence (. ¼ negative, . ¼ positive), number of samples (person-mean-centered), autonomy condition (. ¼ low autonomy; . ¼ high autonomy), and their interactions in a multilevel regression model that included participant and CS identity as random intercepts. Doing so revealed a significant three-way interaction between US valence, autonomy condition, and number of samples, B ¼ :, SE ¼ :, z ¼ :, p ¼ :. For positively paired CSs, there was a significant interaction between autonomy condition and the number of samples, B ¼ :, SE ¼ :, z ¼ :, p ¼ :. In particular, a higher number of samples was related to better memory among participants in the sampling condition, B ¼ :, SE ¼ :, z ¼ :, p ¼ :, but not in the yoked condition, B ¼ :, SE ¼ :, z ¼ :, p ¼ :. In contrast, for negatively paired CSs, the interaction between autonomy condition

https://doi.org/10.1017/9781009002042.008 Published online by Cambridge University Press



 u¨,   

and the number of samples was not significant, B ¼ :, SE ¼ :, z ¼ :, p ¼ :. There was also no main effect of either the number of samples, B ¼ :, SE ¼ :, z ¼ :, p ¼ :, or autonomy condition, B ¼ :, SE ¼ :, z ¼ :, p ¼ :.

. Insights Gained From the Sampling Approach A sampling approach to EC diverges from the traditional EC paradigm in one of its central aspects, namely the passive presentation of carefully balanced stimulus samples. Thereby, this adaptation of a conventional research paradigm provides an opportunity to gain various novel insights to evaluative learning and extends the paradigm to a class of learning situations that it has not spoken to until now. For instance, in their daily lives, people often have autonomy over deciding what and whom they interact with. Traditional, passive evaluative learning paradigms do not capture this natural area of attitude learning. Consistent with previous research, autonomous participants in this experiment showed a strong preference for positively paired CSs (e.g., Fazio et al., ; Le Mens & Denrell, ), ultimately creating skewed distributions unlike the perfectly balanced arrangements used in most traditional EC procedures. Indeed, in other experiments we have conducted using this paradigm (Hu¨tter et al., ), autonomous participants routinely create positively biased samples, and they do so even when they are explicitly provided with an epistemic goal to learn as much as possible about all CSs. Despite the imbalanced distributions of CS–US pairings created by participants’ sampling behavior, experiments we have conducted using this adapted paradigm consistently provide evidence of an EC effect in both the high- and low-autonomy conditions. Thus, although numerous EC paradigms are characterized by passivity (and some paradigms even attempt to heighten this passivity by distracting participants, using brief presentation times, etc.; Gawronski & Bodenhausen, ; Jones et al., ; Olson & Fazio, ; Sweldens et al., ), this should not be viewed as a necessary or enabling condition to observe EC effects. Indeed, the overall strength of the effect of paired valence typically does not depend on autonomy condition in these experiments. Interestingly, this also means that we do not find evidence that actively engaging in the learning task (i.e., as required in the autonomous sampling condition) creates

https://doi.org/10.1017/9781009002042.008 Published online by Cambridge University Press

Sampling as Preparedness in Evaluative Learning



stronger effects of paired valence, which seems at odds with propositional explanations of EC effects (e.g., De Houwer, , ). However, this finding seems to suggest that sampling per se does not facilitate reasoning about the nature and meaning of the pairings. When instructing participants to learn as much as possible about each CS, they show increased EC effects (Hu¨tter et al., ). Additionally, the current experiment speaks to the role that the number of pairings has on EC effects. Some theoretical models of EC effects in the traditional paradigm predict larger EC effects with increasing numbers of conditioning trials (e.g., Baeyens et al., ; Davey, ; Field & Davey, ; Levey & Martin, ; Martin & Levey, , ). Interestingly though, previous attempts to systematically vary the number of presentations yielded inconclusive results as to whether doing so strengthened EC effects (Hofmann et al., ). Indeed, in this and other experiments using this paradigm, we obtained evidence consistent with the notion of an increased number of pairings strengthening the EC effect in the low-autonomy condition. Although effects in the highautonomy condition do not consistently differ from effects in the lowautonomy condition and meta-analyses do not provide support for a significant difference, the relationship between sample size and EC effects seems to be more variable when participants draw their own samples (Hu¨tter et al., ). In this particular experiment, the three-way interaction is significant, and number of samples is not related to the size of the EC effect in the high-autonomy condition. The fact that number of samples predict EC effects less reliably in the high-autonomy condition may be explained by two competing forces determining sampling in the adapted EC paradigm. On the one hand, autonomous participants may stop sampling as soon as their impression of the CS is sufficiently clear (Prager et al., ). On the other hand, they may continue sampling the positive stimuli although or because their impression is sufficiently clear. As we used IAPS pictures that are designed to induce affective reactions and as we have required participants to sample a certain number of pairings, participants might have sampled positive CSs to avoid sampling CSs that induce negative affect. Indeed, the current experiment found evidence that the quality of valence memory was associated with an increased number of samples for positively paired CSs in the highautonomy sampling condition, whereas this relationship was insignificant for negatively paired CSs and within the low-autonomy yoked condition.

https://doi.org/10.1017/9781009002042.008 Published online by Cambridge University Press



 u¨,   

Finally, in addition to the value of considering the role sampling processes play in shaping the information people base their judgments and decisions on, this work highlights the role sampling itself has in evaluative learning. In particular, participants in the high-autonomy condition showed a positive evaluative shift the more frequently they sampled a CS. This finding is consistent with various other studies we have conducted using this paradigm (Hu¨tter et al., ). Most intriguingly, this effect does not appear to be moderated by valence, and indeed, we tend to find a significant positive effect of number of samples on evaluative shift for negatively paired CSs in the high-autonomy condition. This is particularly surprising given that participants receive objectively more negative information about these CSs the more often they sample them. This finding suggests that negatively paired CSs in the traditional, passive EC paradigm may generate two sources of negativity: one from the negative pairing itself, and another from the experience of being forced to continue viewing a stimulus that one does not wish to view. From the research program we conducted along these lines (Hu¨tter et al., ), we have some additional indication that this effect is driven by sampling, as opposed to a possible alternative that people more frequently sample the CSs they already like. First, most of the experiments using this paradigm find no effect of the preratings on the number of times CSs are sampled. Second, we obtained consistent results when using the absolute postratings as the criterion in our analyses while controlling for absolute preratings. That is, an analysis that controls for the preexisting differences between CSs also documents a positive relationship between sampling frequency and evaluative shift. Finally, in a variation of the paradigm (Hu¨tter et al., , Experiment ), participants were only allowed to sample half of the CSs. Specifically, participants could interact with any of the  CSs at the beginning, but once they had sampled six different CSs, they were no longer able to interact with the six unselected CSs. After the learning phase, participants provided evaluative postratings ratings for all  CSs, that is, including the unselected faces. Interestingly, the unselected faces showed a negative evaluative shift. Thus, the decision not to sample a stimulus exerts an effect on liking in the absence of any information provided about this stimulus (i.e., in the form of US images). This result would not be expected if sampling behavior merely tracked preexisting attitudes. Moreover, this finding resonates with constructivist 

Note that the differences between CSs are minimized by the procedure that selects the  most neutral faces as CSs according to the preratings of each participant.

https://doi.org/10.1017/9781009002042.008 Published online by Cambridge University Press

Sampling as Preparedness in Evaluative Learning



views on learning from selective feedback, which assume that the avoidance of a stimulus is due to the anticipation of a negative pairing (Elwin et al., ). From this perspective, these self-generated pairings have the potential to influence the evaluation of the CSs similar to actually perceived pairings. Several variants of the experiment presented here highlight the importance of autonomous sampling decisions in generating this effect. In particular, one experiment used a version of the task that incorporated a second yoked condition in which participants were told which face to sample on each trial (Hu¨tter et al., ). That is, the task was similar to the passive yoked condition in that participants did not have autonomy over the number of times they viewed each CS, but it was similar to the high-autonomy sampling condition in that participants had to actively sample the CSs by clicking on the designated face in each trial in order to cause its pairing to appear. The results of the two yoked groups were highly similar: while number of samples predicted a positive evaluative shift regardless of valence in the high-autonomy sampling condition, this was not the case in either of the yoked groups (where number of samples interacted with US valence). In this way, the current findings connect with research on the role of approach and avoidance behavior in evaluative learning. Previous research demonstrates that approaching (avoiding) a stimulus can positively (negatively) impact evaluations of the stimulus (e.g., Hu¨tter & Genschow, ; Wiers et al., ). That is, if sampling the CS is interpreted as approaching the stimulus, it makes sense that doing so more (vs. less) often would be associated with a positive boost in evaluation. This prediction fits in the broader literature of Self-Perception Theory (Bem, ), according to which people use their behavior to infer their attitudes toward a stimulus when there are no strong internal or external cues to guide behavior (e.g., decisions to sample a neutral stimulus). Cognitive Dissonance Theory (Festinger, ) provides another reason as to why sampling decisions might lead to increased liking, particularly for negatively paired CSs. Indeed, when people feel personally responsible for a negative outcome, they can decrease their dissonance by increasing their attitude toward the chosen option (e.g., liking a negatively paired CS more after repeatedly choosing to sample it; Cooper, ). Thus, this finding connects with a rich literature that predicts decisions to sample a CS might lead to positive evaluative shifts regardless of whether it is paired with a positive or negative US, while decisions not to sample a CS might lead to no shifts or even negative evaluative shifts.

https://doi.org/10.1017/9781009002042.008 Published online by Cambridge University Press



 u¨,   

. Sampling as Preparedness in Evaluative Conditioning We argued above that giving learners autonomy over the stimulus pairings they see allows investigating aspects of preparedness in evaluative learning. In a sampling approach to evaluative learning, we can observe the stimuli that participants sample as well as the impact that these stimuli have on evaluations as a function of autonomy. In this section, we will discuss the current results of the sampling approach to EC from the perspective of our constructivist view on preparedness and illustrate how this perspective inspires and informs future research. Our sampling data offer an illustration of the strong preferences that participants have for positively paired CSs. First, we obtained a strong positive asymmetry in sampling that develops over time. That is, participants naturally start out sampling positive and negative CS–US pairs at roughly equal degrees. The more they learn about the stimuli the less they sample stimuli that are paired negatively and instead focus on the ones that are paired positively. This interpretation is supported by the relationship between valence memory and sampling frequency. In particular, the positive correlation between memory accuracy and sampling frequency in positively paired CSs suggests that participants sample the stimuli they remember to be paired positively. In the yoked condition, this correlation is reduced to nonsignificance, suggesting that the alternative interpretation that more pairings lead to better memory does not apply. Interesting about this observation is also the implication that images are perceived as consequential. Given that participants sample positively paired CSs and avoid negative ones, it can be concluded that participants perceive positive images from the IAPS as rewards and negative ones as punishments. Importantly, we compare a sampling with a yoked condition to investigate whether the impact of stimulus pairings on CS evaluations changes as a function of autonomy. Interestingly, in the studies we conducted so far, we did not obtain systematic effects of autonomy on the absolute difference between positively and negatively paired CSs in evaluative ratings. Hence, it appears that the opportunity to sample does not in itself increase EC effects. One might have reasoned that stimulus pairings exert a stronger effect on liking if participants are prepared to process them (e.g., in line with an amplification account of processing fluency; Albrecht & Carbon, ; Landwehr & Eckmann, ). By contrast, as we habitually sample instances that we like or expect to generate positive outcomes (e.g., Denrell, ; Fazio et al., ), one may assume that sampling serves as a catalyst for positive valence in the encoding, consolidation, and

https://doi.org/10.1017/9781009002042.008 Published online by Cambridge University Press

Sampling as Preparedness in Evaluative Learning



retrieval of evaluative information, so that positive EC effects are particularly strong. Indeed, we found a positive relationship between sampling frequency and evaluative shift even in negatively paired CSs. This perspective is at odds with major theorizing in the domain of another important passive learning paradigm, classical conditioning. That is, according to prominent accounts of classical conditioning such as the Rescorla–Wagner model unexpected pairings should exert a particularly strong effect on learning (e.g., Pearce & Hall, ; Rescorla & Wagner, ). Hence, if negative pairings are unexpected, they should exert a particularly strong negative effect on liking. From this perspective, it is surprising that negatively paired CSs become more positive with each additional pairing in our research. This effect nicely illustrates the degree to which a sampling opportunity has the potential to alter established effects and theorizing in the domain of evaluative learning. In particular, it invites theorizing about the manifold boundary conditions that established research paradigms impose on learning. For instance, given that EC paradigms often employ stimuli that elicit an affective reaction in the viewer (e.g., Lang et al., ), it is likely that a hedonic motivation dominates in our paradigm, meaning that participants try to maximize the number or magnitude of positive experiences, not their learning about the specific CSs. The notion that a hedonic goal could counteract the advantages that sampling creates for learning (cf. Prager et al., ) has not been considered so far. Our preparedness perspective inspires several possible explanations for the relationship between sampling and evaluative change. The first is a constructive coding explanation formulated for learning from selective feedback (Elwin et al., ). This account explains why participants’ estimation of the success rates of sampled and unsampled options is quite accurate although participants do not receive feedback on the options they do not sample (as in the present paradigm). Elwin et al. () assume that participants consider whether the outcome of an option, if sampled, would be positive or negative. If they anticipate a positive outcome, they sample the option, if they anticipate a negative outcome, they refrain from sampling it. The internally generated feedback fills the gaps if external feedback is not available. This explains why knowledge about options oftentimes reflect their actual values quite accurately even if not or infrequently sampled, but it is also in line with the notion that false negative evaluations of choice options are not easily corrected because they are not sampled (Denrell, ). Applied to our paradigm, it is possible that participants anticipate positive or negative pairings and from this inform their decision whether or not to sample a CS, respectively. The

https://doi.org/10.1017/9781009002042.008 Published online by Cambridge University Press



 u¨,   

anticipation of a pairing may itself act as an internally generated US that contributes to the evaluation of the CS. Thus, imagining a positive US contributes to a positive evaluative shift in the CS and the decision to sample that CS, leading to a positive correlation between sampling frequency and evaluative shift. This account can also explain the negative evaluative shift observed in unsampled CSs. Specifically, the decision not to sample is based on the expectation that the pairing will be negative, leading to both a negative evaluative shift and the decision not to sample the CS. The second explanation rests on the assumption that the interpretation of the pairings can be altered as a function of sampling. That is, given that participants decide to sample a stimulus, their interpretation of the US valence may be more positive than for participants who did not request to see the pairing. For instance, if participants are shown a negative pairing for a CS they actively sampled, they may justify their decision to sample the stimulus by revaluating the negative US (e.g., an image of a cemetery may be perceived as peaceful rather than threatening) or the relationship between the CS and the US (e.g., the CS may prevent rather than cause a certain negative event). This is in line with dissonance principles of attitude change (Festinger, ), which assume that discrepancies between cognitions are aversive and require resolving. An open research question concerns whether such influences on the interpretation of the US also occur when there is a fit between the sampling decision and US valence (e.g., when the decision to sample a CS results in a positive pairing). In line with approach-avoidance learning (e.g., Hu¨tter & Genschow, ; Wiers et al., ), it is possible that (under default conditions) the decision to sample is in itself a positive act that colors the evaluation of the CS and the interpretation of the pairing. Approachavoidance learning could also be responsible for the negative evaluative shift observed in stimuli that were never sampled. Finally, being prepared to receive (i.e., requesting) a pairing may increase the subjective ease with which the CS–US pairing is processed, giving rise to positive feelings about the pairing episode which may be associated with the CS. This explanation is similar to effects of processing fluency demonstrated in various domains of judgment and decision making and often used as an explanation for effects of mere exposure on liking. For instance, Winkielman and Cacioppo () argued that high processing fluency leads to positive affective reactions, including feelings of familiarity and safety. Indeed, fluency effects were demonstrated to contribute to standard EC effects (Landwehr et al., ). These positive

https://doi.org/10.1017/9781009002042.008 Published online by Cambridge University Press

Sampling as Preparedness in Evaluative Learning



effects of processing fluency may be particularly strong in a paradigm that allows participants to sample their own set of pairings in line with the notion of Thurstonian sampling (Juslin & Olsson, ; Prager and colleagues Chapter  this volume, and also Prager et al., ). People’s sampling behavior can be shaped by various motivations and goals. In addition to hedonic goals to maximize positivity, epistemic goals to learn about one’s environment can exert an influence on the sample that participants create. Thus, an interesting class of questions to consider is not only how autonomous sampling affects evaluative learning processes, but also what impact the motivations driving people’s sampling behavior have on evaluative learning. One experiment provided initial insight to this question by instructing different goals during the sampling phase (Hu¨tter et al., , Experiment ). Specifically, participants were either given a hedonic goal, an epistemic goal, or no explicitly instructed goal. In line with previous work on the role of processing goals (e.g., Corneille et al., ), an epistemic goal resulted in stronger EC effects, thereby documenting a processing advantage of being ready to process a given stimulus pair. Interestingly though, sampling a CS more frequently resulted in a more positive evaluative shift in all three motivation conditions, suggesting that approach (vs. avoidance) behavior is a universal cue to liking (vs. disliking). Thus, this study provides a nuanced picture regarding the role of sampling goals on evaluative learning. However, future research could benefit from including yoked passive conditions that receive the same goal instructions. For instance, is having an epistemic goal alone enough to produce stronger EC effects, or is it important that participants actually have the autonomy to use this goal to inform their sampling behavior? Similarly, future research could explore how providing participants with sampling goals unrelated to the pairings (instructing participants to try to sample the faces evenly, providing an additional cost or incentive for sampling certain faces, varying the extent to which features of groups of CSs tend to predict positive or negative pairings, etc.) influences the current effects. Thus, the sampling approach provides a rich opportunity for exploring not only how sampling shapes evaluative learning, but also the effects that the motivations underlying sampling might have. In EC research, it has long been acknowledged that different processes may lead to EC effects. Thus, the present view raises the question as to whether preparedness could result in certain mechanisms underlying EC effects. The field has seen much debate about which processes lead to evaluative shifts in the CS, and in particular whether CS evaluations could be acquired in an automatic manner (see Corneille & Stahl, , for an

https://doi.org/10.1017/9781009002042.008 Published online by Cambridge University Press



 u¨,   

overview of this debate). It seems that sampling makes some processes more likely than others. For instance, it seems unlikely that a process like implicit misattribution of affect could lead to EC effects in this setting where the CS precedes the US and the US is explicitly stated as the source of affect (Jones et al., ). By contrast, the paradigm allows for the explicit testing of hypotheses about the CSs and possibly facilitates reasoning about the relationship between the CS and US valence. This latter notion is akin to a propositional process that encodes the relationship between the CS and US valence in a way that justifies criteria of truth (e.g., Gawronski & Bodenhausen, ; De Houwer, ). Importantly, however, the preparedness perspective we take in this chapter should not be considered a variant of the propositional account of evaluative learning. Propositional accounts, such as the Integrated Propositional Model (De Houwer, ) assume that pairings need to be processed deliberately (and outside the scope of automatic processing) in order to exert effects on evaluations (cf. De Houwer, , ). We believe that flexibility in learning might not be restricted to propositions and we also do not exclude the possibility of automatic evaluative learning. To the degree that learning is well prepared, depending on affordances of the environment and the individual, learning could demonstrate features of automaticity, in line with the original conceptualization of preparedness (Seligman, ).

. Conclusion and Outlook Our preparedness view assumes that stimulus pairings are not mere pairings but that they are enriched with self-generated information. Giving participants an opportunity to sample stimulus pairings at their own will carries this notion further, making visible which stimuli participants are ready and willing to receive, and whether the impact of these stimuli changes as a function of sampling. We provide evidence that the traditional passive paradigm and a paradigm that gives participants autonomy over the stimulus pairings they want to see at a certain point in time diverge in a number of ways. In many cases, research profits from connecting to other fields. The present work offers an integrative perspective based on the sampling approach on the one hand, and a preparedness account of evaluative learning on the other hand (see also Hu¨tter & Fiedler, ). That is, our adapted EC paradigm connects to various basic learning paradigms, going beyond traditional EC procedures. For instance, parallels can be drawn to evaluative learning paradigms that rely on participants executing

https://doi.org/10.1017/9781009002042.008 Published online by Cambridge University Press

Sampling as Preparedness in Evaluative Learning



approach and avoidance behaviors toward presented stimuli (e.g., Fazio et al., ). Moreover, per default sampling a stimulus can be construed as approaching this stimulus. Other than in typical approach-avoidance training tasks (e.g., Kawakami et al., ), however, approaching a stimulus is followed by positive or negative consequences, depending on the US valence assigned to the CS (cf. Van Dessel et al., ). Giving participants a chance to sample also introduces an element of operant conditioning (Skinner, ). From this perspective, positive USs serve as rewards that strengthen the association between a CS and approach behavior, while negative USs function as punishments that weaken this association. Moreover, under the perspective of preparedness, the sampling approach creates points of contact with the literature on processing fluency (e.g., Landwehr & Eckmann, ; Winkielman & Cacioppo, ). At the present stage, the sampling approach to evaluative learning may raise more questions than it answers. However, we find these questions intriguing and highly relevant for theory and practice. The sampling approach relates evaluative conditioning to a new class of real-world situations in which people choose the entities they interact with and creates numerous links to established psychological theory. Further, it constitutes a vehicle for future theoretical and methodological development in the field of evaluative learning. REF ERE NCE S Albrecht, S., & Carbon, C. C. (). The fluency amplification model: Fluent stimuli show more intense but not evidently more positive evaluations. Acta Psychologica, , –. Baeyens, F., Eelen, P., Crombez, G., & Van den Bergh, O. (). Human evaluative conditioning: Acquisition trials, presentation schedule, evaluative style and contingency awareness. Behaviour Research and Therapy, (), –. Bar-Anan, Y., & Dahan, N. (). The effect of comparative context on evaluative conditioning. Social Cognition, , –. Bem, D. J. (). Self-perception theory. Advances in Experimental Social Psychology, (), –. Cooper, J. (). Personal responsibility and dissonance: The role of foreseen consequences. Journal of Personality and Social Psychology, (), –. Corneille, O., & Stahl, C. (). Associative attitude learning: A closer look at evidence and how it relates to attitude models. Personality and Social Psychology Review, (), –. Corneille, O., Yzerbyt, V. Y., Pleyers, G., & Mussweiler, T. (). Beyond awareness and resources: Evaluative conditioning may be sensitive to processing goals. Journal of Experimental Social Psychology, , –.

https://doi.org/10.1017/9781009002042.008 Published online by Cambridge University Press



 u¨,   

Davey, G. C. L. (). Is evaluative conditioning a qualitatively distinct form of classical conditioning? Behaviour Research and Therapy, (), –. De Houwer, J. (). The propositional approach to associative learning as an alternative for association formation models. Learning and Behavior, , –. (). Propositional models of evaluative conditioning. Social Psychological Bulletin, (), e. Delprato, D. J. (). Hereditary determinants of fears and phobias: A critical review. Behavior Therapy, , –. Denrell, J. (). Why most people disapprove of me: Experience sampling in impression formation. Psychological Review, , –. Elwin, E., Juslin, P., Olsson, H., & Enkvist, T. (). Constructivist coding: Learning from selective feedback. Psychological Science, , –. Fan, X., Bodenhausen, G. V., & Lee, A. Y. (). Acquiring favorable attitudes based on aversive affective cues: Examining the spontaneity and efficiency of propositional evaluative conditioning. Journal of Experimental Social Psychology, , Article . Fazio, R. H., Eiser, J. R., & Shook, N. J. (). Attitude formation through exploration: valence asymmetries. Journal of Personality and Social Psychology, , –. Festinger, L. (). A theory of cognitive dissonance. Palo Alto, CA: Stanford University Press. Fiedler, K. (). What constitutes strong psychological science? The (neglected) role of diagnosticity and a priori theorizing. Perspectives on Psychological Science, , –. Field, A. P., & Davey, G. C. L. (). Reevaluating evaluative conditioning: A nonassociative explanation of conditioning effects in the visual evaluative conditioning paradigm. Journal of Experimental Psychology: Animal Behavior Processes, (), –. Gawronski, B., & Bodenhausen, G. V. (). Associative and propositional processes in evaluation: An integrative review of implicit and explicit attitude change. Psychological Bulletin, , –. (). Evaluative conditioning from the perspective of the associativepropositional evaluation model. Social Psychological Bulletin, , e. Hofmann, W., De Houwer, J., Perugini, M., Baeyens, F., & Crombez, G. (). Evaluative conditioning in humans: A meta-analysis. Psychological Bulletin, (), –. Hughes, S., Ye, Y., & De Houwer, J. (). Evaluative conditioning effects are modulated by the nature of contextual pairings. Cognition & Emotion, (), –. Hu¨tter, M., & Fiedler, K. (). Editorial: Conceptual, theoretical, and methodological challenges in evaluative conditioning research. Social Cognition, (), –. Hu¨tter, M., & Genschow, O. (). What is learned in approach-avoidance tasks? On the scope and generalizability of approach-avoidance effects. Journal of Experimental Psychology: General, (), –.

https://doi.org/10.1017/9781009002042.008 Published online by Cambridge University Press

Sampling as Preparedness in Evaluative Learning



Hu¨tter, M., Niese, Z. A., & Ihmels, M. (). Bridging the gap between autonomous and predetermined paradigms: The role of sampling in evaluative learning. Journal of Experimental Psychology: General, (), –. Hu¨tter, M., & Sweldens, S. (). Dissociating controllable and uncontrollable effects of affective stimuli on attitudes and consumption. Journal of Consumer Research, , –. Jones, C. R., Fazio, R. H., & Olson, M. A. (). Implicit misattribution as a mechanism underlying evaluative conditioning. Journal of Personality and Social Psychology, , –. Juslin, P., & Olsson, H. (). Thurstonian and Brunswikian origins of uncertainty in judgment: A sampling model of confidence in sensory discrimination. Psychological Review, , –. Kawakami, K., Phills, C. E., Steele, J. R., & Dovidio, J. F. (). (Close) distance makes the heart grow fonder: Improving implicit racial evaluations and interracial interactions through approach behaviors. Journal of Personality and Social Psychology, , –. Landwehr, J. R., & Eckmann, L. (). The nature of processing fluency: Amplification versus hedonic marking. Journal of Experimental Social Psychology, , . Landwehr, J. R., Golla, B., & Reber, R. (). Processing fluency: An inevitable side effect of evaluative conditioning. Journal of Experimental Social Psychology, , –. Lang, P. J., Bradley, M. M., & Cuthbert, B. N. (). International affective picture system (IAPS): Affective ratings of pictures and instruction manual. Technical Report A-. Gainesville: University of Florida. Le Mens, G., & Denrell, J. (). Rational learning and information sampling: On the “naivety” assumption in sampling explanations of judgment biases. Psychological Review, (), –. Levey, A. B., & Martin, I. (). Classical conditioning of human “evaluative” responses. Behaviour Research and Therapy, (), –. Marchewka, A., Żurawski, Ł., Jednoróg, K., & Grabowska, A. (). The Nencki Affective Picture System (NAPS): Introduction to a novel, standardized, wide-range, high-quality, realistic picture database. Behavior Research Methods, (), –. Martin, I., & Levey, A. (). The evaluative response: Primitive but necessary. Behaviour Research and Therapy, (), –. Martin, I., & Levey, A. B. (). Evaluative conditioning. Advances in Behaviour Research and Therapy, (), –. Olson, M. A., & Fazio, R. H. (). Implicit attitude formation through classical conditioning. Psychological Science, , –. Pearce, J. M., & Hall, G. (). A model for Pavlovian learning: Variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychological Review, , –. Prager, J., Fiedler, K., & McCaughey (). Thurstonian uncertainty in selfdetermined judgment and decision making. In Klaus Fiedler, Peter Juslin, &

https://doi.org/10.1017/9781009002042.008 Published online by Cambridge University Press



 u¨,   

Jerker Denrell (Eds.), Sampling in Judgment and Decision Making (pp. –). Cambridge: Cambridge University Press. Prager, J., Krueger, J. I., & Fiedler, K. (). Towards a deeper understanding of impression formation: New insights gained from a cognitive-ecological perspective. Journal of Personality and Social Psychology, , –. Rescorla, R. A., & Wagner, A. R. (). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. –). New York: Appleton-Century-Crofts. Seligman, M. P. (). On the generality of the laws of learning. Psychological Review, , –. Seligman, M. E. (). Phobias and preparedness. Behavior Therapy, (), –. Skinner, B. F. (). The behavior of organisms: An experimental analysis. Oxford: Appleton-Century. Sweldens, S., van Osselaer, S. M. J., & Janiszewski, C. (). Evaluative conditioning procedures and the resilience of conditioned brand attitudes. Journal of Consumer Research, , –. Unkelbach, C., & Fiedler, K. (). Contrastive CS–US relations reverse evaluative conditioning effects. Social Cognition, , –. Van Dessel, P., Hughes, S. J., & De Houwer, J. (). Consequence-based approach-avoidance training: A new and improved method for changing behavior. Psychological Science, (), –. Walther, E., & Grigoriadis, S. (). Why sad people like shoes better: The influence of mood on the evaluative conditioning of consumer attitudes. Psychology & Marketing, , –. Wiers, R. W., Eberl, C., Rinck, M., Becker, E. S., & Lindenmeyer, J. (). Retraining automatic action tendencies changes alcoholic patients’ approach bias for alcohol and improves treatment outcome. Psychological Science, , –. Winkielman, P., & Cacioppo, J. T. (). Mind at ease puts a smile on the face: Psychophysiological evidence that processing facilitation increases positive affect. Journal of Personality and Social Psychology, , –.

https://doi.org/10.1017/9781009002042.008 Published online by Cambridge University Press

 

The Dog that Didn’t Bark Bayesian Approaches to Reasoning from Censored Data Brett K. Hayes, Saoirse Connor Desai, Keith Ransom and Charles Kemp Police Detective: “Is there any other point to which you would wish to draw my attention?” Sherlock Holmes: “To the curious incident of the dog in the nighttime.” Police Detective: “The dog did nothing in the night-time.” Sherlock Holmes: “That was the curious incident.” Silver Blaze, Sir Arthur Conan Doyle ()

Inductive reasoning involves observing samples of evidence and inferring whether sample properties generalize to novel cases. Samples can take the form of past experience (e.g., when you visit your neighborhood café on your way to work you always get an excellent latte) or facts obtained from others (e.g., learning that kangaroos, koalas, and wombats are all Australian marsupials). A long tradition of induction research, dating back to the seminal works of Rips () and Osherson et al. (), explains the inference process in terms of the sample contents. Such work has revealed many important principles that guide inductive generalization. These include the similarity of the sample to the target population, the size of the sample, sample diversity, and the typicality of sample instances (see Feeney,  for a review). We argue that this is only part of the story. A further crucial component is knowledge of the sampling process – how the evidence in the sample was selected. Like many others (e.g., Denrell, ; Fiedler et al., ; Hogarth et al., ), we believe that much of the evidence that we observe is subject to some sort of selection mechanism or data censoring – increasing the likelihood of observing some types of instances but excluding others. Your morning visits to the local café, for example, may coincide with the work shift of the café’s most talented barista leading to an overly positive impression of coffee quality at the venue. Likewise, the conspicuous absence of evidence from a sample can have implications for the inferences we draw, as in our opening Holmesian example. 

https://doi.org/10.1017/9781009002042.009 Published online by Cambridge University Press



 . ,    .

The key question addressed in this chapter is under what circumstances people consider such selection mechanisms when making inferences from samples of evidence. Previous work has suggested that people often fail to adjust for sampling biases when learning contingencies between events and making probabilistic judgments (Fiedler, ). The research reviewed in this chapter focuses primarily on the impact of sample selection on inferences in property induction. In property induction tasks, people learn a novel property of sample instances (e.g., lions, chimps, and dolphins have property X) and are asked to judge the likelihood that the property generalizes to other instances (e.g., mammals). As will be apparent, this work leads to a somewhat more optimistic conclusion about human sensitivity to the impact of sampling mechanisms. The chapter is organized as follows. The first section reviews research on property induction where people are presented with a given sample of evidence but provided with different beliefs or “sampling assumptions” about how the evidence was generated. A key finding is that consideration of these assumptions can change the way people generalize properties from the sample. We show that this process is well described by a Bayesian model of inference that incorporates a sample selection process. The second section describes recent work that tests the boundaries of these conclusions. We examine whether sensitivity to sample selection extends to more complex types of induction (e.g., where more than one sampling mechanism needs to be considered). This section also addresses important questions for the development of process-models of inference such as whether beliefs about sampling processes impact the encoding or retrieval of sample information. We also consider how individuals may differ in their sensitivity to sampling processes in induction. In the final section, we discuss how our approach may be extended beyond property-induction tasks. We conclude with a consideration of future targets and challenges for research on reasoning with biased samples.

. The Role of Sampling Assumptions in Property Induction The key question addressed in this section is whether people factor in sample selection when making property inferences. Two related lines of research bear on this issue. The first examines sampling processes as a form of social cognition; do people reason differently when they believe that a sample of data was selected by an intentional agent (with either helpful or deceptive motives) or generated randomly? The second line of research considers how inferences are impacted by “sampling frames”;

https://doi.org/10.1017/9781009002042.009 Published online by Cambridge University Press

The Dog that Didn’t Bark



environmental mechanisms that determine which instances are included or excluded from a sample of evidence. .. Inductive Inference as Social-Cognition: Assumptions about Intentionality One approach to examining such issues is to present identical reasoning problems to different groups but manipulate beliefs about whether the samples were selected by an intentional agent. Such studies have most frequently contrasted the effects of “strong sampling” by an agent who knows how far some interesting property extends with “weak sampling” where sample contents are generated independently of the property in question. To illustrate the basic idea, imagine that you are playing a game in which you have to reason about a concept given a set of positive examples (e.g., , , ) and to infer whether other numbers (e.g.,  and ) are also positive examples. The three examples were either chosen as positive examples by a teacher (strong sampling) or chosen by a student who does not know the concept and are subsequently labeled as positive by the teacher (weak sampling). Strong sampling suggests that the concept is probably “powers of ” and provides a clear basis for making an inference (“ is probably a positive example but  is not”). In contrast, weak sampling supports no clear conclusion, and the underlying concept could equally well be “powers of ,” “even numbers,” or “all numbers between  and .” In the domain of property induction, several studies have examined how well-known inferential heuristics (see Hayes & Heit,  for a review) are impacted when such sampling assumptions are manipulated. One such heuristic is the “diversity principle” (e.g., Feeney & Heit, ; Liew et al., ; Osherson et al., ), whereby samples that are composed of relatively dissimilar items (e.g., lions and dolphins) that share some novel property, are generally seen as a stronger basis for generalizing the property (e.g., to other mammals) than samples composed of similar instances (e.g., lions and tigers). A study carried out in our laboratory, however, showed that the strength of this effect is dependent on beliefs about how the sample was generated (Hayes, Navarro et al., ). In this study all learners were presented with samples that varied in diversity. On some trials, participants learned that a relatively diverse sample of items (e.g., octopi, eels, trout) shared a novel biological property. On other trials, the sample of evidence was less diverse (e.g., sardines, herring, anchovies). In each case, participants were asked to

https://doi.org/10.1017/9781009002042.009 Published online by Cambridge University Press



 . ,    .

judge the likelihood that the novel property generalized to a broader category (e.g., sea creatures). Crucially, alternative explanations about the sampling process were provided to different groups. A strong-sampling group were told that the samples of evidence were deliberately chosen “to best illustrate the variety of living things that have the property.” A weaksampling group were told that the sample items were selected via a quasirandom process (“We asked a student to open a book on plants and animals at three random pages and note the three living things they came across and whether or not those living things have the property in question”). Although diverse samples were still generally viewed as a stronger basis for property generalization, the effect of sample diversity was greatly attenuated in the weak sampling group. This result suggests that the inference process involves a consideration of both the sample contents and the circumstances under which the sample was obtained. Such effects of strong as compared to weak sampling assumptions have been shown to have considerable generality. Learners’ assumptions about strong or weak sampling have also been shown to mediate the effects of sample size (Navarro et al., ; Ransom et al., ) and negative evidence (Voorspoels et al., ) in property inference tasks. More broadly, applying different sampling assumptions to an evidence sample can lead to shifts in epistemic trust (Mascaro & Sperber, ), changes in pragmatic implicatures (Goodman & Frank, ), and promotion or suppression of exploratory learning (Shafto et al., ). Another interesting finding from this work is that when no information about sample generation is supplied, people often make inferences that reflect a strong sampling assumption. It appears that experimental participants often make the default assumption that training stimuli have been selected by the experimenter to illustrate the features most relevant to the inference that needs to be made (cf. Medin et al., ). This work reinforces the view that inductive reasoning typically occurs in a social context. Hence, the perceived intentions of those who supply us with samples of evidence are as important as the sample contents. Of course, not all of those who present us with evidence are intending to be helpful. Some may select the evidence with the goal of persuading us about the truth or falsity of a particular claim (Mercier, ; Mercier & Sperber, ) or even to deceive us (Franke et al., ; Rhodes et al., ). So far there has been relatively little work on the effects of “deceptive” sample selection on inductive inference. Preliminary work on related tasks (e.g., Ransom et al., ) however, suggests two interesting conclusions. First, in many contexts, those tasked with deceiving others

https://doi.org/10.1017/9781009002042.009 Published online by Cambridge University Press

The Dog that Didn’t Bark



will often employ samples of evidence that are partially consistent with the actual data (referred to as “misleading samples”) rather than samples that contradict other available evidence or that provide no relevant information. Second, the success of this deception is amplified if the person receiving the information believes that it comes from a helpful source (e.g., someone who is believed to be a member of the same team in a competitive game). ..

Sampling Frames: When Is Absence of Evidence, Evidence of Absence?

Although communicative and social intentions play an important role in sample selection, in other cases it is simply the structure of the learning environment that produces biased samples. Consider for example, a human resources manager who wants to identify the positive attributes of potential employees that should be targeted in future hiring. One strategy that the employer might follow would be to examine the attributes of past or current employees who have performed well. Although this strategy has some appeal, some caution is needed. The inferences based on the current sample may well be affected by limitations on the original sample selection process; it may have been subject to gender, age, or locality biases that biased the sample composition. Borrowing from the statistics literature, we refer to such environmental constraints as “sampling frames” (cf. Jessen, ). There are a wide variety of such frames – they could reflect biases that lead to the selection of instances with certain attributes or they could reflect resource limitations (e.g., there was insufficient time to obtain a large and representative sample). We recommend David Hand’s book Dark Data () for an extensive catalogue of frames that constrain sample selection in both everyday reasoning and in scientific inference. A common consequence of such frames is that they lead some types of instances to be included in a sample but exclude others. A key question for research on human reasoning is to determine how much consideration is given to these constraints when drawing inferences from the sample. In particular, do people see the absence of certain types of evidence as consequential for inference, as in the case of our titular dog? Some researchers (e.g., Hearst, ; Kahneman, ) have expressed skepticism about our ability to use such absent evidence to draw meaningful conclusions. Hearst (), for example, suggests that “human beings and other animals have trouble using the mere absence of something as a basis for efficient and appropriate processing of information” (pp. –).

https://doi.org/10.1017/9781009002042.009 Published online by Cambridge University Press



 . ,    .

We (Hayes et al., , ) and others (Hsu et al., ; Lawson & Kalish, ) have studied this issue by examining whether people’s sample-based inferences depend on their knowledge of frames that constrain sample selection. In the paradigm employed in many of our studies, participants are asked to imagine that they are exploring an extraterrestrial planet searching for rocks that have the valuable mineral “plaxium.” Prior to sampling they are informed that there is a wide distribution of rock sizes on the planet. In the subsequent sampling phase however, they only see samples of small plaxium-positive rocks. After viewing the sample, participants are asked to infer whether plaxium is likely to be found in other rocks of varying sizes. Crucially, prior to the sample phase, we vary instructions about frame constraints on sample selection. Those in a category frames condition are told that only certain types of rocks could be sampled (e.g., because the robot that collected them used a small claw only small rocks were permitted). Those in a property frame however, are told that the sample was collected on the basis of a positive screening test for the novel property (e.g., the robot had a plaxium-detecting camera and collected any samples that showed traces of the mineral). The different frames provide different explanations for the absence of larger rocks from the sample. In the category condition, this is entirely attributable to the frame; hence, the absence of large rocks from the sample is not informative regarding property generalization. The property frame however, does permit large rocks to be sampled. The fact that they do not appear is therefore informative – it suggests plaxium may only be a property of smaller rocks. In this case, the absence of certain types of evidence from the sample is evidence of absence. To test these predictions, a recent study presented different groups with either a category or property frame. Each group then observed the same sample of small rocks with plaxium. Finally, participants rated the likelihood that the target property generalized to other rocks of varying sizes (see Figure .). As in most previous studies of human inference, property generalization in both frame conditions was a decreasing function of the similarity between the test items and the observed sample (cf. Osherson et al., ). Crucially however, sampling frames affected these generalization gradients. As predicted, those in the property sampling group were less likely to generalize to large rocks than those in the category sampling condition. Hayes et al. () showed that the magnitude of this “frames effect” depends on the size of the sample observed under the relevant frame. Increasing the size of the observed sample leads to a further “tightening” or

https://doi.org/10.1017/9781009002042.009 Published online by Cambridge University Press



The Dog that Didn’t Bark

Mean Generalization Rating

10 9 8 7 6 5 4 3 2 1 S1

S2

T1

T2

T3

T4

Test Item Category Frame

Property Frame

Figure . Property generalization ratings demonstrating the “frames effect.” Test items S–S were small rocks drawn from the training sample. T–T are novel rocks of increasing size. A Bayesian mixed-model analysis of variance (N ¼ ) found strong evidence of an effect of sampling frame, BF ¼ :.

reduction in generalization under property as compared with category sampling. This result makes intuitive sense. Under category sampling, seeing more data of the same type is uninformative. Under property sampling, discovering that additional instances with a novel property belong to the same category (e.g., small rocks) is further evidence that the property is restricted to this category. In the next section we show that these intuitions can be formalized in a Bayesian model of inference. ..

A Bayesian Model of Inference with Biased Samples

Bayesian models of human inference and decision making are not new – dating back at least to the seminal work of Edwards () and Peterson and Beach (). In the past two decades, such models have undergone something of a renaissance, being applied to a wide variety of “high-level” cognition tasks including deductive reasoning (Oaksford & Chater, ), causal inference (Griffiths & Tenenbuam, ), categorization (Sanborn et al., ), social cognition (Baker et al., ), and decision making

https://doi.org/10.1017/9781009002042.009 Published online by Cambridge University Press



 . ,    .

(Vul et al., ). It is only relatively recently however, that such models have addressed the question of how human inference is affected by mechanisms that affect sample selection. Bayesian models assume that human inference can be described as a form of statistical inference. The learner approaches a property induction task with a prior distribution of possible hypotheses P(h) about how far a property p extends (e.g., only small rocks have plaxium, small and medium rocks have plaxium, all rocks have plaxium). They observe whether the property is present in sample instances x and apply a theory about the world that specifies the probability P(x|h) of observing data x if hypothesis h were true. Tenenbaum and Griffiths () highlighted the fact that this Bayesian approach naturally incorporates the learner’s beliefs about the sampling process. Specifically, they suggested that different likelihood functions apply depending on whether observed data was sampled via weak or strong sampling. Notably, under strong sampling, a “size principle” applies such that smaller or more specific hypotheses that are consistent with a given sample of data are generally more probable than broader hypotheses. In our earlier number game example, the hypotheses “powers of two,” and “even numbers” both fit the data. The size principle captures the intuition that the narrower hypothesis seems more likely given this particular set of numbers. Similar to the approach taken with weak and strong sampling, we assume that the probability of observing sample data x given hypothesis h is dependent on a frame or censoring function s, which allows some types of instances to be observed in the sample but excludes others (Hayes et al., ). Hence, after observing the sample, the posterior probability of a hypothesis about property extension, can be represented by Equation .. P ðhjx; s Þ / P ðxjh; s ÞP ðhÞ

ð7:1Þ

To capture the effects of different sampling frames we apply different likelihood functions. Under category sampling, the fact that all observations belong to the same category (e.g., small rocks) is of no evidentiary value for hypotheses about property extension to other categories, because of the sampling frame. This doesn’t change as additional category members are observed in the sample. However, in this sampling regime we need to explain why all sample observations are positive. We assume that the 

We present a simplified formal version of our model. More complex versions incorporate a function learning mechanism that makes more precise predictions about property generalization over continuous dimensions (Hayes et al., ).

https://doi.org/10.1017/9781009002042.009 Published online by Cambridge University Press

The Dog that Didn’t Bark



probability that a sample instance x from the relevant category has the property is θ (e.g., the probability that a small rock has plaxium), with this value assumed to be >.. The resulting likelihood function for category sampling is given in Equation .. 

P ðx j h, sCATEGORY Þ ¼

θ

if x 2 h

ð  θ Þ

otherwise

(7.2)

Under property sampling, however, the pattern is reversed: the sampling frame means that only observations with target property p can be observed and no explanation for this is required. Instead, what the learner must explain is the fact that all observed items belong to the same category. Assuming the same value of θ, the likelihood function can be described by Equation . where |h| represents the “size” of the target hypothesis. In this case, for a hypothesis that predicts m categories to have the target property (“plaxium positive”) and n categories to lack the property (“plaxium negative”), |h| is given by Equation .. Larger values of |h| reflect hypotheses that many categories (e.g., small, medium, and large rocks) share the target property; small values of |h| reflect hypotheses that the property is restricted to a subset of these. 8 θ > > < jhj P ðx j h, s PROPERTY Þ / > ð  θÞ > : jhj

if x 2 h (7.3) otherwise

j h j ¼ θm þ ð  θÞn

(7.4)

This model predicts the differences in generalization under the respective frames shown in Figure .. It also predicts that, under property sampling, further tightening of generalization will take place as more instances belonging to a single category are observed (see Hayes et al., ; Ransom et al.,  for further details). An extended version of the model (Hayes et al., ) leads to other predictions that have been confirmed in empirical studies. For example, the model predicts that the frames effect will be modulated by knowledge of the base rates of the instances observed during sampling (e.g., small rocks) and the instances that are missing from the sample (e.g., large rocks). If instances of the unobserved category are rare then, in property sampling, this provides an alternative explanation for their absence. The result is an attenuation of the differences in generalization between category and property sampling frames.

https://doi.org/10.1017/9781009002042.009 Published online by Cambridge University Press



 . ,    .

A further model prediction concerns the effects of observing negative evidence (e.g., large rocks with no plaxium). If such evidence is observed after initial observation of property-positive instances (e.g., small rocks with plaxium), the model correctly predicts that there will be more tightening of generalization under category sampling than property sampling (Hayes et al., ). Overall, Hayes et al. () found a correlation of over . between Bayesian model predictions and the property generalization results of their experiments.

. Testing the Boundaries of the Bayesian Approach The previous section highlights the success of the Bayesian approach in modeling how people make inferences under various types of selective sampling. So far, this work suggests that people generally conform to Bayesian principles, giving due consideration to sampling mechanisms when judging how far a property generalizes. However, given the many published cases of failure to adjust for biased sampling in probabilistic inference (e.g., Fiedler et al., , ; Hamill et al., ) it is important to test the boundaries of the Bayesian approach. This section describes recent work that attempts to do this in two ways. First, we add elements to the sampling frames paradigm that increase the complexity of the task. This represents a modest initial step toward simulating the complexity of the inference process faced by reasoners outside the laboratory. Second, we examine more fine-grained aspects of inference based on selective sampling that are not currently part of the Bayesian model – this work examines when sensitivity to sampling frames takes place and who shows such sensitivity. ..

Inferences in More Complex Sampling Environments

A question that arises from the earlier work on sampling frames is whether people will show sensitivity to frames in an inference task that more closely resembles conditions outside the laboratory. We therefore carried out an experiment which followed the structure of previous sampling frames studies but which used a more concrete and realistic scenario involving forensic investigation. Participants were asked to imagine that they were investigators examining the extent of environmental contamination following leakage of a dangerous chemical (“chemical X”) from a factory (see Figure .A). They observed evidence of contamination under one of two sampling conditions.

https://doi.org/10.1017/9781009002042.009 Published online by Cambridge University Press

The Dog that Didn’t Bark



In the category sampling condition, only inhabitants of Town A, which was in close proximity to the factory, were sampled. All of those sampled showed evidence of chemical contamination. In the property sampling condition, an identical sample of contaminated individuals from Town A was observed. In this case, however, the frame instructions indicated that towns at all distances from the factory could have been sampled and that those included in the observed sample were “people who already showed signs of exposure to Chemical X.” The question was whether those given different frames would differ in their inferences about whether more distal towns were likely to have been contaminated. Figure .B shows that the frames did affect generalization in a similar way to our earlier studies. For those in the property frame condition, the absence of people outside Town A in the sample was conspicuous, causing them to be less likely to infer contamination of more remote towns. The contamination study suggests that people understand the different implications of category and property frames and apply this insight when making inferences across a range of contexts. In everyday reasoning however, people may have to deal with more than one mechanism that affects the sample selection process. For example, a person moving to a new city and searching online for information about available apartments may begin filtering their search based on location, and then switch to a different filter based on price. Because the search criteria are not perfectly correlated, each search will produce somewhat different samples, which may license different conclusions about the actual distribution of available accommodation. The question arises whether people remain sensitive to the implications of sampling frames for generalization when they have to shift between frames. To examine this issue, we carried out a “frame switching” study in which all participants first observed a sample under category sampling and then a similar sample under either category or property sampling. The cover story resembled the earlier extraterrestrial exploration scenario except that, in this case, participants observed a sample of ten rocks collected by the robot who could only collect small rocks and then another ten rocks collected by the same robot (no switch condition) or the robot with the plaxium-sensitive camera (switch condition). Recall that under property but not category sampling, observing a sample of small rocks should lead to tightening of property generalization. The question is whether the switch group would show this effect after being exposed to the less informative category frame.

https://doi.org/10.1017/9781009002042.009 Published online by Cambridge University Press



 . ,    .

Figure . The generalization space (A) and property generalization results (B) for the environmental contamination study with category and property framing. A Bayesian analysis of variance (N ¼ ) found strong evidence of an effect of sampling frames, BF ¼ ..

https://doi.org/10.1017/9781009002042.009 Published online by Cambridge University Press



The Dog that Didn’t Bark 10

Mean Generalization Rating

9 8 7 6 5 4 3 2 1 S1

S2

T1

T2

T3

T4

Test Item Category-Category (NO SWITCH)

Category-Property (SWITCH)

Figure . Property generalization with a consistent category frame (No Switch) and a combination of category and property frames (Switch). S–S were small rocks drawn from the training sample. T–T are novel rocks of increasing size. A Bayesian mixed-model analysis of variance (N ¼ ) found moderate evidence of a difference between no-switch and switch conditions, BF ¼ ..

Figure . shows the results for a property generalization test carried out after both samples had been observed. The figures show that the switch group showed tighter generalization than the no-switch group, although the Bayes factor indicates that the size of this effect was smaller than in previous studies. Nevertheless, it appears that at least some people are capable of switching between frames – when switching from a less informative to a more informative sample, people update their intuitions about property generalization accordingly. .. Sensitivity to Sampling Frames: When and Who? The studies reviewed so far have shown that intuitions about generalization from samples collected under different frames often conform to Bayesian predictions. In this section we describe recent work which highlights the need for further theory development – dealing with issues that fall outside the traditional purview of Bayesian models. The first of these concerns the timing of information about sampling frames and the

https://doi.org/10.1017/9781009002042.009 Published online by Cambridge University Press



 . ,    .

resulting effects on property generalization. The Bayesian model described earlier implements frames as different likelihood functions but says nothing about when these are applied during the inference process. At least two alternative accounts are possible. If the relevant frame is presented before observing sample data (as in all studies described so far) this could affect the way that data is encoded. A property sampling frame, for example, could trigger a hypothesis-testing mechanism such that as each sample instance is observed it is used as evidence to evaluate rival hypotheses about property extension, with such evidence accumulating over sampling trials. Under the Bayesian model however, it is also possible that the likelihood function representing the frame could be applied after the data are observed but before generalization judgments are made – that is, the frame could affect data retrieval. These alternatives (which are not mutually exclusive) correspond roughly to the distinction suggested by Fiedler et al. () between “monitoring” and “control” of sample biases. In the former case, adjustments for sample biases occur online during learning. In the latter case, such adjustment is retrospective, taking place after the data have been observed. Ransom et al. () examined this question by varying the timing of sample frames vis-á-vis the observed sample. After observing a small sample of data without any frame, different groups were presented with either a category or property frame. Frames were presented before or after participants observed additional sample instances. Subsequent generalization judgments in the frames-before conditions replicated the effect found in previous studies – with tighter generalization in the property frame as compared with the category frame condition. In the frames-after conditions however, there was no clear evidence of generalization differences between category and property frames. This does not necessarily mean that people are incapable of retrospectively adjusting for biases (cf. Ecker et al., ). It does suggest however, that awareness of such biases before learning takes place is likely to have a more profound effect on generalization judgments. These results highlight the need for further extension of the Bayesian model. They suggest that the processing that corresponds to application of Bayesian likelihoods primarily affects the encoding of sample instances and that accumulation of evidence for rival hypotheses operates throughout the sampling process. All of the sampling frames results discussed so far were based on analyses of generalization judgments made by groups presented with different frames. But it is reasonable to ask what sensitivity to sampling frames looks like at the individual level. Moreover, if individual differences in

https://doi.org/10.1017/9781009002042.009 Published online by Cambridge University Press

The Dog that Didn’t Bark



sensitivity to frames exist, what other variables might predict these differences? Again, these are issues that lie outside the remit of standard Bayesian models of cognition, including our model of sampling frames. Such models often assume some form of Bayesian reasoning is universal, and rarely consider the reasons why some individuals may deviate from Bayesian norms (but see Hemmer et al., ; Zhu et al.,  for exceptions). To examine individual differences in sensitivity to sampling frames, we modified our induction task, so that participants made judgments under both category and property sampling frames (Hayes et al., ). Unlike the earlier “switch” study, each learner was asked to apply category and property sampling frames to different scenarios (e.g., sampling extraterrestrial rocks, sampling animals on an unexplored island), with different target properties attached to each sample. Allocation of scenarios to frames and order of presentation of category and property blocks was counterbalanced across participants, with separate generalization tests after each block. Generalization ratings to novel items in the category and property blocks were averaged over scenario and order. We measured individual sensitivity to sampling frames by subtracting mean generalization ratings for novel items in the property block from corresponding ratings in the category block. A positive score indicates greater tightening of generalization under the property frame as compared with the category frame. The group data (N ¼ ) in this experiment replicated the frames effects found in previous studies, with tighter generalization to novel items under property as compared with category sampling. However, sensitivity to the implications of sampling frames showed considerable individual variation (category-property difference score: median ¼ .; range ¼  to ). While many individuals were quite sensitive, a substantial proportion showed little or no sensitivity to the frames. This impression was reinforced by a cluster analysis which produced two reliable clusters; the largest contained individuals who showed only weak sensitivity to frames effects (median category-property difference ¼ .;  percent of the sample); those in the smaller cluster showed strong sensitivity to frames (median category-property difference ¼ .;  percent of the sample). These individual data suggest that the group-level differences in generalization following category or property sampling may sometimes be “carried” by the responses of less than half the individuals sampled. Clearly, it would be useful to know what factors predict such marked individual differences. As an initial step toward this goal, we asked our participants to complete tests of working memory capacity (Lewandowsky

https://doi.org/10.1017/9781009002042.009 Published online by Cambridge University Press



 . ,    .

et al., ) and the three-item version of the Cognitive Reflection Test (CRT, Frederick, ). We found a positive relationship between working memory capacity and frames sensitivity – those who showed more sensitivity to sampling frames had significantly higher composite scores on the working memory battery than those with low sensitivity. There was a weaker positive relationship between CRT performance and frames sensitivity, which disappeared when we controlled for shared variance between CRT scores and working memory performance. Clearly, further work is necessary to clarify the sources of such individual differences in consideration of sampling constraints when making inferences. A related question is the stability of such differences. Can those who do not spontaneously factor in sampling constraints be trained to do so? As noted, these findings also point to the need for further development of the Bayesian model of inference. Rather than assuming that people share the same priors about property extension, and always modulate their likelihoods as a function of the relevant frame, it may be better to see these as parameters that vary between individuals (see Navarro et al.,  for a related approach that addresses individual differences in responding to weak and strong sampling). Future work could then examine the extent to which experimental manipulations affect the values of these parameters.

. Future Directions The work we have reviewed shows that when people are provided with explicit information about sample generation or constraints on sample selection, many (but not all) factor these into their inferences. Moreover, many do so in a manner consistent with a Bayesian model that incorporates a mechanism for data censoring. In some respects, our work complements previous research showing that people go beyond experienced information in learning and decision making. In particular, when environmental feedback about the success of one’s decisions is missing or withheld, people actively construct such feedback, which in turn guides future decision making (Elwin et al., ; Henriksson et al., ). However, as noted in the introductory comments, there are clearly cases both inside and outside the laboratory where people fail to take adequate account of sampling biases (e.g., Fiedler, ; Feiler et al., ; Hamill et al., ). The discrepancies between such results and those reviewed in this chapter need to be examined carefully. One important issue to resolve in future work is just how much explanation of a selective sampling process

https://doi.org/10.1017/9781009002042.009 Published online by Cambridge University Press

The Dog that Didn’t Bark



is required for people to factor this into their judgments. Previous work has shown that when people are given no explicit cue or warning about the possibility of sampling bias, then such bias is ignored (e.g., Koehler & Mercer, ). Likewise, some types of warnings or explanations (e.g., just telling people that a sample contains information that is biased) have been shown to be ineffective in getting people to adjust for sampling biases (e.g., Hamill et al., ; Ross et al., ) or ignore data that is known to be misleading (Lewandowsky et al., ). By contrast, the studies reviewed here used relatively simple causal explanations for sample selection, involving social or physical mechanisms that would be familiar to most participants (e.g., the size of a claw limits what objects can be collected). A fruitful avenue for future work would be to examine what types of explanations for data censoring people find easiest to comprehend (cf. Lombrozo, ), and whether such transparency predicts whether people adjust for sampling bias in inferences and judgments. Another potentially important factor that differentiates the current work from much previous research on adjustment to sampling biases involves the role of the learner in sampling. In the current work, learners passively observed the same data sample, but applied different beliefs about the mechanisms for data selection. In contrast, much previous work has involved situations where learners make active decisions about which instances they want to sample (e.g., Fiedler et al., ; Le Mens & Denrell, ) – so that learners are exposed to different samples of evidence. For example, Fiedler et al. () describes several studies based on the well-known mammogram problem (Eddy, ; Hayes et al., ), where the goal is to estimate the conditional probability p(cancer| positive mammogram), given information about the base rate of cancer, and conditional probabilities p(positive mammogram| cancer) and p(positive mammogram |no cancer). From a sampling point of view, a good way to address this problem is to follow a strategy of criterion sampling, where you take a sample of women who have a positive mammogram, and examine the prevalence of cancer in these cases. In contrast, predictor sampling, which involves sampling cases conditionalized on having cancer and tabulating positive mammograms, should lead to biased estimates. Fiedler et al. () confirmed these predictions – with those who used predictor sampling making inflated estimates of cancer probability. Unlike the current work, the respective sampling strategies in Fiedler et al. () meant that participants were exposed to different data samples and those doing predictor sampling had to adjust for biases after the sample was observed. The failure to do so effectively is consistent with

https://doi.org/10.1017/9781009002042.009 Published online by Cambridge University Press



 . ,    .

the Ransom et al. () studies which showed that people have more difficulty in applying sampling assumptions retrospectively. That said, it is not immediately clear how the Bayesian model described earlier could be adapted to explain inferences based on different samples of evidence which also involve the interaction of multiple causal variables (e.g., cancer, mammograms), that are subject to censoring under criterion or predictor sampling. Recently though, we have begun to explore how a related formal framework involving causal Bayes nets (cf. Pearl, ; Sloman, ) can be used to understand these more complex examples of sampling bias. Bayes nets are graphical models that specify probabilistic relations between causes and effects, represented as nodes in the network. When at least some information about the relevant probabilities is known (e.g., the base rate of the cancer node, and the likelihood of a positive mammogram given cancer), Bayesian inference can be used to derive other relations in the network. Traditionally, Bayes net approaches have not focused on the problem of selection bias. Building on recent developments in Bayesian modeling (Bareinboim et al., ; Rohrer, ) however, we have begun to develop Bayes net models that incorporate datacensoring mechanisms that only permit sample observations based on a selected value of a network node (e.g., you can only observe cases with cancer or with a positive mammogram). A potential strength of this approach is the breadth of reasoning problems that can be modeled. As well as the mammogram problem, Bayes nets can be used to represent problems such as Berkson’s Paradox (Pearl, ). The paradox arises when two variables A (e.g., high school grades) and B (e.g., exceptional musical talent), that are independent in a population, are found to be correlated in a subset of the data. This can occur when the sample data is selected on the basis of values of either variable A or variable B (but not both). Imagine, for example, that a college admits participants based on good grades or outstanding musical talent. Among those who have been accepted, grades and music talent are likely to be negatively correlated; students admitted because of their grades are not likely to be those with the best musical ability (and vice versa). The paradox can be modeled as a common effect network with the two selection criteria as causes and censoring on the effect node (only students admitted to the college are observed). It is also relatively easy to construct Bayes nets that represent category and property sampling. These can be captured in a net with two nodes; the category (rock type) and the property (plaxium). In category sampling the censoring mechanism is applied to the category node (so only small rocks

https://doi.org/10.1017/9781009002042.009 Published online by Cambridge University Press

The Dog that Didn’t Bark



can be observed). In property sampling the censoring mechanism is applied to the property node (only plaxium-positive instances can be observed). Our aim is to use this “censored Bayes net” approach as a framework for representing both simpler and more complex inferences involving selective sampling. In both cases, the censored Bayes net will allow us to derive normative predictions about what should be inferred from a given sample. It should also provide a basis to examine why such normative responses are more common for some types of reasoning than others and why such responding varies markedly between individuals.

. Conclusions Most of the samples that we use as a basis for inference are subject to some sort of selection mechanism. This may involve the actions of an intentional agent who chooses what sample data we see or environmental constraints that filter sample contents. Our work shows that many people are sensitive to the implications of these selection mechanisms when making inferences about how far a property generalizes. We have also shown that a Bayesian model of inference with a selective sampling mechanism provides a good account of group-level judgments in such tasks. Future model development however, will require adding mechanisms that accommodate important recent discoveries. Adjustment of one’s inferences in response to sampling constraints depends on the availability of working memory resources. Such adjustment primarily involves a change in the way that sample instances are encoded and represented during learning. The other challenge for our Bayesian approach is one of scale. So far, this approach has been successful in explaining inferences involving relatively simple causal structures. The next step is to expand the approach so that it encompasses inference over more complex structures, where the data generated from multiple cause and effect variables may be subject to selective sampling. REF ERE NCE S Baker, C. L., Jara-Ettinger, J., Saxe, R., & Tenenbaum, J. B. (). Rational quantitative attribution of beliefs, desires and percepts in human mentalizing. Nature Human Behaviour, , . https://doi.org/./ s–-

https://doi.org/10.1017/9781009002042.009 Published online by Cambridge University Press



 . ,    .

Bareinboim, E., Tian, J., & Pearl, J. (). Recovering from selection bias in causal and statistical inference. In Proceedings of the twenty-eighth AAAI conference on artificial intelligence, pp. –. Denrell, J. (). Why most people disapprove of me: Experience sampling in impression formation. Psychological Review, (), –. https://doi .org/./-X... Ecker, U. K., Lewandowsky, S., & Tang, D. T. (). Explicit warnings reduce but do not eliminate the continued influence of misinformation. Memory & Cognition, (), –. Eddy, D. (). Probabilistic reasoning in clinical medicine: Problems and opportunities. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncertainty: Heuristics and biases (pp. –). Cambridge: Cambridge University Press. https://doi.org/./CBO . Edwards, W. (). Bayesian and regression models of human information processing: A myopic perspective. Organizational Behavior and Human Performance, (), –. Elwin, E., Juslin, P., Olsson, H., & Enkvist, T. (). Constructivist coding: Learning from selective feedback. Psychological Science, (), –. Feeney, A. (). Forty years of progress on category-based inductive reasoning. In L. J. Ball & V. A. Thompson (Eds.), The Routledge international handbook of thinking and reasoning (pp. –). New York: Routledge/Taylor & Francis. Feeney, A., & Heit, E. (). Properties of the diversity effect in category-based inductive reasoning. Thinking & Reasoning, (), –. https://doi.org/ ./.. Feiler, D. C., Tong, J. D., & Larrick, R. P. (). Biased judgment in censored environments. Management Science, (), –. Fiedler, K. (). Meta-cognitive myopia and the dilemmas of inductivestatistical inference. In B. Ross (Ed.), The Psychology of Learning and Motivation (Vol. , pp. –). San Diego: Elsevier. https://doi.org/ ./B–---.- Fiedler, K., Ackerman, R., & Scarampi, C. (). Metacognition: Monitoring and controlling one’s own knowledge, reasoning and decisions. In R. J. Sternberg & J. Funke (Eds.), The psychology of human thought: An introduction (pp. –). Heidelberg: Heidelberg University. https://doi.org/ ./heiup..c Fiedler, K., Brinkmann, B., Betsch, T., & Wild, B. (). A sampling approach to biases in conditional probability judgments: Beyond base rate neglect and statistical format. Journal of Experimental Psychology: General, (), –. https://doi.org/./-... Fiedler, K., Hu¨tter, M., Schott, M., & Kutzner, F. (). Metacognitive myopia and the overutilization of misleading advice. Journal of Behavioral Decision Making, (), –. https://doi.org/./bdm.

https://doi.org/10.1017/9781009002042.009 Published online by Cambridge University Press

The Dog that Didn’t Bark



Franke, M., Dulcinati, G., & Pouscoulous, N. (). Strategies of deception: Under-informativity, uninformativity, and lies: Misleading with different kinds of implicature. Topics in Cognitive Science, (), –. https:// doi.org/./tops. Frederick, S. (). Cognitive reflection and decision making. Journal of Economic Perspectives, (), –. https://doi.org/./  Goodman, N. D., & Frank, M. C. (). Pragmatic language interpretation as probabilistic inference. Trends in Cognitive Sciences, (), –. https://doi.org/./j.tics... Griffiths, T. L., & Tenenbaum, J. B. (). Structure and strength in causal induction. Cognitive Psychology, (), –. https://doi.org/./j .cogpsych... Hamill, R., Wilson, T. D., & Nisbett, R. E. (). Insensitivity to sample bias: Generalizing from atypical cases. Journal of Personality and Social Psychology, (), –. https://doi.org/./-... Hand, D. J. (). Dark data. Princeton, NJ: Princeton University Press. Hayes, B. K., Banner, S., Forrester, S., & Navarro, D. J. (). Selective sampling and inductive inference: Drawing inferences based on observed and missing evidence. Cognitive Psychology, , Article . https://doi .org/./j.cogpsych... Hayes, B. K., Banner, S., & Navarro, D. J. (). Sampling frames, Bayesian inference and inductive reasoning. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the th annual meeting of the Cognitive Science Society (pp. –). Hayes, B. K., Hawkins, G. E., Newell, B. R., Pasqualino, M., & Rehder, B. (). The role of causal models in multiple judgments under uncertainty. Cognition, (), –. https://doi.org/./j.cognition.. . Hayes, B. K., & Heit, E. (). Inductive reasoning .. Wiley interdisciplinary reviews: Cognitive Science, (), –, e, https://doi.org/./wcs . Hayes, B. K., Navarro, D. J., Stephens, R. G., Ransom, K. J., & Dilevski, N. (). The diversity effect in inductive reasoning depends on sampling assumptions. Psychonomic Bulletin & Review, (), –. https://doi .org/./s–-- Hayes, B. K., Wen, Y. Y., Connor Desai, S., & Navarro, D. J. (). Who is sensitive to selection biases in inductive reasoning? Journal of Experimental Psychology: Learning, Memory, and Cognition. Advance online publication. https://doi.org/./xlm. Hearst, E. (). Psychology and nothing. American Scientist, (), –. Hemmer, P., Tauber, S., & Steyvers, M. (). Moving beyond qualitative evaluations of Bayesian models of cognition. Psychonomic Bulletin & Review, (), –. https://doi.org/./s–--z

https://doi.org/10.1017/9781009002042.009 Published online by Cambridge University Press



 . ,    .

Henriksson, M. P., Elwin, E., & Juslin, P. (). What is coded into memory in the absence of outcome feedback? Journal of Experimental Psychology: Learning, Memory, and Cognition, (), –. Hogarth, R., Lejarraga, T., & Soyer, E. (). The two settings of kind and wicked learning environments. Current Directions in Psychological Science, , –. Hsu, A. S., Horng, A., Griffiths, T. L., & Chater, N. (). When absence of evidence is evidence of absence: Rational inferences from absent data. Cognitive Science, (Suppl ), –. https://doi.org/./cogs . Jessen, R. J. (). Statistical survey techniques. New York: Wiley. Kahneman, D. (). Thinking, fast and slow. New York: Macmillan. Koehler, J., & Mercer, M. (). Selection neglect in mutual fund advertisements. Management Science, (), –. Lawson, C. A., & Kalish, C. W. (). Sample selection and inductive generalization. Memory & Cognition, (), –. https://doi.org/./ MC... Le Mens, G., & Denrell, J. (). Rational learning and information sampling. Psychological Review, , –. Lewandowsky, S., Ecker, U. K., Seifert, C. M., Schwarz, N., & Cook, J. (). Misinformation and its correction: Continued influence and successful debiasing. Psychological Science in the Public Interest, (), –. Lewandowsky, S., Oberauer, K., Yang, L.-X., & Ecker, U. K. H. (). A working memory test battery for MATLAB. Behavior Research Methods, (), –. https://doi.org/./BRM... Liew, J., Grisham, J. R., & Hayes, B. K. (). Inductive and deductive reasoning in obsessive-compulsive disorder. Journal of Behavior Therapy and Experimental Psychiatry, , –. https://doi.org/./j.jbtep ... Lombrozo, T. (). The structure and function of explanations. Trends in Cognitive Sciences, (), –. Mascaro, O., & Sperber, D. (). The moral, epistemic, and mindreading components of children’s vigilance towards deception. Cognition, (), –. https://doi.org/./j.cognition... Medin, D. L., Coley, J., Storms, G., & Hayes, B. K. (). A relevance theory of induction. Psychonomic Bulletin & Review, , –. Mercier, H. (). The argumentative theory: Predictions and empirical evidence. Trends in Cognitive Sciences, (), –. https://doi.org/ ./j.tics... Mercier, H., & Sperber, D. (). Why do humans reason? Arguments for an argumentative theory. Behavioral and Brain Sciences, (), –. https:// doi.org/./SX Navarro, D., Dry, M., & Lee, M. (). Sampling assumptions in inductive generalization. Cognitive Science, (), –. https://doi.org/./j .-...x

https://doi.org/10.1017/9781009002042.009 Published online by Cambridge University Press

The Dog that Didn’t Bark



Oaksford, M., & Chater, N. (). New paradigms in the psychology of reasoning. Annual Review of Psychology, , –. https://doi.org/ ./annurev-psych-- Osherson, D. N., Smith, E. E., Wilkie, O., Lopez, A., & Shafir, E. (). Category-based induction. Psychological Review, (), –. https://doi .org/./-X... Pearl, J. (). Causality: Models, reasoning and inferences. San Francisco, CA: Morgan Kaufman. Peterson, C. R., & Beach, L. R. (). Man as an intuitive statistician. Psychological Bulletin, (), –. https://doi.org/./h Ransom, K., Perfors, A., Hayes, B. K., & Connor Desai, S. (). What do our sampling assumptions affect how we encode data or how we reason from it? Journal of Experimental Psychology: Learning, Memory and Cognition. Advance online publication. https://doi.org/./xlm Ransom, K., Perfors, A., & Navarro, D. (). Leaping to conclusions: Why premise relevance affects argument strength. Cognitive Science, (), –. https://doi.org/./cogs. Ransom, K., Voorspoels, W., Perfors, A., & Navarro, D. (). A cognitive analysis of deception without lying. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. J. Davelaar (Eds.), Proceedings of the th annual conference of the Cognitive Science Society (pp. –). Austin, TX: Cognitive Science Society. Rhodes, M., Bonawitz, E., Shafto, P., Chen, A., & Caglar, L. (). Controlling the message: Preschoolers’ use of information to teach and deceive others. Frontiers in Psychology, , Article . https://doi.org/./fpsyg. . Rips, L. J. (). Inductive judgments about natural categories. Journal of Verbal Learning & Verbal Behavior, (), –. https://doi.org/./ S-()- Rohrer, J. M. (). Thinking clearly about correlations and causation: Graphical causal models for observational data. Advances in Methods and Practices in Psychological Science, (), –. https://doi.org/./  Ross, L., Lepper, M. R., & Hubbard, M. (). Perseverance in self-perception and social perception: Biased attribution processes in the debriefing paradigm. Journal of Personality and Social Psychology, , –. Sanborn, A. N., Griffiths, T. L., & Navarro, D. J. (). Rational approximations to rational models: Alternative algorithms for category learning. Psychological Review, (), –. https://doi.org/./a Shafto, P., Goodman, N. D., & Frank, M. C. (). Learning from others: The consequences of psychological reasoning for human learning. Perspectives on Psychological Science, (), –. https://doi.org/./  Sloman, S. (). Causal models: How people think about the world and its alternatives. Oxford: Oxford University Press. https://doi.org/./ acprof:oso/..

https://doi.org/10.1017/9781009002042.009 Published online by Cambridge University Press



 . ,    .

Tenenbaum, J. B., & Griffiths, T. L. (). Generalization, similarity and Bayesian inference. Behavioral and Brain Sciences, (), –. https:// doi.org/./SX Voorspoels, W., Navarro, D. J., Perfors, A., Ransom, K., & Storms, G. (). How do people learn from negative evidence? Non-monotonic generalizations and sampling assumptions in inductive reasoning. Cognitive Psychology, , –. https://doi.org/./j.cogpsych... Vul, E., Goodman, N., Griffiths, T. L., & Tenenbaum, J. B. (). One and done? Optimal decisions from very few samples. Cognitive Science, (), –. https://doi.org/./cogs. Zhu, J. Q., Sanborn, A. N., & Chater, N. (). The Bayesian sampler: Generic Bayesian inference causes incoherence in human probability judgments. Psychological Review, (), –. http://doi.org/./rev

https://doi.org/10.1017/9781009002042.009 Published online by Cambridge University Press

 

Unpacking Intuitive and Analytic Memory Sampling in Multiple-Cue Judgment August Collsiöö, Joakim Sundh and Peter Juslin

. Introduction Responding to a new situation by sampling similar situations with known properties from memory is arguably one of the most generic ways to produce a judgment. As made famous in Roger Shepherd’s “law of generalization” (Shepard, ), responding to novel stimuli as an exponential function of their perceived similarity to known stimuli may be one of the most deeply engrained adaptations to the world in humans and other animals. It is no surprise, then, that we find a long history of theories in Cognitive Psychology that postulate that judgments for new stimuli are made by similarity-based sampling of our “internal world” of previously encountered “instances” (Hintzman, ; Logan, ), “cases” (Schank, ), or “exemplars” (Medin & Schaffer, ; Nosofsky, ), ranging from the earliest learning theories to the latest Bayesian sampling theories (see Chapters  and ). Yet, while few would question that making judgments by comparing the new situation to memory representations of concrete and sometimes unique previous experiences is one important route to make a judgment, the cognitive substrate of these processes often remains elusive. Indeed, what defines and delimits a “case,” “exemplar,” or “instance” is often left surprisingly under-specified, as is the nature of the memory processes involved when these memory representations are sampled and subsequently applied to the query. On the one hand, it is clear that people often consciously compare current situations with past situations when making judgments and decisions; for example, choosing a restaurant might entail actively looking back at previous visits to restaurants and choosing the one that one has the The research was funded by the Swedish Research Council and the Marcus and Amalia Wallenberg Foundation.



https://doi.org/10.1017/9781009002042.010 Published online by Cambridge University Press



 o¨ o¨ ,  ,  

fondest memories of. On the other hand, comparisons might not always be quite so direct; if encountering an unfamiliar restaurant, one might compare it to previously visited restaurants and decide based on the relative similarity of the current restaurant to the best (or worst) experiences in the past, perhaps without ever being conscious of the process. However, one might not be able to apply all available previous data (e.g., every restaurant one has ever visited), in fact it seems psychologically implausible that one would be able to do so outside of narrowly defined contexts. It is more likely that, just as when processing information from the environment, one is limited to samples of data rather than the complete distribution. In this chapter, we explore the processes of memory sampling in exemplar-based models of multiple-cue judgment (Hoffman et al., ; Juslin et al., , ; Karlsson et al., ; Pachur & Olsson, ; Platzer & Bröder, ; von Helversen & Rieskamp, ), in turn, typically based on the Generalized Context Model (GCM) of category learning (Nosofsky, , ). We use the Precise/Not Precise (PNP) model (Sundh et al., ) to demonstrate that one can empirically identify not only more or less intuitive vs analytical rule-based processes, but also more or less intuitive vs analytical memory sampling. As illustrated below, depending on the parameters, the GCM implements two qualitatively distinct processes that draw on memory for concrete exemplars to produce a response, one that involves recall of individual exemplars and one that involves similarity-based inferences from previously encountered exemplars. The first process is exemplified when you encounter a person and retrieve that he or she has the profession of being a lawyer. The second process is exemplified when you encounter a person and recognize that he or she looks like previous people with the profession of lawyer that you have come across. In this chapter, we demonstrate that these two parametrizations of the GCM naturally produce the empirical hallmarks of analysis and intuition that are identified by the PNP framework. In the following, we first describe how rule-based and exemplar (memory)-based cognitive algorithms can be identified in multiple-cue judgment and we then introduce the PNP model as a new way to identify if a given cognitive process is intuitive or analytic. Thereafter, we describe how combining these two approaches allows us to investigate if intuitive or analytic cognitive processes implement the rule-based and exemplar-based algorithms observed in multiple-cue judgment. Based on a large database with multiple-cue judgments we lastly show how the PNP model allows us to empirically identify intuitive and analytic processes of exemplar

https://doi.org/10.1017/9781009002042.010 Published online by Cambridge University Press

Unpacking Intuitive and Analytic Memory Sampling



sampling, as captured by the GCM, and we discuss how these results can inform the current debate on dual-systems (or processes) of cognition.

. Identification of the Cognitive Processes in Multiple-Cue Judgment Much of the previous research on multiple-cue judgment has modeled the interplay between direct use of memory, as typically captured by exemplar memory models involving similarity-weighted estimates based on known exemplars, and cue-abstraction models; rule-based judgments involving intuitive integration of explicit beliefs about the cue-criterion relations (e.g., fever is associated with pneumonia). A large body of literature by now demonstrates that people shift between these processes as a function of both task properties (e.g., Hoffman et al., ; Juslin et al., , ; Karlsson et al., ; Pachur & Olsson, ; Platzer & Bröder, ; von Helversen & Rieskamp, ) and properties of the decision maker (e.g., Hoffman et al., ; Little & McDaniel, ; von Helversen et al., ). To distinguish empirically between use of an exemplar-based process and a cue-abstraction process is a nontrivial problem, because in virtually all experimental designs they yield similar or even identical predictions. One key instrument for identifying the processes is to use extrapolation designs, where the training phase in the experiment involves a limited stimulus range, but the test phase requires the participants to extrapolate the knowledge acquired in training outside of the training range (DeLosh et al., : see also Figure .). While exemplar models offer no way to extrapolate the judgments outside of the training range (without auxiliary processes), cueabstraction – which involves inducing the underlying rule-based structure of the task – allows for such extrapolation. Extrapolation designs have been successfully used in standard multiple-cue paradigms, where both sorts of processes have been empirically identified (e.g., Juslin et al., , ).

. Returning to Brunswik: The PNP Model In terms of the standard dual-systems distinction between Systems  (intuitive) processes and System  (analytic) processes (Evans, ; Evans & Stanovich, ) it may seem natural to align System  with the exemplar processes and System  with the rule-based cue-abstraction processes. As we discuss below, however, it turns out that upon scrutiny the rule-based cue processes described by cue-abstraction models can be

https://doi.org/10.1017/9781009002042.010 Published online by Cambridge University Press



 o¨ o¨ ,  ,  

(A)

(B)

(C)

(D) Intuition(B) Rule-based 100

80

80

Predicted judgment

Predicted judgment

Analysis(B) Rule-based 100

60

40

20

60

40

20

0

0 0

20

40

60

80

100

0

Criterion

20

40

60

80

100

Criterion

Figure . The characteristic quantitative predictions by each of the four cognitive process models summarized in Table .. The identity lines in the graphs represent correct judgments and the rectangles in the left-most panels identify the rare but potentially larger errors that are predicted by Analysis(B). The predictions in the graphs are stylized examples of error-free predictions by the models that have either been perturbed by a few larger errors (left-side panels for Analysis(B)) or perturbed by a ubiquitous Gaussian noise (right-side panels for Intuition(B)).

implemented either by analytic or intuitive cognitive processes, in a way that is identifiable with the PNP model (Sundh et al., ). In the following, we elaborate on the PNP model, arguing that also the exemplar processes captured by the GCM come in both analytic and intuitive forms.

https://doi.org/10.1017/9781009002042.010 Published online by Cambridge University Press

Unpacking Intuitive and Analytic Memory Sampling



Inspired by Brunswik (), in Sundh et al. () we introduced a new conceptualization of analysis and intuition, not defined by hypothesized conceptual correlates (Evans, ), but by empirical response distributions. To this end, we defined Analysis(B) as deterministic cognitive processes, as illustrated by application of explicit integration rules to exact, symbolic representations of cues that almost always produce the same result. To exemplify, consider an experiment where participants assess their willingness to pay (WTP) for lotteries on the form: probability p of obtaining a sum of money V, otherwise nothing. A possible Analysis (B) process would be to analytically “number-crunch” the Expected Value (EV) by multiplying p and V, and report this as the WTP. Because this is essentially a rule-based deterministic process, the errors will be relatively rare and typically involve misinterpretation of the symbols or misremembered facts. To the extent that the integration required by the algorithm is feasible to perform mentally within the constraints of working memory, these errors can be expected to be rare, yielding a leptokurtic distribution. Intuition(B) processes, typically honed by evolution or extensive training, are approximate and characterized by ubiquitous and normally distributed errors (as defined in Sundh et al.,  based on the ideas by Brunswik, ). In the WTP task above, an Intuition(B) process could be to subjectively assess one’s strength of preference, and map this assessment to a monetary scale, an inherently variable process. Intuition(B) processes depart from the deterministic procedures of Analysis(B) in one (or both) of two ways. First, there might be error-perturbed encoding of cues, for example due to neural noise, such as in “natural assessments” of the similarity between an object and a prototype (Tversky & Kahneman, ). Second, there might be an inability to consistently apply the cue integration rule in the same way on all trials, leading to different judgments for the same stimulus (c.f., “inconsistency” or “lack of cognitive control,” see Brehmer, ; Karelaia & Hogarth, ). In multiple-cue judgment, the participants often have rule-like beliefs about how the cues relate to a criterion (e.g., fever is a cue to pneumonia), but the cues are informally integrated rather than “number-crunched” according to a formula. In short, Analysis(B) is characterized by few but potentially large errors and Intuition(B) is characterized by ubiquitous and normally distributed 

To distinguish our use of the terms “intuition” and “analysis,” which are operationally defined in terms of the observed judgment error distributions, from other uses in the literature we from here on refer to them as “Intuition(B)” and “Analysis(B),” where the “B” is honoring their origin in the ideas of Egon Brunswik ().

https://doi.org/10.1017/9781009002042.010 Published online by Cambridge University Press



 o¨ o¨ ,  ,  

errors. In order to distinguish between these two types of processes, we created the Precise/Not Precise (PNP) model. The PNP model is based on the premise that, for each actualization of a given cognitive algorithm, there is some probability (denoted λ) that an error will occur and otherwise the algorithm will be executed without error. Thus, in the case of Analysis (B), λ is presumably a small number, while in the case of Intuition(B), λ = . Given the assumption that errors follow a Gaussian distribution, Analysis(B) will produce a leptokurtic mixed distribution with a peak on  (error-free execution) and comparatively long tails (Gaussian distributed errors), and Intuition(B) will reduce to the standard assumption of a Gaussian error distribution usual in many computational cognitive models., The WTP task described above was investigated in Sundh et al. (), and the responses and modeling of ID  is a good example of the benefits of the PNP model when the underlying process bears the hallmarks of Analysis(B). Most of the WTP judgments () of ID  were exactly equal to the expected values, but there were seven large errors. The PNP model thus indicates that most responses are perfect computations of the EV but there is a probability (λ ¼ .) for error with a high standard deviation. A standard regression model assuming Gaussian noise will erroneously specify the process, implying that the participant computes a value . times larger than the EV that is always perturbed by random noise. The better fit to data of the PNP model identifies (correctly we believe) deterministic calculation of the EV, marred by occasional errors. Maximum Likelihood Estimation identifies this state because, whereas  out of  WTPs equal to the EV is likely under the PNP model, that many exactly correct EVs is unlikely if Gaussian noise is added to each judgment. Formally, the PNP model is defined as: 

yjðB ¼ bÞ ¼

g ðxjθÞ þ N ð, σ  Þ, b ¼  g ðxjθÞ,

b¼

(8.1)

where g(x|θ) is any cognitive algorithm, for example the EV algorithm described above, mapping a stimulus (x) and parameter (θ) vector into predicted judgments (y). Each estimate is either affected by Gaussian error 



These error distributions are centered on the model predictions rather than the criterion of the task. Thus, the PNP model will distinguish precise from not-precise responses according to some given cognitive model, but is agnostic in regard to whether these responses are “correct” in any normative sense. Note that this differs from other “robust” modeling in that we focus specifically on the proportion of precise responses rather than the departures from the regular Gaussian distribution as such.

https://doi.org/10.1017/9781009002042.010 Published online by Cambridge University Press

Unpacking Intuitive and Analytic Memory Sampling



(b ¼ ) or not affected by error (b ¼ ). B is a Bernoulli random variable with probability λ, describing the probability that Gaussian noise is added to the output of the algorithm (b ¼ ) and inversely (  λ) the probability that a response is an error-free execution of the algorithm (b ¼ ). In the WTP example λ ¼ :, thus in seven cases (. percent) the EValgorithm is executed with Gaussian noise added (b ¼ ) and in  cases (. percent) it is executed without error (b ¼ ). The default assumption in cognitive modeling corresponds to λ ¼ : to add a Gaussian noise to every predicted response. In Sundh et al. () we demonstrate that (i) in model recovery analyses the PNP model recovers the parameters of the process better than a standard regression model for a mix of error-free and error-perturbed responses (i.e., when there exists a subset of deterministic responses computed by an exact algorithm, the PNP model correctly recovers this algorithm); (ii) distinguishes between data from analytical numbercrunching tasks and from intuitive perceptual processes; (iii) identifies processes as Analysis(B) or Intuition(B) also in multiple-cue judgment tasks that involve no sensory uncertainty (see Sundh et al., ).

. Combining Multiple-Cue Integration and the PNP Model An important aspect of the PNP model is that it does not specify a particular cognitive algorithm or operation, rather it represents an additional layer that is added to a given cognitive algorithm. As such, the PNP model can be used for modeling both the rule-based cue-abstraction and exemplar-memory algorithms described above, and both of them can therefore, in principle, be expressed by either an Analysis(B) or Intuition (B) process. When we combine these two cognitive algorithms with the PNP model we get four (rather than two) potential cognitive processes that people might, in principle, engage in a multiple-cue judgment task (see Table . for a summary and Figure . for illustration of stylized quantitative predictions). Memory may be engaged either by rote memory (Panel A, Figure .) of the criterion values of specific exemplars or by similaritybased inference (Panel B) that draws on the relative similarity to several exemplars stored in long-term memory. As further discussed in Section ., both of these processes involve previously observed exemplars, as captured by the GCM, but the first process is best conceptualized as involving recall of the criterion of a single exemplar, while the latter process involves similarity-based inference about a likely criterion value. The

https://doi.org/10.1017/9781009002042.010 Published online by Cambridge University Press



 o¨ o¨ ,  ,  

inability to extrapolate judgments outside of the training range (that is, e.g., the inability to infer a criterion value of , when the highest criterion value previously encountered was ) with the memory processes is illustrated by the nonlinearities at the extreme of the criterion range in Panel A and B, Figure .. The rule-based processes captured by cueabstraction models, which emphasize abstraction of explicit cue-criterion relations, may likewise either involve explicit reasoning (Panel C) with declarative rules and facts, or the ad hoc integration of vague verbal beliefs by a more informal cue-integration (Panel D). The linearity of the functions at the extremes of the criterion range illustrate the ability for extrapolation with the rule-based processes (Panel C and D, Figure .). These processes, and the graphical illustrations of them, are stylized extremes; complex cognitive processes likely involve a mix of several of them. First, in tasks with a deterministic criterion and reoccurring exemplars, a participant may engage in Rote Learning of the criterion value for the exemplars. Because this process involves retrieval of the same symbolically represented criterion (e.g., when you know that an exemplar has criterion ) and therefore (almost) always yield exactly the same response, it is consistent with Analysis(B). Likewise, if you know the age of a person, there is no reason to expect this estimate to be perturbed by a ubiquitous Gaussian noise. While relying on memory for exemplars, this is a special case where the criterion of a single exemplar is retrieved. A second use of memory for judgment is when you make an estimate for a new unknown situation based on its relative similarity to a number of previous exemplars with known criterion values, a process of similaritybased inference routinely captured by exemplar-based models of categorization (e.g., Nosofsky, ; Nosofsky & Johansen, ) and judgment (e.g., Juslin et al., , ). This is often how good fit of an exemplarbased model is conceptually interpreted. Because this process is theorized to capture an inference based on automatic processes of similarityperception, recall, and informal integration across exemplars, it is likely to be perturbed by inconsistencies and neural noise in the processes. In terms of the PNP model, this yields a ubiquitous random error that affects each judgment, the characteristic of Intuition(B). A third cognitive process refers to judgments by explicit Reasoning, where the participants produce judgments by elaborating in working memory with declarative rules and facts. For example, if one numbercrunches the expected value of a lottery, one executes a deterministic sequence of mental operations that always produce the same output, save those occasions where the algorithm is erroneously executed. In terms of

https://doi.org/10.1017/9781009002042.010 Published online by Cambridge University Press

Cognitive Process



https://doi.org/10.1017/9781009002042.010 Published online by Cambridge University Press

Table .. Four cognitive processes that can be identified by the experimental design and the modeling reported in this chapter (the central four cells of the table). The cognitive algorithm employed is either rule-based or exemplar-memory based, as identified by extrapolation beyond the training range and the processing may either involve an Analysis(B) or an Intuition (B) process as identified by the judgment error distributions.

Cognitive algorithm

Identifier (process)

Identifier (algorithm)

Analysis(B)

Intuition(B)

Extrapolation

Exemplarmemory based

Rote memory Estimate by rote learning of the criterion value, which is retrieved and reported.

Inability to extrapolate beyond the training range.

Rule-based

Explicit reasoning Estimate by explicit mental number- crunching within working memory based on declarative rules and facts. Leptokurtic error distribution with many exactly correct estimates but occasional errors. Low λ

Similarity-based inference Estimates inferred from knowledge of the criterion values of a sample of known similar exemplars. Inf. cue-integration Estimate by integration of separate explicit beliefs about cuecriterion relations by informal additive weighting of cues. A Gaussian error distribution with app. correct estimates, ubiquitously perturbed by error. High λ

Error distribution

Ability to extrapolate according to the (explicit or implicit) integration rule.



 o¨ o¨ ,  ,  

the PNP model, these are Analysis(B) processes because they produce deterministic responses that are only occasionally marred by errors. The fourth process is the informal cue-integration process that is often assumed to take place in complex multiple-cue judgment tasks. This process is rule-based in the sense that it involves sequential consideration of explicit rules for cue-criterion relations (e.g., apartments in the city center are expensive), but intuitive in the sense that the aggregation is an informal sequential belief updating that involves no explicit integration rule (e.g., Einhorn & Hogarth, ; Juslin et al., ). For example, one may assess the reasonable price of an apartment, by sequentially pondering its values on cues that are known to predict the price of apartments. Because this integration typically operates on subjective evaluations of the cues and informal sequential updating of a belief about the criterion, this implements an Intuition(B) process. This ubiquitous variability in the judgments is known as inconsistency, or lack of cognitive control (Brehmer, ; Cooksey, ).

. Analytic and Intuitive Sampling from Memory As detailed in the previous section, we propose that processes of sampling from memory can come in two different guises, rote-memorization of individual exemplars and inference based on consideration of similar known exemplars. In this section, we show that adaptations of the GCM (Nosofsky, ), applied to a continuous criterion (e.g., Juslin et al., ), can instantiate both of these processes and that they naturally exhibit the properties of Analysis(B) and Intuition(B). For a continuous criterion, the GCM is: k X

^c ¼

n X

exp ðβ

j¼

jx i  x ∗ ij jÞ  C j

i¼

k X j¼

n X

exp ðβ

(8.2) jx i 

x∗ ij jÞ

i¼

The model predicts the criterion (^c ) of the test-item based on a weighted average of the criterion Cj for each of the k exemplars. The criterion values of the exemplars are weighted in accordance with how similar each of the n cue values x ∗ ij of the previously encountered exemplars are in relation to the corresponding n cue values xi of the test-item. β describes how much the relative similarity between the cue-values of an exemplar and the cuevalues of the test-item affects the weight put on each exemplar (with a high

https://doi.org/10.1017/9781009002042.010 Published online by Cambridge University Press

Unpacking Intuitive and Analytic Memory Sampling



value resulting in more weight on the exemplar(s) most similar to the testitem relative to other less similar exemplars) The standard GCM is thus based on the assumption that each previous exemplar is weighted according to its relative similarity to the test-item. Of course, as previously noted, it is not psychologically plausible that, literally speaking, “each previous exemplar” is retrieved (Juslin & Persson, ), nor is it necessarily obvious what such a set contains. Fortunately, the predictions of the GCM are consistent with a sampling-based account as we will show with a simulation below and, viewed through a PNP lens, the observed error distributions give us additional hints regarding the nature of these memory-sampling processes. Specifically, the similarity parameter (β in Equation .) defines the relative weights given to previous exemplars based on their similarity to the item being judged. The higher the parameter, the more relative importance is given to the most similar exemplar(s). This, in turn, defines how many exemplars have any practical importance for the output of the model. By extension, for a very high value of β, the output of the GCM is equivalent to a rote-memory process where only the most similar exemplar is considered; in this case so much more weight is given to the most similar exemplar that other exemplars might as well be ignored. Figure . illustrates this with the simulated predictions of exemplarbased models applied to a simple two-cue judgment task, where the two cues are generated from the uniform distribution U(, ) and the criterion is equal to the sum of the cues, using  exemplars to judge   probes with no added noise. The x-axis represents the prediction error. Note that while this technically relates an error distribution, it pertains to deterministic prediction error that stems from the number of exemplars that are considered and the way that exemplars are chosen, as opposed to random noise. The leftmost panels show a regular GCM model using a similarity parameter of β ¼ , meaning that exemplars are weighted according to the exponential of their similarity (upper panel) compared to a GCM model sampling only the ten most similar exemplars using a similarity parameter of β ¼ , meaning that all exemplars are weighted equally (lower panel). The rightmost panels, by contrast, show a regular GCM model with β ¼ , meaning that high relative weight is put on the most similar exemplar (upper panel) compared to a GCM model sampling only the most similar exemplar (lower panel). The error distributions (deviations from the correct value) from these different expressions of the GCM model (upper vs lower panels) are similar, indicating that the regular GCM

https://doi.org/10.1017/9781009002042.010 Published online by Cambridge University Press



 o¨ o¨ ,  ,  

model, by way of the similarity parameter, can also approximate a computationally rational account that selectively samples over the exemplar space. This is because the similarity parameter in effect restricts the number of exemplars that will have a practical impact on model output. Moreover, the implementation of the GCM in the left-side panels, which capture similarity-based inference based on multiple exemplars (similarity-based inference in Table .) naturally produces a broader error distribution similar to what is implied by Intuition(B). Also, the implementation in the right-side panels, which effectively portrays the recall of a single (almost) identical exemplar (rote memory in Table .), reproduces the distinct peak that is implied by Analysis(B). We conjecture that this result captures an important psychological difference between two ways to use memory to produce a judgment: One analytic use of memory that involves explicit recall of a known value, and one more approximate intuitive use of memory that engages similarity-based inductive inference informed by the experience of multiple similar exemplars. Figure . illustrates two important results: (i) Comparing the upper and the lower panels we see that the good fit of the GCM is consistent with a sampling account, where only a subset of exemplars are actually retrieved and considered in the computations. This amplifies the plausibility of exemplar memory as psychological processes. (ii) Comparing the left-side with the right-side panels of Figure . illustrates that the different parameter settings in the GCM – either capturing the similarity-based integration of multiple exemplars (left side), or approximating rotememory of individual exemplars (right side) – naturally reproduces the response distributions emphasized, respectively, by Intuition(B) (left side) and Analysis(B) (right side). The open question is if we can confirm both processes in empirical data. A drawback of using the similarity parameter alone is that some proportion of errors will most probably arise also in an Analysis(B) process, which might bias the parameter. This can be avoided by using the PNP model, by which we can isolate the proportion of “precise” implementations of rote memory.

. Identification of the Sampling Mechanisms from Judgment Data In the following, we will draw on a database containing multiple-cue judgment data from several experiments using the same basic multiplecue judgment task that requires either additive or nonadditive cueintegration. We explore under what circumstances, and to what extent,

https://doi.org/10.1017/9781009002042.010 Published online by Cambridge University Press

Unpacking Intuitive and Analytic Memory Sampling



Figure . The error distributions from simulations with adaptations of the Generalized Context Model (GCM; Nosofsky, ), when applied to a multiple-cue judgment task with a continuous criterion (e.g., Juslin et al., ). From left to right: The GCM applied to a continuous criterion; a GCM with extremely high specificity parameter; a GCM that makes a judgment by sampling the  most similar exemplars from memory; and a GCM sampling only the most similar exemplar.

the memory-based and rule-based cue-abstraction algorithms identified by the model-fitting can be empirically identified as being Analysis(B) or Intuition(B) processes. .. Method ... Participants All participants come from different nonoverlapping samples from the same population, primarily students at Uppsala University. Participants were recruited through public advertisement at various places at Uppsala University. Compensation was awarded in the form of a cinema voucher. The database consists of  participants ( females,  males,

https://doi.org/10.1017/9781009002042.010 Published online by Cambridge University Press



 o¨ o¨ ,  ,  

 nonbinary and  not reporting gender) ranging in age from  to  years (M ¼ :, SD ¼ :). ... Design and Material All experiments consisted of a    between-subjects design where one independent variable was additive vs nonadditive task and the other independent variable was the format of cues and/or criterion (verbal vs numerical). The participants were presented with two cues either on a numerical scale (–) or a verbal scale (very little, a little, average, a lot, very much), depending on the format condition. Their task was to learn from feedback training to judge a criterion either on a numerical scale (–) or a verbal scale (extremely low, very low, low, somewhat low, normal, somewhat high, high, very high, extremely high). The criterion was calculated from the numeric cue values and then mapped to the verbal counterparts for the verbal condition. A full factorial combination of cuevalues resulted in  items. The training phase consisted of  of these items (two items omitted in order to distinguish between rule-based cueabstraction and memory-based sampling) repeated ten times. The testphase consisted of all  items presented twice. The underlying cue-criterion relationship was the same for all experiments with either an additive (Equation .) or a nonadditive (Equation .) cue-integration rule depending on the task condition. Criterion ðadditive Þ ¼ α þ β  Cue   β  Cue 

(8.3)

The normative values for inferring the criterion in the additive task were α ¼ , β ¼ , β ¼ . Criterion ðnonadditive Þ ¼ α þ β  ðCue   β Þ  ðCue   β Þ

(8.4)

The normative values for inferring the criterion in the nonadditive task were α ¼ , β ¼ , β ¼ . Across the experiments, the task was varied regarding a number of factors as described in Table .. When the cover story was medical the task was to judge the concentration of a fictitious hormone in the bloodstream of an individual based on information about the amount of two other fictitious hormones in the individual’s urine. When the cover story was social the task was to judge

 

Age data is missing for seven participants. The verbal cue- and criterion-values were presented in Swedish in the actual experiments.

https://doi.org/10.1017/9781009002042.010 Published online by Cambridge University Press

Unpacking Intuitive and Analytic Memory Sampling



Table .. Compilation of factors that are varied across the experiments in the database used for the presented analyses. Experiment

Cover Story

   

Medical Medical Medical Social

Feedback Deterministic Deterministic Probabilistic Deterministic

Cue and Criterion Format Same format Different format Same Format Same Format

the politeness of an individual’s communication with another person based on information about (i) the (social) power relationship between the individual and his/her conversation partner (e.g., a CEO has a power advantage over an employee) and (ii) the social distance between them (e.g. the distance is low between close friends but large for acquaintances). “Same format” indicates that both the cues and the criterion were presented either in a numerical or verbal format. “Different format” indicates that either the cues were verbal and the criterion numerical or the other way around. The factors in Table . were not orthogonally crossed over the  participants. We have thus noted in footnotes when an experiment has been excluded to avoid biased results. ... Procedure The participants carried out the experiment in separate computer booths at the Department of Psychology, Uppsala University under supervision of an experiment leader. In each experiment participants were randomly assigned to one of the four conditions. The  training trials were presented in an independently randomized order for each training block. The  test trials were presented in an independently randomized order for each test block. ... Cognitive Modeling The cognitive modeling investigated if participants relied on a rule-based or a memory-based algorithm to solve the task and if these processes were applied as an Analysis(B) or Intuition(B) process (see Table . for further description of these processes). We fit four cognitive algorithms within the PNP framework (see Equation .): An additive model (Equation .), a nonadditive model (Equation .), an exemplar-based memory model

https://doi.org/10.1017/9781009002042.010 Published online by Cambridge University Press



 o¨ o¨ ,  ,  

(Equation . in the previous section), and a null model predicting that participants answer with their mean estimate for all trials. We fit both a standard configural version of the GCM and a nonconfigural version. In the latter version, only the identity of the cue values, not their position, matters for the similarity (e.g., cues [, ] and [, ] are coded as identical, see Collsiöö et al., , for details). Parameters were estimated with maximum likelihood and we relied on the Bayesian Information Criterion (BIC) to identify the best model fit (see Raftery, ). We categorized a model as supported for a participant, if the BICdifference between that model and all other models were C), assessed in paired likability comparisons. For instance, if in the simplest case A and B are represented as two overlapping normal distributions of equal variance, it is easy to see that the probability p(A>B) that A is preferred to B in paired comparisons can be translated into a standardized “discriminal difference” value x B x A that indicates the differential position of A and B on the latent scale x. Thus, once the scale has been set by pairwise comparison, the scale values of further stimuli can be determined from the available preference data, granting that the difference scores corresponding to all pairwise comparisons are compatible with the assumption of a transitive, one-dimensional likability scale.

https://doi.org/10.1017/9781009002042.018 Published online by Cambridge University Press

Brunswikian and Thurstonian Sampling



While Thurstone’s () model of psychological scaling relied on paired-comparison data, his general notion of a distributive representation of target stimuli – conceived as a sample of variable values rather than a fixed scalar – is applicable to modeling judgments of all kinds. In forming person impressions from person traits, the cognitive task is to integrate traits rather than to compare them pairwise. Yet, Thurstone’s psychological scaling concept still holds: Locating target stimuli on a common scale is the first step of integrating them with regard to the target dimension of likability. Likewise, the same principles of discriminal dispersion apply: Combined likability information gained by sampled traits fluctuates from one moment to another. .. Brunswikian and Thurstonian Uncertainty Thurstone’s notion of ambiguity, fluctuation, and uncertainty has crucial implications for judgments and decisions inferred from incomplete and indirect stimulus samples. In the spirit of Thurstone’s stimulus scaling approach and the critical role assigned to the dispersion of judgmental responses elicited by the same stimuli, we will present a cognitiveecological interpretation of the concepts of “Thurstonian” and “Brunswikian” uncertainty borrowed from Juslin and Olsson (). Brunswikian sources are environmentally determined, caused by the incompleteness and imperfect validity of a sample drawn from the actual target population. Therefore, Brunswikian incompleteness or uncertainty cannot be reduced by the cognitive system through more careful assessment or enhanced processing capacity. On the contrary, even unbiased, lossless, and flawless statistical procedures or perfectly operating algorithms are subject to Brunswikian uncertainty. Thurstonian sources of uncertainty, in contrast, are inherently cognitive and take place exclusively within the mind of the processing individual. The Thurstonian variance component of judgmental responses can reflect manifold internal causes, such as inherent variability of the nervous system, generative memory processes, and side effects of all kinds of ongoing mental activities. Thus, Thurstonian uncertainty can vary considerably between individuals, contexts, or time, even when Brunswikian uncertainty in the sample remains unchanged. Although the details of Brunswikian and Thurstonian uncertainty can be hard to discern, their distinct impact on resulting impression judgments can be isolated experimentally. Throughout this chapter, we will see that including Thurstonian in addition to Brunswikian uncertainty in a

https://doi.org/10.1017/9781009002042.018 Published online by Cambridge University Press

  ,  ,   theoretical framework leads to clear-cut theoretical constraints and distinct predictions of highly regular judgment and decision effects. Aligned with the cognitive-ecological sampling approach, we see that assessing and analyzing Brunswikian aspects of samples of target behaviors can greatly help to illuminate the uncertainty of the information environment. Analyzing and manipulating Thurstonian uncertainty can elucidate the systematic influence of sources residing within the judge’s mind. Although both types of uncertainty in sampling are clearly separable at the conceptual level and refer to clearly distinct psychological influences, they are neither mutually independent nor reflective of purely additive or separate consecutive stages of an overarching process. Rather, the interplay of ecological and cognitive sampling processes reflects an intertwined iterative process that prevents us from stopping at a two-stage conception of judgment and decision processes: We cannot merely split judgment tasks and stimulus domains into cases where either Brunswikian or Thurstonian sources of uncertainty are at work. Especially when judging individuals themselves can decide when to truncate a sequentially unfolding sample, we must take the recursiveness of such iterative sampling into account. When the judging individuals themselves direct their sampling behavior (e.g., through sample truncation or by sampling from another option), the initial stage of Brunswikian uncertainty can result in polarized initial impressions or primacy effects. Small samples are more likely to display extreme patterns and lead to quickly converging impressions (see Hadar & Fox, ; Hertwig & Pleskac, ). These can trigger Thurstonian associative and generative processes within the judge’s mind, and the interim effect of these Thurstonian processes can influence the subsequent strategy of Brunswikian information sampling from the environment, which can in turn affect Thurstonian sampling again. This iterative process results in a genuine interaction of mind and environment as they jointly determine judgments and decisions. ..

Self-Truncated Sampling

The joint impact of both sources of uncertainty will become particularly apparent in the following discussion of self-truncated sampling in the context of an impression formation task. For the experiments reported below, which we conducted to investigate sample-based impression formation, we created a task setting modeled after the seminal work of Solomon Asch (). A target person is described by a series of sequentially presented traits, randomly drawn from the population of all traits of

https://doi.org/10.1017/9781009002042.018 Published online by Cambridge University Press

Brunswikian and Thurstonian Sampling



that person, from which the participants have to form an evaluative judgment of the target person’s likability. After each added trait in this sample-based impression task, participants can decide to either sample another trait word or to stop sampling and make a final impression judgment if they believe they have acquired sufficient information about the target person. However, when does this happen? When will they decide to truncate the sampling process? Unlike choices between two or more options, in which truncation usually occurs when a sufficiently strong difference between options has been accrued, the impression formation task outlined here relies on a single sample of traits that calls for a different stopping rule. Excluding the possibility that an impression task entails a hidden choice (e.g., between the present job candidate and the predecessor), a sensible truncation rule relies on convergence and settlement of an impression. When a growing trait sample settles on a sufficiently stable impression that cannot be expected to change much with newly added traits, it is reasonable to stop sampling and translate the impression into a final judgment. Thus, convergent and stable impressions trigger truncation, whereas conflicting and altering samples indicate the need to seek further information. Stochastic indeterminacy and conflict in growing samples of target traits can be expressed by statistical indices of (Brunswikian) uncertainty, such as the standard error of the sample estimate (e.g., the average scale value of the sampled stimulus traits) of the current target person’s likability. However, as sample truncation is a joint function of Brunswikian and Thurstonian uncertainty, the truncation decision is not solely determined by the statistical properties of the target person’s traits sampled so far. It also depends on the impact of Thurstonian oscillations within the judges’ mind on the experience of information convergence. When the Brunswikian sample is statistically stable and, at the same time, that stimulus-inherent stability aligns with Thurstonian sampling that supports the perception of stability, sufficiency, coherence, and readiness to judge, then truncation is very likely. Otherwise, when internal mental sampling activities leave judges uncertain and insufficiently prepared, then Thurstonian diffusion may obscure a statistically clear-cut Brunswikian trait sample, and hinder truncation. In any case, the truncation decision offers a strong opportunity for Thurstonian fluctuations to enter the iterative information acquisition and integration process. Conversely, the judges’ preparedness to make an impression judgment based on a strong primacy effect in a small sample depends crucially on Thurstonian processes.

https://doi.org/10.1017/9781009002042.018 Published online by Cambridge University Press

  ,  ,   Previous research on optional stopping in choice tasks necessarily assessed a combination of Brunswikian and Thurstonian aspects of uncertainty, since Thurstonian uncertainty is part of all decisions based on Brunswikian samples (e.g., Busemeyer & Townsend, ; Pleskac & Busemeyer, ; Ratcliff, ). However, no prior studies have tried to differentiate between Brunswikian and Thurstonian uncertainty systematically. Especially for impression formation tasks, Thurstonian uncertainty was hardly ever recognized or investigated. The remainder of this chapter is devoted to our deliberate attempt to fill this gap and delineate systematic influences of Thurstonian uncertainty within the judge’s mind in addition to Brunswikian uncertainty inherent in sampling of target traits. We elaborate on this extended perspective in a stepwise fashion. In the next sections, we introduce the yoked-control design we employed to demonstrate and measure the strength of Thurstonian sampling effects, defined as random sampling in a most simplified and agnostic manner. Then, in later sections, we will go beyond the simplifying assumption and isolate systematic factors of Thurstonian uncertainty. Under either assumption, we provide ample evidence that impression judgments – like all sample-based judgments and decisions – are susceptible to both sources of uncertainty.

. Experimental Inquiries into Thurstonian Judgment Effects As mental processes within the mind of the judging individual are manifold, dynamic, and variable, it does not seem to be too far-fetched to use the “black box” metaphor as guidance throughout the following sections. In this section, we leave the “black box” closed, conceptualizing Thurstonian uncertainty as random oscillation of unknown origin. However, as we shall see, even this abstract stochastic model allows us to demonstrate systematic results in experimental settings. In the following three sections, we will gradually open the “black box” and we will end up with precise properties and specific psychological origins of Thurstonian uncertainty. ..

Thurstonian Uncertainty Conceived as Random Oscillation

Starting from the closed “black box,” we conceptualize Thurstonian uncertainty as random variance, or noise. Although we understand mental processes are sensitive to a multitude of diverse causal origins (from neuronal dynamics, to internal periodics, perception, encoding, and retrieval activities), we can approximate these multicausal influences by a

https://doi.org/10.1017/9781009002042.018 Published online by Cambridge University Press

Brunswikian and Thurstonian Sampling



random-variance component. The mental manifestations of Brunswikian trait samples in the judge’s mind can be conceived as normally distributed random fluctuations scattered around the Brunswikian sample‘s expected mean (see Galton, , chapter ). This conceptualization of cognitive uncertainty as random noise is close to Thurstone’s (a, b) ideas of dispersion in psychological scaling. Thus, our preliminary attempt to understand the nature and consequences of Thurstonian uncertainty remains agnostic about the psychological origins of oscillations of the human mind. Granting that multiple causal influences may be at work and that there is no reason to believe that they are all (or even predominantly) biased in the same direction, we believe it is justifiable to start from Thurstonian oscillation conceived as a random process. .. Using Yoked-Control Designs to Measure Thurstonian Uncertainty In a series of sample-based impression formation experiments (Prager & Fiedler, ; Prager et al., ), we applied the following tool to measure Thurstonian effects on impression judgments, treating Thurstonian influences as random variance between individual judges. We let a primary judge in a pair of participants sample traits and truncate the sample when they felt ready to form a judgment. The secondary participant received exactly the same trait sample, presented in the same order and limited by the primary judge’s truncation decision, making them the yoked control. Yet, although both yoked partners received an identical Brunswikian trait sample, exactly in the same format, they differ with regard to their Thurstonian processes. Only the primary original sampler can be expected to be mentally ready to make a final judgment based on their self-truncated trait sample; the secondary, yoked control cannot be assumed to be similarly well prepared for an impression judgment. Thus, while the Brunswikian trait samples are controlled and held constant, the two yoked participants’ Thurstonian mind states cannot be expected to entail the same level of settlement and convergence necessary for a confident sample truncation and impression judgment. Only the person in the pair who originally truncated the sample was mentally prepared to make sense of and to form an integrative judgment of the target described by the current trait sample. The yoked-control participant, in contrast, who could not truncate their own sample, cannot be assumed to be aligned with the Brunswikian input to the same degree. Yoked controls have to make do

https://doi.org/10.1017/9781009002042.018 Published online by Cambridge University Press

  ,  ,   with somebody else’s sample truncation decision that cannot be expected to be aligned with their own internal Thurstonian sampling processes. Relying on different variants of this yoked-control paradigm, Prager and Fiedler () and Prager et al. () could demonstrate a variety of novel and psychologically sensible findings that testify to the theoretical fertility of the Thurstonian sampling concept. As explained in the following sections, these findings are general and fundamental enough to carry over to more refined conceptions that go beyond the simplifying randomnoise assumption. Convergent evidence from several experiments and experimental conditions demonstrates that, although Brunswikian uncertainty is equivalent for pairs of participants exposed to the very same trait samples, the impression judgments by the yoked controls differ systematically from the self-truncating partners’ judgments. To explain the nature and the psychological reasoning underlying this systematic pattern of Thurstonian effects, though, we must first take a closer look at what happens during self-truncated sampling from a Brunswikian sampling perspective. ..

Small-Sample Polarization in Impressions from Self-Truncated Sampling

First of all, it is necessary to explain the impact of self-truncation on impression judgments, compared to impressions informed by samples of experimentally controlled size. Indeed, one should not expect impression judgments informed by a self-truncated sample of a certain number of traits to be equivalent to judgments informed by a sample of equal size when truncation was determined experimentally. For experimenter-determined sample truncation, impression judgments tend to be more extreme with increasing sample size (Prager et al., ). In contrast, impressions based on self-truncated trait samples with selfdetermined sample size tend to be more extreme at small rather than at larger sample size because the truncation decision is influenced by the sample content, with small, clear-cut samples eliciting truncation. Thus, a distinct feature of self-truncated samples is that the resulting judgments are strongest and most confident when sample sizes are small rather than large. It is exactly with respect to this distinct feature that yoked-control judgments diverge from self-truncated samplers’ judgments, to an extent determined by further aspects of Thurstonian oscillations. Let us first describe the typical self-truncation effects in more detail. When a sample of traits is drawn successively from a population defined by

https://doi.org/10.1017/9781009002042.018 Published online by Cambridge University Press



Brunswikian and Thurstonian Sampling (A)

(B)

(C)

Figure . Empirically observed judgment strength J plotted against sample size n for (A) externally determined sample size (truncation was determined by the software), (B) for self-truncated sampling and (C) yoked controls who received exactly the samples of (B). Judgment strength denotes the extremity of the likability judgment toward the population direction (i.e., likability as it is judged for predominantly positive targets and likability judgments with reversed sign for predominantly negative targets). Thin grey lines connect individual averages per sample size and the solid black line averages per sample size of these individual averages of judgment strength J, error bars indicate corresponding standard errors.

a specific proportion and distribution of positive or negative traits, a typical finding is that resulting judgments increasingly reflect the dominant trend or preponderance of positive or negative traits as sample size increases (de Finetti, ; Edwards, ; Kutzner & Fiedler, ). Convergent evidence shows that the same sample proportion of positive to negative observations is worth more if it is observed in a large than in a small sample. Yet, a strong reversal of these polarization effects is obtained when sample size becomes dependent on the sample content through selftruncation. Self-truncation serves to accentuate the informative value, the diagnosticity, and hence the extremity and confidence of small samples compared to larger samples (Coenen & Gureckis, ; Prager & Fiedler, ; Prager et al., ). In a self-truncation condition, small samples remain so small precisely because the first few traits happened to be informative, diagnostic, and conducive to high confidence. Conversely, large samples are not truncated earlier precisely because the initial impression was unclear, equivocal, and conflict-prone. Even long extensions will typically not render large samples more clear-cut and conflict-free than small and early-truncated samples drawn from the same population. As evident from the downward slope in the curve of Figure ., panel B, smaller self-truncated samples lead to more extreme impression judgments. Self-truncation also results in a systematic negative correlation of sample size and judgment extremity.

https://doi.org/10.1017/9781009002042.018 Published online by Cambridge University Press

  ,  ,   ..

Regressive Shrinkage of Truncation Effects due to Thurstonian Misalignment

In the yoked-control paradigm, the pattern of self-truncation effects carries over to some degree to the yoked partners, who are provided with the selftruncated partners’ samples. However, misalignment in Thurstonian uncertainty between the original and the yoked participants is evident in regular regressive shrinkage of all truncation effects. On one hand, yoked controls’ judgments also become less extreme with increasing sample size and their correlations between sample size and extremity are negative across trials. This congruency is reflective of the Brunswikian sampling effects. After all, the samples were truncated at small sample size when early traits suggested strong impressions, and these primacy effects were carried over from the self-truncating to their yoked partners. On the other hand, however, the negative relation of judgment strength to sample size was weaker in the yoked controls than in the original truncating partners. Apparently, the yoked controls’ Thurstonian sampling processes diverged from their self-truncating partners; they were less prepared to stop sampling and base an impression judgment on a sample of traits that was truncated by another person, who was probably in a different Thurstonian state of mind at the very time of truncation. As a consequence, Thurstonian misalignment not only resulted in weaker impression judgments but also in weaker, more diluted truncation effects in yoked controls, compared to self-truncating samplers (see Figure ., panel C). Difference scores between judgments of self-truncating participants and their yoked partners provide sensible measures of Thurstonian mental oscillations. .. Isolating Systematic Factors of Thurstonian Uncertainty Moving a first step beyond the simplifying conceptualization of Thurstonian processes, from random noise to a more systematic perspective, we started to consider the dependency between self-truncated originals and the yoked controls. We do not yet open the “black box,” but merely introduce a distinction between temporal and interpersonal sources of Thurstonian variation. In an extended yoked-controls design (depicted in Figure .), Prager and Fiedler () presented participants in an other-yoked condition with trait samples that other participants had truncated in a previous block of trials, as in the former experiments by Prager et al. (). In contrast, participants in a self-yoking condition, in the second block received copies of their own trait samples, which they had

https://doi.org/10.1017/9781009002042.018 Published online by Cambridge University Press

Brunswikian and Thurstonian Sampling

self- truncated



self- truncated

intermission yoked controls (self)

yoked controls (other)

Figure . Schematic diagram of the self–other yoked controls design. The time axis is vertically oriented from top to bottom.

themselves truncated in the first block. Thus, in this extended paradigm, we decomposed the free-floating Thurstonian random process into (a) the impact of mere intertemporal oscillations in mental representations of the same trait samples (self-yoking) versus (b) the joint impact of both intertemporal and interpersonal oscillations (other-yoking). As expected, the regressive shrinkage was stronger in the other-yoked than in the self-yoked condition (Figure .), consistent with the notion that Thurstonian uncertainty can be decomposed into different sources of asynchrony (between person and within person across time). Obviously, asking the same person to form impressions from the same trait sample twice, separated only by the delay between blocks, causes less asynchrony in Thurstonian readiness to judge than asking different persons. ..

Diagnosticity

As we finally open the “black box,” we consider specific causal origins of Thurstonian variation. In the context of an integrative impression judgment task, the same Brunswikian trait input can give rise to systematically different weighting and inference processes resulting in different types of Thurstonian uncertainty. We refer to the task-dependent weighting of the sampled stimulus traits as “diagnosticity.” The dispersion of the 

The correlation between the natural logarithm of sample size and judgment strength diminished only tentatively from self-truncation r = . to yoked-controls . when reevaluation of samples in the “self”-yoked condition took place, whereas regressive shrinkage was considerably stronger when comparing self-truncation r = . and “other” yoked controls r = .. The reported values are means of individually calculated correlation coefficients.

https://doi.org/10.1017/9781009002042.018 Published online by Cambridge University Press

  ,  ,   (A)

(B)

(C)

Figure . Empirically observed judgment strength J plotted against sample size n for (A) self-truncated sampling in a first block, (B) yoked controls who received their own samples in a second block, and (C) yoked controls who received samples that were truncated by other participants, with yoked control participants of both (B) and (C) receiving samples from (A).

Thurstonian variation, whether a sampling-truncation threshold is exceeded, and whether the final impression judgments is affected stronger by positive or negative information depend crucially on the diagnosticity of the stimulus input. Impression formation from self-truncated samples involves an integrative cognitive process that regularly deviates from simple statistical integration, such as averaging of all sampled traits’ likability scale values (Anderson, ) in the person impression formation paradigm presented above. Aside from random noise caused by immanent activities of the cognitive system, the deviations of updated impressions from average trait values follow distinct patterns. When a task calls for judgments of a person’s honesty, for example, observing the target person telling a lie is more informative than observing them telling the truth (Rothbart & Park, ). The reason for this is that not all information is equally diagnostic. Diagnosticity of a sampled piece of information refers to its potential to change the current impression (Ajzen & Fishbein, ; Edwards, ; Reeder & Brewer, ; Trope & Bassok, ). Observing someone telling lies for example will shift our impression considerably toward “dishonesty,” whereas observing them telling the truth will not change our impression much. Information of high diagnosticity has a strong impact on impression integration, whereas low diagnosticity is associated with modest or even no change. Relatedly, impression judgments vary in the number and consistency of behavioral evidence required to confirm a hypothetical impression. If an impression judgment relies on nondiagnostic, commonplace information,

https://doi.org/10.1017/9781009002042.018 Published online by Cambridge University Press

Brunswikian and Thurstonian Sampling



such as observing someone telling the truth, a large amount of confirming evidence is required to confirm the hypothesis that the target person is honest. Moreover, very little diagnostic evidence (e.g., observing lying) is sufficient to disconfirm the honesty hypothesis. In contrast, little evidence of the diagnostic type is required to infer that somebody is dishonest, while it is very hard to get rid of the impression that someone is dishonest (Rothbart & Park, ). Diagnosticity is a feature that redefines properties of the Brunswikian stimulus sample by applying a situational- and task-dependent subjective Thurstonian interpretation of the stimulus input. Even when Brunswikian uncertainty remains unchanged, identical stimuli can change their meaning and thus their informativeness and diagnosticity considerably when they are processed or integrated under different sampling goals or in different contexts. First, diagnosticity and the informative value of the stimuli in a sample differs between the already discussed impression-formation tasks and choice tasks that involve selecting one of two or more options. In a choice task, information that discriminates between options, sets them apart, and increases between-option variance must be expected to change our opinion the most, exerting a strong impact on the encoding and integration process. Choice tasks call for weighting or selecting information that helps to classify targets into relevant categories or that discriminates between choice options (Skowronski & Carlston, ). In contrast, for an impression or estimation task, information that helps to place the target reliably on the relevant dimension, that is, information which is typical or even exclusive to certain target characteristics, is most diagnostic. Not only the structure but also the goal of a task (i.e., the attribute dimension of judgments and decisions) is of central importance to determining which stimuli happen to be diagnostic and informative. Skowronski and Carlston (, ) demonstrated that the typical negativity effects in impression formation in the morality domain (see the example of honesty-ratings above) can reverse into positivity effects when we move from the morality to the ability domain. When forming impressions about ability, positive information (success) is often more diagnostic than negative information (failure). Whereas success (e.g., solving a difficult mathematical equation; running the  m distance in less than  seconds) provides cogent evidence for high ability, failure can have many causes other than low ability (distraction, confusion error, etc.). Thus, different inference schemes for the ability and morality (Reeder & Brewer, ) moderate the relative diagnosticity of positive and negative

https://doi.org/10.1017/9781009002042.018 Published online by Cambridge University Press

  ,  ,   information and hence the shape and nature of Thurstonian oscillations. People give more weight to positive (i.e., success) than negative (failure) information when ability is judged, which clearly reverses typical effects found in morality-judgment contexts. This reversed positive–negative asymmetry in the diagnosticity of morality and ability by no means reflects random variation but theoretically deducible principles determined by the synergy of stimulus properties and task context. Diagnosticity does not only impact final judgments through differential weighting of distinct stimulus–context combinations; it also moderates the iterative feedback loop through which dynamically updated judgments trigger subsequent information search, particularly the decision to either continue sampling or to truncate information acquisition. In other words, diagnosticity determines both the strength of the resulting impression and the size of the sample. When diagnosticity is low because added traits are not informative enough to establish a final judgment, sample truncation is delayed, since uncertainty is not sufficiently reduced by uninformative input. In contrast, highly diagnostic input likely leads to clear-cut changes in impressions, and – when combined with convergence and freedom of conflict – results in early truncation and highly confident judgments.

. Thurstonian (Un-)certainty Determines the Decision to Truncate Sampling Diagnosticity is not only an interaction between individual stimuli and the task context. How newly sampled stimuli change the current impression also depends on which stimuli have been observed before. Redundancy and similarity of stimuli is a convenient way to systematically describe the relation of a current stimulus to preexisting knowledge and to other stimuli in the same sample. High redundancy within a sample (i.e., stimuli are all close and similar) undermines diagnosticity: little change in impressions can be expected from observing redundant or even repeated information. High redundancy within a sample facilitates sample truncation. When stimulus traits within a sample are redundant to each other, the sample conveys an impression of high convergence, settlement, and stability (Soll, ). That directly connects (un-)certainty to sample truncation: high levels of subjective uncertainty (i.e., high fluctuation of an impression within the mind of the beholder) are associated with the need to sample more, whereas certainty (i.e., little fluctuation) facilitates sample truncation. Accordingly, Dieckmann and Rieskamp () showed that high cue redundancy makes early truncation (expressed as noncompensatory

https://doi.org/10.1017/9781009002042.018 Published online by Cambridge University Press

Brunswikian and Thurstonian Sampling



strategies) more effective compared to less dependent cues in a dual option cue-sampling paradigm. Redundancy of stimuli can be suitably expressed in terms of multidimensional density (Alves et al., ; Unkelbach et al., ). Through multidimensional scaling, one can extract the distance in multidimensional space between each stimulus and all other items in the list. High density or low distance between stimuli of a sample reflect high redundancy and low diagnosticity, whereas high distance of information signals high levels of diagnosticity. In many ecologically generated sets of stimuli, density is strongly related to valence: Negative stimuli are typically more distant from each other compared to positive stimuli (Koch et al., ). Reversing this naturally emerging association between density and valence can even overwrite negativity effects in recognition memory: Deliberate selection of lowdistance negative and high-distance positive stimuli makes the typical memory advantage of negative stimuli disappear (Alves et al., ). In Prager and Fiedler (), we manipulated density, here between traits characterizing groups in a self-truncated group-impression formation task. Some target groups were characterized by samples of generally high within-sample density, others by samples of low density. Our results demonstrated that higher within-sample density indeed facilitates earlier truncation, and is associated with weaker judgments (of group likability), and less perceived within-group homogeneity. As mentioned above, Thurstonian oscillations have various causes. Therefore, further opening the black box, some researchers have experimentally isolated distinct determinants of Thurstonian uncertainty. As subjective (un-)certainty has direct consequences for sample truncation, experimental manipulation of these determinants is reflected in sampling behavior. Inducing uncertainty and perceived fluctuation typically induces the need to sample more compared to perceived stability and high levels of certainty. This principle does not only apply to impression formation, where the stability and settlement of the (current) impression determines the decision to truncate, but also in choice tasks, when perceived stability and settlement of the preference for one over alternative options signal low uncertainty. Accordingly, Lejarraga et al. () showed that high 

The literature cited here applies choice rather than impression-formation tasks (the vast majority of related literature exclusively examines choice rather than impression judgments). The argument remains the same: Choice-samples are perceived convergent and stable when the preference for one option does not vary much when new information is received, whereas impression-samples are convergent and stable when the resulting impression does not change much in the presence of newly added information.

https://doi.org/10.1017/9781009002042.018 Published online by Cambridge University Press

  ,  ,   outcome variability caused participants to sample more before they finally chose one out of two possible lotteries. Outcome variation is an index of perceived risk of losing money in economic scenarios, but also of the risk of unpleasant encounters in the social domain (see Levy, ). The decision to truncate sampling in both impression formation and choice tasks requires a truncation threshold, that is, the level of certainty or perceived stability and settlement at which the judging individual stops sampling. This truncation-threshold is systematically sensitive to several aspects of Thurstonian uncertainty. Granting a constant level of Brunswikian uncertainty in an information sample of environmental stimuli, the judges’ readiness to truncate can vary tremendously. Thurstonian uncertainty determines, in particular, how the threshold of maximally acceptable uncertainty is set in a given task setting and stimulus context (Peterson et al., ; Tanner & Swets, ). In signal detection terms, some set a liberal threshold, judging more readily and accepting low levels of (Brunswikian) evidence as a basis of their judgment; others are more cautious and stop information-seeking only when they have reached very low levels of uncertainty (which typically requires more extensive sampling). Different Thurstonian states of mind, causing measurable differences in sampling behavior have been induced experimentally. Desender et al. () manipulated outcome variability orthogonally to Brunswikian discriminability. They used a color-discrimination task (e.g., a forced choice whether a set of color shades was rather “red” or “blue”) with different levels of variability of color between stimuli. Even when stimulus variability (i.e., color variability) did not objectively affect Brunswikian discriminability, participants nevertheless solicited more evidence when stimulus variance was high rather than low. Again, holding Brunswikian uncertainty constant, this experiment demonstrated a distinct influence of Thurstonian uncertainty on sample truncation. Past research on over- and under-confidence has demonstrated that the subjective (Thurstonian) certainty is not always optimally calibrated to actual performance or what we can expect from the current Brunswikian sample before making a decision (for a summary see Erev et al., ). Dougherty () proposed a decision-making version of a memory model to integrate Brunswikian and Thurstonian aspects of how judging individual’s perceived (un-)certainty is calibrated to their actual performance (e.g., correctness of choice). He showed for example, that encoding quality (the accuracy with which the features of a memory trace are written to long-term memory) has a great impact on confidence calibration: The higher the encoding quality, the lower the Thurstonian noise in judging

https://doi.org/10.1017/9781009002042.018 Published online by Cambridge University Press

Brunswikian and Thurstonian Sampling



individuals’ confidence. That reduced Thurstonian noise in turn allows for better calibration of subjective confidence in deciding correctly. .. Affective States Similarly, a strong and systematic impact of Thurstonian uncertainty on sampling behavior can be observed in a line of research inspired by Schwarz and Clore’s () affect-as-information framework with Brunswikian uncertainty held constant. In an advice-taking paradigm, Gino et al. () showed that under experimentally induced anxiety states, more advice was sought from the social environment. Additionally, this advice is also weighted more in a state of anxiety compared to a neutral affective state. Relying on a lottery rather than an advice-taking paradigm, but following the same reasoning, Frey et al. () found that participants in a fearful state engaged in more exploration (i.e., prolonged sampling) before making a choice than participants in other affective states. They used both naturally occurring and experimentally induced emotions. The resulting larger samples gathered by participants in the fear condition enabled them to experience more rare events than people in different affective states. These findings nicely demonstrate the role of emotions in the construction of Thurstonian uncertainty. In both the aforementioned studies, the extended information search caused by anxiety or fear, compared to other affective states, took place in the same sampling environment with constant Brunswikian uncertainty parameters. Participants in different mood states were exposed to the same objective evidence strength, but they obviously had different (Thurstonian) levels of required certainty (i.e., different truncation thresholds) before a decision was made. Anxiety and fear as compared to neutral or happy states enhanced the need to stop only at high levels of certainty. Additionally, both studies highlight the systematic nature of the emotionally induced levels of Thurstonian uncertainty, going beyond the conceptualization of Thurstonian uncertainty as mere random variation between individual participants or between time and situations within participants. Wulff et al. () relied on a similar lottery-choice task; they also kept Brunswikian uncertainty equivalent between experimental conditions. However, they contrasted participants’ sampling and choice behavior in a single-play with a multiplay environment. Single-play environments are used to assess short-term aspirations, as the chosen lottery is played only once, whereas multiplay environments serve to assess long-term

https://doi.org/10.1017/9781009002042.018 Published online by Cambridge University Press

  ,  ,   aspirations – the chosen lottery is played multiple times. There is no difference between single- and multiplay environments with regard to Brunswikian uncertainty since the lotteries’ expected values are the same across both scenarios. However, playing a gamble once or repeatedly causes pronounced differences in the variance of expected outcomes, which can be supposed to be a core determinant of Thurstonian uncertainty. Playing only once results in a wider range of possible outcomes, whereas expectations center around the expected value when regarding the combined outcome of several rounds of the same gamble. Accordingly, participants sampled more in a multiplay than in a single-play environment. Therefore, salience of the very same expected loss can systematically vary with shortversus long-term perspectives on the task. Again, identical contexts in terms of Brunswikian uncertainty can result in different levels of Thurstonian uncertainty.

.

Conclusion

Starting from Thurstone’s () law of comparative judgment, we delineated and elaborated on the notion of a dispersed mental representation of a judgment target, conceived as a distribution of responses that can vary on an underlying continuum over time, contexts, and between individual judges. We used the term Thurstonian uncertainty to refer to the resulting degrees of freedom in impression judgments based on invariant sets of target stimuli. As we have shown in a review of our own research as well as in a cursory sketch of the extant literature, a comprehensive theory of sample-based judgments and decisions has to take two distinct but intertwined sampling processes into account. The first process, termed Brunswikian sampling, refers to the accrual of proximal-cue observations from which a distal entity that is not itself amenable to direct perception can be inferred or construed. This sampling process takes place in the information environment. Prior to all cognitive processes within the judge’s mind (e.g., perception, encoding, memory organization, forgetting), the environment provides the judge with samples of relevant observations or empirical data (e.g., traits in impression formation; responses to test items in diagnostics; arguments in political persuasion). Yet, regardless of what and how much information the environment renders available, judgments and decisions are not solely determined by this Brunswikian sampling input. They also depend in several fundamental ways on a second stage of Thurstonian sampling that takes place in the

https://doi.org/10.1017/9781009002042.018 Published online by Cambridge University Press

Brunswikian and Thurstonian Sampling



judge’s mind, reflecting all kinds of internal fluctuations: unsystematic or periodic oscillations within the cognitive system; cognitive extensions and elaborations of externally provided stimuli based on prior knowledge; autobiographical associations; or analogical and context-driven inferences. The effective cognitive representation of the judgment target, conceived as a distribution of varying responses to the target, constitutes a blend of both sampling stages. The same Brunswikian sample of target data (e.g., the same set of target traits) can lead to divergent judgments, depending on the manner in which Thurstonian sampling moderates the effective representation in the judge’s mind. While it seems obvious that internal mind states can cause strong biases in judge’s bottom-up inferences (e.g., subjective interpretation of the same data) as well as top-down strategies (e.g., motivated hypothesis testing leading to confirmation biases), we have confined our theorizing to Thurstonian influences to the context of purely random sampling first. Even in such an apparently unbiased information environment, the freedom to truncate sampling at the right moment offers an opportunity for Thurstonian processes to exert a strong influence on the final judgment. Self-truncated sampling creates a recursive process of seeking information, integrating it to the current impression on the target, and finally deciding whether to continue the circle of sampling and integration or to stop to make a conclusive judgement of the target. This recursiveness of sampling and cognitive processing means that judgments, but also the decisions to truncate, are the result of an interactive combination of statistical sampling and cognitive mechanisms. Both aspects together incorporate Brunswikian (sample-side) and Thurstonian (cognitive) aspects of uncertainty to the overall process, which mutually depend on each other. Samples are truncated, and resulting judgments are strong when sampled information converges. Converging information signals impression stability when both (Brunswikian) stochastic uncertainty and Thurstonian uncertainty are low. Thurstonian uncertainty is a powerful concept, even when conceived as “black-box” randomness. Judgments formed by self-truncating samplers and by their yoked controls differ in substantial ways due to the unequal synchrony of self-truncated samples with the original samplers’ but not their yoked partners’ readiness to judge a target described by exactly these traits. Opening the “black box” of determinants of Thurstonian uncertainty, we are in a position to begin to understand where a specific stimulus is located on the psychological scale in a specific situation and task context. Aspects subsumed under the concept of diagnosticity determine how a certain stimulus is interpreted and weighted given a specific

https://doi.org/10.1017/9781009002042.018 Published online by Cambridge University Press

  ,  ,   task context and goal. Importantly, we noted that Thurstonian uncertainty does not only determine how information is integrated, weighted, and interpreted, but also when a sample is truncated. The basic rule that low levels of Brunswikian uncertainty and high perceived stability predict early sample truncation turned out to be moderated by Thurstonian influences on subjective confidence versus insecurity and a feeling of divergence versus convergence of the sampled stimuli, beyond their pretested scale values. The conceptualizations and empirical examples presented in this chapter are not exhaustive; many other sampling approaches and models presented in this volume could contribute further insights to elucidating the Thurstonian-uncertainty framework. Returning to Thurstone’s original idea of a dynamic psychological scale, Stewart et al’s. () notion of decision by sampling or Parducci’s () seminal range-frequency hypothesis may suggest useful hints for how to construct a formalized model of Thurstonian uncertainty and scaling. In any case, we firmly believe that Thurstonian uncertainty can be fruitfully refined in future research by elaborating how information integration and the decision to stop sampling are determined in an emergent interplay of environmental (Brunswikian) and cognitive (Thurstonian) sampling processes. R E F E R EN C E S Ajzen, I., & Fishbein, M. (). A Bayesian analysis of attribution processes. Psychological Bulletin, (), –. https://doi.org/./h Alves, H., Unkelbach, C., Burghardt, J., Koch, A. S., Kru¨ger, T., & Becker, V. D. (). A density explanation of valence asymmetries in recognition memory. Memory & Cognition, (), –. https://doi.org/./s-- Asch, S. E. (). Forming impressions of personality. Journal of Abnormal and Social Psychology, , –. http://dx.doi.org/./h Busemeyer, J. R., & Townsend, J. T. (). Decision field theory: A dynamiccognitive approach to decision making in an uncertain environment. Psychological Review, (), –. doi:.//-X... Coenen, A., & Gureckis, T. M. (). The distorting effect of deciding to stop sampling. Proceedings of the th Annual Conference of the Cognitive Science Society. De Finetti, B. (). La prévision: Ses lois logiques, ses sources subjectives [Foresight: Its logical laws, its subjective sources]. Annales de l’Institut Henri Poincaré, , –. Denrell, J. (). Why most people disapprove of me: Experience sampling in impression formation. Psychological Review, (), –. https://doi .org/./-X...

https://doi.org/10.1017/9781009002042.018 Published online by Cambridge University Press

Brunswikian and Thurstonian Sampling



Desender, K., Boldt, A., & Yeung, N. (). Subjective confidence predicts information seeking in decision making. Psychological Science, (), –. https://doi.org/./ Dieckmann, A., & Rieskamp, J. (). The influence of information redundancy on probabilistic inferences. Memory & Cognition, (), –. https://doi.org/./BF Dougherty, M. R. P. (). Integration of the ecological and error models of overconfidence using a multiple-trace memory model. Journal of Experimental Psychology: General, (), –. https://doi.org/ ./-... Edwards, W. (). Optimal strategies for seeking information: Models for statistics, choice reaction times, and human information processing. Journal of Mathematical Psychology, (), –. https://doi.org/ ./-()- Erev, I., Wallsten, T. S., & Budescu, D. V. (). Simultaneous over- and underconfidence: The role of error in judgment processes. Psychological Review, (), –. https://doi.org/./-X... Fiedler, K. (). Beware of samples! A cognitive-ecological sampling approach to judgment biases. Psychological Review, (), –. https://doi.org/ ./-X... Fiedler, K., & Wänke, M. (). The cognitive-ecological approach to rationality in social psychology. Social Cognition, (), –. https://doi.org/ ./soco.... Frey, R., Hertwig, R., & Rieskamp, J. (). Fear shapes information acquisition in decisions from experience. Cognition, (), –. https://doi.org/ ./j.cognition... Galton, F. (). Natural inheritance. Macmillan. Gino, F., Brooks, A. W., & Schweitzer, M. E. (). Anxiety, advice, and the ability to discern: Feeling anxious motivates individuals to seek and use advice. Journal of Personality and Social Psychology, (), –. https://doi.org/./a Hadar, L., & Fox, C. R. (). Information asymmetry in decision from description versus decision from experience. Judgment and Decision Making, , –. Hertwig, R., & Pleskac, T. J. (). Decisions from experience: Why small samples? Cognition, , –. http://dx.doi.org/./j.cognition ... Juslin, P., & Olsson, H. (). Thurstonian and Brunswikian origins of uncertainty in judgment: A sampling model of confidence in sensory discrimination. Psychological Review, (), –. https://doi.org/./X... Juslin, P., Olsson, H., & Björkman, M. (). Brunswikian and Thurstonian origins of bias in probability assessment: On the interpretation of stochastic components of judgment. Journal of Behavioral Decision Making, (), –. https://doi.org/./(SICI)-():..CO;-

https://doi.org/10.1017/9781009002042.018 Published online by Cambridge University Press

  ,  ,   Koch, A., Alves, H., Kru¨ger, T., & Unkelbach, C. (). A general valence asymmetry in similarity: Good is more alike than bad. Journal of Experimental Psychology: Learning, Memory, and Cognition, (), –. https://doi.org/./xlm Kutzner, F. L., & Fiedler, K. (). No correlation, no evidence for attention shift in category learning: Different mechanisms behind illusory correlations and the inverse base-rate effect. Journal of Experimental Psychology: General, (), –. https://doi.org/./a Lejarraga, T., Hertwig, R., & Gonzalez, C. (). How choice ecology influences search in decisions from experience. Cognition, (), –. https://doi.org/./j.cognition... Levy, L. H. (). The effects of variance on personality impression formation. Journal of Personality, (), –. https://doi.org/./j.- ..tb.x Parducci, A. (). Category judgment: A range-frequency model. Psychological Review, (), –. https://doi.org/./h Peterson, W. W. T. G., Birdsall, T., & Fox, W. (). The theory of signal detectability. Transactions of the IRE Professional Group on Information Theory, (), –. Pleskac, T. J., & Busemeyer, J. R. (). Two-stage dynamic signal detection: A theory of choice, decision time, and confidence. Psychological Review,  (), –. doi:./A Prager, J., & Fiedler, K. (a). Forming impressions from self-truncated samples of traits: Interplay of Thurstonian and Brunswikian sampling effects. Journal of Personality and Social Psychology, (), –. https://doi.org/./pspa.supp Prager, J., & Fiedler, K. (b). Small-group homogeneity: A crucial ingredient to inter-group sampling and impression formation. Unpublished manuscript, Heidelberg University. Prager, J., Krueger, J. I., & Fiedler, K. (). Towards a deeper understanding of impression formation: New insights gained from a cognitive-ecological perspective. Journal of Personality and Social Psychology, (), –. https://doi.org/./pspa Ratcliff, R. (). A theory of memory retrieval. Psychological Review, (), –. https://doi.org/./-X... Reeder, G. D., & Brewer, M. B. (). A schematic model of dispositional attribution in interpersonal perception. Psychological Review, (), –. https://doi.org/./-X... Rothbart, M., & Park, B. (). On the confirmability and disconfirmability of trait concepts. Journal of Personality and Social Psychology, (), –. https://doi.org/./-... Schwarz, N., & Clore, G. L. (). Mood, misattribution, and judgments of well-being: Informative and directive functions of affective states. Journal of Personality and Social Psychology, (), –. https://doi.org/./ -...

https://doi.org/10.1017/9781009002042.018 Published online by Cambridge University Press

Brunswikian and Thurstonian Sampling



Skowronski, J. J., & Carlston, D. E. (). Social judgment and social memory: The role of cue diagnosticity in negativity, positivity, and extremity biases. Journal of Personality and Social Psychology, (), –. https://doi.org/ ./-... (). Negativity and extremity biases in impression formation: A review of explanations. Psychological Bulletin, (), –. https://doi.org/ ./-... Soll, J. B. (). Intuitive theories of information: Beliefs about the value of redundancy. Cognitive Psychology, (), –. https://doi.org/./ cogp.. Stewart, N., Chater, N., & Brown, G. D. A. (). Decision by sampling. Cognitive Psychology, (), –. https://doi.org/./j.cogpsych. .. Tanner, W. P., Jr., & Swets, J. A. (). A decision-making theory of visual detection. Psychological Review, (), –. https://doi.org/./ h Thurstone, L. L. (a). A law of comparative judgment. Psychological Review, (), –. https://doi.org/./h (b). Psychophysical analysis. American Journal of Psychology, , –. https://doi.org/./ Trope, Y., & Bassok, M. (). Confirmatory and diagnosing strategies in social information gathering. Journal of Personality and Social Psychology, (), –. https://doi.org/./-... Unkelbach, C., Fiedler, K., Bayer, M., Stegmu¨ller, M., & Danner, D. (). Why positive information is processed faster: The density hypothesis. Journal of Personality and Social Psychology, (), –. https://doi.org/ ./-... Wulff, D. U., Hills, T. T., & Hertwig, R. (). How short- and long-run aspirations impact search and choice in decisions from experience. Cognition, , –. https://doi.org/./j.cognition...

https://doi.org/10.1017/9781009002042.018 Published online by Cambridge University Press

 

The Information Cost–Benefit Trade-Off as a Sampling Problem in Information Search Linda McCaughey, Johannes Prager, and Klaus Fiedler

The search for information before arriving at a decision is a most natural activity. Moreover, the information acquired about options has a major influence on the decision outcome. Cognitive-ecological approaches emphasise the information sample as major determinant of subsequent cognitive processing and decision outcomes and take into account that most entities that are the focus of judgements or decisions cannot be assessed directly – be it the risks involved in a certain therapy or the potential happiness derived from a consumer product. A sample of information – a subset of direct observations of proxies – is the only way to estimate said entities. While sometimes it may be realistic to assume that information is available in the environment and just needs to be processed to form the basis of our decisions, far more often the information sample is the result of an individual’s active search process, for example when we choose whom or what to interact with and when to stop. Since cognitive processes necessarily influence (if not fully determine) active search, which in turn determines the information sample that forms the basis of further cognitive processing and decision making, investigating the determinants of active search is imperative. This perspective is emphasised by other recent theoretical developments outlined in other chapters of this volume: Denrell and Le Mens’s concept of hedonic sampling, for example, describes how information search (sampling) is guided by active sampling strategies that serve to attain pleasure and to avoid displeasure and pain, as in the Hot Stove Effect (see Chapter  by Denrell & Le Mens in this volume; Denrell & March, ). Chapter  by Harris and Custers (this volume), illustrates how reward-rich versus reward-poor environments induce different active information search strategies concerning two options as sources. In the first case, in which both options are rewarding, search strategies emphasise 

Corresponding author, email address: [email protected]



https://doi.org/10.1017/9781009002042.019 Published online by Cambridge University Press

The Information Cost–Benefit Trade-Off



exploitation and thus serve to uphold biased initial beliefs about the options. In the second case, however, where both options are meagre, search strategies focus on exploration and thus successfully correct the biased initial belief. Apart from the influences on source selection in information search, a particularly interesting and important aspect to consider is that of information amount. Information amount, defined as sample size in the case of sampling approaches, tends to be tightly related to decision quality: Usually, the more information one has accumulated about the options, the more comprehensive and accurate one’s judgements of them will be – the bigger the (random) sample, the more accurate its estimate. However, information acquisition requires resources such as time or effort, and may even incur financial costs. Hence, one cannot simply act on the maxim ‘the more the better’. Instead, the cost of the information (whatever form it may take) needs to be weighed against the benefits of the information in a cost–benefit trade-off.

. Benefits and Costs of Information: An Inherent Trade-Off At every step of the information search, one has to decide whether it is worthwhile to carry on searching or whether one should make the final decision. Whether it is worthwhile depends on the cost–benefit trade-off, that is, whether the information’s benefit outweighs its cost. Information’s costs are simply higher the more one has to expend to acquire it. Its benefits, on the other hand, increase with higher expected value, which can either be increased by a higher probability of a correct choice of the better outcome or by increasing the outcomes’ values themselves. How well can we take all these aspects into account to avoid making rash decisions based on too little information, on the one hand, but also to avoid acquiring too much or overpriced information, on the other hand? Put in simpler terms: do we know how much information is worthwhile? In the remainder of this article, we first provide a review of previous attempts to assess information costs in the form of financial and time costs. As an extension of the literature on time costs, we will present our own experiments implementing a sample-based speed–accuracy trade-off that directly links speed and accuracy via sample size. The apparently robust bias towards accuracy found in those experiments will be discussed critically in light of evaluability and with reference to a recent review by Evans et al. (), who showed that the optimal strategy varies vastly depending on the specific task parameters and, hence, does not allow for

https://doi.org/10.1017/9781009002042.019 Published online by Cambridge University Press

  ,  ,   general claims about the optimality of human performance. Our own experiments using financial information costs will be presented to corroborate this claim, showing that the investigation of specific aspects of adaptivity might be a more worthwhile research aim. In preparation for this mental journey, we will start out by looking at previous experiments investigating information costs in the following.

. Investigating Information Costs ..

Financial Costs

The question of whether people know how much information is worthwhile captivated the attention of the Bayesian approach to judgement and decision making and cognition (Slovic & Lichtenstein, ) in the s and s, with mixed findings regarding the issue of information amount (for a review, see Connolly & Serre, ). The paradigm most commonly used to investigate the question is elegantly simple and relied on financial costs. The participant was shown two urns with different proportions of two colours of marbles, for example,  per cent red and  per cent white marbles versus  per cent red and  per cent white marbles. The urns represented two different Bernoulli distributions (distributions with binary outcomes and a certain probability p of one of the outcomes occurring). The experimenter would flip a coin to determine which urn, that is, which distribution, would be selected without telling the participant, whose task it was to infer which of the two urns had been selected based on a sample of marbles drawn from it. One by one, the participant could then sample observations from the urn at a certain monetary cost, each observation being a marble randomly drawn from the selected urn. They could draw as many or as few items as they wished before making a final decision, the correctness of which was rewarded (and usually, its incorrectness punished). The factors manipulated in this paradigm were mostly those that influence the trade-off between the cost and benefit of information discussed above: the information cost, reward, and punishment (gain and loss) for correct and incorrect choices, and the discriminability of the choice options, that is, how similar the two urns were in terms of their distributions. Operationalising both information cost and payoff in monetary terms, such experiments demonstrated a certain sensitivity to costs and payoffs on the participants’ side. When the information cost was low or the payoff for an accurate choice was high, they tended to search for more information.

https://doi.org/10.1017/9781009002042.019 Published online by Cambridge University Press

The Information Cost–Benefit Trade-Off



When the information cost was high or the payoff low, they tended to search less, investing less money in the information search. This sensitivity was, however, limited, meaning that participants’ adjustments of the sample size pointed in the right direction but fell short of an optimal strategy specified in Bayesian terms (e.g., Edwards, ; Pitz, ; Pitz et al., ). Next to information costs and payoffs, one main determinant of the optimal amount to sample is the evidence conveyed by the sampled information, which is determined by the discriminability of the two hypotheses or options and was also examined in the same paradigm. To attain different levels of discriminability for the two options, Fried and Peterson () varied the outcome probabilities of the two options (between proportions of . and . vs . and .), with sampled information consisting of certain light bulbs lighting up. The condition that could engage in optional stopping, deciding when to stop during the sampling procedure, was shown to acquire too little information compared to the optimal Bayesian strategy. Another condition, which had to indicate in advance how many observations they wanted to acquire, performed closely to the respective optimal strategy. Snapper and Peterson () used a very similar paradigm to manipulate the discriminability of the options and hence, the evidence conveyed by the information. Again, a decision had to be made between two options as possible sources of a sequence of numbers. In this experiment, however, the two possible sources were not Bernoulli distributions with binary outcomes, but rather normal distributions with continuous values as observations. This introduced another way in which the two options could be more or less discriminable, that is, via the difference between their distribution means. The further apart their means were, the less the two distributions overlapped and the more discriminable they were. And the more discriminable the two options, the less information was needed to discriminate them (to the same degree). However, participants did not





Since each new observation in the sample calls for a re-evaluation of the evidence in favour of one hypothesis in relation to the evidence in favour of the contrary hypothesis, it is commonly assumed that Bayesian updating of the conditional probabilities of the two hypotheses and their posterior odds is a suitable normative standard of comparison given suitable priors – an optimal strategy (Edwards, ). The best minimum odds ratio at which a decision should be made in favour of the leading option, can be determined through simulations that take into account the information cost and payoff parameters. We take evidence to mean the change in the posterior odds (in favour of one over the other of the two hypotheses) that the data leads to (understood in Bayesian terms).

https://doi.org/10.1017/9781009002042.019 Published online by Cambridge University Press

  ,  ,   adjust their sampling behaviour enough in response to the levels of discriminability, which led to oversampling in this paradigm. In a variation of the paradigm, Hershman and Levine () posed a hypothetical military reconnaissance problem, in which participants had to determine which of two missile mixes was being used. After a first free reconnaissance flight had already yielded a sample of ten observations of missiles, participants were given the opportunity to buy a second reconnaissance flight that would yield another ten observations. The cost of the second sample was varied and through varying the proportion of the two missiles, the evidence or diagnosticity of the first sample also varied. Participants, again, mostly overpurchased information, meaning that they tended to buy additional samples even when those were overpriced compared to how little they increased the probability of choosing the correct answer. Overall, participants appeared to be somewhat sensitive to the different constraints of the tasks, adjusting the amount of sampled information to the payoff, the information cost and the discriminability of the options to some extent. Compared to the optimal Bayesian strategy, however, some experiments showed that participants sampled too much information, while others showed that participants sampled too little. How these mixed findings can be reconciled will be discussed later. Explicit financial costs are not the only type of cost we are confronted with when searching for information on options. Often, information is freely accessible, and the information cost most important to consider is that of the time one spends on information search into options. .. Time Costs The cost of time is more difficult to implement in an experimental setting than monetary costs. One way of making time or waiting costly is to implement a time limit after which the potential payoff is foregone (Madan et al., ). A variation of that is to make the payoffs decrease with the passage of time, which is usually implemented to induce time pressure (Hausfeld & Resnjanskij, ; Payne et al., ). Time passes whether we want it to or not; it cannot be spent or accumulated in the same way as money can. Hence, what is actually meant by ‘time cost’ is most often the opportunity cost of time – the cost of foregoing alternative uses for a certain amount or period of time, that is, being unable to use the time to do something else. This conception is

https://doi.org/10.1017/9781009002042.019 Published online by Cambridge University Press

The Information Cost–Benefit Trade-Off



reflected more closely in a research design in which a total session time was allocated to work through, say,  decisions and each minute taken longer would be punished (each minute shorter rewarded; Rieskamp & Hoffrage, ). Although fascinating, this research only offered limited insight into how well participants regulate the amount of information collected before deciding, since it was not related to a framework of optimal regulation. Its focus rather lay on the influence of time pressure on risk preferences (Madan et al., ) or on decision strategies (Payne et al., ; Rieskamp & Hoffrage, ; but see Hausfeld & Resnjanskij, ). Another clever and fairly common way of implementing opportunity cost of time is through a speed–accuracy trade-off. Speed is an inverse function of required time relative to the total time allotted. When accuracy and speed need to be traded off against each other, emphasising accuracy at the expense of speed means that time is invested to achieve a certain degree of accuracy, accepting the opportunity cost of lower speed, which consists of the rounds or trials one could have worked through instead, which will be irrevocably lost. Speed–accuracy trade-offs are very common in perceptual research, where they are often modelled by drift-diffusion models, mostly based on random dot motion tasks. Processing perceptions of moving dots to judge the direction that they tend to move in is not directly comparable to sampling as information search. Nevertheless, the typical finding that participants tend to overemphasise accuracy (waiting too long to make a decision; Evans et al., ) will prove relevant. In one of the few instances that are not perceptual, Jarvstad et al. (), constructed speed–accuracy trade-offs not only with perceptual but also with higher-order cognitive (simple arithmetic and mental rotation) tasks, of which as many as possible were to be completed correctly in the specified time period of two minutes. Both types of tasks required cognitive processing to be solved, meaning that spending more time on an individual task increased the chances of being correct and obtaining the reward, but it also incurred the cost of that time not being available for later decisions, thereby decreasing the overall number of tasks completed in the time period. Hence, there was a trade-off between the speed at which the tasks were completed and the (expected) accuracy at which they were completed. Even though information search was not the focus of those studies or, in fact, any studies we know of investigating speed–accuracy trade-offs, they can be adapted in a way that makes them well suited to investigating the opportunity cost of time in information search.

https://doi.org/10.1017/9781009002042.019 Published online by Cambridge University Press

  ,  ,  

. Linking Speed and Accuracy More Directly through Sample Size Adapting a speed–accuracy trade-off for the investigation of time costs in information search first requires that the constituting task be a decision based on sampled information. Since sample size has a given relation to one dimension of the trade-off, accuracy, the relation to the other dimension, speed, has to be established next. This is achieved by tying the amount of acquired information (i.e., sample size) directly to the elapsing time by having the sample increase at a steady rate of time. This specification achieves a more direct (and also somewhat more objective) relation between accuracy and speed than is present in other speed–accuracy trade-off, where the trade-off depends on the participants’ processing, attention, reaction time, and many other factors. Following these considerations, we (Fiedler et al., ) constructed and implemented a novel speed–accuracy trade-off in sample-based choices, which allowed us to operationalise speed and accuracy in terms of the same joint scale, namely sample size. To the best of our knowledge, this is the only experimental paradigm so far that combines information sampling with a trade-off task that links speed and accuracy to the same natural measure of information amount, namely the size of a binary sample of choice options’ outcomes. At a more concrete level, the paradigm took the form of an investment game that had participants decide between pairs of stocks. The stocks were described by samples of the changes in share price, which were explicitly described as being drawn randomly from the last  trading days. Each sample or information was displayed in the form of an arrow that indicated a price increase when it pointed upwards, or represented a price decrease when it pointed downwards (see Figure .; in the actual experiment, upward pointing arrows were green and downward pointing arrows were magenta). The aim was to identify and choose the stock that had a larger proportion of positive share price changes within the last  trading days. Identifying it correctly resulted in a reward of points, while making the wrong decision implied a loss of points. The price change information for the stocks was implemented by drawing samples from Bernoulli distributions with certain probabilities of a price increase, p(↑), that could range from . to . (in steps of .). The difference between the two probabilities (of price increases) of stock pairs A and B, Δp ¼ pA(↑) – pB(↑), was manipulated and could take the values Δp ¼ ., ., or .. This led to a dynamic difficulty parameter, with some

https://doi.org/10.1017/9781009002042.019 Published online by Cambridge University Press

The Information Cost–Benefit Trade-Off



Figure . Graphical illustration of a fund-investment choice task developed by Fiedler et al. () to investigate speed–accuracy trade-offs in sample-based choice tasks.

trials being easier because the stocks were easier to discriminate (e.g., Δp ¼ .) and other rounds being harder because the stocks’ ps were closer together (e.g., Δp ¼ .). As indicated above, what made it a speed–accuracy trade-off was that participants had a limited total time (e.g.,  minutes in Experiment ) and that the sample information appeared at a rate of  ms per arrow. Hence, accuracy would increase with time spent waiting for more information, but the amount of rounds accomplished, that is, speed, would decrease with time spent waiting for information in a particular round. Accuracy had to be traded off against speed, since both had advantages and disadvantages with respect to the points one aimed to win by making correct decisions. Crucially, the number of sampled prior outcomes (sample size n) afforded a direct quantitative link between accuracy and speed, because n is functionally related to both trade-off components and this two-fold functional relationship is well defined and understood. Given a presentation speed of  ms per arrow, the time required for a sample of n arrows amounts to n/ seconds, and the total time period is sufficient for , seconds (¼ minutes) divided by (n/) samples of average size n (plus an additive constant for choice execution). By comparison, statistical sampling theory tells us that the standard error SE of the sample mean (i.e., the expected inaccuracy) decreases with increasing n according to the formula SE ¼ Standard Deviation / √n. Thus, while the

https://doi.org/10.1017/9781009002042.019 Published online by Cambridge University Press

  ,  ,   number of choices completed in the given time decreases linearly with sample size n, the corresponding increase in accuracy is clearly sublinear (i.e., the rate of inaccuracy shrinkage is √n rather than n). Sampling n more observations costs n more time units but only yields a standard error decrease of √n. Thus, for many parameter specifications (above-chance accuracy; reasonable information costs; fairly even payoff for correct and incorrect choices), maximising the expected total payoff (i.e., the average payoff gained from all completed choices) calls for a fast strategy, sacrificing accuracy. In other words, over a wide range of parametric conditions, speedy small-n strategies will be superior to accuracy-based large-n strategies. Although larger samples produce more accurate choices than smaller samples, after the first few items, additional items tend to decrease the number of completed choices more than they increase the average payoff per choice. ..

Substantial Oversampling

Yet, a whole series of experiments employing the speed–accuracy trade-off paradigm described above consistently demonstrated that participants collect far too large amounts of information. Owing to the different relations of speed and accuracy to sample size, the strategies that lead to the most points are very fast, meaning that they are based on very small samples. The reason is that each arrow requires the same amount of time to arrive, but it does not always bring the same increase in (expected) accuracy, as is illustrated in Figure .. Earlier arrows lead to a bigger

Proportion Correct

1.0

Dp

0.8

.70 − .30 .60 − .40

0.6

0.4

.55 − .45

2

4

6

8

10

12

14

16

Sample Size

Figure . Simulation results of the proportion of correct decisions as a function of mean sample size n based on an algorithm that always samples n observations.

https://doi.org/10.1017/9781009002042.019 Published online by Cambridge University Press

The Information Cost–Benefit Trade-Off



increase in accuracy – they are ‘worth more’ – than later ones, but still incur the same cost in terms of time. Hence, instead of ‘wasting’ the time on additional arrows for one round, in most cases one should rather invest that ‘arrow time’ in a new round, where the arrows are ‘worth more’ again. Participants, however, pursued strategies that were much ‘slower’, focussed on accuracy, engaging in a degree of oversampling that substantially reduced the amount of financial reward at the end of the experiment. Moreover, they did not change their strategies over the course of the experiments as a function of learning by experience, nor were they responsive to various interventions that clearly highlighted the advantage of speed. In a first experiment, there was no explicit outcome feedback to demonstrate to participants that choices based on smaller samples were almost as likely to be correct as choices based on larger samples; participants had to adjust the self-determined sample size to the experienced ease or clarity of smaller or larger samples (as in a calibration experiment calling for confidence judgements under uncertainty). However, in all following experiments, they received explicit trial-by-trial feedback, making it clear that the accuracy hardly increased with markedly increasing sample size. Yet, although they experienced that investing time into gathering very large samples produced at best a negligible gain in accuracy, such regular and transparent feedback was not successful in helping participants to reduce their persistent oversampling. Neither could the accuracy bias of almost all participants’ strategies be reduced when sample size was limited experimentally, forcing them to experience the (hardly reduced) accuracy resulting from smaller samples (i.e., when maximal sample sizes were reduced from n max ¼  and  to n max ¼  and ). Moreover, to avoid anchoring effects and unwanted influences of externally imposed limitations, we also included an ‘unlimited’ condition, where no explicit limit restricted self-determined sample sizes. This intervention served to show that participants were, if anything, constrained by the limits rather than nudged to sample more by them. When all limits were removed, participants tended to sample even more. Another intervention that forced participants to make do with smaller samples was procedural in nature. When every new arrow had to be solicited by an active keystroke (rather than continuing to sample for as long as the space bar was pressed), the resulting sample size was actually reduced from about M n ¼ : (SDn ¼ :) in the normal passivesampling condition to M n ¼ : (SDn ¼ :), but to such a small

https://doi.org/10.1017/9781009002042.019 Published online by Cambridge University Press

  ,  ,   extent that it stands to reason that the reduction was due to the procedural nudge rather than insight into the advantage of reduced sampling. In another experiment, indeed, the accuracy bias was shown to persist even when participants had no losses to fear because the payoff schema was such that correct choices were rewarded while incorrect changes were of no consequence. All they stood to lose was time; their only punishment was having foregone the opportunity of gaining a point. And still, they did not seem to be able to give up their fondness for accuracy, continuing to sample too much. .. Pseudo-adaptive Trade-Off Regulation While the results detailed above quite clearly demonstrate that participants’ behaviour is far from optimal, results from other experiments, which we will describe below, seem to present a more hopeful picture. This picture should rather be likened to an illusion, however. The results that may suggest, at first glance, that participants are solving the trade-off well, upon further examination reveal that it was merely external pressures or blatant demand characteristics that caused the behaviour. This cannot be counted as adaptive behaviour, which is usually understood as (sustainable) behaviour that matches the structure of the environment (e.g., Todd & Gigerenzer, ), as the match presumably occurred mainly because of the external pressure and not a response to the task structure itself. Taking the pressure away and leaving the task structure the same should result in the same behaviour if the previous behaviour was adaptive, but should revert to less well-matched behaviour if the change was for reasons of pressure alone. The latter is an instance of what we want to call pseudoadaptivity: behaviour that resembles adaptive behaviour but does not occur as a (suitable) response to the context, but rather, for other reasons or by happenstance. It is also important to differentiate between optimality and adaptivity or pseudo-adaptivity in this context. Here, a suitable definition of optimality refers to behaviour that maximises reward, whereas deviations from optimality are tantamount to behaviour that fails to maximise the reward. This can be and often is interpreted normatively as ‘bad’ or ‘irrational’, which we will criticise below. Adaptivity, on the other hand, requires a broader framework and cannot just be diagnosed through a simple (or even oneoff ) deviation from optimality. Since adaptivity implies flexibility and suitable responding to task structure and changes, it is not enough to observe behaviour that happens to be in line with the optimal strategy to

https://doi.org/10.1017/9781009002042.019 Published online by Cambridge University Press

The Information Cost–Benefit Trade-Off



infer adaptive behaviour. One needs to make sure that the behaviour actually occurs in response to the task’s characteristics and can be flexible (or is at least suitable for a range of different task characteristics or parameter values). Optimality comparisons can be very helpful, but are only a tool to assess adaptivity. Assessing adaptivity was precisely the aim of another experiment that used a variation of the speed–accuracy trade-off task. In this variation, inspired by Phillips et al. (), participants had to play an opponent at an adapted version of the speed–accuracy trade-off task in one condition. In another condition, they watched a teammate play the opponent before engaging in the standard speed–accuracy trade-off task themselves (McCaughey et al., ). To allow for a competitive version of the trade-off task, both players saw the same sample emerge concurrently, but only the player who decided to truncate the sampling process could make a decision and potentially win or lose, with the other player having to wait. The active condition required the participants to engage with the virtual opponent themselves, while in the passive condition participants watched a virtual teammate play against the opponent. In both conditions, the programmed rival’s strategy was close to the optimal strategy, which the virtual teammate was programmed to match, but which also forced participants to acquire smaller samples, thus exhibiting a strategy that was closer to optimal. Nevertheless, in the subsequent (single-player) standard trade-off task, both conditions subsequently reverted to overly accuracy-focussed strategies (although they differed significantly from the control group). This is a strong indication that the rival condition’s apparent proximity to optimality was for reasons unrelated to insight into the problem or other causes that could generate flexible, adaptive behaviour, but was rather caused by external pressure from the rival and can therefore be considered pseudo-adaptive. The pressure was such that if sampling was not truncated early enough, no chance at making a decision was attained at all. Pursuing a faster strategy can plausibly be assumed to be a superficial reaction to missing out, rather than recognition that this is a good strategy for the task, which would underlie adaptive behaviour.

. Avoiding Premature Conclusions Thus, strong and persistent evidence from a series of several experiments seems to converge on the conclusion that speed–accuracy trade-off performance in sample-based choices are plagued by a systematic impairment or bias that cannot be easily overcome with the help of interventions.

https://doi.org/10.1017/9781009002042.019 Published online by Cambridge University Press

  ,  ,   Against the background of the relation of speed and accuracy with sample size in our paradigm, this may appear very plausible. Given that speed decreases linearly with increasing sample size, whereas accuracy increases in a clearly sub-linear manner, the reported findings could be a reflection of the well-known difficulty to understand non-linear functional relationships (Larrick & Soll, ; Svenson & Eriksson, ). Apparently, in the context of a sample-based choice task, in which speed and accuracy are linked to each other via sample size, the optimal strategy calls for speed rather than accuracy, and relative to this optimum, almost all participants exhibit oversampling. However, there are two important reasons why these conclusions, as plausible and empirically justified as they might appear at first, would be premature and unwarranted. The first is that drawing a general inference about the optimality or sub-optimality of human performance from a comparison with an optimal strategy (for speed–accuracy trade-offs) on closer inspection turns out to be untenable. As Jarvstad et al. () have anticipated with respect to the alleged rationality of perceptual in contrast to conceptual trade-off tasks, and as later articulated explicitly by Evans et al. (), generalised inferences based on the comparison with an optimal strategy are highly problematic because they are far less general than they first appear. As Evans et al. demonstrated, the optimal strategy – defined as the strategy that maximises the payoff for the given task – depends very much on the specific parameter settings of a task. This means that even for a speed–accuracy trade-off task, one can specify parameters such that strategies favouring accuracy would be optimal (e.g., payoff with high stakes or large asymmetries, time delay as punishment for incorrect choices or very difficult choices j Δp j< :Þ. Hence, assuming that people generally oversample would definitely be a premature conclusion. An experiment would have to include a range of parameter settings implying different optimal strategies to support more general inferences about people’s performance. The second reason why a persistent bias against speed in the reported experiments may not reflect a fundamental cognitive or metacognitive deficit lies in the particular task structure and its implications for what Hsee and Zhang () call evaluability, that is, how easily a certain aspect or dimension can be evaluated. One reason why sensitivity to speed may have been inhibited, creating an overweighting of accuracy, is that accuracy is much more evaluable than speed. Whether a choice is correct or incorrect, leading to a positive or negative payoff, can be evaluated naturally and immediately for each trial, on a clearly defined categorical

https://doi.org/10.1017/9781009002042.019 Published online by Cambridge University Press

The Information Cost–Benefit Trade-Off



(dichotomous) scale. However, speed differences between fast and slow choices are much less evaluable. Whether information sampling speed was ‘too high’ or ‘too low’ on a relative efficiency scale will at best be evident at the end of the experiment, when the total payoff is visible, or maybe from a comparison of two strategies played in two (consecutive or simultaneous) experiments. Because of this fundamental problem, the failure to solve the speed–accuracy trade-off and the apparent insensitivity to speed may, to an unknown degree, reflect a task-inherent imbalance in evaluability. Whether the choice on a single trial is correct or not, is easily and naturally evaluable, but there is hardly a benchmark or enumerable scale for whether the present choice is fast or slow. To avoid this imbalance, it would be necessary to use an experimental task that operationalises the cost–benefit trade-off in dimensions that are both similarly evaluable.

. Information Cost Experiments: Context Changes and Normative Inferences To substantiate and empirically test these two considerations concerning task-dependent optimality and evaluability, we decided to conduct further experiments with a different experimental task that closely resembles the urn task in the Bayesian tradition described above. To keep it similar to the speed–accuracy trade-off task, participants had to decide whether the underlying population, which could take a probability of . to . (excluding .), was above or below ., instead of judging which urn was the source of a sample. Similar to the difficulty in the speed–accuracy trade-off task, this adds a dynamic element to a series of trials with different p-parameters: a population with an extreme p ¼ ., will lead to samples that more quickly support the likelihood of one hypothesis over the other. A population with a moderate p ¼ ., on the other hand, will tend to convey more conflicting observations and will take longer to reach the same level of support for one over the other hypothesis. This task differs from the speed–accuracy trade-off task in a few notable ways. Most importantly, it specifies information cost in terms of financial costs instead of time, so the cost–benefit trade-off is no longer operationalised in different dimensions, speed and accuracy. This leads to a number of useful consequences with regard to the first consideration concerning the comparisons with optimal strategies. It allows for very easy changes in the cost and payoff parameters that imply different optimal strategies. Hence, it allowed us to expand the range of information costs relative to the payoffs to such a degree that optimal strategies called for

https://doi.org/10.1017/9781009002042.019 Published online by Cambridge University Press

  ,  ,   either small, moderate, or quite large samples. Instead of implementing just one optimal strategy in an experiment, this means that participants’ behaviour can be compared to a whole range of different optimal strategies. Implementing this range in a repeated-measures design with varying cost– payoff ratios across different trial blocks is also useful because it may sensitise participants to this variation (Fischhoff et al., ). Specifying costs and payoffs and the resulting optimal strategy is also part of the Bayesian tradition. However, this was mostly done to compare human performance to Bayesian integration. Our focus lies on changes in the parameters with the aim of studying human adaptation. With regard to the second issue outlined above, evaluability, this new task likewise had advantages. We could quantify the information costs (for one additional observation sampled from a distribution) and the payoff for a correct choice in terms of the same measurement unit or ‘currency’, such that participants could see, on every trial, what portion of the entire payoff for a correct decision they had to pay for each new observation. For instance, when each observation cost  points and a correct choice was rewarded with  points, they presumably understood that a sample larger than  (n>) was more expensive than the profit to be gained from a correct choice. In addition to equal evaluability of information cost (as financial costs instead of time) and accuracy (payoff ) in terms of the same ‘currency’ with equal visibility on every trial, the salience of the information costs was further enhanced by letting participants ‘purchase’ every additional piece of information. This means that participants were reminded of the price of an increasing sample each time they actively solicited a new observation.

.

Research Design to Illustrate Task-Dependent Optimality and Evaluability

Figure . provides an example of how the sample-based choice task parameters were specified in recent experiments by McCaughey, Prager,

Figure . Schematic illustration of one of the sequences of cost parameters used in the sample-based decision task for Blocks  to , with displayed information costs and payoff in the middle row and the standardised ratio in the bottom row.

https://doi.org/10.1017/9781009002042.019 Published online by Cambridge University Press

The Information Cost–Benefit Trade-Off



and Fiedler (). The four columns indicate the four blocks participants worked on, which consisted of  or  decisions each, depending on the experiment. Below each block label, the ratio of the information cost per sampled item to payoff for a correct or incorrect choice is displayed. For instance, a ratio of : for B indicates that each sampled observation cost  units while the gain for a correct response and the loss for an incorrect response amounted to  units, with the standardised ratio of : ratio indicated in the bottom row. Likewise, the (standardised) ratios were :, : and : for B, B, B, respectively, thus making information most inexpensive for B and most expensive for B. The information that participants could sample at the indicated cost again consisted of arrows that were either green and upward pointing or magenta and downward pointing, similar to the previously described speed–accuracy trade-off task. However, a decision had to be made on the basis of only one instead of two samples. The task indicated to participants that the arrows represented gains and losses that a fictitious slot machine had yielded in the past. Based on a sample of past outcomes, participants had to decide whether the majority of all past outcomes consisted of gains or of losses, that is, whether the dominant outcome of the slot machine were gains or losses. The slot machine’s probability to yield gains (green upward-pointing arrows) could take values from . to . (in steps of ., with the exception of .). Making the correct choice was rewarded with the addition of points to one’s balance, while an incorrect choice incurred an equivalent loss of points. To obtain information, each individual arrow had to be bought at the information cost illustrated above. As in previous experiments, larger random samples provided a better chance of making the right decision and receiving the reward instead of the loss. Yet, a larger sample also incurred higher information costs. Hence, there was a clear trade-off between the benefits and costs of information (higher expected payoff for correct choices, but more expensive sample costs), measurable in the same monetary unit at the level of each individual trial. This trade-off depended on the ratio of information cost to potential payoff (gain/loss): roughly speaking, the cheaper the information was relative to the potential gain, the more information tended to be worthwhile to pay for, with the increase in the expected accuracy outweighing the direct cost of the information. When the payoff was low and the information expensive, the cost of information more quickly started to outweigh the additional benefit. Different ratios (of information cost to payoff ) called for different strategies (in terms of sample size) in different

https://doi.org/10.1017/9781009002042.019 Published online by Cambridge University Press

  ,  ,   blocks. Thus, the inclusion of a wider range of ratios in the longitudinal design allowed us to see whether the strong oversampling bias obtained in the aforementioned experiments generalises across a wider range of task conditions and, in particular, whether the oversampling bias disappears when speed (information cost) and accuracy are comparable in evaluability. For an empirical answer to both questions, we analysed the size of participants’ samples as main dependent measure for the sequence of tasks in Figure .. As a benchmark to evaluate participants’ performance, we specified a Bayesian model for each cost ratio. The model consisted of an iteratively determined threshold that maximised the payoff. The threshold was the posterior odds ratio that the evidence in a sample had to exceed in favour of one of the two hypotheses (i.e., p < . or p >.) before sampling was to be truncated and a decision made. Assessing individual participants’ performance and sampling strategy relative to these benchmarks of optimal performance allows for comparisons across participants, experiments, and blocks. A glance at Figure . shows that the average participant was indeed somewhat sensitive to the manipulated changes in cost ratios. From the smallest information–cost ratio on the left (:) to the highest ratio on the right (:), the average sample size decreases slightly but monotonically from . (for :), when sampling was least expensive, to . (for :), when information was most expensive (see light grey squares indicating average n). However, although the decline in average n with increasing cost ratios may be considered genuine evidence for some adaptive sensitivity, the extent of this sensitivity is conspicuously small. The mean of the individual correlation coefficients for the relation between sample size and ratio is M r ¼ : for Experiment (based on  individuals who completed one to two blocks of  decisions per ratio) and M r ¼ : for Experiment  (based on  individuals who completed one block of  decisions per ratio). Both means are reliably different from , but the variance accounted for by the ratio is surprisingly small, considering that the ratio should be the main systematic influences on the sample size for the optimal strategy. While these results testify to the basic possibility that participants at least understand the general demands implied by the cost ratios (i.e., that 

The standard deviation of the individual correlation coefficients was SDr ¼ : for Experiment  and SDr ¼ : for Experiment . T-tests showed that both means differed from ; for Experiment  t ðÞ ¼ :, p < :; for Experiment t ðÞ ¼ :, p < :.

https://doi.org/10.1017/9781009002042.019 Published online by Cambridge University Press

The Information Cost–Benefit Trade-Off



Figure . Graph indicating participants’ mean sample sizes (grey squares with dashed bars) and corresponding mean optimal sample size (black points) for each cost ratio. Bars indicate the standard deviations.

lower information costs allow for larger samples than higher costs), it is obvious at a quantitative level that this basic sensitivity is not sufficient for effective regulation of cost–benefit trade-offs. The sensitivity to the ratio of information cost to payoff, that is, the decline of light-grey squares from left to right in Figure ., is by magnitudes smaller than the decline of the black points illustrating an optimal agent’s sensitivity to cost ratios. Crucially, however, a comparison of the black and light-grey bars in Figure . entails clear-cut answers to the twofold theoretical question that guided this investigation. The crossover pattern for the black and light-grey points highlights the insight gained from Evans et al. () that comparisons with an optimal strategy are determined more by the parameter-specific optimal strategy, which can vary greatly, than by the participants’ behaviour, which only shifts mildly. In other words, as robust as findings such as our oversampling bias in speed–accuracy trade-offs may seem, any seemingly general result based on

https://doi.org/10.1017/9781009002042.019 Published online by Cambridge University Press

  ,  ,   a single (or narrow) set of parameters, may only be unidirectional simply because the parameters yield an extreme optimal strategy as comparison standard. Different parameters, implying a different comparison standard, may lead to very different conclusions. Thus, rather than corroborating a general oversampling bias, Figure . shows a clear pattern of deviations from the optimal strategy in both directions that could be considered regressive: Whatever strategy is called for is apparently followed insufficiently, seemingly regressing to the mean or to some task-specific default that participants may be using. Participants sample too much information when information is expensive (ratio :) and small samples are called for. They sample approximately the optimal amount when information costs are moderate (ratios : and :), and finally, they sample too little when the information is the cheapest and call for large samples. If we had included only a subset of those ratios in our experiments, we might have been convinced that we have evidence for any one of those conclusions. That conclusion, however, would have been arbitrary, determined by our choice of parameters rather than participants’ behaviour. In any case, just as Jarvstad et al. () and Evans et al. () pointed out, optimality depends on the task and its parameters. Hence, comparisons with optimality should always be interpreted carefully, and are only of limited meaning in and of themselves. With regard to evaluability, in particular, the regressive pattern of Figure . lends support to the notion that obtaining a one-sided oversampling bias in previous research was at least partially due to unequal evaluability. Quantifying the negative payoff to be paid as information costs in the same currency as the positive payoff for a correct choice, and making this ratio jointly visible on every trial sensitised participants for the relative advantage of both speed and accuracy and presumably supported the two-sided (regressive) pattern.

. Concluding Remarks Thus, as a result of various experiments and simulation studies we have conducted on information cost–benefit trade-offs in sample-based decision making, we have learned that the joint regulation of costs and benefits, or speed and accuracy, is an intricate, highly demanding metacognitive task. Mapping speed and accuracy onto the same quantitative index, sample size, highlights the different relations that speed and accuracy have with sample size, with speed decreasing more rapidly with increasing sample size

https://doi.org/10.1017/9781009002042.019 Published online by Cambridge University Press

The Information Cost–Benefit Trade-Off



than accuracy increases. Although responses to the post-experimental questionnaires indicate that people seem to understand that they have to weigh speed against accuracy, they fail to appreciate their specific relations with sample size and especially the fact that those relations differ. Hence, they cannot translate their basic understanding of the trade-off into a quantitative sampling strategy that maximises the product of the number of complete choices (speed) and the accuracy of the expected correctness of the average choice (accuracy) in a given period of time. However, more thorough tests of the range of possible task parameters reveal that the unequal relation of sample size to speed (or information costs) and accuracy does not necessarily imply an optimal strategy that emphasises speed (or very low expenditure on information costs) over accuracy. Rather, the optimal strategy depends substantially on the specific task parameters and so does the classification of participants’ strategies as oversampling, undersampling, or ‘about right’. Hence, the most important superordinate insight to be gained from the reported research is that comparisons with optimality should be interpreted with great care and are only meaningful when they are embedded in a broader framework. Our suggestion for such a framework is the study of adaptivity. Assessing humans’ adaptivity to their environments and changes thereof can be aided greatly by specifying an optimal strategy for comparison. Importantly, deviations from optimality would not automatically be bad or irrational. Their interpretation would depend on what they imply for adaptivity. In our financial costs experiments, the comparison with the changes required by an optimal strategy helps to quantify the extent of the participants’ change in response to the changing task parameters and demonstrates how small those changes are. The deviations are not meaningful in themselves, but they become meaningful in the context of what one expects from adaptive behaviour. Why people’s adaptivity seems rather limited in this task warrants further investigation, since changing behaviour in response to changes in one’s environment and task is a cornerstone of adaptive behaviour. Just as research into covertly dynamic environments (e.g., Navarro et al., ) is fascinating and important, because many aspects of the world are far from stable, research into responses to overt and obvious changes in the environment complements and completes the study of human adaptivity. An additional aspect of interest in the latter category is that humans might not have to be able to adapt to the changes without assistance. They need the metacognitive insight to recognise that they are in a situation that requires

https://doi.org/10.1017/9781009002042.019 Published online by Cambridge University Press

  ,  ,   adaptation and that they might not perform well on their own, which would enable them to use technological tools or ask for (expert) advice. Whether or not they are able to recognise their need for help is a worthwhile question for further research – the investigation of which has already begun (Fiedler et al., in press). R E F E R EN C E S Balci, F., Simen, P., & Niyogi, R., et al. (). Acquisition of decision making criteria: Reward rate ultimately beats accuracy. Attention, Perception, and Psychophysics, (), –. https://doi.org/./s--- Connolly, T., & Serre, P. (). Information search in judgment tasks: The effects of unequal cue validity and cost. Organizational Behavior and Human Performance, (), –. https://doi.org/./-() -X Denrell, J. & Le Mens, G. () The hot stove effect. In Klaus Fiedler, Peter Juslin, & Jerker Denrell (Eds.), Sampling in judgment and decision making (pp. –). Cambridge: Cambridge University Press. Denrell, J., & March, J. G. (). Adaptation as information restriction: The hot stove effect. Organization Science, (), –. https://doi.org/ ./orsc.... Dhami, M. K., Hertwig, R., & Hoffrage, U. (). The role of representative design in an ecological approach to cognition. Psychological Bulletin, (), –. https://doi.org/./-... Edwards, W. (). Optimal strategies for seeking information: Models for statistics, choice reaction-times, and human information-processing. Journal of Mathematical Psychology, (), . https://doi.org/./ -()- Evans, N. J., Bennett, A. J., & Brown, S. D. (). Optimal or not: Depends on the task. Psychonomic Bulletin and Review, (), –. https://doi .org/./s--- Evans, N. J., & Brown, S. D. (). People adopt optimal policies in simple decision-making, after practice and guidance. Psychonomic Bulletin and Review, (), –. https://doi.org/./s--- Fiedler, K., McCaughey, L., Prager, J., Eichberger, J., & Schnell, K. (). Speed–accuracy trade-offs in sample-based decisions. Journal of Experimental Psychology: General, (), –, https://doi.org/./ xge Fiedler, K., Prager, J., & McCaughey, L (in press). Metacognitive Myopia: A Major Obstacle on the Way to Rationality. Current Directions in Psychological Science, –. Fischhoff, B., Slovic, P., & Lichtenstein, S. (). Subjective sensitivity analysis. Organizational Behavior and Human Performance, (), –. https:// doi.org/./-()-

https://doi.org/10.1017/9781009002042.019 Published online by Cambridge University Press

The Information Cost–Benefit Trade-Off



Fried, L. S., & Peterson, C. R. (). Information seeking: Optional versus fixed stopping. Journal of Experimental Psychology, (), –. https://doi .org/./h Gigerenzer, G. (). Striking a blow for sanity in theories of rationality. In M. Augier & J. G. March (Eds.), Models of a man: Essays in memory of Herbert A. Simon (pp. –). Cambridge, MA: MIT Gigerenzer, G., & Goldstein, D. G. (). Reasoning the fast and frugal way: Models of bounded rationality. Psychological Review, , –. Goldstein, D. G., & Gigerenzer, G. (). Models of ecological rationality: The recognition heuristic. Psychological Review, (), . Harris, C. A., & Custers, R. (). Biased preferences through exploitation. In Klaus Fiedler, Peter Juslin, & Jerker Denrell (Eds.), Sampling in judgment and decision making (pp. –). Cambridge: Cambridge University Press. Hausfeld, J., & Resnjanskij, S. () Risky decisions and the opportunity costs of time. Ifo Working Paper No. . Munich: Ifo Institute. Hershman, R. L., & Levine, J. R. (). Deviations from optimum informationpurchase strategies in human decision-making. Organizational Behavior and Human Performance, (), –. https://doi.org/./-( )- Hsee, C. K., & Zhang, J. (). General evaluability theory. Perspectives on Psychological Science, (), –. https://doi.org/./  Jarvstad, A., Rushton, S. K., Warren, P. A., & Hahn, U. (). Knowing when to move on: Cognitive and perceptual decisions in time. Psychological Science, (), –. https://doi.org/./ Larrick, R. P., & Soll, J. B. (). The MPG illusion. Science, , –. http://dx.doi.org/./science. McCaughey, L., Prager, J., & Fiedler, K (). Rivals reloaded: Adapting tosample-based speed–accuracy trade-offs through competitive pressure. Manuscript submitted for publication. McCaughey, L., Prager, J., & Fiedler, K. (). Adapting to information search costs in sample-based decisions. Manuscript in preparation. Madan, C. R., Spetch, M. L., & Ludvig, E. A. (). Rapid makes risky: Time pressure increases risk seeking in decisions from experience. Journal of Cognitive Psychology, (June), –. https://doi.org/./ .. Navarro, D. J., Newell, B. R., & Schulze, C. (). Learning and choosing in an uncertain world: An investigation of the explore–exploit dilemma in static and dynamic environments. Cognitive Psychology, , –. https://doi.org/ ./j.cogpsych... Payne, J. W., Bettman, J. R., & Luce, M. F. (). When time is money: Decision behavior under opportunity–cost time pressure. Organizational Behavior and Human Decision Processes, (), –. https://doi.org/ ./obhd..

https://doi.org/10.1017/9781009002042.019 Published online by Cambridge University Press

  ,  ,   Phillips, N. D., Hertwig, R., Kareev, Y., & Avrahami, J. (). Rivals in the dark: How competition influences search in decisions under uncertainty. Cognition, (), –. https://doi.org/./j.cognition.. . Pitz, G. F. (). Information seeking when available information is limited. Journal of Experimental Psychology, (), –. https://doi.org/./ h Pitz, G. F., Reinhold, H., & Scott Geller, E. (). Strategies of information seeking in deferred decision making. Organizational Behavior and Human Performance, (), –. https://doi.org/./-()- Rieskamp, J., & Hoffrage, U. (). Inferences under time pressure: How opportunity costs affect strategy selection. Acta Psychologica, (), –. https://doi.org/./j.actpsy... Sedlmeier, P., Hertwig, R., & Gigerenzer, G. (). Are judgment of the positional frequencies of letters systematically biased due to availability? Journal of Experimental Psychology: Learning Memory, and Cognition, (), –. https://doi.org/./-... Slovic, P., & Lichtenstein, S. (). Comparison of Bayesian and regression approaches to the study of information processing in judgment. Organizational Behavior and Human Performance, (), –. https:// doi.org/./-()-X Snapper, K. J., & Peterson, C. R. (). Information seeking and data diagnosticity. Journal of Experimental Psychology, (), –. https://doi.org/ ./h Svenson, O., & Eriksson, G. (). Mental models of driving and speed: Biases, choices and reality. Transport Reviews, , –. http://dx.doi.org/ ./.. Todd, P. M., & Gigerenzer, G. (). Environments that make us smart: Ecological rationality. Current Directions in Psychological Science, (), –. https://doi.org/./j.-...x

https://doi.org/10.1017/9781009002042.019 Published online by Cambridge University Press

 

Sampling as a Tool in Social Environments

https://doi.org/10.1017/9781009002042.020 Published online by Cambridge University Press

https://doi.org/10.1017/9781009002042.020 Published online by Cambridge University Press

 

Heuristic Social Sampling Thorsten Pachur and Christin Schulze

In , Leon Festinger pioneered the idea that social judgment is shaped by one’s personal social environment. For instance, when forming an opinion about the hazards of eating tomatoes (Festinger, ) or assessing one’s ability to play tennis, an important step is to consider the people one knows and gauge their views on tomatoes or their ability to swing a racket. Knowledge about others can also drive behavior and guide perceived social norms. Tunçgenç et al. () found that people were more likely to adhere to social distancing guidelines during the SARS-CoV- pandemic when they reported that their close social circle did. Observations of the people one knows can be key to making an inference about the broader social environment. For example, when judging mortality from various causes of death in one’s country, people take into account how many individuals in their immediate social circles have died from different risks (Hertwig et al.,; Lichtenstein et al., ). While numerous studies have demonstrated the impact of one’s close social environment on judgment (see also Brown et al., ; Wood et al., ), the cognitive mechanisms underlying access to knowledge about one’s social environment have received much less attention. How does social sampling work? This chapter describes the social-circle model, a computational account of social sampling (Schulze et al., ) that integrates insights about the structure of social memory (Hills & Pachur, ) and the boundedly rational nature of information sampling (Fiedler & Juslin, ; Gigerenzer et al., ; Simon, ). Memory representations about the people one knows seem to be organized by fundamental social categories (e.g., self, family, friends, acquaintances) and these categories are used as cues during retrieval from social memory. This process might also be exploited to guide heuristic social judgment. The social-circle model parameterizes several aspects of information processing during social sampling, such as the weight given to different social circles, noise during evidence evaluation, and discrimination threshold; it can therefore be used 

https://doi.org/10.1017/9781009002042.021 Published online by Cambridge University Press



 ,  

to reveal individual differences in social sampling as well as differences between domains. In the following, we first argue that social sampling might be governed by a process that sequentially probes memories of subgroups of people in an individual’s social environment, and that sampling during this sequential search process might be truncated by a simple stopping rule. We also contrast this view with other ideas about the nature of social sampling. Then we describe the social-circle model as a process account of heuristic, boundedly rational social sampling and summarize empirical tests of the model. After that, we examine applications of the social-circle model in order to investigate individual differences in social sampling, show how the model can map differences in social sampling between different judgment domains, and outline future applications of the models, such as studying social sampling of virtual versus physical social contacts. We close by discussing analyses of the ecological rationality of heuristic social sampling.

. The Structure of Social Memory and How It Might Shape and Constrain Social Sampling Social sampling is commonly invoked to describe memory-based judgments about social statistics, such as when people are asked to estimate the proportion of people in the population who hold political views similar to their own, or the distribution of the number of friends people have (e.g., Dawtry et al., ; Galesic et al., ). A cognitive process account of social sampling should therefore be consistent with mechanisms of memory retrieval and the organization of memory. Galesic et al. () developed the social sampling model, a computational account of social sampling that describes how knowledge about instances in a person’s social memory is activated to judge the frequency distribution of social events (see Chapter  by Olsson et al. in this volume). Galesic et al.’s model elegantly explains various prominent phenomena of human judgment, such as selfenhancement and self-depreciation effects; moreover, it correctly predicts conditions (specifically, different frequency distributions in the environment) under which false consensus or false uniqueness should occur. Following proposals in memory research, the social sampling model assumes a cue-based, similarity-graded activation of instances in memory; allows for the possibility that only a portion of the sampling space is inspected for relevant instances; and can also accommodate imperfect recall of instances.

https://doi.org/10.1017/9781009002042.021 Published online by Cambridge University Press

Heuristic Social Sampling



As in other computational models assuming retrieval of instance or exemplar knowledge (e.g., Dougherty et al., ; Nosofsky, ; Smith & Zarate, ), in the social sampling model the relevant instances in the activated sampling space are activated in one go. That is, sampling proceeds through the sampling space unconditionally on the amount of evidence that has already been accumulated. Yet social memory is highly structured (e.g., according to social categories such as self, family, friends, acquaintances; Bond & Brockett, ; Fiske, ). Schafer and Schiller () argued that the mental representation of the “social space” – which guides social inferences and decisions – is structured by dimensions such as power and affiliation, and that it is based on similar neuronal mechanisms as cognitive maps of physical spaces (specifically, the hippocampal formation). If social memory is highly structured, recall from social memory might not follow the homogeneous and comprehensive process assumed by the social sampling model. Instead, elements might be retrieved in clusters – as has been found for spontaneous recall from semantic memory (e.g., Troyer et al., ). Hills and Pachur () found support for clustered recall from social memory. They asked participants to freely recall the people they knew personally and stopped them after they had provided  names. Participants then indicated the social category (partner, family, friends, acquaintances) of each person they had recalled. An analysis of the order in which the people were recalled suggested that participants often retrieved their social network members in clusters: Family members tended to be recalled around the same time, as were friends and acquaintances. Arguably, such sequential probing of subspaces in a person’s social memory might also operate in social sampling – that is, when social memory is probed to make inferences about social statistics in the world. If social sampling proceeds by sequential, clustered retrieval guided by the structures in social memory, the structures might also be used to limit search. In multi-attribute decision making from memory, such as judging which of two job candidates is more promising, people tend to rely on strategies that inspect attributes sequentially, compare the objects on the attribute, and stop search as soon as an attribute discriminates between the objects (i.e., when the objects have different values on the attribute; Bröder & Schiffer, ; Einhorn, ; Gigerenzer et al., ; Payne et al., ). Analogously, simple stopping rules could also operate when people access instance knowledge in memory during social sampling, such that during the sequential inspection of social circles, people truncate search as soon as a given circle allows them to make a decision. In other words, instead of unconditionally and comprehensively activating relevant

https://doi.org/10.1017/9781009002042.021 Published online by Cambridge University Press



 ,  

instances in the sampling space in social memory – as is assumed in existing models of social sampling (Galesic et al., ; Smith & Zarate, ) – search might be expanded to another circle only if the current circle does not provide sufficient information to make a decision. The notion of “lazy” sampling from one’s social environment – adjusting sample size depending on the evidence found and aiming to reduce search effort if possible – has also been discussed by Epstein (). He conducted computer simulations to investigate a mechanism that, in order to assess which of two behaviors is the social norm (i.e., is shown by the majority of people), checks the relative prevalence of the behaviors among the people in the immediate spatial vicinity. The prevalence is first assessed within a particular, randomly drawn radius r in their social network, and then in a slightly larger radius, r þ . If the two assessments differ, the decision maker adopts the majority behavior observed with the slightly larger radius and adjusts their sampling effort for the next round to r þ . If the two assessments do not differ, the decision maker checks the behaviors’ prevalence in a slightly lower radius, r  . If the assessments under radius = r and radius = r   do not differ, the majority behavior in the sample is adopted and the sampling effort for the next round is adjusted to r  . The mechanism thus expands the agent’s sampling effort only if it pays off to do so (i.e., if it yields new information) and relies on a smaller sample size if it does not affect the inference. In Epstein’s simulations of the mechanism, the average radius adopted by the simulated decision makers quickly shrank over several rounds, in particular in stable environments, where the prevalence of behaviors is relatively homogeneous. Epstein’s mechanism is mute on which agents specifically are selected and does not take into account specific structures, typically consisting of subgroups of different sizes, in social memory. We next propose the social-circle model as a way to implement the notions that social sampling proceeds by sequentially probing in memory a person’s social circles for relevant instances, and that sampling is constrained dynamically by a simple stopping rule that is sensitive to the amount of evidence sampled in a circle.

. The Social-Circle Model The social-circle model is a cognitive process model of how people infer which of two events, A or B, is more frequent in the broader population. For instance, the question may be whether more people play tennis or table tennis in one’s country. Such statistics are often not known directly and thus need to be inferred. It is assumed that to make a decision, people

https://doi.org/10.1017/9781009002042.021 Published online by Cambridge University Press

Heuristic Social Sampling



engage in social sampling and access their knowledge of the people in their social environment to see how many of them are instances of the events in question (e.g., how many of them play tennis or table tennis). Critically, the model acknowledges that a person’s memory of their social environment is structured, for instance, by social category (e.g., self, family, friends, acquaintances; Bond & Brockett, ; Fiske, ). Other factors may also contribute to this structure, such as the frequency of contact with a person (e.g., Hills & Pachur, ; Pachur et al., ) or the social context in which people are encountered (e.g., Bond et al., ; Fiske, ). Building on a proposal by Pachur et al. (), it is assumed in the social-circle model that people rely on social categories to structure their search for relevant instances in memory, such that the social circles representing these categories are inspected sequentially; if the current circle provides sufficient evidence to make a decision, search is stopped. Arguably, search in social memory may often rest on more than one type of retrieval cue and the social-circle model can, in principle, be modified to capture these different structuring elements of social memory. We will discuss extensions of the social-circle model with different circle definitions below. We next provide a step-by-step tutorial on the social-circle model, describing how social sampling proceeds through the social circles, how the evidence accumulated in the circles is evaluated, how search is stopped, and how a decision is made. The tutorial also shows how to compute the model’s quantitative predictions. ..

Search

We first describe the search process in the social-circle model by which a person’s social circles are sequentially probed for relevant instances. It is assumed that each of the I circles has a weight wi , with the constraint that PI i¼ w i ¼ . Assuming the circles self, family, friends, and acquaintances, I equals . The circle weights wis are free parameters. For a set of circle weights, the probability that a given, not yet inspected circle j is inspected is defined as pðinspectψj Þ ¼

wj J 2ψ X

:

(16.1)

wk

k¼

Note the ψ in Equation ., which is the set of J circles not inspected so far. That is, the inspection probability of a circle j depends on the current

https://doi.org/10.1017/9781009002042.021 Published online by Cambridge University Press



 ,  

constellation of not-yet inspected circles, as the denominator in Equation . is adjusted accordingly. In general, the higher the weight of a circle, the higher its probability of being selected for inspection. For illustration, assume circle weights of ., ., ., and . for the self, family, friend, and acquaintance circles, respectively. Before any circle has been inspected – that is, when ψ ¼ {,,,} – using Equation . yields inspection probabilities of ., ., ., and ., for the self, family, friend, and acquaintance circles, respectively. If the self circle is inspected first but the evidence in it does not discriminate between the two events in questions (e.g., tennis vs table tennis), another circle is selected probabilistically from the J ¼  remaining circles; in this example (i.e., when ψ ¼ f, , g), the inspection probabilities for the family, friend, and acquaintance circles are then ., ., and ., respectively. Next, we turn to the probability of orders in which circles are inspected. The probability of an inspection order is based on the inspection probabilities of the individual circles; importantly, the probability of each circle in the order is calculated with Equation . in the context of the set of circles ψ that have not been selected for inspection yet. The probability of inspection order π is given by pðπÞ ¼

I Y

ψ

pðinspecti i Þ:

(16.2)

i¼

Consider the probability of inspection order π ¼ 〈, , , 〉 – that is, where search proceeds as follows: self, family, friends, and acquaintances circles. For this order, the calculation of the inspection probabilities for the self, family, friends, and acquaintances circles are based on ψ ¼ f, , , g, ψ ¼ f, , g, ψ ¼ f, g, and ψ ¼ fg, respectively. Adjusting the denominator in Equation . for each circle accordingly and assuming circle weights of w ¼ :, w ¼ :, w ¼ :, and w ¼ :, the inspection probability for order π ¼ 〈, , , 〉 equals :  :  :   ¼ :. This is the most likely of the I ! ¼  possible inspection orders (e.g., the probability of inspection order π ¼ 〈, , , 〉 is only :  :  :   ¼ :Þ. ..

Evaluation of the Evidence

If a circle is selected for inspection, all relevant instances of the events A and B in that circle are retrieved (in the studies reported below, the assumed knowledge base consisted of the instances that people reported in

https://doi.org/10.1017/9781009002042.021 Published online by Cambridge University Press

Heuristic Social Sampling



Figure . Illustration of the distribution of subjective value x. The two shaded areas under the curve represent the probabilities that event A or event B, respectively, are judged to be more frequent; in both cases, search is stopped and no further circles are inspected. The nonshaded area under the curve represents the case in which the subjective value is not large enough to discriminate between the events; in this case another circle is selected for inspection.

their social circles). The number of instances in the circle is the basis for the decision. Specifically, the amount of evidence in circle i, Δi , is defined as the difference between the number of instances for events A and B, niA and niB , relative to the total number of relevant instances in the circle: Δi ¼

niA  niB : niA þ niB

(16.3)

For instance, if there are two instances of event A (nA ¼ ) and one instance of event B (nB ¼ ), then Δ ¼ ð  Þ=ð þ Þ ¼ :. A positive value of Δ is evidence in favor of event A; a negative value is evidence in favor of event B. The subjective evaluation of the evidence is based on a noisy process; to formalize this, the subjective value xi of the evidence is represented as a stochastic variable, drawn from a normal distribution with a mean that equals Δi (i.e., the evidence) and a standard deviation of σ, x i  &ðΔi , σ  Þ:

(16.4)

σ ð Þ is estimated from the data and can be conceptualized as a noise parameter. Figure . shows an illustration of the distribution of x i , assuming Δ ¼ :, a noise parameter of σ ¼ :, and a discrimination threshold (see next subsection) of d = ..

https://doi.org/10.1017/9781009002042.021 Published online by Cambridge University Press



 ,   .. Stopping Search

To acknowledge limits in the discrimination ability of the cognitive system, it is assumed in the social-circle model that the subjective value xi derived from the evidence sampled in the current circle is compared to a discrimination threshold d ( < d < ). If x i falls into the interval [d, d] (the nonshaded area under the curve in Figure .), the subjective value is not large enough to discriminate between the events and a next circle is selected for inspection. d is a free parameter. If x i exceeds the discrimination threshold, search is stopped and no further circles are inspected. The probability that xi exceeds d – either positively or negatively – and search is stopped corresponds to the sum of the two shaded areas under the curve. This probability can be determined by using integration to find the areas under the curve of a normal distribution with mean Δ and standard deviation σ where x i  d and where x i  d :    p stopi ¼ pffiffiffiffiffiffiffiffiffiffiffiffi σ π

Zd e

ðxΔi Þ σ 

∞

 dx þ pffiffiffiffiffiffiffiffiffiffiffiffi σ π

Z∞ e

ðxΔi Þ σ 

dx

(16.5a)

d

An alternative and simpler way to express this is (see Schulze et al., ):       Δi  d Δi  d p stopi ¼ Φ þΦ σ σ

(16.5b)

where Φð·Þ is the standard normal cumulative distribution function. The first element on the right-hand side of Equations .a and .b represents the probability that x i falls to the left of the discrimination area, that is, it is smaller than d and thus supports event B. The second element on the right-hand side of Equations .a and .b represents the probability that x i is larger than d and thus supports event A. Note that because x i is a random variable, it is possible that event B is inferred to be more frequent even if the evidence Δi points to event A. With Δi ¼ :,  d ¼ :, and σ ¼ :, search is stopped at circle i with probability p stopi ¼ :; conversely, with prob  ability   p stopi ¼ :, search proceeds to another circle. ..

Decision

Once search is stopped, a decision is made. If the subjective evaluation of the evidence found in the circle, x i , is higher than the discrimination threshold d, event A is chosen as the more frequent one. The probability of this occurring, pi ðA j A,BÞ, corresponds to the shaded upper part of the

https://doi.org/10.1017/9781009002042.021 Published online by Cambridge University Press

Heuristic Social Sampling



distribution in Figure . and – based on the simplified expression in Equation .b – is calculated as   Δi  d : pi ðA j A, B Þ ¼ Φ σ

(16.6)

pi ðAjA, B Þ equals . in the example case. Event B is chosen if xi is smaller than d. The probability of this occurring corresponds to the shaded area of the lower part of the distribution in Figure . and is calculated as   Δi  d : pi ðB j A, B Þ ¼ Φ σ

(16.7)

It equals . in the example. To determine not just the probability of choosing A or B at a particular circle i but for a given set of circle weights more generally, one needs to consider all the possible ways that can lead to a particular decision. This includes the cases where search is stopped at the other circles, given a specific inspection order, as well as all the other inspection orders that can occur. Figure . gives an overview of some of the possible search and decision trajectories. Let us first consider the probability that A is chosen over B under a specific inspection order. For the inspection order π, the probability pπ ðAjA, B Þ follows from summing up the probabilities of choosing A across all circles, with the choice probability of each circle weighted by the probability that search was not stopped before that circle. For instance, the probability of choosing A under the inspection order π ¼ 〈, , , 〉 is ph1,2,3,4i ðAjA;B Þ ¼ p1 ðAjA;B Þ

   þp2 ðAjA;B Þ 1p stop1    þp3 ðAjA;B Þ 1p stop1  1p stop2

i       h þp4 ðAjA;B Þ 1p stop1  1p stop2  1p stop3

i h

i       h þ:5 1p stop1  1p stop2  1p stop3  1p stop4 : ð16:8aÞ

The last line in Equation .a represents the case where none of the circles discriminate between the events, in which case a decision is made by guessing. To provide a concrete example for the calculation of the choice probability under inspection order π ¼ 〈,,,〉 and circle weights of w ¼ ., w ¼ ., w ¼ ., and w ¼ ., let us assume that there are no instances of

https://doi.org/10.1017/9781009002042.021 Published online by Cambridge University Press



 ,  

Figure . Illustration of some possible trajectories that social sampling with the socialcircle model can take when sequentially inspecting in memory a person’s social circles.

event A and event B in the first circle (Δ ¼ ), two instances for event A and one instance for event B in the second circle (Δ ¼ .), no instances in the third circle (Δ ¼ ), and one instance for event A and three instances for event B in the fourth circle (Δ ¼ .). Computing the respective probabilities based on Equations ., ., and . (with σ ¼ . and d ¼ .) and entering them into Equation .a yields ph1, 2, 3, 4i ðAjA; B Þ ¼ :388 þ:576½1  :775 þ:388½1  :775½1  :799 þ:159½1  :775½1  :799½1  :775 þ:5½1  :775½1  :799½1  :775½1  :825 ¼ :537: ð16:8bÞ

The total probability that event A is inferred to be more frequent than event B can be calculated by summing the probabilities of choosing A across all I! ¼  possible inspection orders of circles, with each

https://doi.org/10.1017/9781009002042.021 Published online by Cambridge University Press

Heuristic Social Sampling



inspection order weighted by its probability of occurrence (which is a function of the set of circle weights): pðAjA, BÞ ¼

I! X

pn ðAjA, BÞ  pðπn Þ:

ð16:9Þ

n¼1

p(π) is as defined in Equation .. In our example, this yields a probability of choosing A of p(A|A,B) ¼ .. Note that event A is thus more likely to be chosen than event B despite the fact that the total number of instances across all circles is larger for event B ( vs ). The reason is that based on the circle weights, Circle  (which has more instances of event A than of event B) is more likely to be selected for inspection and thus to determine the choice compared to Circle  (which has more instances of event B). We have now provided a full formal description of the key cognitive steps during judgment formation as assumed in the social-circle model: searching through one’s social circles, evaluating the instance knowledge retrieved from a circle, stopping search, and making a decision. The model represents the probabilistic nature of various elements of the cognitive process – in particular the inspection order through a person’s social circles and the subjective evaluation of the evidence – and how this gives rise to probabilistic choice. The social-circle model‘s six parameters enable the model to map individual differences in various aspects of social sampling: The four circle weights wi can accommodate differences in the prioritization of the social circles during sampling; the parameter σ can measure individual differences in noise during evidence evaluation; and the threshold parameter d can capture differences in the amount of evidence a decision maker requires before stopping search and making a decision. In light of the specific conceptual assumptions that underlie the parameters of the social-circle model, one may ask to what extent the behavioral effects resulting from variation of the parameters are specific enough to allow for an accurate estimation of the parameters. Moreover, some of the model parameters may formally interact with each other, making it difficult to attribute behavioral variation to a specific parameter. To address these issues, Schulze et al. () conducted parameter recovery analyses. In these analyses, different sets of parameter values were used to generate decisions; when the simulated decisions were then modeled with the social-circle model, the key question was to what extent the estimated 

Technically, only five of the parameters can vary freely: Because the four circle weights must sum up to , one circle-weight parameter is necessarily determined by the values of the other three.

https://doi.org/10.1017/9781009002042.021 Published online by Cambridge University Press



 ,  

model parameters would coincide with those that generated the data. The results showed that the generating parameter values could be recovered with high accuracy. As a further test of the validity of the model parameters, an important task in future research will be to also submit the socialcircle model to selective influence tests (e.g., Heathcote et al., ). For instance, one could explicitly instruct decision makers to use the socialcircle model and follow a particular inspection order; this manipulation should be primarily reflected in the pattern of the estimated circle-weight parameters, but leave the other parameters unaffected. Conversely, when people are, for instance, set under time pressure, this could specifically affect the noise and the threshold parameters. Another important concern that might arise from the complexity of the social-circle model is that its multiple free parameters render it prone to overfitting (Pitt & Myung, ). In the next section, we describe empirical tests of the social-circle model, including a model comparison that pitted it against a constrained model variant in which the inspection order of the circles was fixed, to see whether the flexibility of the social-circle model is warranted relative to its descriptive fit. .. Empirical Tests of the Social-Circle Model How well does the social-circle model account for people’s decisions? And how does its performance fare relative to strategies that make different assumptions about the sampling from social circles? To address these questions, Schulze et al. () submitted the social-circle model to a rigorous model comparison (using Bayesian latent mixture modeling). The comparison involved four candidate models. One competitor was the social-circle heuristic (Pachur et al., ). It represents a simplified variant of the social-circle model and assumes that social sampling proceeds in a fixed order through the social circles: The self circle is always inspected first, followed by the family circle, the friends circle, and finally the acquaintances circle. In this order, which reflects altruism considerations (Henrich & Henrich, ), the typical circle size increases with each circle: The number of friends one has is typically larger than the number of family members, and the number of acquaintances usually exceeds the number of friends (Hills & Pachur, ). This order may also be plausible given that it reflects how people access (at least on average) clusters of these social categories in spontaneous recall from social memory (Hills & Pachur, ). Another difference between the socialcircle heuristic and the social-circle model is that the former’s stopping rule

https://doi.org/10.1017/9781009002042.021 Published online by Cambridge University Press

Heuristic Social Sampling



is based on the absolute difference in the number of recalled instances (rather than the proportional difference) and that it has a fixed discrimination threshold (of zero). The second competitor to the social-circle model that Schulze et al. () tested was availability-by-recall (Hertwig et al., ; Pachur et al., ). This strategy assumes that to make an inference, all instances in a person’s social circles are tallied; like the social sampling model (Galesic et al., ), availability-by-recall thus assumes social sampling to be comprehensive rather than sequential and limited. In contrast to the social-circle model, availability-by-recall does not have a stopping rule, nor does it prescribe a particular order in which a person’s social circles are inspected. Schulze et al. () applied the three models as well as a baseline model assuming random guessing to data from Pachur et al. (), where each participant judged, for a total of  pair comparisons, which of two sports was more popular. Popularity was defined in terms of the number of sports club members. In a separate task, participants indicated for each sport whether they were a club member themselves as well as how many people in their social circles were club members, and whether the people were family members, friends, or acquaintances. Based on this information, the authors derived the predictions of the models for each participant and used them to model the decisions in the pairwise judgment task. The results of the model comparison were unanimous: Of the  participants,  ( percent) were best described by the social-circle model (based on the posterior model probabilities in the latent mixture analysis), and only two ( percent) and four ( percent) participants were best described by the social-circle heuristic and availability-by-recall, respectively (for a further two participants, model performance was not better than the guessing baseline model). The social-circle model was also better than availability-by-recall in two other data sets that examined social sampling in different domains. In Study  of Schulze et al. (), participants were asked to judge the popularity of vacation destinations and in Schulze et al. () the judgments concerned the relative popularity of first names. In



For clarification, participants were provided with rough definitions of these social categories. “Family” are people to whom one is genetically or legally related. As regards the “friend” category, it was highlighted that there are different definitions of “friendship” and that participants should consider whom they would themselves call a friend. “Acquaintances,” finally, were defined as people not falling into one of the other categories, whom one knows by name and whom one has met several times (e.g., old schoolmates).

https://doi.org/10.1017/9781009002042.021 Published online by Cambridge University Press



 ,  

both analyses, the social-circle model outperformed the model assuming exhaustive search. As described above, due to its stopping rule, the social-circle model predicts that there can be situations in which people make judgments that systematically contradict the frequency distribution of instances aggregated across their social circles. For instance, assume that a person knows a total of five tennis players but only three soccer players and two basketball players. If the tennis players are in a circle with a lower circle weight (or are distributed across circles), but the soccer and basketball players are in a circle with a higher circle weight, the social-circle model predicts that soccer and basketball will be inferred to be more frequent than tennis – despite there being more instances for the latter overall. In the data sets analyzed by Schulze et al. (), such a pattern was predicted in .–. percent of all cases. Participants best described by the social-circle model judged against the aggregate frequency of instances they knew of .–. percent of the time – and instead followed the frequency distribution in the first social circle that discriminated between the events. To summarize: the results of the model comparisons indicate that the complexity of the social-circle model is warranted and that the model’s flexibility is thus necessary to provide a good description of people’s intuitive judgments of social statistics. The results further suggest that social sampling is usually not based on exhaustive search from people’s social circles, but limited depending on the evidence accumulated during search. Finally, sequential search does not seem to proceed in the fixed order of self, family, friends, and acquaintances that one may expect based on altruistic considerations (Henrich & Henrich, ) and that is assumed in the social-circle heuristic. The tests of the social-circle model also provided insights into specific aspects of people’s sampling policies, by virtue of the estimated values of the model’s parameters. All applications of the social-circle model consistently obtained estimates for the discrimination threshold d in the range of d ¼ .  ., with modest variability across people. This indicates that search is typically stopped once the number of instances retrieved for the events differs by at least about a fourth of the total number of instances in a circle. Another aspect of social sampling parameterized in the social-circle model is the order in which the social circles are inspected. One consistent finding across the analyses on the social-circle model is that on the aggregate level more peripheral circles – in particular the friends circle – seem to play an important role, and that the self circle often receives relatively little weight (Schulze et al., , ). This result deviates

https://doi.org/10.1017/9781009002042.021 Published online by Cambridge University Press

Heuristic Social Sampling



from the common conclusion in research on the false consensus effect (Ross et al., ) that people’s judgments are overly swayed by knowledge of their own characteristics (but see Galesic et al., , for a demonstration of how homophily can give rise to a false consensus effect even when information about oneself does not have priority during sampling). Finally, the estimated circle parameters indicated considerable variability across people in circle weighting, covering a wide range of inspection orders. In the next section we apply the model to investigate developmental differences in social sampling as well as differences between judgment domains.

. ..

Applications of the Social-Circle Model Individual Differences in Social Sampling

The social-circle model offers a tool for disentangling and quantifying aspects of the cognitive operations assumed to underlie the search and decision processes in social sampling. On that basis, individual differences, differences across age groups, and differences across judgment domains can be assessed and evaluated. Previous work (Schulze et al., , ) has investigated whether children engage in the type of structured and limited search captured by the social-circle model that appears to be prevalent in adults’ social sampling: Adults and children aged – years were asked to judge the relative popularity of vacation destinations (Schulze et al., , Study ) and first names (Schulze et al., ). In both studies, a substantial subset of children were best described by the social-circle model. In fact, when judging the popularity of vacation destinations, more children ( percent) than adults ( percent) were best described by the social-circle model. Importantly, the differences in strategy selection were not driven by the children responding more randomly – in both studies, only a few children and adults were best described by a guessing strategy. How, and how extensively, did adults and children search for instances in social memory? In general, there was considerable individual variability across participants in their social sampling schemes (Schulze et al., , ). At the group level children and adults showed striking similarities in the search and decision processes: Children and adults weighted the circles in their social network in similar ways, applied similar difference thresholds, and did not differ in their level of response noise. These results echo previous findings from risky choice, suggesting that children do not

https://doi.org/10.1017/9781009002042.021 Published online by Cambridge University Press



 ,  

necessarily differ from adults in their search efforts (van den Bos & Hertwig, ). At the individual level, however, there were also some differences between the age groups in the relative weighting of instances in different social circles. One way to assess this heterogeneity in search orders is by looking at which circle received the highest weight in children’s and adults’ inferences and was therefore most likely to be inspected first. The majority of children aged – years were most likely to prioritize instances among their friends when judging the popularity of holiday destinations and first names, irrespective of the inference domain. This primacy of social information stemming from peers may be specific for the particular age group of early adolescence that was examined. For adults, by contrast, prioritization was more variable and sensitive to the reliability and density of instance knowledge across inference domains. We next turn to these domain-specific differences in people’s social sampling schemes. .. Differences in Social Sampling across Domains Looking at the distributions of individual participants’ social-circle weight parameters in different judgments domains – including the popularity of sports (Study  in Schulze et al., ), holiday destinations (Study  in Schulze et al., ) and first names (Schulze et al., ) – suggests that adult participants weighted the various sources of their instance knowledge somewhat differently across these domains. To assess this heterogeneity in search orders we examined the circle that was most likely to be probed first across judgment domains. Figure .a shows the proportion of adult participants whose circle weight parameters indicated that the self, family, friends, or acquaintance circle, respectively, was most likely to be probed first across three judgment domains (Schulze et al., , ). When judging the popularity of vacation destinations, participants’ own experiences were factored in most strongly and most participants tended to probe the self circle first. By contrast, when judging the popularity of sports, instance knowledge of family and friends was given priority; for the popularity of first names, participants were most likely to sample among their friends or acquaintances first. This differential weighting of instance knowledge across judgment domains might reflect a trade-off between the relative reliability of instance knowledge on the one hand and its availability on the other, thus indicating an adaptive search process in social memory. That is, although a person’s knowledge of their own preferences, opinions, and activities is likely to be most reliable, more distant relations may be the only available

https://doi.org/10.1017/9781009002042.021 Published online by Cambridge University Press

Heuristic Social Sampling



Figure . Domain specificity of social sampling (Panel a) and available social information (Panel b) in Schulze et al. (, ). Panel a: Proportion of adult participants whose circle weight parameters indicated that the self, family, friends or acquaintance circle was most likely to be probed first across three different judgment domains. Panel b: Mean proportions of items (vacation destinations, sports, and first names) for which participants recalled at least one instance, across the self, family, friends, or acquaintance circles and for each of three judgment domains.

source of social information when instance knowledge is sparse. Indeed, the distribution of the availability of any instances in people’s social circles across the three domains, as shown in Figure .b, indicates that the prioritization of the different sources of knowledge (Figure .a) reflects this trade-off to some extent. In domains in which instance information was rich – such as the popularity of vacation destinations – many participants prioritized proximate social circles over more peripheral ones. In domains in which instance knowledge was rather sparse – such as the popularity of first names – information from friends and acquaintances was prioritized most frequently. Yet despite these general regularities in people’s weighting of instance knowledge, there is usually heterogeneity in the orders in which people probe their social memories. An important avenue for future research is therefore to better understand the origins of these individual differences in search (e.g., in terms of expertise) and to relate them to cognitive abilities (e.g., executive control or working memory; Hills & Pachur, ).

. Alternative Definitions of Social Circles The social-circle model assumes that people structure their internal search in social memory according to social category, thereby exploiting

https://doi.org/10.1017/9781009002042.021 Published online by Cambridge University Press



 ,  

regularities in their external social environment (see, e.g., Hill & Dunbar, ). This assumption is supported by research on social recall, which has shown that social contacts are recalled in clusters of individuals that share a common feature (Bond & Brockett, ; Brewer, ; Fiske, ; Hills & Pachur, ). But social category is not the only plausible retrieval cue contributing to clustering during social sampling. What might these further cues be? One alternative factor structuring memory representations of one’s social environment is the frequency of contact with a person (Hills & Pachur, ). Analyses of how people distribute social interactions across their network members have shown a clear differentiation of subgroups of social network members according to the typical contact frequency (Hill & Dunbar, ; Pachur et al., ). In addition, frequency of contact might determine the organization of social memory because frequency is a key determinant of the retrieval probability of memory records (Anderson & Lebiere, ). To examine the possibility of search along social circles that are defined based on frequency of contact, both Pachur et al. () and Schulze et al. () asked participants to indicate for each recalled instance how frequently they typically had contact with the person (on a scale from  ¼ “less than once every  months” to  ¼ “several times per week”). These ratings were used to define social circles based on frequency of contact. The authors then compared model variants where the social circles were either defined by social category or by contact frequency. In both cases, the model variant assuming social circles defined by contact frequency was clearly outperformed. It cannot be ruled out, however, that frequency of contact plays a more important role in other judgment domains. Another cue to social memory is the social context in which people are encountered. While it may be rather uncommon in daily life to recall everyone one knows, there are: many contexts in which a person wants to bring to mind everyone who occupies some specific role to them: it is important to announce the new 

One may suspect that the different social categories are associated with different contact frequencies. Indeed, in the data sets analyzed by Schulze et al. (), people tended to indicate a higher contact frequency for family members than for friends, and contact frequency was, on average, lowest for acquaintances (see also Figure  in Hills & Pachur, ). The differences across social categories in average contact frequency were, however, not very pronounced, and contact frequency also varied considerably within each category. As a consequence, there were several cases for which the two circle definitions led to different predictions of the social-circle model, allowing for a discrimination of the model variants.

https://doi.org/10.1017/9781009002042.021 Published online by Cambridge University Press

Heuristic Social Sampling



baby to all of one’s friends, and to consult with all co-workers about the office party. So people must develop the capacity to recall people according to the social situation in which they encounter them, and their social roles. (Fiske, , p. )

Social category captures part of this demand, but social context may also matter. One social context that has, to date, received relatively little attention in research on social sampling is the digital online environment. In an increasingly digitalized world, where people have access to unprecedented amounts of information, the people with whom an individual predominantly communicates online may exert a particular influence on that person’s social judgments. Indeed, information proliferation through social media has been suggested to result in outcomes such as polarization, herding, and misinformation (Hills, ). Thus, an important direction for future research is to examine whether the mode of contact – offline vs online – impacts the judgments people make about social statistics and the cognitive processes that underlie them (even if offline and online networks have similar structural characteristics; Dunbar et al., ). By defining the social circles of the social-circle model based on whether network members are predominately encountered offline (e.g., through face-to-face interaction) or online (e.g., via social media platforms), the model can be used to investigate whether social sampling is contingent on the mode of contact (see Hecht et al., ). Moreover, this extension makes it possible to address which weight is assigned to instance knowledge recruited via online social media channels in people’s judgments. Finally, people might also take into account during social sampling how representative or typical an instance (or a group of instances) is for the reference class to which the inference refers (Hewstone & Lord, ; Rips, ). The social sampling model (Galesic et al., ) allows adjusting the sampling space such that proximal instances are less likely to be sampled than more distant ones. Such a mechanism can shield against the distorting influence of homophily – that is, when a characteristic occurs in clusters and people having the characteristic are more likely to know each other than people who do not have the characteristic (see also next section). In such situations, instances close to oneself are less likely to be representative of the larger reference class than more distant ones. Mapped onto the processes assumed by the social-circle model, if people believe that the instances in a social circle are rather atypical and thus unrepresentative, they might give that circle a low weight.

https://doi.org/10.1017/9781009002042.021 Published online by Cambridge University Press



 ,  

. The Ecological Rationality of Heuristic Social Sampling Limiting search to personally experienced instances or even a subset of one’s personal social network, as implemented by the social-circle model, violates the statistical dictum that more information is better. But what is the price of limiting search to one’s close social circles? Might it depend on statistical properties of the environment? In other words, what is the ecological rationality (Todd et al., ) of heuristic social sampling? To address this issue, Pachur et al. () conducted computer simulations in which the accuracy of both heuristic, dynamically expanding sampling and exhaustive sampling were tested in environments that varied on two important properties of real-world social environments. One ecological property was the skewness of the frequency distribution across events whose relative frequency was to be judged. Many environmental quantities measured in diverse biological, social, and physical systems follow highly skewed distributions, in which a select few events dominate all others (e.g., Clauset et al., ; Newman, ). An example is the distribution of personal wealth in a population: Typically, a small fraction of people owns a large fraction of the total wealth, and most people have relatively modest fortunes. The other typical characteristic of social environments investigated in the simulations was spatial clustering of instances. Such clustering arises because people tend to interact and form social ties with others who have similar sociodemographic, behavioral, and attitudinal characteristics, such as ethnicity, religion, level of education, or political opinions – a phenomenon known as homophily (e.g., McPherson et al., ). Does such clustering improve or hamper the performance of a heuristic, dynamically expanding sampling scheme with a stopping rule, relative to a fixed, exhaustive sampling scheme? In the simulations by Pachur et al. (), populations of agents were represented as two-dimensional grids, with each cell representing an agent. Each agent had a social network consisting of  neighboring agents on the grid, and the network was divided into four discrete social circles. Each circle was defined based on its spatial distance from the central agent. Across the agents in the social circles, instances of ten events were distributed. Each agent was an instance of at most one event. The task of the sampling strategies (heuristic vs exhaustive) was to infer which of two events occurred more frequently in the population (i.e., the entire grid), given different environmental characteristics. To manipulate frequency distribution, in one set of conditions the frequencies of the ten events

https://doi.org/10.1017/9781009002042.021 Published online by Cambridge University Press

Heuristic Social Sampling



Figure . Environmental properties manipulated in the computer simulations by Pachur et al. (): frequency distribution (Panel a) and spatial clustering (Panel b); and the resulting average sample size (Panel c) and accuracy (Panel d) achieved by exhaustive and heuristic sampling.

decreased evenly from the most to the least frequent event (flat frequency distribution; Figure .a); in a second set of conditions, the frequencies decreased steeply (skewed frequency distribution). Clustering of event occurrence was manipulated on three levels, such that instances of the same event had a high, medium, or low probability of occurring in close spatial vicinity to each other on the grid. Crossing the two types of frequency distribution (flat vs skewed) and the three types of spatial

https://doi.org/10.1017/9781009002042.021 Published online by Cambridge University Press



 ,  

distribution (no, medium, or high clustering) yielded a total of six environmental conditions; Figure .a-b illustrates the conditions. To evaluate the performance of the heuristic sampling strategy, which dynamically expands information search, relative to a strategy with fixed exhaustive sampling, the authors determined in how many of all pair comparisons the two sampling strategies chose the event that was actually more frequent in the entire population. In addition, they recorded the number of instances in the circles that were inspected before sampling was stopped and a decision was made. Let us first look at the search effort incurred by the two sampling strategies. Figure .c shows that while the exhaustive strategy necessarily considered the maximum number of  agents in all environments, the sample size for heuristic sampling was considerably smaller (on average, – agents); search effort was also affected by environmental structure. Clustering in particular affected sample size: Stronger clustering increased the number of agents that were considered for a judgment in heuristic sampling. In all environments, heuristic sampling constrained search relative to exhaustive sampling (i.e., stopped search before reaching the last circle) in three quarters or more of cases. Next, given that heuristic sampling considerably reduces search effort, what is the effect on accuracy? Does the price depend on properties of the environment, and if so, how? Answers to these questions can be found in Figure .d. For both strategies, performance was influenced by skewness and clustering in the environment. Without clustering (i.e., events were distributed randomly), exhaustive sampling by far outperformed heuristic sampling; averaged across flat and skewed environments, the margin was . percentage points. However, with a high level of clustering, the advantage of exhaustive sampling disappeared, with the margin shrinking to . percentage points in the flat environments and . percentage points in the skewed environments. Despite its violation of the normative dictum to consider all available information, heuristic sampling got away with truncating search as soon as a decision could be made. This suggests that the stopping rule operating in heuristic sampling is highly efficient and, at least under some ecological conditions, tends to ignore information that is redundant anyway. Finally, for both sampling strategies, performance was



Visit https://taming-uncertainty.mpib-berlin.mpg.de/chapter--element- for an interactive element that illustrates the interplay between skewness of the frequency distribution and spatial clustering in shaping environmental structure.

https://doi.org/10.1017/9781009002042.021 Published online by Cambridge University Press

Heuristic Social Sampling



higher in skewed than in flat frequency distributions (except for exhaustive sampling in an environment without clustering); however, the frequency distribution had a comparatively smaller impact on the strategies’ accuracy than did clustering. In sum, heuristic social sampling as implemented in the social-circle model, that starts with a small sample and sequentially expands search only if that sample does not yield sufficient evidence, is an efficient tool for making inferences about event frequencies in the population under conditions that are common in natural environments (in particular, event clustering). But like any tool it is subject to ecological rationality: It works well in some ecologies but not in others.

. Conclusion In statistics there are two visions of information sampling for making inferences about the world. In Neyman–Pearson statistical tests, sample size is fixed prior to data collection and it follows from a desired probability of Type I and Type II errors. In sequential sampling, pioneered by Abraham Wald (), the amount of sampling is dynamically expanded depending on the amount of evidence accumulated so far and sampling is truncated as soon as an evidence threshold is exceeded. Existing models of social sampling have followed the Neyman–Pearson notion of sampling. Yet as this chapter shows, it is possible that the intuitive social statistician searches for information in a way that is more in line with the Waldian notion of sequential sampling. Recall of instance knowledge from social memory is organized by social categories, and in the social-circle model it is assumed that social categories are used to chunk instance knowledge and that these chunks are probed sequentially during sampling. This perspective, which has garnered promising empirical support, also makes it possible to link social sampling to existing models of bounded and ecological rationality. In that sense, research on the social-circle model also highlights the ecological flavor of Festinger’s () seminal ideas about social sampling informing social judgment. REF ERE NCE S Anderson, J. R., & Lebiere, C. J. (). The atomic components of thought. New York: Psychology Press. Bond, C. F., & Brockett, D. R. (). A social context-personality index theory of memory for acquaintances. Journal of Personality and Social Psychology,  (), –.

https://doi.org/10.1017/9781009002042.021 Published online by Cambridge University Press



 ,  

Bond, C. F., Jones, R. L., & Weintraub, D. L. (). On the unconstrained recall of acquaintances: A sampling-traversal model. Journal of Personality and Social Psychology, (), –. Brewer, D. D. (). Patterns in the recall of persons in a department of a formal organization. Journal of Quantitative Anthropology, , –. Bröder, A., & Schiffer, S. (). Take the best versus simultaneous feature matching: Probabilistic inferences from memory and effects of representation format. Journal of Experimental Psychology: General, (), –. Brown, G. D., Gardner, J., Oswald, A. J., & Qian, J. (). Does wage rank affect employees’ well-being? Industrial Relations, (), –. Clauset, A., Shalizi, C. R., & Newman, M. E. (). Power-law distributions in empirical data. SIAM Review, (), –. Dawtry, R. J., Sutton, R. M., & Sibley, C. G. (). Why wealthier people think people are wealthier, and why it matters: From social sampling to attitudes to redistribution. Psychological Science, (), –. Dougherty, M. R., Gettys, C. F., & Ogden, E. E. (). Minerva-DM: A memory processes model for judgments of likelihood. Psychological Review, (), –. Dunbar, R. I., Arnaboldi, V., Conti, M., & Passarella, A. (). The structure of online social networks mirrors those in the offline world. Social Networks, , –. Einhorn, H. J. (). The use of nonlinear, noncompensatory models in decision making. Psychological Bulletin, (), –. Epstein, J. M. (). Learning to be thoughtless: Social norms and individual computation. Computational Economics, (), –. Festinger, L. (). A theory of social comparison processes. Human Relations, (), –. Fiedler, K., & Juslin, P. (Eds.). (). Information sampling and adaptive cognition. New York: Cambridge University Press. Fiske, A. P. (). Social schemata for remembering people: Relationships and person attributes in free recall of acquaintances. Journal of Quantitative Anthropology, (), –. Galesic, M., Olsson, H., & Rieskamp, J. (). Social sampling explains apparent biases in judgments of social environments. Psychological Science, (), –. (). A sampling model of social judgment. Psychological Review, (), –. Gigerenzer, G., Hertwig, R., & Pachur, T. (Eds.). (). Heuristics: The foundations of adaptive behavior. New York: Oxford University Press. Heathcote, A., Brown, S. D., & Wagenmakers, E.-J. (). An introduction to good practices in cognitive modeling. In B. U. Forstmann & E.-J. Wagenmakers (Eds.), An introduction to model-based cognitive neuroscience (pp. –). New York: Springer. Hecht, M., Pachur, T. & Schulze, C. (). Does social sampling differ between online and offline contacts? A computational modeling analysis. In J.

https://doi.org/10.1017/9781009002042.021 Published online by Cambridge University Press

Heuristic Social Sampling



Culbertson, A. Perfors, H. Rabagliati, & V. Ramenzoni (Eds.), Proceedings of the th Annual Meeting of the Cognitive Science Society (pp. –). Cognitive Science Society. Henrich, N., & Henrich, J. P. (). Why humans cooperate: A cultural and evolutionary explanation. New York: Oxford University Press. Hertwig, R., Pachur, T., & Kurzenhäuser, S. (). Judgments of risk frequencies: Tests of possible cognitive mechanisms. Journal of Experimental Psychology: Learning, Memory, and Cognition, (), –. Hewstone, M., & Lord, C. G. (). Intergroup behavior: The role of typicality. In C. Sedikides, J. Schopler, C. A. Insko, & C. Insko (Eds.), Intergroup cognition and intergroup behavior (pp. –). New York: Psychology Press. Hill, R. A., & Dunbar, R. I. (). Social network size in humans. Human Nature, (), –. Hills, T. T. (). The dark side of information proliferation. Perspectives on Psychological Science, (), –. Hills, T. T., & Pachur, T. ((d)). Dynamic search and working memory in social recall. Journal of Experimental Psychology: Learning, Memory, and Cognition, (), –. Lichtenstein, S., Slovic, P., Fischhoff, B., Layman, M., & Combs, B. (). Judged frequency of lethal events. Journal of Experimental Psychology: Human Learning and Memory, (), –. McPherson, M., Smith-Lovin, L., & Cook, J. M. (). Birds of a feather: Homophily in social networks. Annual Review of Sociology, (), –. Newman, M. E. (). Power laws, Pareto distributions and Zipf’s law. Contemporary Physics, (), –. Nosofsky, R. M. (). Attention, similarity, and the identification–categorization relationship. Journal of Experimental Psychology: General, (), –. Pachur, T., Hertwig, R., & Rieskamp, J. (). Intuitive judgments of social statistics: How exhaustive does sampling need to be? Journal of Experimental Social Psychology, (), –. Pachur, T., Hertwig, R., & Steinmann, F. (). How do people judge risks: Availability heuristic, affect heuristic, or both? Journal of Experimental Psychology: Applied, (), –. Pachur, T., Schooler, L. J., & Stevens, J. R. (). We’ll meet again: Revealing distributional and temporal patterns of social contact. PLoS ONE, (), e. Payne, J. W., Bettman, J. R., & Johnson, E. J. (). The adaptive decision maker. Cambridge, UK: Cambridge University Press. Pitt, M. A., & Myung, I. J. (). When a good fit can be bad. Trends in Cognitive Sciences, (), –. Rips, L. J. (). Inductive judgments about natural categories. Journal of Verbal Learning and Verbal Behavior, (), –. Ross, L., Greene, D., & House, P. (). The “false consensus effect”: An egocentric bias in social perception and attribution processes. Journal of Experimental Social Psychology, (), –.

https://doi.org/10.1017/9781009002042.021 Published online by Cambridge University Press



 ,  

Schafer, M., & Schiller, D. (). Navigating social space. Neuron, (), –. Schulze, C., Hertwig, R., & Pachur, T. (). Who you know is what you know: Modeling boundedly rational social sampling. Journal of Experimental Psychology: General, (), –. Schulze, C., Pachur, T., & Hertwig, R. (). How does instance-based inference about event frequencies develop? An analysis with a computational process model. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the th Annual Meeting of the Cognitive Science Society (pp. –). Simon, H. A. (). Rational choice and the structure of the Psychological Review, (), –. Smith, E. R., & Zarate, M. A. (). Exemplar-based model of social judgment. Psychological Review, (), –. Todd, P. M., Gigerenzer, G., & the ABC Research Group. (). Ecological rationality: Intelligence in the world. Oxford University Press. Troyer, A. K., Moscovitch, M., Winocur, G., Alexander, M. P., & Stuss, D. (). Clustering and switching on verbal fluency: The effects of focal frontal-and temporal-lobe lesions. Neuropsychologia, (), –. Tunçgenç, B., El Zein, M., & Sulik, J., et al. (). Social influence matters: We follow pandemic guidelines most when our close circle does. British Journal of Psychology, , –. Van den Bos, W., & Hertwig, R. (). Adolescents display distinctive tolerance to ambiguity and to uncertainty during risky decision making. Scientific Reports, (), –. Wald, A. (). Statistical decision functions. New York: Wiley. Wood, A. M., Brown, G. D., & Maltby, J. (). Social norm influences on evaluations of the risks associated with alcohol consumption: Applying the rank-based decision by sampling model to health judgments. Alcohol and Alcoholism, (), –.

https://doi.org/10.1017/9781009002042.021 Published online by Cambridge University Press

 

Social Sampling for Judgments and Predictions of Societal Trends Henrik Olsson, Mirta Galesic, and Wändi Bruine de Bruin

How do we make judgments about our social worlds? And how do properties of our social worlds influence our judgments? To navigate our lives, we often estimate the frequencies of different behaviors, beliefs, preferences, and intentions of others in various relevant social environments. The capacity to make these social judgments, and draw inferences about other individuals and groups, is crucial for comparing oneself with others and setting personal aspirations (Festinger, ; Suls et al., ), judging whom to learn from and with whom to cooperate (Hertwig & Hoffrage, ; Kendal et al., ), learning complex social network relations (Lynn & Bassett, ; Tompson et al., ), understanding descriptive norms (Cialdini, ), and assessing the frequency of environmental risks (Lichtenstein et al., ). Indeed, as we will describe in this chapter, people’s social judgments reveal valuable information about actual societal trends such as voting intentions and vaccination behavior. Since social judgments are important to human cognition, it might seem puzzling that decades of research in social psychology have produced a long list of apparent biases in social cognition (Jussim, ; Krueger & Funder, ), often including opposing effects such as false consensus and false uniqueness, where people with a particular attribute can either judge this attribute to be more or less common in the general population. Other examples are self-enhancement and self-depreciation, where people can incorrectly judge themselves to be either better or worse than other people. As we will discuss later on, social psychologists have mostly focused on cognitive and motivational explanations for these biases. They have pointed to people’s desire to feel good about themselves (Alicke &

The authors were supported in part by a grant from the National Science Foundation (MMS).



https://doi.org/10.1017/9781009002042.022 Published online by Cambridge University Press



 ,    .

Govorun, ) and the low cognitive ability of people who perform worse than others on a particular task (Kruger & Dunning, ). This apparent contradiction between the importance of social judgments, and the results indicating deficiencies in people’s abilities to make these judgments, can be resolved by investigating the scope of such judgments and how they interact with the properties of social and task environments. People seem to be relatively accurate in estimating frequencies of different attributes in their immediate social circles (Galesic et al., ; Nisbett & Kunda, ). However, estimating frequencies in broader populations with which they do not have direct experience is more difficult as it requires making inferences from other sources of knowledge, including their social contacts. As we explain in this chapter, the network structure of these social circles, the statistical properties of broader populations, as well as task characteristics, interact with unbiased basic cognitive processes underlying social judgments. When this complex social-cognitive system is not taken into account and cognition is studied in isolation from its social and task environment, social judgments can appear to be a product of motivational biases and cognitive errors (Galesic et al., ; Galesic & Olsson et al., ; Pachur et al., ; Schulze et al., ). We first discuss how people’s social judgments vary with their social environments and the task at hand (Structure of Social and Task Environments). We then describe our model of social judgments which explicitly relates basic cognitive processes with the structure of social and task environments (Social Sampling Processes; Galesic et al., ). We show how it can explain empirically observed findings on biases in people’s estimates of frequencies in broader populations, such as false consensus and self-enhancement together with their opposites, false uniqueness and self-depreciation. We further show that these findings have important practical implications (Using Social Judgments to Describe and Predict Societal Trends). If people’s knowledge about their social circles is relatively accurate, then, with appropriate sampling of respondents, asking people about their social circles can be used to predict and describe societal trends. We show that social-circle questions produce better predictions of elections than traditional questions about people’s own intentions, provide good descriptions of various population attributes, and help predict people’s future voting and vaccination behavior (Bruine de Bruin et al., ; Galesic et al., ; Galesic & Bruine de Bruin et al., ; Galesic & Olsson et al., ; Olsson & Bruine de Bruin et al., ). We end with a discussion of related and further research in the section Sampling Social Worlds: Outlook.

https://doi.org/10.1017/9781009002042.022 Published online by Cambridge University Press

Judgments and Predictions of Societal Trends



. Structure of Social and Task Environments People’s judgments are not made in a vacuum, but are influenced by the surrounding environment (e.g., Anderson, ; Gigerenzer et al., , ; Simon, ). To fully understand how people form their social judgments, it is necessary to look outside of the inner workings of the mind and investigate how their thinking interacts with their environment. Here we focus on three important environmental properties: properties of people’s immediate social networks, frequency distributions of the attributes in broader populations being judged, and the specific task requirements people are faced with. .. Social Network Properties Many aspects of social networks can affect how people experience and evaluate their worlds (Jackson, ). Examples are the number of contacts one has, the number of contacts one’s friends have, the extent to which one connects otherwise disconnected parts of one’s network, strength and length of paths in one’s network, and balance of one’s relationships (see section Representing Social Environments for further discussion). Here we will focus on one property that is particularly relevant for social judgments: homophily, whereby people with similar attributes tend to live close to each other and move in similar social circles. This tendency can be driven by selective attraction to similar others, by mutual influence of people who interact with each other, as well as by common environmental factors influencing people living close to each other (Christakis & Fowler, , ; Shalizi & Thomas, ). Homophily is higher for some attributes and in some social circles than in others, for a variety of reasons (McPherson et al., ). One immediate consequence of homophily is that people’s reports of beliefs, intentions, and other attributes of people in their social circles will resemble their own. This homophily bias can contribute to explanations of false consensus and false uniqueness, as we show later on (see section Explaining Social Judgment Phenomena). This bias also has methodological consequences when using people’s social circles to describe broader populations. Unless people reporting about their social circles are selected to be representative of the overall population, the homophily biases in their social circles will lead to overall biased population estimates (see section Using Social Judgments to Describe and Predict Societal Trends).

https://doi.org/10.1017/9781009002042.022 Published online by Cambridge University Press



 ,    . .. Shape of Frequency Distributions

Attributes of social environments have different frequency distributions. Some, such as income and health problems, have highly skewed distributions, while others such as number of friends or education are more symmetrically distributed (Galesic et al., ; Galesic & Olsson et al., ; Nisbett & Kunda, ). It has long been recognized that the shape and other characteristics of frequency distributions have consequences for judgments, especially if paired with questions asking about certain properties of these distributions such as the mean or median (for an overview of the effects of characteristics of frequency distributions see Gigerenzer et al., ). In the social domain, perhaps the most prominent example is the better-than-average effect, where it appears that most people self-enhance and place themselves as being higher in skill than the average person (Chambers & Windschitl, ). At a first glance, it might seem that it is “logically impossible for most people to be better than the average person” (Taylor & Brown, , p. ). Researchers have explored explanations based on biased processing in the forms of overconfidence, unrealistic optimism, and illusion of control (Chambers & Windschitl, ), as well as explanations based on differential information whereby people typically have better information about themselves than they do about others (Moore & Small, ). Motivational and cognitive processes alone can certainly play a role in explaining this tendency to self-enhance. It has been argued, however, that part of this effect can be attributed to the skewness of population distributions and how people understand the concept of average (Lopes, ). For example, the distribution of safe drivers is left-skewed (e.g., most drivers are not involved in accidents, see Schwing & Kamerud, ), thus, it is possible that most drivers are safer than the average. In addition, as we will describe later, sampling processes typically lead to self-enhancement effects in left-skewed distributions, further contributing to the apparent overestimation of own driving skills (see the section on Self-Enhancement and Self-Depreciation). ..

Task Requirements

The way a question is asked will influence what information people pay attention to and therefore can affect their social judgments. Such response format effects are well known in the survey methodology literature (Tourangeau et al., ), as well as in the judgment and decision making literature. It has been shown that people can appear overconfident, well calibrated, or underconfident in the same task depending on the response

https://doi.org/10.1017/9781009002042.022 Published online by Cambridge University Press

Judgments and Predictions of Societal Trends



format (Juslin et al., , ). Similarly, studies of social judgments use different response formats. For example, in investigations of the falseconsensus effect researchers might ask about the percentages of both people that exhibit a certain behavior and those who do not (e.g., “What % of your peers do you estimate would agree to carry the sandwich board around campus? What % would refuse to carry it?”; Ross et al., , p. ). Other studies ask only about those who exhibit a behavior (e.g., “What percentage of students do you think agreed to wear the sign?” Krueger & Clement, , p. ). Because of memory retrieval processes, the relative size of the category that people are asked about can be underestimated, which will in turn affect the size of the observed false consensus effects (see the section on False Consensus and False Uniqueness). Other format dependence effects are also possible. For example, in social comparison studies one of two types of questions are typically used: indirect or direct estimates of one’s own position relative to that of others (Chambers & Windschitl, ). Direct questions ask people how they think they compare with others on a given attribute, and their answer is taken as their perceived social position. Indirect questions first ask people to self-report about some attribute, and then ask them to report about other people. Typically, the direct method tends to yield stronger evidence of systematic social comparison biases than the indirect method (Chambers & Windschitl, ). As we show later, a possible explanation for the difference between the two is that with the indirect method participants might search their memories more systematically, leading to an increase in overall accuracy (Galesic & Olsson et al., ). Next, we describe our work on a model that integrates different properties of social and task environments with cognitive processes underlying social judgments.

. Social Sampling Processes The social sampling model, illustrated in Figure ., assumes that people form judgments about frequencies in broader populations by recalling what they know about their social circles. In other words, people rely on social sampling from memory (Galesic et al., ; Galesic & Bruine de Bruin et al., ; Pachur et al., ; Schulze et al., ). According to the social sampling model, illustrated in Figure ., when asked about the prevalence of an attribute in a certain population, people sample instances from memory representations of their social circles that are similar to the members of that population (the reference class). These instances constitute their retrieved sample. A part of those instances is recalled as having

https://doi.org/10.1017/9781009002042.022 Published online by Cambridge University Press



 ,    . (a)

(b)

Figure . (a) Simulated example of the social sampling process when the population includes two levels of the target characteristic (voters of red and blue parties) and two levels of homophily. Left (right) panel shows an example of a social environment characterized by high (low) homophily. The level of homophily affects the over- or underestimation of population values. (b) Empirical examples of the social sampling process for populations characterized by right-skewed, left-skewed, and symmetrical distributions. Cumulative population estimates enable comparing their estimated population percentile with their true percentile, as indicated by the example guidelines for a hypothetical worse-off (left of center on the x-axis) and better-off (right of center) person. Depending on the distribution shape, the resulting patterns resemble self-depreciation, self-enhancement, or both effects. Better-off (worse-off) people are those who are positioned at one of the top (bottom) three categories of the population distribution.

the attribute in question, and used to derive an estimate of the frequency of people with that attribute in the population. More precisely, the judgment of the relative frequency p(C|R) of an attribute C in the population (or reference class) R is based on two levels of activation: the activation ARi of instance i due to it belonging to reference class R and the activation ACi due to it having attribute C. We

https://doi.org/10.1017/9781009002042.022 Published online by Cambridge University Press

Judgments and Predictions of Societal Trends



assume a simple activation function where the activation is one if the instance belongs to the reference class or if it has the target attribute, and zero otherwise. The judgment of the relative frequency pðC jRÞ is then expressed as the ratio: n X

pðC jRÞ ¼

i¼

α  AC i  ARi n X

:

(17.1)

ARi

i¼

In Equation ., the denominator is the retrieved sample, that is all instances of the social circle that are activated in memory because of their similarity to the reference class. The numerator consists of those instances in the sample that are activated because they are recalled to have the target attribute. We assume that people do not have perfect recall. The recall probability parameter α determines the probability of retrieving those instances in the sample that have the attribute C. For example, if one is asked to estimate the proportion of the general population who will vote red in the next election (as in Figure .), one can activate the sample of all people in their social circle who are similar to voters in the general population, and then try to recall roughly who among them seems likely to vote red. Although the main mechanism in the social sampling model is based on sampling from people’s social circles, this does not mean that people neglect to incorporate other information in their social judgments. Specifically, we assume that people are roughly aware of homophily in their social circles and know that people around them are likely more similar to them on various attributes than are people in a broader population. To solve this challenge and activate a sample from memory that is more similar to a broader population (reference class R) of interest, people can use cues that are correlated with the membership in that class. These cues can be gender, age, education, or other beliefs, or even simple dissimilarity to self on the target attribute, with instances being more likely activated in the sample the higher is their value on those cues. In the example above, to estimate the proportion of red voters in the next election, one can activate a sample of their social contacts who seem in some ways similar to the overall population of voters (e.g., by age, gender, or location). Alternatively, one can use a self-similarity cue that is based on who seem to differ most from oneself according to their voting preferences (see section Tests of Model Assumptions for empirical support for the use of this cue). We implement this sampling process in the social sampling model by assuming that the sample of instances is augmented to only

https://doi.org/10.1017/9781009002042.022 Published online by Cambridge University Press



 ,    .

include the ρ most similar instances to the reference class. Thus, the activation A Ri becomes, ARi ¼  if pct i  ρ, else ARi ¼ ,

(17.2)

where pct i refers to the percentile of instance i among all n instances sorted by their similarity to the reference class R from highest to lowest. The parameter ρ determines the percentile of the least similar instance that is still included in the sample. This means that   ρ instances that are least similar to the reference class are not activated. Higher values of ρ mean that larger proportion of all instances of the social environment that are stored in memory are activated and included in the sample. Both α and ρ are free parameters that can vary between zero and one depending on the specifics of the task and social environments and individual differences. To illustrate how the social sampling processes in our model interact with social and task environments, consider a task where people are asked to judge the percent of people in the country that they believe are going to vote for either a red or blue party in the national elections (Figure .). Assume that a voter who intends to vote for the red party is asked to estimate the percentage of people that they believe will also vote for the red party. Likewise, a blue voter is also asked to estimate the percentage of people that they believe will vote for the red party. In this example, the actual election result showed that  percent voted for the red party. The target attribute C is the intention of voting for the red party and the reference class R is the voting population of the country. Further assume that the red and blue voters have social circles that are characterized by high homophily ( percent vs  percent red social circle voters; high homophily in Figure .). Here, homophily is measured by the Coleman Index, which is defined as the difference between percentage of red voters in the social circle of red voters and the percentage of red voters in an average social circle, divided by the percentage of blue voters in an average social circle (Coleman, ; Signorile & O’Shea, ). To arrive at a population estimate, both the red and blue voters discard (for example)  percent of the social circle instances (ρ ¼ .) most similar to the voter. All of these discarded instances have the intention of voting red, so the resulting sample percentage is  percent for the red voter and  percent for the blue voter (sample bars, high homophily panel in Figure .). The voters’ recall, however, is not perfect (e.g., α ¼ .) resulting in a population estimate of  percent for the red voter and  percent for the blue voter (population estimate, high homophily column in Figure .). Note that in both the social circle and in the sample the percentages of red and blue

https://doi.org/10.1017/9781009002042.022 Published online by Cambridge University Press

Judgments and Predictions of Societal Trends



must sum to  percent. The population estimates, however, do not need to sum to  percent because there are two groups of people, red and blue, that make judgments of the percentages of red in the population. If the homophily in the social circles is lower ( percent vs  percent red social circle voters; low homophily in Figure .), the same social sampling processes (in this example, ρ ¼ α ¼ .) lead to population estimates where the red voter is underestimating the election result more than the blue voter ( percent vs  percent; population estimate, low homophily in Figure .). The process works the same way when there are several levels of the attribute, for example, several parties. It is assumed that the voter first activates the level with the highest frequency (typically the level where the asked voter is located) and ends with the level with the lowest frequency (see Galesic et al., , figure , for evidence that people indeed use this order of activation). .. Tests of Model Assumptions The social sampling model has three main assumptions. The first is that people use their knowledge about their social circles as a basis for their judgments of frequencies of different attributes in broader populations. We therefore expect that differences in individual social circles should be reflected in people’s population estimates. Specifically, people whose social circles are more representative of the overall population should give more accurate population judgments than people who have social circles that are less representative of the general population. We tested this assumption on a large national probabilistic sample of the Dutch population (N ¼ ,; described in detail in Galesic et al., ). As expected, there was a positive relationship (r ¼ .) between the representativeness of the social circle (measured as the correlation of individual social circle distributions across ten different properties and true population distributions), and the accuracy of population estimates (measured as the correlation between individual population estimates and true population distributions). There was also a positive correlation (r ¼ .) between deviations of participants’ social circle distributions from true population distributions on the one hand and deviations of their population estimates from true population distributions on the other (Galesic & Olsson et al., ). We also found support for this assumption in our US presidential election study (Olsson & Bruine de Bruin et al., ). Specifically, we found that participants whose social circles are more similar to the state population in terms of age

https://doi.org/10.1017/9781009002042.022 Published online by Cambridge University Press



 ,    .

and education also have voting expectations in their social circles that are more representative of the voting intentions in the state population. The second main assumption in the social sampling model is that socialcircle estimates represent people’s immediate social environments reasonably well. Averaged across probabilistic national samples, they should describe the overall population relatively accurately. In contrast, people’s population estimates will show specific biases resulting from the interaction of sampling processes and environmental structures. In support for this assumption, we find that the average of the social-circle estimates was in good agreement with true population distributions across ten different attributes, both qualitatively (see examples in Figure .a) and quantitatively (median r ¼ . and median RMSD ¼ .; Galesic et al., ). As a comparison, an average of participants’ population estimates showed larger deviations from the population values (median r ¼ . and median RMSD ¼ .). This difference in accuracy between the social-circle estimates and the population estimates is also present in the predictions of the mean and standard deviations of the population distributions (Galesic & Olsson et al., ). In addition to providing evidence for this main assumption in the social sampling model, these results also suggest that it is better to use people’s social circle estimates than direct population estimates in other contexts, such as forecasting societal trends. Motivated by this assumption, we find that social-circle questions can usefully describe and predict societal trends such as election results (see section on Using Social Judgments to Describe and Predict Societal Trends). The third main assumption in the social sampling model is that people sample their social circles according to different cues, of which an important one might be the self-similarity cue. The application of this cue is based on a similarity of social-circle instances in memory to oneself on the target attribute. If people can disregard similar instances when making population estimates, we would expect a negative difference between social circle and population estimates for their own category and positive differences for more distant categories. For example, people with a high income will on average have a relatively large proportion of high-income people in their social circles. Due to them discarding instances that are similar to themselves, their estimates of the proportion of high-income people in the general population will be lower than their estimates of high-income people in their social circles. This also means that the relative frequencies estimated from the remaining instances in the sample will lead them to give higher population estimates for other income levels than they do for

https://doi.org/10.1017/9781009002042.022 Published online by Cambridge University Press

Judgments and Predictions of Societal Trends



Figure . (a) Average of social-circle estimates tracks population distributions of different attributes well, and better than the average of people’s population estimates. Absolute errors in brackets. Data from the Netherlands (N = ,)

Figure . (b) Social-circle question produced better predictions of elections in France  (N ¼ ,), Netherlands  (N ¼ ,), and Sweden  (N ¼ ,) than own-intention and (in the Netherlands) election-winner questions. Absolute errors in brackets.

Figure . (c) The social-circle question produced overall the lowest error of predictions in three recent US elections ( and  presidential elections and  midterm elections), across several survey waves (N > ,).

https://doi.org/10.1017/9781009002042.022 Published online by Cambridge University Press



 ,    .

their social circles. We find this pattern empirically across a number of different target attributes (figure  in Galesic & Olsson et al., ). These assumptions reflect social aspects of the social sampling model in two ways. First, they involve judgments about broader populations of people and these judgments are based on knowledge about an individual’s immediate social environment. Second, and more importantly, the social sampling model applies to a particular structure of social environments, characterized by homophily. People are surrounded by similar others and to make judgments about broader social worlds they need to use different cues to discern how representative a sample is for the population of interest, for example by using the self-similarity cue described above. These inherent environmental structures are not necessarily present for other types of nonsocial stimuli. There are many ways in which the social sampling model is similar to any judgment model that uses memory instances as a basis for frequency judgments, for example the naïve sampling model (Juslin et al., ). The three assumptions described above could be restated in terms of nonsocial knowledge and make the social sampling model applicable to a wider range of judgments about broader populations. For example, if a stamp collector is asked about frequency of attributes of stamps in general, she might be basing some of those judgments on her own stamp collection. This corresponds to the assumption that people use knowledge about their social circles as a basis for their judgments of frequencies of different attributes in broader populations. Further, the stamp collector might be assumed to have a quite accurate representation in memory of her collection. This would correspond to the assumption that people represent their immediate social environment reasonably well. Finally, if a collector who owns mostly stamps from s is asked to make judgments about stamps from others decades in the twentieth century, she might use cues to differentially sample instances of her own stamps, sampling from memory those instances of her collection that are least similar to the s stamps. This would correspond to differentially sampling social circles using similarity or other cues. While the social sampling model can be applied to nonsocial inputs, the role of sampling cues such as self-similarity and others that help make the sample more representative, as in our third assumption, might be less important for the nonsocial inputs. Compared to the social circles that are typically at least partially self-selected in accordance with one’s own

https://doi.org/10.1017/9781009002042.022 Published online by Cambridge University Press

Judgments and Predictions of Societal Trends



characteristics, many nonsocial samples are sampled irrespectively of one’s own characteristics. For example, the stamp collector might like s stamps even though she was born in the s. It might also be easier for a stamp collector to find and collect rare stamps that are completely dissimilar to all other stamps she has, than it is to a person to find and befriend others who are completely dissimilar to them. Socializing with very different others can also have coordination and reputation costs, which are probably larger than the cost of collecting an unusual stamp. And, nonsocial samples typically do not form themselves purposively to be similar to the sampler, while social samples do. For example, our friends have often chosen us just as much as we have chosen them, in contrast to for example, apples in the supermarket or stamps in our collection. For all these reasons, social samples might be more likely to trigger some sort of adjustment process such as in the third assumption described above, while the naïve sampling assumption (Juslin et al., ; Lindskog et al., ) might be more appropriate for inference from nonsocial samples. Life-long experience with various samples might teach people that some samples need to be corrected more than others. This relationship between the self-similarity of different samples in the real world and the corrective cognitive processes used to make inferences from those samples could be an interesting area for future research. The assumptions in the social sampling model do not make any references to the process that created the memory representations of people’s social circles and how estimates of social circle attributes are formed. In the social sampling model, we are modeling the processes of making judgments of broader populations, not social-circle judgments. In the applications of the model so far, we have taken the social-circle judgments as given and used those as one part of the model. However, a full understanding of both social-circle and population judgments would need to have a process model of how people make judgments of attributes in their social circles. There are several possibilities here, including a structured search through social memory as in the social-circle heuristic (Pachur et al., ) and the social-circle model (Schulze et al., ), coupled with specific ego-projection, frequency-based, memory, and inference strategies for making social-circle estimates (Olsson & BarmanAdhikari et al., ). We next show how the social sampling model can explain pairs of prominent inconsistent biases in human social cognition.

https://doi.org/10.1017/9781009002042.022 Published online by Cambridge University Press



 ,    . .. Explaining Social Judgment Phenomena

... False Consensus and False Uniqueness The false consensus effect (Ross et al., ) refers to a phenomenon that people who endorse a particular view or exhibit a certain behavior believe that this view or behavior is more common overall than do people with different views or behaviors. False consensus has been reported so often in the literature that it has been considered to be an automatic response (Krueger & Clement, ). However, an opposite bias called false uniqueness has also been documented (Frable, ; Mullen et al., ). People holding a particular view sometimes believe that their view is less popular than do people holding a different view. A wide variety of explanations have been proposed to explain false consensus, including selective exposure, salience influences, motivational effects, or Bayesian reasoning (for a review, see Marks & Miller, ). None of these, however, can simultaneously explain false consensus and false uniqueness. Both effects can be explained, to some extent, solely by the homophily or heterophily in people’s social circles. When people have homophilous social circles, where the proportion of people like them is larger than in the overall population, false consensus will always occur. If people have heterophilous social circles, in which they are in a relative minority, false uniqueness follows (Lee et al., ). If people with different attributes have different levels of homophily (e.g., in some contexts Democrats might be more likely to socialize with Democrats than Republicans with Republicans), these patterns can be altered but as long as on average across groups there is homophily (heterophily), social judgments should show false consensus (false uniqueness) effects. This simple network-based explanation cannot, however, explain some additional empirically documented effects. Specifically, false uniqueness can occur even when people have homophilous social circles (see figure  in Lee et al., , and figure  in Galesic & Olsson et al., ). In the social sampling model, this is explained by the parameter ρ < , which helps to retrieve samples from social circles that are more similar to the broader population than one’s own social circle, thus counteracting homophily. Figure . illustrates how the sampling process interacts with homophily to produce either false consensus or false uniqueness. When homophily is high, the red voter estimates that there are more red voters in the population ( percent) than the blue voter does ( percent). This difference ( percent/ percent) in estimates resembles false consensus.

https://doi.org/10.1017/9781009002042.022 Published online by Cambridge University Press

Judgments and Predictions of Societal Trends



When homophily is lower the blue voter will still encounter fewer red voters than there are in the general population ( percent) than the red voter ( percent), but the difference from the population value ( percent) is smaller than before. This can reverse the direction of the effect and give the appearance of false uniqueness, where blue voter’s estimate of the red voters in the population is now higher than the red voter’s estimate ( percent vs  percent). Furthermore, we have found (Galesic et al., ) that the size of false consensus is smallest when people are asked to report the overall population prevalence of people like them (e.g., if Democrats are asked about Democrats and Republicans about Republicans), and largest when people are asked to report about people that are not like them (e.g., if Democrats are asked about Republicans and vice versa). In the social sampling model, this is explained by the parameter α. As long as the parameter α is smaller than one, the model predicts that people will underestimate the prevalence of the category they are asked about. If they are not explicitly asked about the other category and their answer is inferred by deducting their estimate for the first category from  percent, then the prevalence of the alternative category will be overestimated. Consequently, when people are asked about people that are like them, they will underestimate their own group more, leading to smaller false consensus effects than when they are asked about the people that are not like them. If they are asked both about people like themselves and people not like themselves, then the social sampling model predicts no bias in absence of further assumptions about sampling processes (specifically, ρ < ) and the presence of homophily. In sum, in the social sampling model, the appearance of false consensus or false uniqueness results from an interplay of the level of homophily in people’s social circles and the sampling process, described by parameters ρ and α. The social sampling model can predict occurrence of both effects even in the circumstances of high homophily, and can explain why the effect sizes are different when people are asked for their frequency estimates in different ways. We tested the social sampling model predictions of both effects in data obtained from samples of participants in the USA and Germany across a variety of target attributes (Galesic & Olsson et al., ). In both countries there is a positive relationship between homophily and the size of false consensus effects (r ¼ . in the USA and r ¼ . in Germany) and model fits using the average parameter values obtained from each country show that the social sampling model can capture the main trends in the data quantitively (figure  in Galesic &

https://doi.org/10.1017/9781009002042.022 Published online by Cambridge University Press



 ,    .

Olsson et al., ). The same pattern of results was found in a study that investigated peoples’ population estimates of vaccination coverage and flu prevalence, where participants with less homophilous social circles showed less false consensus and tended toward false uniqueness (Bruine de Bruin et al., ). Taken together, the predictions from the social sampling model and our empirical results, alert to the possibility that false consensus is not such a robust effect as is typically assumed, but that it is shaped by social network characteristics and by the way questions are asked. ... Self-Enhancement and Self-Depreciation Self-enhancement effects such as the better-than-average and optimism biases are considered to be among the most robust findings in the literature on social comparison across a wide range of behaviors and attributes, including driving ability, intelligence, friendliness, and future prospects (e.g., Alicke & Govorun, ; Chambers & Windschitl, ; Roese & Olson, ). However, robust findings of the opposite effect have also been documented, namely of self-depreciation, in particular for people who otherwise show superior skills (Burson et al., ; Kruger, ; Moore & Small, ). Many accounts have been proposed to explain self-enhancement, but most of them cannot explain self-depreciation. These accounts include motivational biases (Alicke et al., ) and cognitive incompetence of people who overestimate their social position (Kruger & Dunning, ). An account that can explain both self-enhancement and self-depreciation is based on statistical regression effects (Fiedler, ; Krueger & Mueller, ; Moore & Small, ). This account typically assumes that people have an unbiased representation of the overall social environment, but that their reports contain some random noise that leads to underestimation of high performance and overestimation of low performance. However, a pure regression account cannot explain the finding that worse-off people (e.g., those with bad results on a task) make larger errors than better-off people (those with good results; e.g., Burson et al., ; Ehrlinger et al., ; Krueger & Mueller, ; Kruger & Dunning, ). The reason for this is that regression effects are typically assumed to be of the same magnitude for both worse-off and better-off people. The regression account can only explain these results if it is paired with biases that modify the regression effect, such as a general better-than-average bias (Krueger & Mueller, ).

https://doi.org/10.1017/9781009002042.022 Published online by Cambridge University Press

Judgments and Predictions of Societal Trends



In the social sampling model, the observed patterns of self-depreciation and self-enhancement are a consequence of the shape of the population distribution, people’s position in that distribution, and social sampling processes (Figure .). Overall, the social sampling model predicts that people’s population estimates will look like “smoothed” versions of their social-circle estimates, which is empirically supported by comparing the first two columns of Figure .. More specifically, the social sampling model predicts two general patterns of self-depreciation and self-enhancement. The first is that the skew of population distribution (over attributes ordered from least to most positive, e.g., from lowest to highest household wealth) will determine whether self-depreciation or self-enhancement will occur. When the population distribution is right-skewed, that is when most people perform poorly (e.g., most people are low on household wealth in Figure .), selfdepreciation will occur (everyone will appear to feel they are relatively poorer than they are). This is illustrated in the first row of the third column of Figure ., where cumulative population estimates for both better-off (dotted line) and worse-off (dashed line) participants lie below the empirical population distribution (solid line). Cumulative population estimates allow for an easier detection of the patterns of self-enhancement biases that occur solely because of the differences in the shape of the population distributions. When the population distribution is left-skewed, meaning that most people perform well (e.g., work stress in Figure .), people’s estimates of the overall population will appear to be biased toward self-enhancement (everyone will appear to feel they are relatively richer than they are). Here, the cumulative population estimates shown in the second row of the third column in Figure . are both above the empirical population distribution. Finally, when the population distribution is symmetrical (e.g., number of friends in Figure .), people who perform well will show self-depreciation, and those who perform poorly will show self-enhancement. The cumulative population estimates are above the empirical population distribution for worse-off people, suggesting self-enhancement, but below for the better-off ones who appear to exhibit self-depreciation. The second prediction is that worse-off people will show different error patterns of their population estimates compared to people who are betteroff. The size of the errors will depend on the shape of the underlying distribution and will sometimes resemble higher self-enhancement. When the underlying distribution is skewed left, the errors of population estimates of the worse-off people will be larger (toward more self-

https://doi.org/10.1017/9781009002042.022 Published online by Cambridge University Press



 ,    .

enhancement) than the errors of the better-off people (see second row of the third column in Figure .). The reason for this is that their social circles will tend to have more people who are also doing badly, and therefore they will overestimate the frequency of worse-off people in the general population. This will add to the smoothing effect that results from sampling processes, and therefore increase their apparent selfenhancement. The prediction is reversed when the underlying distribution is skewed right, with smaller errors for worse-off people than for better-off people (first row of the third column in Figure .). Empirical tests on a nationally representative sample showed that the social sampling model’s predictions can reproduce these two main patterns in empirical results reasonably well (the rightmost column of Figure .). In addition, we find the same patterns of empirical results and predictions in two different studies conducted in three countries, using both indirect and direct estimates of social circles and populations (Galesic & Olsson et al., ).

.

Using Social Judgments to Describe and Predict Societal Trends

Taken together, the results presented so far suggest that cognitive processes involved in social judgments are by themselves relatively unbiased, and that apparent cognitive and motivational biases might occur when studies of social judgments do not take into account the roles of social and task environments. This leads to an intriguing possibility that people’s social judgments can be used to improve descriptions and predictions of societal trends and gain insights into how social influence impacts future beliefs and behaviors. In support of this possibility, we found that averages of social-circle estimates about attributes ranging from income and education to number of friends and work stress were in good agreement with population values estimated from a national probabilistic sample (see examples in Figure .a and Galesic et al., ). Furthermore, we find that asking people about their social circles can help predict societal trends in the domains of elections and vaccination, as described next. Recent forecasting failures of national elections have highlighted limitations to traditional polling methods. For example, polling errors in several key states led to the prediction of an electoral win for Hillary Clinton in the  US presidential election. Several explanations have been proposed for this and other mispredictions, including inadequate adjustments for the underrepresentation of some population groups,

https://doi.org/10.1017/9781009002042.022 Published online by Cambridge University Press

Judgments and Predictions of Societal Trends



reluctance among some voters to reveal their voting preferences before the election, and substantive changes in voting preferences during the final days before the election (Kennedy et al., ). Our results suggest that asking about social circles might reduce biases because of all of these concerns. Most election surveys in the USA and elsewhere ask participants about their own intentions to vote in elections (e.g., “Who will you vote for in the election?”). In addition, pollsters sometimes ask participants about their expectations about who is going to win the elections (e.g., “What percentage of people do you think will vote for party X?”). Such election-winner questions have outperformed own-intention questions in many elections (Graefe, ; Lewis-Beck & Skalaban, ; Lewis-Beck & Tien, ; Murr, ), possibly because they allow respondents to summarize all of the potentially relevant information they have at hand (Bruine de Bruin et al., , , ; Hurd & McGarry, ; Rothschild & Wolfers, ). However, this can also be its drawback, as participants might rely too much on publicly available information such as candidates’ rankings in recent polls. For example, most people expected that Clinton would win the  US election (CNN, ). We show that asking people who their social-circles will vote for (e.g., “What percentage of your social contacts do you think will vote for party X?”) might in many circumstances be better than asking about own intentions and election-winner expectations. We have compared the performance of the social-circle question to the own-intention question, and in some studies also to the winner-expectation question, in three recent US elections (Galesic & Bruine de Bruin et al., ; Olsson & Bruine de Bruin et al., ) as well as in three recent elections in European countries with larger number of political candidates and parties (the Netherlands and Sweden: Bruine de Bruin et al., ; France: Galesic & Bruine de Bruin et al., ). In each election we used probabilistic national samples. Figure .b summarizes the results for the European elections (predicted and observed percentages and absolute error of predictions) and Figure .c summarizes the results for the US presidential and House of Representatives elections (absolute error of predictions across different polling waves before the elections). In all elections, social-circle question outperformed own-intention and election-winner expectation questions. At least three reasons could contribute to the usefulness of social-circle questions to describe and predict broader populations. First, social-circle questions might implicitly improve the representativeness and size of the survey sample. When asked to report about their social contacts, people

https://doi.org/10.1017/9781009002042.022 Published online by Cambridge University Press



 ,    .

can provide information on people who are not included in the survey or who misrepresent their beliefs and behaviors. These “hidden” people might be those who are difficult to reach (e.g., like some age and education groups), or who misreport or refuse to participate because of embarrassment, fear of harassment, or willing obstruction of pollsters (Enns et al., ). We investigated this possibility in a poll conducted before the  US election (sample from the Understanding America Study, N ¼ ,; Olsson & Bruine de Bruin et al., ). We asked participants to estimate the percentage of their social circles who have different ages and education levels. The results showed that the percentages of younger people and people with lower education derived from social circle estimates are closer to the population values than the percentages of these people among our actual poll participants (Olsson & Bruine de Bruin et al., ). This suggests that the social-circle question helps collect information about demographic groups that are often more difficult to reach by pollsters (Kennedy et al., ). We also asked the same sample of participants to estimate the percentage of their social circles who might be reluctant to reveal their opinions about Biden or Trump to pollsters due to embarrassment, fear of harassment, or intentional poll obstruction. As could be expected, participants whose social circles included only a few supporters of a particular candidate were more likely to report that their social contacts would be reluctant to reveal their opinion about that candidate. Furthermore, estimates of hidden voters in different states corresponded well to the average error of predictions of the respective state polls (as aggregated by .com). Second, people might be more willing to report less socially desirable characteristics of their social contacts than of themselves (Sudman & Bradburn, ). A recognized concern with own-intention questions is that they might be biased by social acceptability (Lehrer et al., ). Hence, by asking about social circles, instead we might gain more accurate estimates of participants’ own beliefs and behaviors (Barton, ). Interestingly, we obtain only a limited support for this explanation in our data. Own-intention questions sometimes underestimate election results of controversial political options (PVV in Figure .b; Trump in Galesic & Bruine de Bruin et al., ; Olsson & Bruine de Bruin et al., ) but at other times they overestimate them (Le Pen and Swedish Democrats in Figure .b). Social-circle questions are more accurate in both cases, but social desirability cannot be the explanation for both cases. The third reason is that people’s estimates of their social circles today can provide hints about how they themselves will change in the future due to the

https://doi.org/10.1017/9781009002042.022 Published online by Cambridge University Press

Judgments and Predictions of Societal Trends



processes of social influence. Results from numerous studies suggest that people’s social circles, particularly their family and friends, are an important information source when forming their own political views (Huckfeldt & Sprague, ; Santoro & Beck, ; Sinclair, ). There are also some indications that people’s social circles influence their vaccination decisions (Parker et al., ). We investigated the possibility that people’s social circles influence their voting behavior and vaccination decisions with participants from two national probabilistic panels (Understanding America Study, N ¼ , and RAND’s American Life Panel, N ¼ ). For the  US presidential election we found that changes in social-circle estimates predicted later changes in own voting intentions and actual voting behavior over and above own-intention questions (Galesic & Bruine de Bruin et al., ). For vaccination behavior, we found that perceived vaccination in social circles helped predict own vaccination behavior next year, after controlling for past vaccination behavior and sociodemographics (Bruine de Bruin et al., ).

. Sampling Social Worlds: Outlook Social sampling processes are a burgeoning area of research. Here we discuss related and possible further research directions. ..

Representing Social Environments

Only a few models of social judgments explicitly represent different aspects of the social environment as a key part of the model representations. The exemplar model PDIST (Linville et al., ) assumes that people form perceived distributions of characteristics of exemplars currently activated in memory, enabling the model to handle different shapes of frequency distributions. Similarly, in social judgment applications of Decision by Sampling (Brown et al., ; Stewart et al., ), availability-by-recall heuristic (Hertwig et al., ; Pachur et al., ), as well in the socialcircle heuristic and the social-circle model (Pachur et al., ; Schulze et al., ), differently shaped frequency and spatial distributions in people’s environments can produce different valuations of a variety of attributes. Going beyond models of social judgments in social and cognitive psychology, there are a plethora of models in the belief dynamics literature that formally include the structure of social networks, often within general frameworks from statistical physics and epidemiology (Castellano et al., ). Such models can be more explicitly related to

https://doi.org/10.1017/9781009002042.022 Published online by Cambridge University Press



 ,    .

cognitive and social processes involved in social judgments and belief updating (Galesic & Olsson et al., ). There are many other characteristics of social environments that have been shown to influence the beliefs, behaviors, and intentions of people (for overviews, see Albert & Barabási, ; Easley & Kleinberg, ; Jackson, ; Newman, ; Watts, ). In this chapter, we focused on the effect of homophily on representations and judgments of social worlds. Another network property that is of particular relevance for social judgments is the friendship paradox. It describes the phenomenon that one’s friends, on average, always have more friends than oneself (Feld, ). This occurs because people with more friends will be more likely to occur in one’s friend group. This means that people’s reports about their social circles will over-represent individuals with many friends (Feld & McGail, ). The direction of the correlation between the attribute in question and the number of friends one has oneself, as well as the number of friends one’s friends have, will determine if the attribute will be over- or underestimated. For predicting societal trends, from spread of disease to spread of misinformation, individuals with more social connections might be a useful early indicator. For example, simply by virtue of having more social contacts, better connected people are more likely to be heard by others and drive beliefs, fashions, and behavioral intentions. Furthermore, in some cases better connected people are more likely to be “infected” with a characteristic of interest, such as a contagious disease or misinformation. Identifying those people could help predicting and managing the spread of epidemics and misinformation (Alipourfard et al., ; Christakis & Fowler, ; Cohen et al., ). .. Modeling Social Sampling Processes Different models of social judgment assume different social sampling processes, but most follow some version of exemplar-based processes in which exemplars are retrieved from memory, or from the immediate environment, based on their similarity with the exemplar that has to be judged. In exemplar models, these processes are usually modeled by forgetting and retrieval parameters (Linville et al., ). Other models postulate explicit search rules within different circles of one’s social network. For example, in the availability-by-recall heuristic, all instances of an event are tallied across the entire social network (Hertwig et al., ; Pachur et al., ), while in the social-circle heuristic (Pachur et al., ) it is assumed that people judge the relative frequency of two events by

https://doi.org/10.1017/9781009002042.022 Published online by Cambridge University Press

Judgments and Predictions of Societal Trends



searching their social memory for relevant instances by sequentially probing their social circles and terminating search as soon as one circle distinguishes between the options. The social-circle model is a probabilistic extension of the social-circle heuristic that allows for individual differences in search and evaluation processes (Schulze et al., ). In contrast to the assumption of a structured social memory in the social-circle heuristic and the social-circle model, the social-sampling model described here makes no strong assumptions about how social memory is organized. Such assumptions, however, could be implemented in future versions of the model. Another way to account for differential search and retrieval processes in social memory is to investigate the possibility of integrating parts of the process assumptions in an exemplar-based sequential sampling model of categorization (Nosofsky & Palmeri, ), memory (Nosofsky et al., ), or estimation (Juslin & Persson, ). With multidimensional instances stored in memory, differences in search and retrieval from different social circles (e.g., family, friends, or acquaintances) could be modeled by including the social circle categories as attributes that are weighted differently by the nature of the task or individual differences. In some sampling approaches, it is assumed that people act as naïve intuitive statisticians (Fiedler, ; Fiedler & Juslin, ; Juslin et al., ), accurately summarizing the data but being naïve about the potential biases in the data available to them. The extent of this “cognitive myopia” might be limited, as in everyday life people are often aware that the instances they encounter are not a representative sample of the general population (Juslin et al., ). In addition, the naïvety assumption is not necessary for sampling-based explanations of judgments that appears to be biased. Even with full information and no cognitive limitations, asymmetries in the information available to people can lead to apparent biases in judgment (Le Mens & Denrell, ). The general premises of the naïve sampling model and the social sampling model are very similar: both assume that judgments of populations are based on a small selection of information retrieved from memory. In the social sampling model, however, we assume that people are aware of the discrepancy between their social circle and broader populations and that they try to correct for these differences by strategic sampling of instances from memory. We find empirical support for this assumption, in that people systematically disregard instances from their samples that are most similar to themselves when making population estimates (figure  in Galesic & Olsson et al., ).

https://doi.org/10.1017/9781009002042.022 Published online by Cambridge University Press



 ,    .

Another issue worthy of further exploration is the judgment of attributes that are rarely displayed in public (Jordan et al., ). People’s population estimates of these attributes are often less correlated with actual population distributions. For example, in Galesic et al. (), the errors for attributes such conflicts with partner, number of dates, health problems, are all much larger than for other attributes that are more visible. To make judgments about less visible attributes, people might be sampling from a uniform distribution that reflects their ignorance or use their own attributes as a base for their population estimates. These two possibilities could also be combined in a distribution with a peak on own position, which corresponds to a Bayesian model that assumes a uniform prior distribution and one’s own position as the only evidence (Dawes & Mulford, ). In any case, even when people’s population estimates are less accurate, this does not impair the social sampling model’s ability to predict people’s population estimates, because the model relies on experienced or perceived attributes rather than actual attributes of social environments. .. Describing Hidden Societal Trends Social scientists have never had access to as much data as today, but important parts of our society still remain unobserved because they are hard to reach or are intentionally kept from public view. People with controversial beliefs or behaviors that go against social norms might fear embarrassment or harassment from revealing them in public or to researchers. People’s reports about their social networks have previously been used to improve sampling and knowledge of difficult-to-reach populations, from drug users to jazz players (Heckathorn & Jeffri, ). Our results from election polls and vaccination surveys suggest that asking about people’s social circles can be used to gain knowledge about difficult-to-reach portions of the population. Importantly, when we ask about social circles, participants are never asked to reveal the identity of their specific social contacts. They are only asked to estimate relative frequencies of different characteristics of their social circles, typically defined as all adults they were in contact with within a specified time period. This gives researchers an opportunity to obtain some indication of the characteristics of hidden populations, while protecting their privacy.

. Concluding Remarks Much of human sociality depends on people’s capacity to perceive, remember, and assess what their immediate social worlds think and do. Although

https://doi.org/10.1017/9781009002042.022 Published online by Cambridge University Press

Judgments and Predictions of Societal Trends



social psychology traditionally focused on potential biases in people’s social judgments, we show that people may actually be relatively good at judging the characteristics of others around them. This capacity allows people to fit into their immediate social worlds and learn from the experiences of others (Galesic & Bruine de Bruin et al., ). We even find that asking people how others around them will vote can improve the prediction of election outcomes and other behaviors, such as vaccination rates. Thus, new insights from psychology suggest that people’s social judgments are reasonably accurate and have important real-world applications. REF ERE NCE S Albert, R., & Barabási, A.-L. (). Statistical mechanics of complex networks. Reviews of Modern Physics, (), –. https://doi.org/./ RevModPhys.. Alicke, M. D., & Govorun, O. (). The better-than-average effect. In M. D. Alicke, D. Dunning, & J. I. Krueger (Eds.), Studies in self and identity: The self in social judgment. New York: Psychology Press. Alicke, M. D., Klotz, M. L., Breitenbecher, D. L., Yurak, T. J., & Vredenburg, D. S. (). Personal contact, individuation, and the better-than-average effect. Journal of Personality and Social Psychology, (), –. https:// doi.org/./-... Alipourfard, N., Nettasinghe, B., Abeliuk, A., Krishnamurthy, V., & Lerman, K. (). Friendship paradox biases perceptions in directed networks. Nature Communications, (), . https://doi.org/./s---x Anderson, J. R. (). The adaptive character of thought. Hillsdale: Lawrence Erlbaum. Barton, A. H. (). Asking the embarrassing question. Public Opinion Quarterly, (), –. https://doi.org/./ Brown, G. D. A., Wood, A. M., Ogden, R. S., & Maltby, J. (). Do student evaluations of university reflect inaccurate beliefs or actual experience? A relative rank model. Journal of Behavioral Decision Making, (), –. https://doi.org/./bdm. Bruine de Bruin, W., Downs, J. S., Murray, P., & Fischhoff, B. (). Can female adolescents tell whether they will test positive for chlamydia infection? Medical Decision Making, (), –. https://doi.org/./ X Bruine de Bruin, W., Galesic, M., & Bååth, R., et al. (). Asking about social circles improves election predictions even with many political parties. International Journal of Public Opinion Research, (), edac. https://doi .org/./ijpor/edac Bruine de Bruin, W., Galesic, M., Parker, A. M., & Vardavas, R. (). The role of social circle perceptions in “False Consensus” about population statistics: Evidence from a national flu survey. Medical Decision Making, (), –. https://doi.org/./X

https://doi.org/10.1017/9781009002042.022 Published online by Cambridge University Press



 ,    .

Bruine de Bruin, W., Parker, A. M., & Fischhoff, B. (). Can adolescents predict significant life events? Journal of Adolescent Health, (), –. https://doi.org/./j.jadohealth... Bruine de Bruin, W., Parker, A. M., Galesic, M., & Vardavas, R. (). Reports of social circles’ and own vaccination behavior: A national longitudinal survey. Health Psychology, (), –. https://doi.org/./ hea Burson, K. a, Larrick, R. P., & Klayman, J. (). Skilled or unskilled, but still unaware of it: How perceptions of difficulty drive miscalibration in relative comparisons. Journal of Personality and Social Psychology, (), –. https://doi.org/./-... Castellano, C., Fortunato, S., & Loreto, V. (). Statistical physics of social dynamics. Reviews of Modern Physics, (), –. https://doi.org/ ./RevModPhys.. Chambers, J. R., & Windschitl, P. D. (). Biases in social comparative judgments: The role of nonmotivated factors in above-average and comparative-optimism effects. Psychological Bulletin, (), –. https://doi.org/./-... Christakis, N. A., & Fowler, J. H. (). The spread of obesity in a large social network over  years. New England Journal of Medicine, (), –. https://doi.org/./NEJMsa (). The collective dynamics of smoking in a large social network. New England Journal of Medicine, (), –. https://doi.org/./ NEJMsa (). Connected: The amazing power of social networks and how they shape our lives. London: Harper Collins. Cialdini, R. B. (). Descriptive social norms as underappreciated sources of social control. Psychometrika, (), –. https://doi.org/./ s--- CNN. (). Poll: Most see a Hillary Clinton victory and a fair count ahead. www.cnn.com////politics/hillary-clinton--election-poll/index .html Cohen, R., Havlin, S., & Ben-Avraham, D. (). Efficient immunization strategies for computer networks and populations. Physical Review Letters, (), . https://doi.org/./PhysRevLett.. Coleman, J. S. (). Relational analysis: The study of social organizations with survey methods. Human Organization, (), –. Dawes, R. M., & Mulford, M. (). The false consensus effect and overconfidence: Flaws in judgment or flaws in how we study judgment? Organizational Behavior and Human Decision Processes, (), –. https://doi.org/./obhd.. Easley, D., & Kleinberg, J. (). Networks, crowds, and markets. New York: Cambridge University Press. Ehrlinger, J., Johnson, K., Banner, M., Dunning, D., & Kruger, J. (). Why the unskilled are unaware: Further explorations of (absent) self-insight

https://doi.org/10.1017/9781009002042.022 Published online by Cambridge University Press

Judgments and Predictions of Societal Trends



among the incompetent. Organizational Behavior and Human Decision Processes, (), –. https://doi.org/./j.obhdp... Enns, P. K., Lagodny, J., & Schuldt, J. P. (). Understanding the  US presidential polls: The importance of hidden Trump supporters. Statistics, Politics and Policy, (), –. https://doi.org/./spp-- Feld, S. L. (). Why your friends have more friends than you do. American Journal of Sociology, (), –. https://doi.org/./ Feld, S. L., & McGail, A. (). Egonets as systematically biased windows on society. Network Science, (), –. https://doi.org/./nws.. Festinger, L. (). A theory of social comparison processes. Human Relations,  (), –. https://doi.org/./ Fiedler, K. (). Explaining and simulating judgment biases as an aggregation phenomenon in probabilistic, multiple-cue environments. Psychological Review, (), –. https://doi.org/.//-X... (). Beware of samples! A cognitive-ecological sampling approach to judgment biases. Psychological Review, (), –. https://doi.org/I .I/-X.I.. Fiedler, K., & Juslin, P. (Eds.) (). Information sampling and adaptive cognition. New York: Cambridge University Press. Frable, D. E. S. (). Being and feeling unique: Statistical deviance and psychological marginality. Journal of Personality, (), –. https://doi .org/./j.-..tb.x Galesic, M., Bruine de Bruin, W., & Dalege, J., et al. (). Human social sensing is an untapped resource for computational social science. Nature,  (), –. https://doi.org/./s--- Galesic, M., Bruine de Bruin, W., & Dumas, M., et al. (). Asking about social circles improves election predictions. Nature Human Behaviour, (), –. https://doi.org/./s---y Galesic, M., Olsson, H., Dalege, J., van der Does, T., & Stein, D. L. (). Integrating social and cognitive aspects of belief dynamics: Towards a unifying framework. Journal of The Royal Society Interface, (), rsif... https://doi.org/./rsif.. Galesic, M., Olsson, H., & Rieskamp, J. (). Social sampling explains apparent biases in judgments of social environments. Psychological Science, (), –. https://doi.org/./ (). False consensus about false consensus. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the thirty-fifth annual conference of the Cognitive Science Society (pp. –). Berlin: Cognitive Science Society. (). A sampling model of social judgment. Psychological Review, (), –. https://doi.org/./rev Gigerenzer, G., Fiedler, K., & Olsson, H. (). Rethinking cognitive biases as environmental consequences. In P. M. Todd, G. Gigerenzer, & ABC Research Group (Eds.), Ecological rationality: Intelligence in the world (pp. –). New York: Oxford University Press.

https://doi.org/10.1017/9781009002042.022 Published online by Cambridge University Press



 ,    .

Gigerenzer, G., Todd, P. M., & the ABC Research Group. (). Simple heuristics that make us smart. New York: Oxford University Press. Graefe, A. (). Accuracy of vote expectation surveys in forecasting elections. Public Opinion Quarterly, (S), –. https://doi.org/./poq/ nfu Heckathorn, D. D., & Jeffri, J. (). Finding the beat: Using respondentdriven sampling to study jazz musicians. Poetics, (), –. https://doi .org/./S-X()- Hertwig, R., & Hoffrage, U. (Eds.). (). Simple heuristics in a social world. New York: Oxford University Press. Hertwig, R., Pachur, T., & Kurzenhäuser, S. (). Judgments of risk frequencies: Tests of possible cognitive mechanisms. Journal of Experimental Psychology: Learning, Memory, and Cognition, (), –. https://doi .org/./-... Huckfeldt, R. R., & Sprague, J. (). Citizens, politics and social communication: Information and influence in an election campaign. New York: Cambridge University Press. Hurd, M. D., & McGarry, K. (). The predictive validity of subjective probabilities of survival. Economic Journal, (), –. https://doi .org/./-. Jackson, M. O. (). Social and economic networks. Princeton, NJ: Princeton University Press. Jordan, A. H., Monin, B., & Dweck, C. S., et al. (). Misery has more company than people think: Underestimating the prevalence of others’ negative emotions. Personality and Social Psychology Bulletin, (), –. https://doi.org/./ Juslin, P., Olsson, H., & Björkman, M. (). Brunswikian and Thurstonian origins of bias in probability assessment: On the interpretation of stochastic components of judgment. Journal of Behavioral Decision Making, (), –. https://doi.org/./(SICI)-():..CO;-W Juslin, P., & Persson, M. (). PROBabilities from EXemplars (PROBEX): A “lazy” algorithm for probabilistic inference from generic knowledge. Cognitive Science, , –. https://doi.org/./scog_ Juslin, P., Wennerholm, P., & Olsson, H. (). Format dependence in subjective probability calibration. Journal of Experimental Psychology: Learning, Memory, and Cognition, (), –. https://doi.org/ ./-... Juslin, P., Winman, A., & Hansson, P. (). The naïve intuitive statistician: A naïve sampling model of intuitive confidence intervals. Psychological Review, (), –. https://doi.org/./-X... Jussim, L. (). Social perception and social reality: Why accuracy dominates bias and self-fulfilling prophecy. New York: Oxford University Press. Kendal, R. L., Boogert, N. J., Rendell, L., Laland, K. N., Webster, M., & Jones, P. L., (). Social learning strategies: Bridge-building between fields.

https://doi.org/10.1017/9781009002042.022 Published online by Cambridge University Press

Judgments and Predictions of Societal Trends



Trends in Cognitive Sciences, (), –. https://doi.org/./j.tics ... Kennedy, C., Blumenthal, M., & Clement, S., et al. (). An evaluation of the  election polls in the United States. Public Opinion Quarterly, (), –. https://doi.org/./poq/nfx Krueger, J. I., & Clement, R. W. (). The truly false consensus effect: An ineradicable and egocentric bias in social perception. Journal of Personality and Social Psychology, (), –. https://doi.org/./- ... Krueger, J. I., & Funder, D. C. (). Towards a balanced social psychology: Causes, consequences, and cures for the problem-seeking approach to social behavior and cognition. Behavioral and Brain Sciences, , –. https:// doi.org/./SX Krueger, J. I., & Mueller, R. A. (). Unskilled, unaware, or both? The betterthan-average heuristic and statistical regression predict errors in estimates of own performance. Journal of Personality and Social Psychology, (), –. https://doi.org/.//-... Kruger, J. (). Lake wobegon be gone! The “below-average effect” and the egocentric nature of comparative ability judgments. Journal of Personality and Social Psychology, (), –. Kruger, J., & Dunning, D. (). Unskilled and unaware of it: How difficulties in recognizing one’s own incompetence lead to inflated self-assessments. Journal of Personality and Social Psychology, (), –. https://doi .org/./-... Le Mens, G., & Denrell, J. (). Rational learning and information sampling: On the “naivety” assumption in sampling explanations of judgment biases. Psychological Review, (), –. https://doi.org/./a Lee, E., Karimi, F., & Wagner, C., et al. (). Homophily and minority-group size explain perception biases in social networks. Nature Human Behaviour,  (), –. https://doi.org/./s--- Lehrer, R., Juhl, S., & Gschwend, T. (). The wisdom of crowds design for sensitive survey questions. Electoral Studies,  (September ), –. https://doi.org/./j.electstud... Lewis-Beck, M. S., & Skalaban, A. (). Citizen forecasting: Can voters see into the future? British Journal of Political Science, (), –. https:// doi.org/./SX Lewis-Beck, M. S., & Tien, C. (). Voters as forecasters: A micromodel of election prediction. International Journal of Forecasting, (), –. https://doi.org/./S-()- Lichtenstein, S., Slovic, P., Fischhoff, B., Layman, M., & Combs, B. (). Judged frequency of lethal events. Journal of Experimental Psychology: Human Learning and Memory, (), –. https://doi.org/./-... Lindskog, M., Winman, A., & Juslin, P. (). Naïve point estimation. Journal of Experimental Psychology: Learning, Memory, and Cognition, (), –. https://doi.org/./a

https://doi.org/10.1017/9781009002042.022 Published online by Cambridge University Press



 ,    .

Linville, P. W., Fischer, G. W., & Salovey, P. (). Perceived distributions of the characteristics of in-group and out-group members: Empirical evidence and a computer simulation. Journal of Personality and Social Psychology,  (), –. https://doi.org/./-... Lopes, L. L. (). Risk perception and the perceived public. In D. Bromley & K. Segerson (Eds.), The social response to environmental risk (pp. –). New York: Springer. https://doi.org/./----_ Lynn, C. W., & Bassett, D. S. (). How humans learn and represent networks. Proceedings of the National Academy of Sciences, (), –. https://doi.org/./pnas. Marks, G., & Miller, N. (). Ten years of research on the false-consensus effect: An empirical and theoretical review. Psychological Bulletin, (), –. https://doi.org/./-... McPherson, M., Smith-Lovin, L., & Cook, J. M. (). Birds of a feather: Homophily in social networks. Annual Review of Sociology, (), –. https://doi.org/./annurev.soc... Moore, D. A., & Small, D. A. (). Error and bias in comparative judgment: On being both better and worse than we think we are. Journal of Personality and Social Psychology, (), –. https://doi.org/./- ... Mullen, B., Dovidio, J. F., Johnson, C., & Copper, C. (). In-group–out-group differences in social projection. Journal of Experimental Social Psychology, (), –. https://doi.org/./-()-Q Murr, A. E. (). The wisdom of crowds: What do citizens forecast for the  British general election? Electoral Studies, , –. https://doi.org/ ./j.electstud... Newman, M. E. J. (). The structure and function of complex networks. SIAM Review, (), –. https://doi.org/./S Nisbett, R. E., & Kunda, Z. (). Perception of social distributions. Journal of Personality and Social Psychology, (), –. https://doi.org/./ -... Nosofsky, R. M., Cao, R., Cox, G. E., & Shiffrin, R. M. (). Familiarity and categorization processes in memory search. Cognitive Psychology, , –. https://doi.org/./j.cogpsych... Nosofsky, R. M., & Palmeri, T. J. (). An exemplar-based random walk model of speeded classification. Psychological Review, (), –. https://doi.org/./-X... Olsson, H., Barman-Adhikari, A., Galesic, M., Hsu, H.-T., & Rice, E. (). Cognitive strategies for peer judgments https://doi.org/./osf.io/shxj Olsson, H., Bruine de Bruin, W., Galesic, M., & Prelec, D. (). Combining survey questions with a Bayesian bootstrap method improves election forecasts. https://doi.org/./osf.io/nqcgs Pachur, T., Hertwig, R., & Rieskamp, J. (). Intuitive judgments of social statistics: How exhaustive does sampling need to be? Journal of Experimental Social Psychology, (), –. https://doi.org/./j.jesp...

https://doi.org/10.1017/9781009002042.022 Published online by Cambridge University Press

Judgments and Predictions of Societal Trends



Pachur, T., Hertwig, R., & Steinmann, F. (). How do people judge risks: Availability heuristic, affect heuristic, or both? Journal of Experimental Psychology: Applied, (), –. https://doi.org/./a Parker, A. M., Vardavas, R., Marcum, C. S., & Gidengil, C. A. (). Conscious consideration of herd immunity in influenza vaccination decisions. American Journal of Preventive Medicine, (), –. https://doi.org/./j .amepre... Roese, N. J., & Olson, J. M. (). Better, stronger, faster: Self-serving judgment, affect regulation, and the optimal vigilance hypothesis. Psychological Science, (), –. Ross, L., Greene, D., & House, P. (). The “false consensus effect”: An egocentric bias in social perception and attribution processes. Journal of Experimental Social Psychology, (), –. https://doi.org/./ -()-X Rothschild, D. M., & Wolfers, J. (). Forecasting elections: Voter intentions versus expectations. SSRN Electronic Journal. https://doi.org/./ssrn. Santoro, L., & Beck, P. A. (). Social networks and vote choice. In J. N. Victor, A. H. Montgomery, & M. Lubell (Eds.), The Oxford Handbook of Political Networks (pp. –). Oxford: Oxford University Press. Schulze, C., Hertwig, R., & Pachur, T. (). Who you know is what you know: Modeling boundedly rational social sampling. Journal of Experimental Psychology: General, (), –. https://doi.org/./xge Schwing, R. C., & Kamerud, D. B. (). The distribution of risks: Vehicle occupant fatalities and time of the week. Risk Analysis, (), –. https://doi.org/./j.-..tb.x Shalizi, C. R., & Thomas, A. C. (). Homophily and contagion are generically confounded in observational social network studies. Sociological Methods & Research, (), –. https://doi.org/./ Signorile, V., & O’Shea, R. M. (). A test of significance for the homophily index. American Journal of Sociology, (), –. https://doi.org/ ./ Simon, H. A. (). Invariants of human behavior. Annual Review of Psychology, (), –. https://doi.org/./annurev.ps... Sinclair, B. (). The social citizen: Peer networks and political behavior. Chicago: University of Chicago Press. Stewart, N., Chater, N., & Brown, G. D. A. (). Decision by sampling. Cognitive Psychology, (), –. https://doi.org/./j.cogpsych... Sudman, S., & Bradburn, N. M. (). Asking questions: A practical guide to questionnaire design. San Francisco: Jossey-Bass. Suls, J., Martin, R., & Wheeler, L. (). Social comparison: Why, with whom, and with what effect? Current Directions in Psychological Science, (), –. https://doi.org/./-. Taylor, S. E., & Brown, J. D. (). Illusion and well-being: A social psychological perspective on mental health. Psychological Bulletin, (), –. https://doi.org/./-...

https://doi.org/10.1017/9781009002042.022 Published online by Cambridge University Press



 ,    .

Tompson, S. H., Kahn, A. E., Falk, E. B., Vettel, J. M., & Bassett, D. S. (). Individual differences in learning social and nonsocial network structures. Journal of Experimental Psychology: Learning, Memory, and Cognition, (), –. https://doi.org/./xlm Tourangeau, R., Rips, L. T., & Rasinski, K. (). The psychology of survey response. Cambridge University Press. Watts, D. J. (). Six degrees: The science of a connected age. New York: W. W. Norton.

https://doi.org/10.1017/9781009002042.022 Published online by Cambridge University Press

 

Group-Motivated Sampling From Skewed Experiences to Biased Evaluations Yrian Derreumaux, Robin Bergh, Marcus Lindskog, and Brent Hughes

From planning daily activities with close others to passively overhearing a stranger’s conversation on the bus, humans spend roughly half of their waking life in the presence of other humans (US Bureau of Labor Statistics, –). These social interactions constitute the source of a large proportion of the information that individuals gather on a daily basis. At the same time, people are often limited to only a small portion of information, or samples, in relation to all the possible information made available during a social experience (Ambady & Rosenthal, ; Fiedler, ). The fundamental question guiding this chapter is how groupmotivated sampling of social information facilitates biased evaluations. One feature unique to social interactions is that the categories or groups to which individuals belong or affiliate (e.g., gender, race, political, religious, sports affiliations) elicit motivations that fluctuate across time and social contexts (Tamir & Hughes, ). Whereas extant sampling models have accounted for many dimensions of both internal and external sampling behavior, less attention has been dedicated to the motivations that may guide sampling behaviors within these social contexts. We provide a framework to account for how motivations not only influence the social experiences people have in group settings, but also how motivations influence the interpretations and beliefs people form based on those social experiences. Social contexts elicit different types of motivations that may influence sampling behavior. Interpersonal motivations arise as a function of an individual’s social roles and their surrounding needs, which give rise to motivations related to maintaining positive affect and achieving personal goals (e.g., avoiding unpleasing individuals; Denrell, ). Group-based motivations, on the other hand, relate to the different groups that people categorize themselves and others into (e.g., based on gender, race, political affiliation), and how these group-identity attachments give rise to motivations related to social belonging and positive group esteem (Kahan et al., 

https://doi.org/10.1017/9781009002042.023 Published online by Cambridge University Press



 ,    .

). While interpersonal motivations have received some attention in previous work (e.g., Denrell, ), less attention has focused on the role of group-based motivations in sampling behavior. Given that individuals belong to multiple different categories (Brewer, ; Brewer & Pierce, ; Hogg et al., ; Onorato & Turner, ; Willer et al., ), and these social identities shape social cognitive processes (Bolsen et al., ; Hughes et al., ; Kahan, ; Kunda, ), how might belonging to a group impact sampling behavior? There are many different social categories that divide us into different groups (Cikara et al., ; Onorato & Turner, ; Willer et al., ), which engender different motivations that can produce intergroup biases. In its purest form, intergroup motivations can be examined in minimal group contexts, where individuals are arbitrarily assigned to novel and experimentally created groups in which no preexisting knowledge can influence cognition and behavior (Tajfel & Billig, ). From a sampling perspective, the minimal group context provides a baseline for examining how group-based motives bias sampling behavior as people have no prior knowledge about ingroup and outgroup members. One benefit of establishing a baseline of sampling biases in the context of minimal groups is that it provides an opportunity to examine if and how sampling biases diverge in the context of real groups. In contrast to minimal groups, people often maintain strong preexisting beliefs and attitudes about members of real groups that may introduce other motivations (Miller et al., ). For instance, real-group contexts defined by intergroup conflict may engender motivations to derogate and harm outgroup members, which is rarely seen in minimal group contexts (e.g., Mummendey et al., ). Information sampling may thus deviate based on whether people have prior knowledge about ingroup and outgroup members (i.e., real groups) or not (i.e., minimal groups). Social group affiliations can lead to biased evaluations of social targets via several mechanisms (see Figure .). First, through motivated sampling (see Figure ., Path A), motivations may constrain the availability of samples by guiding attention toward particular information. Here, we are concerned with sampling choices that sacrifice accuracy in social perceptions in order to satisfy some other goal or wish of a person. Whether these motivations are rational or not, and whether they are associated with deliberate or “accidental” biases, is not the crux of the argument. Indeed, some goals can be rational (e.g., avoiding things that are unpleasant or harmful) and only generate accidental evaluative biases (e.g., Denrell, ). More important for our purposes, however, motivations may also

https://doi.org/10.1017/9781009002042.023 Published online by Cambridge University Press

Group-Motivated Sampling



Figure . A visual schema of where motivations may influence information processing. First, motivations constrain samples by guiding attention toward goal relevant information (Path A). Second, group-based motivation may lead to motivated interpretations of the sampled information (Path B). Finally, people may employ different sampling strategies over time that capitalize on skewed experiences (Path C).

be tied to groups and these may generate biases that cannot easily be explained by actors solely attempting to navigate the environment at hand. For example, if people want to find the best place to get food, and they are tasked to evaluate if they get better recommendations from some groups than others, then it makes little sense to seek more information from one group – especially when the groups are novel and completely arbitrary (Bergh & Lindskog, ). If people were trying to maximize utility or maximize information gain, they should gather the same amount of information from each group, given what the task and environment entails. A deviation from equal sampling from each group would constitute motivated sampling. We use the term motivation broadly to capture the various reasons why people prefer information from some groups over others. A long line of work shows that interactions with ingroup members are often guided by concerns over reputation, trustworthiness, and cooperation (Everett et al., ), where people may be by default more interested in ingroup members (Brewer, ), and may therefore gather more information from them. Conversely, interactions with outgroup members are often guided by concerns related to threat and competition over

https://doi.org/10.1017/9781009002042.023 Published online by Cambridge University Press



 ,    .

resources (Anderson & Putterman, ; Hewstone et al., ). It could be argued then to be a rational choice to avoid outgroups, at least in some cases. Avoidance of outgroups could also be accidental, because of a combination of interpersonal motivations (avoid unpleasant interactions) and differential likelihoods of encountering new people from certain groups (Denrell, ). We offer a third possibility: People may also have an intrinsic motivation to seek out information about ingroups to a greater extent than outgroups – even when both information sources are equally accessible. Specifically, we suggest that in both minimal and real group contexts, group-based identities may guide sampling behavior toward one group over another, giving rise to experiences that are not representative of the population. Critically, if people fail to account for their skewed experiences, these biased samples may lead to biased evaluations (i.e., data-driven biases; Denrell & Le Mens, ; Fiedler, ; Le Mens & Denrell, ; Lindskog et al., ; Lindskog & Winman, ). Second, through motivated interpretations (see Figure ., Path B), when a sample has been obtained, motivations may also guide the interpretation of sampled information, for instance by shifting attention and memory processes toward congenial information and away from uncongenial information. Here, we operationalize congenial as both information that is favorable to the ingroup and information that is unfavorable to the outgroup, and uncongenial as both information that is unfavorable to the ingroup and information that is favorable to the outgroup. In some cases, such as with real groups, congenial may also comprise favorable ingroup and unfavorable outgroup information that is congruent with expectations, as people believe a priori that their group is better than the outgroup. In social contexts, the same motivations that guide sampling behavior may also contribute to erroneous interpretations of information after samples are obtained, introducing a second stage of information processing where motivations can bias evaluations. From this perspective, motivations can lead people to form conclusions that are most desired rather than those that are most accurate. For instance, research has consistently found that people interpret congenial information or information from congenial sources (e.g., coming from the ingroup) more favorably and perceive it as more valid than uncongenial information (Flynn et al., ; Gampa et al., ; Hughes et al., ). Conversely, when confronted with uncongenial information, people often become skeptical and require more and stronger information before information is adopted (Ditto & Lopez, ; Kraft et al., ; Taber et al., ). Taken together, motivated

https://doi.org/10.1017/9781009002042.023 Published online by Cambridge University Press

Group-Motivated Sampling



interpretations may interact with biased samples, giving rise to evaluations that favor the ingroup over the outgroup. Third, motivations may influence the processing of group-based information by changing sampling behaviors over time, based on the congeniality of sampled information (see Figure ., Path C). That is, sampling to infer an unknown population parameter is a dynamic process: samples are gathered, interpreted, and then gathered again to reach a conclusion. When people are limited in their ability to interpret information in a favorable way (e.g., because their own group is worse than an outgroup on some attribute), they may capitalize on skewed experiences by adapting where they sample over time (hereinafter sampling strategies). Sampling strategies can thus obfuscate real-group differences by steering sampling toward congenial or away from uncongenial information.

. Empirical Framework To empirically test how motivations in social group contexts guide information sampling, and the extent to which skewed experiences interact with motivated interpretations to produce biased evaluations, we conducted seven studies in minimal and political groups (Bergh & Lindskog, ; Derreumaux et al., ). In all studies, participants actively sampled numerical information from ingroup and outgroup categories that represented some underlying attribute (e.g., trustworthiness, intellectual abilities), and stopped sampling when they felt confident that they could accurately evaluate each group. In each trial, a participant could freely choose to gather information from either the ingroup or outgroup category. The sampled information was subjected to two manipulations designed to examine how sampling behavior may interact with motivated interpretations to produce biased evaluations. First, we manipulated the valence of the first sample to be either overly positive or overly negative to examine how first experiences influence evaluations and guide subsequent information sampling. If the majority of people begin sampling from their own group, this outlier will increase the variability of ingroup relative to outgroup experiences, as their overall ingroup samples will be biased upward or downward. If motivations only influence information processing in Path A (i.e., sampling first from the ingroup), but not Path B (i.e., no interpretive biases), then differences between participant’s ingroup and outgroup evaluations should be of similar magnitude based on their overly

https://doi.org/10.1017/9781009002042.023 Published online by Cambridge University Press



 ,    .

positive or negative initial experiences (i.e., a sampling-driven biases). Conversely, if motivations only influence information processing in Path B (i.e., motivated interpretations), but not Path A (i.e., no motivated sampling), then we would expect participants to randomly sample from the ingroup and outgroup first, and that both positive and negative initial samples would likely lead significantly higher ingroup compared to outgroup evaluations. This is because in both situations, people would interpret the information as more favorable than it is and arrive at similarly biased evaluations. However, we predicted that biased evaluations arise from sampling biases in Path A as most participants sample first and most often from the ingroup (Brewer, ), which produces more variable ingroup experiences (Konovalova & Le Mens, ), and its interaction with interpretive biases in Path B, which produces asymmetric updating of initial samples based on their congeniality (Hart et al., ; Hughes et al., ). Second, participants were randomly assigned to real-group difference conditions (i.e., distributions of information that statistically differed in one direction or another), such that their ingroup was either () better than, () worse than, or () the same as the outgroup on some evaluative dimension (e.g., more or less trustworthy). Manipulating the real-group differences provides a second opportunity to examine whether biased evaluations interact with biased samples. If participants are not biased in their interpretation of information (Path B), and instead follow the data they encounter (data-driven), then their evaluations about ingroup and outgroup members should be similarly accurate (i.e., equally likely to report the direction of the magnitude of the group difference when the ingroup is doing better and worse). However, we predict that participants will be capable of accurately representing data they encounter (e.g., Denrell & Le Mens, ; Fiedler, ), but only do so when the ingroup is de facto better. When the ingroup is de facto worse, we predict that participants will fail to pick up on any differences and evaluate the ingroup and outgroup similarly (Gramzow et al., ; Howard & Rothbart, ). Thus, we predict that participants will selectively integrate samples and pick up on underlying distributions based on whether the environment is favorable (i.e., ingroup is de facto better) or unfavorable (i.e., outgroup is de facto better). Next, we discuss how our research design addresses sampling biases in Path A, their interaction with interpretive biases in Path B, and changes in sampling strategies over time in Path C, as outlined in Figure ..

https://doi.org/10.1017/9781009002042.023 Published online by Cambridge University Press

Group-Motivated Sampling



Figure . A visual representation of how information was sampled over time. Below the arrow is a generic representation of the paradigm flow. Above the arrow is a representation of how information was sampled in Studies  and .

.. Path A: Group-Based Motivations Constrain Individual Samples Examining sampling biases in the context of minimal groups allows us to compare and contrast if and how sampling biases diverge in the context of real groups. Because people have no prior knowledge about ingroup and outgroup members in the minimal group context, any sampling biases that emerge should represent “pure” group-based motivations. These motivations would not have to be overtly aimed at discriminating against other groups. Instead, they could be as innocent as wanting to first learn about one’s own group. That is, in the minimal group context, sampling biases may emerge simply because people are more curious about ingroup members, and as a result, sample first and most often from them. Motivations to sample first and most often from one’s own group are also likely to bias sampling in the context of real groups, in line with work suggesting that people often prefer information from congenial sources (e.g., an ingroup member). However, the specific motivations underlying sampling behavior – and more importantly, how people interpret sampled information – may diverge in the presence of real groups. This is because people often maintain preexisting beliefs (e.g., about ingroups as superior and outgroups as inferior), which engenders intergroup biases (Brewer, ). For instance, in a real-group context, people’s motivations may

https://doi.org/10.1017/9781009002042.023 Published online by Cambridge University Press



 ,    .

Table .. Overview of sample information for all  empirical studies. Study

n





























Ingroup first sample % CI [,] % CI [,] % CI [,] % CI [,] % CI [,] % CI [,] % CI [,]

Online/ In-lab

Context

Group Type

Scandinavia

In-lab

Restaurant Ratings

Minimal

Scandinavia

In-lab

Restaurant Ratings

Minimal

United States

Online

Security Screening

Minimal

United States

Online

Cognitive Screening Minimal

United States

Online

Political Knowledge Political

United States

Online

Fact-Check Ratings

Political

United States

Online

Fact-Check Ratings

Political

Location

shift from focusing solely on positive ingroup information to also focusing on negative outgroup information. Throughout all seven studies, the majority of participants chose their first piece of information from the ingroup (see Table .), demonstrating that group-based motives influence information sampling from the very start. Because the first sample was either overly positive or negative, most participants also had more varied experiences about their ingroup relative to their outgroup. In the minimal group context, roughly  percent of participants started sampling from the ingroup, providing strong evidence for group-based motives even when people could not have any “rational” prior beliefs about ingroup and outgroup members, nor any knowledge about preexisting differences between groups (Bergh & Lindskog, ). In the political context, roughly  percent of participants started sampling from the ingroup, suggesting that group-based motives also guide sampling behavior in real groups (Derreumaux et al., ). In the minimal group context, sampling behavior likely reflects “pure” ingroup favoritism driven by a greater interest to learn about ingroup members by default, because minimal group contexts are not contaminated by preexisting motivations or expectations associated with either group. However, even in minimal groups where no preexisting beliefs or associations exist, sampled information can still vary in the extent to which it bears implications for the self,

https://doi.org/10.1017/9781009002042.023 Published online by Cambridge University Press

Group-Motivated Sampling



and this can contribute additional motivations to biased information sampling beyond greater interest. For instance, in Study , participants gathered IQ scores from novel ingroups and outgroups. IQ scores represent meaningful pieces of information that could have a real impact on the success or failure of future coalitions and therefore people may have good reason to gather more IQ scores from novel ingroups compared to novel outgroups. Thus, sampling in this context may be considered “rational,” yet it is still driven by motivations surrounding group membership. Likewise, in the political context, sampling behavior may be driven by motivations beyond greater ingroup interest, such as motivations to confirm prior beliefs about the superiority of one’s own group. Indeed, we find evidence that sampling more from the ingroup was stronger for participants who identified more strongly with their political identity and for those with stronger political conviction. This finding provides an explicit connection between strength of identity and motivated sampling behavior, and suggests that biases in sampling behavior in part represent groupspecific motivations. Absent any prior knowledge about ingroups and outgroups, one reason that participants may have predominantly begun sampling from their ingroup is that they are testing an internally generated hypothesis that their ingroup is better (i.e., a positive test strategy). This possibility aligns with work suggesting that people often maintain overly positive expectations about ingroup attributes relative to outgroup attributes (King & Bee, ). To more formally test the extent to which sampling from the ingroup first represents a positive test strategy to confirm that the ingroup is better, in addition to a general or “default” motivation to learn more about ingroup members, the last of our studies introduced a global pundit impression. This manipulation took place prior to any sampling, such that participants either received a positive or negative pundit impression about the candidates from an ingroup or outgroup source. This context aligns or misaligns positive test strategies for the externally generated hypothesis (i.e., the pundit comment) with motivations to learn more about ingroup members. The positive pundit impression aligns a positive test strategy with the motivation to learn more about ingroup members, as both should lead to more ingroup first samples. Conversely, the negative pundit impression misaligns a positive test strategy with the motivation to learn more about ingroup members, as testing the hypothesis that the outgroup is better should lead to an increase in sampling from the outgroup first. Results suggest that when these two motivations align, participants were more likely ( percent) to select the first sample from the ingroup. In

https://doi.org/10.1017/9781009002042.023 Published online by Cambridge University Press



 ,    .

contrast, following a negative pundit impression, participants were equally likely to sample from the ingroup ( percent) or the outgroup ( percent) first. This suggests that positive test strategies exert some influence over initial sampling behavior, but they do not fully account for sampling biases. In addition to sampling more often from the ingroup first, participants also sampled more information on average from the ingroup, demonstrating that group-based motivations also influence sampling behavior on aggregate. Sampling biases were reliable across contexts with a few notable exceptions. Participants sampled overall more information when gathering information about novel ingroups as compared to real groups, as well as sampled overall more information when sampling from groups, as compared to group representatives. One parsimonious explanation for both of these effects pertains to perceived variability, represented both in terms of novel groups (about which participants have no priors) as well as inferring group characteristics from interactions with individual group members. Specifically, participants may sample more information from novel groups simply because they have less a priori information about them and therefore require more experiences before feeling confident in evaluating them on a given attribute. .. Path B: Biased Samples Interact with Motivated Interpretations Although motivated sampling and motivated interpretations can be viewed as taking place at different stages during information processing, it is possible that they interact. Across all studies, the observed pattern of results suggest that evaluative biases arise from the interaction of motivated sampling and motivated interpretations: the influence of the sampled information (Path A) on evaluations depends not only on where the sample came from (i.e., ingroup or outgroup), but also the congeniality of that sample. Across studies, we find evidence that observers asymmetrically integrate initial experiences into their evaluations based on congeniality (see Figure ., Panels a and b). Specifically, observers integrated positive initial experiences about the ingroup into their evaluations, but failed to integrate negative initial ingroup experiences into their evaluations. These results are inconsistent with the notion that biased evaluations only depend on motivated interpretations, as both congenial and uncongenial initial experiences should lead to similarly biased evaluations. Thus, these findings explicitly connect group-motivated biases in sampling to biases in

https://doi.org/10.1017/9781009002042.023 Published online by Cambridge University Press

Group-Motivated Sampling



evaluations, such that biased evaluations represent the interaction of motivated sampling (producing more variable ingroup experiences) and motivated interpretations (selectively relying on favorable information about the ingroup). In addition to manipulating the first piece of information to be overly positive or negative, we also manipulated the overall favorability of the environment. Consistent with work suggesting that people are capable of veridically representing data they encounter (Denrell, ; Denrell & Le Mens, ; Fiedler, ; Le Mens & Denrell, ; Le Mens et al., ; Lindskog & Winman, ; Lindskog et al., ), participants across studies were quite adept at inferring the direction and magnitude of the real-group differences, but only when their group was de facto better (see Figure ., Panels c and d). Once again, these results are consistent with the notion that biased evaluations depend on the interaction of

Figure . Evaluations of ingroups and outgroups as a function of valence of initial impressions (Panels a and b) or real-group differences (Panels c and d) in the political context (Study ; Derreumaux et al., ) and minimal group context (Study ; Bergh & Lindskog, ). Worse, Same and Better is stated in reference to the ingroup. Vertical bars denote standard error of the mean.

https://doi.org/10.1017/9781009002042.023 Published online by Cambridge University Press



 ,    .

motivated sampling and motivated interpretations, as people were quite adept at accurately interpreting information when the ingroup was de facto better, but were less accurate at inferring real-group differences when the ingroup was de facto worse. Specifically, when sampled information about the ingroup was unfavorable, participants failed to pick up on any group differences and evaluated the ingroup and outgroup similarly. Overall, we observed very similar evaluative biases between minimal and political groups (see Figure .). However, results suggest that in the political context evaluative biases seem to reflect participants underestimating outgroup averages, and in the minimal group context evaluative biases seem to indicate that participants are inflating ingroup averages. One interpretation of this finding is that evaluations of minimal groups may be more susceptible to anchoring effects on the response scale (e.g., answering closer to the midpoint of the scale), given that people have little prior experiences from which to draw. An alternative hypothesis is that political polarization in the USA and strong political affiliation (Iyengar & Krupenkin, ; Iyengar & Westwood, ; Iyengar et al., ) elicits additional motivations to derogate political outgroups, and may therefore manifest in harsher interpretations of unfavorable outgroup information. .. Path C: Sampling Strategies and Dynamics In order to uncover underlying attributes about individuals or groups, people must sample and interpret information iteratively. Therefore, examining sampling behavior on aggregate may not fully capture the dynamic features of sampling behavior as it evolves over time. For instance, people may employ different sampling strategies based on the congeniality of sampled information. In this situation, aggregate trends would wash out any changes in sampling trajectories over time. Therefore, in order to elucidate whether particular trajectories of sampling behavior manifest over time as a function of the congeniality of sampled information, and whether these trajectories were associated with downstream evaluative biases, we modeled sampling using mixed logistic regression models in our biggest samples that could most easily be analyzed jointly (i.e., the political samples). These models revealed that participants adopted different sampling strategies when the ingroup was de facto worse, such that a positive first sample led to a higher propensity of outgroup sampling, whereas a negative first sample led to a higher propensity of ingroup sampling (see Figure .). One explanation for this behavior is that when the ingroup

https://doi.org/10.1017/9781009002042.023 Published online by Cambridge University Press

Group-Motivated Sampling



Figure . Probability of sampling from the ingroup when the ingroup was de facto worse (Meta-Analysis ; Derreumaux et al., ). Dashed line denotes sampling as a function of a negative first sample, whereas the solid line denotes sampling as a function of a positive first sample. Error bars denote standard error of the mean.

is de facto worse, participants are limited in their ability to selectively attend to congenial information and must therefore try and maintain plausible deniability about the superiority of the outgroup. One way this can be accomplished is by shifting attention away from the ingroup following a positive initial experience, and thus increasing the uncertainty of new information (that is most likely unfavorable to the ingroup), while shifting attention toward the ingroup following a negative first experience, as new experiences are likely to be better relative to the first experience. Critically, participants who employed this strategy were also more likely to end up with the most biased evaluations, suggesting that sampling strategies may be effective at obfuscating real-group differences.

. Conclusions and Future Directions Human experiences are often guided by motivations engendered by the groups with which we affiliate, which can constrain the availability of information and pave the way for motivated interpretations. In the current chapter, we demonstrate that sampling models provide a useful method to examine and quantify how group-based motivations guide information processing in real-world settings. Replicating these effects across a variety of sampling categories (e.g., minimal and real group; individual groups and individual group representatives) and sampled information (e.g., sampling ingroup and outgroup restaurant helpfulness ratings; sampling political candidate fact-check

https://doi.org/10.1017/9781009002042.023 Published online by Cambridge University Press



 ,    .

scores), we find that biased sampling behaviors interact with motivated interpretations to produce biased evaluations. Specifically, we outline three stages where motivations permeate information processing. First, motivations drive people to sample first and most often from their own groups (Path A in Figure .). This in turn generates samples that are skewed and not representative, as people have larger and more variable ingroup experiences to draw from (Konovalova & Le Mens, ). Second, people selectively attend and integrate information into their evaluations based on congeniality: favorable information about the ingroup is integrated, whereas unfavorable information about the ingroup is not (Path B in Figure .). Importantly, the ability to selectively attend to and integrate certain pieces of information, while ignoring others, was driven by biases outlined in Path A, demonstrating that biased evaluations depend on the interaction of motivated sampling and motivated interpretations. Third, sampling behavior changed over time when people’s ingroup was de facto worse, based on whether the first piece of information was positive or negative (Path C in Figure .). Thus, in addition to biases in path A and B, when people are limited in their ability to selectively attend to information, they changed their sampling trajectories over time in a way that allowed for the most favorable interpretations. Next, we describe how social contexts often elicit additional moderators that may further influence information processing. A premise of the proposed framework is that population attributes are noisy, and that in order to accurately approximate them people need to gather a sufficient amount of information, account for biased samples, and then accurately make inferences. An implicit assumption is that participants may rationalize downweighing of uncongenial information to those samples being invalid or unreliable. This hypothesis in line with prior work suggesting that people are often motivated skeptics of uncongenial information – requiring more and stronger evidence before it is adopted (Ditto & Lopez, ). Future research could explicitly manipulate the reliability and validity of the source from which information is sampled. Doing so may provide insights into the mechanisms underlying the interaction of biased sampling behavior and biased interpretations. For instance, how sensitive are people to the validity or reliability of information, and does this sensitivity change based on whether information in the sampling environment favors the ingroup or the outgroup? One benefit of examining information processing biases in both minimal and political groups is that it provides insights into if and how different classes and strengths of motivations may lead to differences in

https://doi.org/10.1017/9781009002042.023 Published online by Cambridge University Press

Group-Motivated Sampling



evaluative biases. Although our results suggest that biases are relatively consistent across our sampling contexts, evaluative biases in the minimal group context seem to indicate that participants overestimated ingroup averages (i.e., ingroup evaluations were greater than their true group means), and in the political context evaluative biases seem to reflect that participants underestimated outgroup averages (i.e., outgroup evaluations were lower than their true group means; see Figure .). This finding is consistent with prior work that shows that outgroup derogation is likely to emerge in intergroup contexts characterized by a perception of threat or zero-sum competition over resources (Brewer, ; Cikara et al., ). We speculate that the underestimation of outgroup averages in the political context may generalize to other real groups also faced with competition and intergroup conflict (e.g., race, religion, sports teams). However, unlike political contexts, overt outgroup derogation may not be deemed socially acceptable in other intergroup contexts (e.g., race), suggesting that social norms may also play a role in guiding information processing. For instance, in some contexts it may be socially unacceptable for a White individual to actively search for negative information about a Black individual based on their race. In this situation, social norms (e.g., social desirability; Paulhus & Reid, ) may “unbias” sampling behavior (Path A), as actively searching for negative outgroup information would be frowned upon. Conversely, sampling may be impervious to social norms, as people may believe that their sampling behavior is “hidden” from others (e.g., online). In both contexts, people may still arrive at biased evaluations, yet via different mechanisms. In real intergroup contexts, additional group dynamics, like social status, may also bias sampling behavior and further constrain the environment. For instance, prior work has shown that people attend to and admire high status individuals, in part because they tend to have power and dictate social norms (Dalmaso et al., ), and that ingroup biases are often strongest in high status groups (e.g., Mullen et al., ). These results suggest that the ingroup and outgroup dynamics we find across minimal and political groups may diverge when groups differ on social status. Thus, sampling models may help to explain why intergroup biases are strongest when they intersect with status, as ingroup and status motivations may align. Since most social categories intersect with social status on some level, future research on intergroup biases may prioritize extending sampling models to these contexts. If intergroup biases depend, in part, on biases in information sampling, then what strategies may be used to intervene and reduce intergroup bias?

https://doi.org/10.1017/9781009002042.023 Published online by Cambridge University Press



 ,    .

Our results highlight the role of initial ingroup experiences in driving downstream information processing biases. As such, one route to reducing bias may be to encourage people to sample first and more often from outgroup members. That is, if biases are driven by people selectively attending to positive ingroup samples rather than negative outgroup samples, then sampling from the outgroup might mitigate biases as participants would have less biased ingroup experiences to selectively attend to. Indeed, in the political context, we find preliminary evidence that under certain conditions, sampling first from the outgroup leads to less biased ingroup and outgroup evaluations (Derreumaux et al., ). This suggests that under certain conditions, having more variable outgroup samples may be one route to reducing partisan prejudice. Taken together, extending sampling models to social contexts offers a number of exciting avenues for future research to better understand how sampling and interpretations jointly produce biased evaluations. Most importantly, we believe that increasing crosstalk between researchers studying sampling and interpretive sources of bias will greatly improve research and theory in the domain of intergroup conflict. R E F E R EN C E S Ambady, N., & Rosenthal, R. (). Half a minute: Predicting teacher evaluations from thin slices of nonverbal behavior and physical attractiveness. Journal of Personality and Social Psychology, (), –. https://doi.org/ .//-... Anderson, C., & Putterman, L. (). Do non-strategic sanctions obey the law of demand? The demand for punishment in the voluntary contribution mechanism (p. ). Working paper. Bergh, R., & Lindskog, M. (). The group-motivated sampler. Journal of Experimental Psychology: General, (), –. https://doi.org/./ xge Bolsen, T., Druckman, J. N., & Cook, F. L. (). The influence of partisan motivated reasoning on public opinion. Political Behavior, (), –. https://doi.org/./s--- Brewer, M. B. (). The psychology of prejudice: Ingroup love or outgroup hate? Journal of Social Issues, (), –. https://doi.org/./. (). The many faces of social identity: Implications for political psychology. Political Psychology, (), –. https://doi.org/./-x . Brewer, M. B., & Pierce, K. P. (). Social identity complexity and outgroup tolerance. Personality and Social Psychology Bulletin, (), –. https:// doi.org/./

https://doi.org/10.1017/9781009002042.023 Published online by Cambridge University Press

Group-Motivated Sampling



Cikara, M., Van Bavel, J. J., Ingbretsen, Z. A., & Lau, T. (). Decoding “Us” and “Them”: Neural representations of generalized group concepts. Journal of Experimental Psychology: General, (), –. https://doi.org/./ xge Dalmaso, M., Pavan, G., Castelli, L., & Galfano, G. (). Social status gates social attention in humans. Biology Letters, (), –. https://doi.org/ ./rsbl.. Denrell, J. (). Why most people disapprove of me: Experience sampling in impression formation. Psychological Review, (), –. https://doi .org/./-X... Denrell, J., & Le Mens, G. (). Information sampling, belief synchronization, and collective illusions. Management Science, (), –. https://doi .org/./mnsc.. Derreumaux, Y., Bergh, R., & Hughes, B. (). Partisan-motivated sampling: Re-examining politically motivated reasoning across the information processing stream. Journal of Personality and Social Psychology, (), –. https://doi.org/./pspi Ditto, P. H., & Lopez, D. F. (). Motivated skepticism: Use of differential decision criteria for preferred and nonpreferred conclusions. Journal of Personality and Social Psychology, (), –. https://doi.org/./ -... Everett, J. A. C., Faber, N. S., & Crockett, M. (). Preferences and beliefs in ingroup favoritism. Frontiers in Behavioral Neuroscience,  (Feb.), –. https://doi.org/./fnbeh.. Fiedler, K. (). Beware of samples! A cognitive-ecological sampling approach to judgment biases. Psychological Review, (), –. https://doi.org/ ./-X... Flynn, D. J., Nyhan, B., & Reifler, J. (). The nature and origins of misperceptions: Understanding false and unsupported beliefs about politics. Political Psychology, , –. https://doi.org/./pops . Gampa, A., Wojcik, S. P., Motyl, M., Nosek, B. A., & Ditto, P. H. (). (Ideo)logical reasoning: Ideology impairs sound reasoning. Social Psychological and Personality Science. https://doi.org/./  Gramzow, R. H., Gaertner, L., & Sedikides, C. (, Feb). Memory for ingroup and out-group information in a minimal group context: The self as an informational base. Journal of Personality & Social Psychology, (), –. doi: ./-.... PMID: . Hart, W., Albarracín, D., & Eagly, A. H., et al. (). Feeling validated versus being correct: A meta-analysis of selective exposure to information. Psychological Bulletin, (), –. https://doi.org/./a Hewstone, M., Rubin, M., & Willis, H. (). Intergroup bias. Annual Review of Psychology, (February), –. https://doi.org/./annurev .psych...

https://doi.org/10.1017/9781009002042.023 Published online by Cambridge University Press



 ,    .

Hogg, M. a., Terry, D. J., & White, K. M. (). A tale of two theories: A critical comparison of identity theory with social identity theory. Social Psychology, (), –. http://www.jstor.org/stabl. Howard, J. W., & Rothbart, M. (). Social categorization and memory for ingroup and out-group behavior. Journal of Personality and Social Psychology, , –. Hughes, B. L., Zaki, J., & Ambady, N. (). Motivation alters impression formation and related neural systems. Social Cognitive and Affective Neuroscience, (), –. https://doi.org/./scan/nsw Iyengar, S., & Krupenkin, M. (). The strengthening of partisan affect. Political Psychology, , –. https://doi.org/./pops. Iyengar, S., Lelkes, Y., & Levendusky, M., et al. (). The origins and consequences of affective polarization in the United States. Annual Review of Political Science, (), –. https://doi.org/./annurev-polisci- Iyengar, S., & Westwood, S. J. (). Fear and loathing across party lines: New evidence on group polarization. American Journal of Political Science, (), –. https://doi.org/./ajps. Kahan, D. M. (). Ideology, motivated reasoning, and cognitive reflection. Judgment and Decision Making, (), –. Kahan, D. M., Braman, D., Gastil, J., Slovic, P., & Mertz, C. K. (). Culture and identity-protective cognition: Explaining the white-male effect in risk perception. The Feeling of Risk: New Perspectives on Risk Perception, (), –. https://doi.org/./ King, J. S., & Bee, C. C. (). Better in the (near) future: Group-based differences in forecasting biases. European Journal of Social Psychology,  (), –. https://doi.org/./ejsp. Konovalova, E., & Le Mens, G. (). An information sampling explanation for the in-group heterogeneity effect. Psychological Review. https://doi.org/ ./rev Kraft, P. W., Lodge, M., & Taber, C. S. (). Why people “don’t trust the evidence”: Motivated reasoning and scientific beliefs. Annals of the American Academy of Political and Social Science, (), –. https://doi.org/ ./ Kunda, Z. (). The case for motivated reasoning. Psychological Bulletin,  (), –. https://doi.org/./-... Le Mens, G., & Denrell, J. (). Rational learning and information sampling: On the “naivety” assumption in sampling explanations of judgment biases. Psychological Review, (), –. https://doi.org/./a Lindskog, M., & Winman, A. (). Are all data created equal? Exploring some boundary conditions for a lazy intuitive statistician. PLoS ONE, (). https://doi.org/./journal.pone. Lindskog, M., Winman, A., & Juslin, P. (). Naïve point estimation. Journal of Experimental Psychology: Learning, Memory and Cognition, (), –. https://doi.org/./a

https://doi.org/10.1017/9781009002042.023 Published online by Cambridge University Press

Group-Motivated Sampling



Miller, K. P., Brewer, M. B., & Arbuckle, N. L. (). Social identity complexity: Its correlates and antecedents. Group Processes and Intergroup Relations, (), –. https://doi.org/./ Mullen, B., Brown, R., & Smith, C. (). Ingroup bias as a function of salience, relevance, and status: An integration. European Journal of Social Psychology, (), –. https://doi.org/./ejsp. Mummendey, A., Simon, B., & Dietze, C., et al. S. (). Categorization is not enough: Intergroup discrimination in negative outcome allocation. Journal of Experimental Social Psychology, (), –. https://doi.org/./ -()-I Onorato, R., & Turner, J. (). Fluidity in the self concept: The shift from personal to social identity. European Journal of Social Psychology, (), –. https://doi.org/./ejsp. Paulhus, D. L., & Reid, D. B. (). Enhancement and denial in socially desirable responding. Journal of Personality and Social Psychology, (), –. https://doi.org/./-... Susskind, J., Maurer, K., Thakkar, V., Hamilton, D. L., & Sherman, J. W. (). Perceiving individuals and groups: Expectancies, dispositional inferences, and causal attributions. Journal of Personality and Social Psychology,  (), –. https://doi.org/./-... Taber, C. S., Cann, D. M., & Kucsova, S. (). The motivated processing of political arguments. SSRN Electronic Journal. https://doi.org/./ssrn . Tajfel, H., & Billig, M. (). Social categorization and similarity in intergroup behaviour. European Journal of Social Psychology, (), –. Tamir, D. I., & Hughes, B. L. (). Social rewards: From basic social building blocks to complex social behavior. Perspectives on Psychological Science, (), –. https://doi.org/./ US Bureau of Labor Statistics. (). Willer, D., Turner, J. C., & Hogg, M. A., et al. (). Rediscovering the social group: A self-categorization theory. Contemporary Sociology, (), . https://doi.org/./

https://doi.org/10.1017/9781009002042.023 Published online by Cambridge University Press

 

Opinion Homogenization and Polarization Three Sampling Models Elizaveta Konovalova and Gaël Le Mens

G. Le Mens benefited from financial support from grants PID–GB-I/AEI/./ from the Spanish Ministerio de Ciencia, Innovacion y Universidades (MCIU) and the Agencia Estatal de Investigacion (AEI), and ERC Consolidator Grant # from the European Commission and BBVA Fundation Grant GQ.

. Introduction The polarization of opinions has been described as a “challenge to democratic debate” (EU Commissioner Vera Jourova) and is frequently seen as an important problem that needs to urgently be addressed in order to preserve social harmony. A number of commentators and politicians have attributed opinion polarization to the abundance of “fake news” that spread on social media and via more traditional channels such as cable television and the press. The design of social media platforms, and how these shape the information to which people have access, have also been pointed out as culprits. For example, Sunstein () argued that because social media platforms more strongly connect like-minded people than people with different opinions, this facilitates the emergence of opinion clusters. This clustered structure would, in turn, affect the kind of information to which users are exposed, contributing to a self-reinforcing dynamic. Others have proposed that this self-reinforcing dynamic could be strengthened by the algorithms that control the content shown to users. These would create “filter bubbles” (Pariser, ) in which people fail to be exposed to contrarian ideas and opinions. This could, in turn, contribute to an increase in the popularity of extreme opinions, contributing to polarization on issues such as climate policy, environmental issues, social policy, colonial history, immigration, gender issues, or abortion rights. 

https://doi.org/10.1017/9781009002042.024 Published online by Cambridge University Press

Opinion Homogenization and Polarization



These claims that social media contribute to opinion polarization fall within the scope of the sampling approach to human judgment because they focus on how social media shape the information to which their users have access. Yet, they do not clearly spell out the micro-mechanism of social influence at play and how such micro-mechanisms could contribute to a macro phenomenon such as between-group opinion polarization. In this chapter, we aim to address this shortcoming by discussing three sampling models that focus on distinct aspects of how social interactions happen on social media platforms: how popularity creates additional exposure to contrarian arguments, how differences in popularity make an agent more likely to hear particularly persuasive arguments in support of popular options, and how opinions in favor of popular options are reinforced through social feedback. The first two models pertain to how newsfeeds – a personalized flow of information that contains posts from other users or companies shown to users of social media as they login onto the platform – affect opinion dynamics. They assume that arguments for opinions popular among the network contacts of a user will be prevalent in their newsfeed and illustrate two mechanisms according to which this can contribute to social influence. The third model pertains to the effect of feedback in the form of “likes,” “favorites,” and “retweets” – an integral part of social media that has been shown to provide a motivation for people to use such platforms (Chen et al., , ). This model posits that people are more likely to express opinions that received positive feedback and that feedback is more likely to be provided by network contacts than by other users. Each sampling model leads to two jointly occurring phenomena: opinion homogenization between densely connected agents and opinion polarization between agents who are not connected or only indirectly connected. This sampling perspective on homogenization and polarization builds on the idea of assimilative influence: a classical approach to polarization where the focal agent converges in their belief with others through social influence. A common approach used in discussions of models of assimilative influence assumes that people converge in their belief with others by simply adopting the average opinion of their peers (DeGroot, ; Friedkin, ; Latané et al., ). Even though this approach has been widely adopted in prior work on opinion dynamics in social networks, it is a kind of “black box” regarding the micro-process of social influence because it does not specify the mechanism according to which the opinion of an agent would become aligned with that of the agents to which they are connected. Moreover, the lack of specificity of this perspective renders it

https://doi.org/10.1017/9781009002042.024 Published online by Cambridge University Press



 , e¨   

similarly applicable to mechanisms that rely on motivated cognition, rational inferences, and sampling-based mechanisms. Here, we contribute to opening this “black box” by specifying sampling-based mechanisms that lead to local opinion homogenization and between-group polarization. The most common explanation for local opinion homogenization in models of assimilative influence relies on motivated information processing as the result of the desire to belong and avoid punishment for deviating from the group norm: the social pressure applied by others motivates the person to interpret information such that the resulting opinion conforms to expectations of others (McGarty et al., ; Turner et al., ). Another explanation presumes that agents infer the “quality” of a stance based on their perceived popularity. Research in economics has shown that it is sometimes rational to do so, and that such inferences can lead to opinion homogenization through information cascades (Banerjee, ; Bikhchandani et al., ). In contrast, the sampling approach discussed in this chapter aims to specify conditions regarding how people sample information from their environments that are sufficient to produce homogenization or polarization even in the absence of motivated information processing or popularity-based inferences. The unifying theme of the three models is that they are particularly relevant to interactions on social media, but they are also applicable to other settings that satisfy their assumptions regarding how people sample information. Each model is simplistic, in the sense that it focuses on just one sampling mechanism while excluding other sampling mechanisms and “switches off” motivated cognition and popularity-based inferences. In presenting these models, we do not claim that motivated cognition and popularity-based inferences are unimportant or that just one sampling mechanism is applicable in a given setting. Rather, we see these models as proofs-of-concept that these sampling mechanisms are each sufficient to produce homogenization and polarization. This implies, in particular, that if an analyst observes that opinions have become more homogeneous or more polarized in an empirical setting, several sampling mechanisms could have produced this phenomenon. Thus, explaining the reasons for the observed homogenization requires uncovering the specifics of the sampling process or processes (and/or information processing and inference mechanisms) at play. Next, we first present an informal description of the three sampling models, then we discuss the relationship between social influence, local opinion homogenization, and between-group polarization. We then turn to a formal analysis of the three models.

https://doi.org/10.1017/9781009002042.024 Published online by Cambridge University Press

Opinion Homogenization and Polarization

.



Three Sampling Models for Social Influence

Florian just arrived in Barcelona for his Erasmus exchange stay at a local university. He quickly realizes that most Catalans tend to have strong opinions about a particular issue: Catalan independence. Florian wonders what he should think about this issue: would an independent Catalonia be a good thing or a bad thing? As he talks to his classmates and reads the news articles his new friends post on Facebook and Twitter, Florian samples information about the issue and begins to form his own opinion. In this introductory section, we explain how such information sampling can lead Florian to favor independence when most people in his social circle are pro-independence. (Symmetric predictions would hold if most of his classmates were against independence.) In what follows, we provide a verbal description of the three sampling-based social influence models and then move onto formal analyses of these models. The “Asymmetric Hot Stove” model analyzes the dynamics of Florian’s opinion as he samples pro-independence and anti-independence arguments in the content shared by his Facebook friends and those he follows on Twitter or in conversations with his classmates. When neither of the options is much more popular than the other, the Hot Stove Effect (see Chapter  in this volume) implies that Florian is likely to underestimate the value of one of the stances. This is because, if he samples unconvincing arguments in favor of this stance, he will avoid sampling further arguments about this stance and will fail to discover there is merit to this stance. Now, consider the effect of the popularity of the pro-independence stance. Because the pro-independence stance is popular in Florian’s social circle, the Hot Stove Effect will affect this stance less strongly than the anti-independence stance. Even if Florian starts to develop an antiindependence opinion and thus would rather avoid reading or hearing more pro-independence arguments, because most of his social circle is proindependence, he keeps being exposed to the pro-independence perspective and thus obtains additional samples of information about it. By contrast, if Florian develops a pro-independence opinion, and comes to dislike the anti-independence stance, he will not be exposed to many additional arguments about this stance because it is unpopular in his social circle. This asymmetry in exposure to pro-independence and antiindependence arguments implies that Florian can be subject to a Hot 

This model was initially introduced in Denrell and Le Mens () and its implications for collective opinions were analyzed in Denrell and Le Mens ().

https://doi.org/10.1017/9781009002042.024 Published online by Cambridge University Press



 , e¨   

Stove Effect about the pro-independence stance but is unlikely to be subject to a Hot Stove Effect about the anti-independence stance (see Chapter ). In other words, Florian is unlikely to underestimate his attraction for the pro-independence stance but underestimation of the anti-independence stance is possible. Overall, this makes Florian more likely to adopt an opinion in line with that of his social circle: proindependence. The “Maximal Argument Strength” model also analyzes the dynamics of Florian’s opinion as he samples pro-independence and anti-independence arguments in the content shared by his social circle. In each period, Florian hears pro-independence and anti-independence arguments from his social circle and he is most influenced by the strongest argument he hears. And because the strongest arguments are likely to come from the larger sample of arguments, the strongest argument he hears is likely to support the more popular stance in his social circle. Because most of Florian’s social circle prefers the pro-independence stance, this implies that Florian is likely to shift his opinion in favor of independence. By contrast to the “Asymmetric Hot Stove” model, the “Maximal Argument Strength” model does not assume that Florian’s current opinion affects the samples of information he will obtain (he does not engage in what is sometimes called “active” sampling but instead in “passive” sampling). Here, the samples are entirely driven by the composition of Florian’s social circle. The “Feedback Sampling” model examines what happens as Florian starts to express his opinions about Catalan independence by sharing content supporting his opinion on social media or by explicitly stating his opinion in conversations with friends and classmates. When Florian expresses his opinion, he sometimes gets approval in the form of a “like” or “retweet” or a signal of approval in a conversation. At other times, he gets negative feedback, in the form of negative comments. Because most of Florian’s classmates are pro-independence, he tends to get more positive feedback when he shares a pro-independence opinion than when he shares an anti-independence opinion. If Florian cares about such feedback, he will respond by shifting the position of his statements toward the proindependence stance. This model differs from the other two models in terms of the unit of sampling. Whereas in the Asymmetric Hot Stove model and the Maximal Argument Strength model, Florian forms opinions about the two stances based on arguments expressed by members of  

This model is original to this chapter. This model builds on an ongoing project by the authors of this chapter and Nikolas Schöll.

https://doi.org/10.1017/9781009002042.024 Published online by Cambridge University Press

Opinion Homogenization and Polarization



his social circle (he samples arguments), here, he voices his own opinions and arguments but samples feedback (that could be negative or positive) about the arguments he expresses. In what follows, we discuss how these basic sampling-based social influence mechanisms can contribute to explaining opinion homogenization and polarization. The resulting patterns of opinions and preferences depend on the structure of the social network in which agents are embedded.

.

From Social Influence to Opinion Polarization

In all three models, the samples obtained by Florian are subject to randomness: In the “Asymmetric Hot Stove,” what is random is the strength of a sampled argument. In the “Maximal Argument Strength”, what is random is the strength of the argument of maximal strength among the set of sampled arguments. In the “Feedback Sampling” model, what is random is the valence and strength of the feedback provided by Florian’s social circle. The randomness at the core of each model implies that the same set of initial conditions can lead to different resulting patterns. For example, even if Florian’s social circle favors independence, Florian might become anti-independence. Yet, all three models predict that Florian is more likely to adopt an opinion aligned with that of his social circles than an opposed opinion. A more formal and general rendition of this claim is that the three sampling models imply that opinions become correlated. In other words, agents who are close to each other in the social network tend to come to prefer the same options even when initial evaluations are independent of each other and there is no clear initial majority. The mechanisms we discuss in this chapter thus explain the emergence of opinion homogeneity and preference convergence in densely connected networks. To unpack this process, after providing a formal description of the models, we first explain how they can lead to opinion homogenization and preference convergence in a tiny network of two agents (Figure .) who update their opinions in each period. We then turn to polarization. We call “polarization” the divergence of opinions and preference between groups of people or distinct parts of a social network. We illustrate how the three social influence models can 

We do not use the word “polarization” in the way it was used in the program of research on “attitude polarization” in social psychology (Burnstein & Vinokur, ; Moscovici & Zavalloni, ; Nowak et al., ; for reviews, see Isenberg,  and Myers and Lamm, ). This research program focused on what we call local opinion homogenization rather than polarization.

https://doi.org/10.1017/9781009002042.024 Published online by Cambridge University Press



 , e¨    A

A B A

E C

B

E C

J I (b)

D

E C

J I (c)

B

D

E C

F

G H

A

B

F

F G

(a)

B

D

H

A

F

G

J H

D

I (d)

G

J H

I (e)

Figure . Social network structures analyzed in the simulations: a) network of two agents; b) network of two groups,  agents each, with  between-group links; c) network of two groups,  agents each, with  between-group link; d) network of two groups,  agents each, with  between-group links; e) network of two groups with distinct identities,  agents each, with  between-group links.

produce polarization in networks with “structural holes” – agents who are not connected to each other (Burt, ). We first consider a simple setting in which agents belong to two independent (i.e., disconnected) groups and then settings in which the groups are more or less densely connected to each other. We consider two cases: one case in which group identity is irrelevant to the nature of the interaction between agents (Figure .b, c, d), and one case in which it is relevant (Figure .e; see Iyengar et al. ()).

. The “Asymmetric Hot Stove” Model In this section, we analyze what happens when agents form opinions about options that differ in popularity, and when popularity is not fixed but evolves as the agents update their opinions about the options. The central assumption of the model is that agents tend to sample options that are “locally popular” – popular among network contacts at the time of the sampling instance. In contrast to many social learning models (Banerjee, ; Bikhchandani et al., ), this model assumes that popularity does not affect inference about the quality of the option, but only impacts sampling. There are many situations where people might select popular options even if they do not like them. Studies of normative social influence and conformity have shown that people who deviate from the behaviors of others tend to be less liked and sometimes ostracized. Moreover, people might select the popular alternative because “it is better for reputation to

https://doi.org/10.1017/9781009002042.024 Published online by Cambridge University Press

Opinion Homogenization and Polarization



fail conventionally than to succeed unconventionally” (Keynes , p. ). Finally, the ranking algorithms that control which options show up at the top of search results, which links are shown on our Facebook newsfeed or our Twitter feed, or which movies are shown on the top of the screen of our favorite streaming platform frequently rely on evaluations by our network contacts or people who are similar to us. Such ranking algorithms make options that are liked among our network contacts more available and thus more frequently selected (Germano et al., ). ..

Model

We analyze the sampling behavior of N agents in a connected network who update their evaluations of K options over a set of T periods. In each period, an agent i that is randomly drawn from the population samples one of the K options, observes its payoff, and updates their valuation of that option. The popularity of the options among’s network contacts at the beginning of period t affects i’s sampling behavior in that period. ... Payoff Distributions of the Options There are K options with stable payoff distributions. We denote by f k the density of the payoff distribution of option k and by uk its mean. ... Initial Valuations The valuation of option k at the beginning of period t, for agent i, is denoted by V ik,t . For all agents, the initial valuation of option k, V ik, . consists in one random draw from its payoff distribution. ... Sampling Rule In each period, an agent i is randomly selected in the population. Let P ik,t denote the likelihood that the agent samples option k in period t. We implement the assumption that agent i is more likely to sample a popular option by assuming that P ik,t depends on the mean evaluation of option k by i’s neighbors – the agents to which i is connected (Woiczyk & Le Mens, i ). We denote this quantity by V k,t . The agent does not fully rely on the opinions of their neighbors, however. Their own valuations also affect the sampling probability. Accordingly, the likelihood that the agent samples option k is a logistic function of their valuations and the mean valuations of their neighbors:

https://doi.org/10.1017/9781009002042.024 Published online by Cambridge University Press



 , e¨    i

e s V k,t þs V k,t ¼ K X i i e s V l ,t þs V l ,t i

P ik,t

(19.1)

l ¼

where s >  and s >  are parameters that characterize the sensitivity of the choice probability to the valuations of the options and its popularity respectively. ... Valuation Updating We assume that the agent updates their valuation of an option based on their own experience with the option, and that this updating process is independent of the relative popularities of the options. In other words, we “switch off” the possibility for motivated information processing that would make the agent interpret differently the same information about a popular or an unpopular alternative. More formally, we assume that the valuation of option k at the beginning of period t, V ik,t is equal to a weighted average of the prior valuation and the most recent payoff x ik,t if it was selected in period t : V ik,t ¼ ð  bÞV ik,t þ bx ik,t ,

(19.2)

where b 2 ½,  is the weight of the payoff and x ik,t is a random draw from a binomial distribution with probability pk . The valuations of options that were not selected do not change. Prior research has shown that the combination of a logistic choice rule and the delta rule for valuation updating provides a good fit to experimental data on sequential choice under uncertainty (Denrell, ). .. Illustration in a Setting with Two Agents and Two Options To illustrate how the model works, we consider the case of two agents (“A” and “B”) learning about two uncertain options (e.g., an “independent Catalonia” or a “Catalonia as a part of Spain”) with positive variance and unknown means u and u . See Figure .a. The probability agent A selects option  in period t is: A

P A,t ¼

e s V ,t e

s  V A,t þ V B,t

þ s  V B,t

þe

s  V A,t þ s  V B,t

¼

 þ

A B e s ΔV t s ΔV t

(19.3)

where ΔV At ¼ V A,t  V A,t is A’s “opinion.” We define B’s opinion similarly. Agent A is more likely to sample option  when their opinion favors

https://doi.org/10.1017/9781009002042.024 Published online by Cambridge University Press

Opinion Homogenization and Polarization



option . A is also more likely to sample option  when B’s opinion favors that option. An important feature of this choice rule is that the opinions of the two agents are compensatory in the sense that even if A’s opinion is against option , A might nevertheless be likely to sample that option if B’s opinion strongly favors option . To see this, note that the probability that A samples option  in period t is higher than . whenever ΔV B,t >  ss ΔV A,t In the special case where B’s opinion does not affect A’s sampling probability (s ¼ ), this model becomes the basic “Hot Stove model” analyzed in Chapter  in this volume. In this case, the learning process of agent A is characterized by a systematic asymmetry in error correction: Errors of underestimation are less likely to be corrected than errors of overestimation. Morespecifically, suppose A underestimates the value of  option  V A,t < u . A becomes unlikely to sample option . By avoiding option , A is unlikely to obtain additional information that could help them correct this error of underestimation.Compare this to the case where A overestimates the value of option  V A,t > u . A is likely to sample option  again. By doing so, A obtains additional information about option  that can lead to a correction of this error of overestimation. This asymmetry in error correction implies that in every period after the first period, A has a probability lower than  percent of sampling option . Moreover, the expected valuation of option  A  is lower than the mean of option : E V ,t < u . See Chapter  in this volume for details. Now consider the case where A’s sampling probability depends not only on A’s opinion but also on B’s opinion (s > ). Denrell and Le Mens () have shown that, in this case, the valuations of option  by agents A and B become positively correlated. The same happens for option . The reason for this emergent correlation is that the compensatory nature of the choice rule influences the joint pattern of errors of underestimation and overestimation implied by the Hot Stove Effect: suppose agent A underestimates the value of option . A might correct this error only if A samples option  again. When does this happen? This happens when agent B values option  positively. Hence, upward corrections of underestimation errors by agent A tend to happen when agent B has a positive valuation of that option. Suppose that agent B also underestimates the value of option . In this case, both agents tend to sample option  and the joint underestimation of option  will persist. All-in-all, this dynamic implies that the valuations of the two options become correlated, as do the opinions (ΔV At and ΔV Bt ).

https://doi.org/10.1017/9781009002042.024 Published online by Cambridge University Press



 , e¨   

As an illustration, suppose that the payoff distributions of the two options are Normal with mean  and variance , that s ¼ s and that b ¼ :. Simulations show that the correlation between the valuations of option  by agents A and B is initially  and increases over time. After  periods it is close to .. When s ¼ s  ¼ s, Proposition  in Denrell and Le Mens (, p. ) provides an explicit formula for the asymptotic correlation (the correlation after a very large number of periods):   corr V Ak , V Bk ¼

 Þ  þ  ðsb  σ b

> :

(19.4)

This formula shows that the correlation is larger when the weight of the more recent observation (b) is larger and option choice depends more strongly on the valuations (s is larger). Moreover, the correlation is equal to  if the payoffs are certain (σ ¼ ) and larger if the payoffs are more variable (σ is large). In this case, each observation is a noisy signal of the mean of the payoff distribution of the sampled option, and estimation errors can be high. What does this imply for the opinions and the preferences of the two agents? We will say that when the opinion of agent A favors option , ΔV At > , agent A “prefers” option . Simulations show that both opinions and preferences become more similar over time. Initially, the opinions of the two agents are uncorrelated, and the probability of consensus (that they have the same preference) is .. After  periods, the correlation between the opinions is . and the consensus probability is .. It is important to note that this model implies an emergent homogeneity in opinions and a convergence in preferences only if () the probability that agent i samples an option depends both on their valuations and the valuations of the other agent. If the choice probability depends only on the focal agent valuation (s ¼ ), the two agents are subject to the Hot Stove Effect, but there is no interaction between them and the opinions remain uncorrelated. If the choice probability depends only on the other agent valuation (s  ¼ ), homogenization will not happen either. This is because, in this case, the influence of the other agent on the focal agent’s opportunities for error corrections is independent of the focal agent’s valuations. In this case as well, the opinions remain uncorrelated. 

Results are based on , simulations of the model.

https://doi.org/10.1017/9781009002042.024 Published online by Cambridge University Press

Opinion Homogenization and Polarization



. The “Maximal Argument Strength” Model This model is inspired by early work on how argument exchanges can lead to opinion extremization and polarization. Persuasive Argument Theory (PAT, Burnstein & Vinokur, ) proposes that people are most convinced by arguments they find unique and novel and that during deliberation more unique arguments for the position that is more popular in the group are generated. These two phenomena could explain why argument exchange leads to a tendency for the preferences of the members of the group to converge toward one position. The model we describe in this section provides a formal rendition of this intuition that we then use to examine how this mechanism could lead to polarization in a structured network. (See Mäs and Flache () for a similar model.) ..

Model

We analyze the dynamics of valuations in a population of N agents connected in a network. The agents update their evaluations of two options based on the arguments they receive from others. In each period, one agent i is randomly drawn. This agent samples arguments in support of the two options from their network neighbors. When prompted, a neighbor provides an argument in support of the option they prefer. After collecting the arguments, agent i updates their valuations of the two options based on the strongest arguments they heard in support of each option. ... Initial Valuations For all agents, the initial valuation of option k consists in one random draw from a uniform distribution: V ik,  U ð, Þ. ... Argument Generation Suppose agent i samples arguments supporting each option from their network neighbors. We implement the assumption that agents try to persuade others of their positions by assuming they only generate arguments in support of their preferred option. Consider option k. The agents who generate arguments for option k are all the agents that are the neighbors of agent i for which option k is the preferred option. Therefore, if ηik,t is the set of i’s neighbors who prefer option k, agent i will sample jηik,t j arguments in favor of this option from their neighbors.

https://doi.org/10.1017/9781009002042.024 Published online by Cambridge University Press



 , e¨   

We also assume that i generates one argument of their own. Each argument is a random draw from the uniform U ð, Þ distribution. This captures the possibility that although the agent’s valuation of an option is very high, the argument they provide can be perceived as weak by agent i. Consistent with Persuasive Argument Theory, we assume that i is most influenced by the most “persuasive” argument in support of an option, defined as the argument with the largest value on the [, ] interval. More formally, the strength of the persuasive argument for option k is: i k,t

j

¼ max ak,t , i

(19.5)

j2ηk,t

j

where ak,t  U ð, Þ. ... Valuation Updating The valuation updating rule differs from the one used in the Asymmetric Hot Stove model. Here, the agent does not make choices that affect the information they sample. Instead, the agent updates their valuations of all available options based on the strength of the most persuasive argument sampled for each option, using the delta rule: V ik,t ¼ ð  bÞV ik,t þ b

i k,t ,

(19.6)

where b 2 ½,  is the weight of the new argument. .. Illustration in a Setting with Two Agents and Two Options Consider again a simple network with just two agents (“A” and “B”). We first consider the case where, initially, the two agents both prefer option , without loss of generality (ΔV At >  and ΔV Bt > ). Suppose agent A is the focal agent in period . A samples two arguments about their preferred option (option ): the argument they generate and the argument B generates. By contrast, A samples just one argument about their least preferred option (option ). Because the strength of the relevant argument depends on sample size (the maximum of two independent realizations of a random variable tends to be higher than one realization), it is likely that the persuasive argument for option  is stronger than the persuasive argument for option : A, > A, . There is a / probability that this is the case. Because option valuation at the beginning of period 

This is mostly motivated by technical considerations regarding the simulation of what happens when alternative k is preferred by none of the agents.

https://doi.org/10.1017/9781009002042.024 Published online by Cambridge University Press

Opinion Homogenization and Polarization



 is the weighted average of valuation at the beginning of the period and of the strength of the persuasive argument, there is at least a / probability that, at the beginning of period , agent A prefers option . Suppose that in period , the focal agent is B. Since A prefers option , agent B will sample two arguments in favor of option  and one argument in favor of option . By the same logic as that applied to A, B is likely to keep preferring option . Now, consider the case where the two agents have different initial preferences. Without loss of generality, we assume that A prefers option  and B prefers option . Suppose that A is the focal agent in period . B’s preference implies that B will provide an argument for option  that will be considered together with the one argument for each option generated by A. By the logic outlined above, this will lead to a likely increase in the evaluation of option  for A. There are two possible scenarios. In the first scenario, the increase leads to a change in A’s preferences in favor of option . This leaves us in the situation described in the previous paragraph. In the second scenario, A’s preference does not change, yet with a / probability, A’s opinion will become less unfavorable to option  than before. In the next period, B samples arguments. B’s opinion will likely shift (to some extent at least) toward option . Then the same question can be asked about B: did this update lead to a preference reversal in favor of option ? Eventually, the preference of one of the agents will flip and both agents will prefer the same option. The reinforcing dynamics described above will lead to a stochastically stable agreement among the agents. Just as the Asymmetric Hot Stove model, this model does not lead to a deterministic “lock-in.” In each period, the agents update their valuations of the options based on arguments that could lead to a preference reversal. When the weight of new arguments (b) is low, convergence becomes quite stable. Simulations of the model with b ¼ : show that, after  periods, the consensus probability is . and the correlation between the agents’ opinions is .. By contrast, if the weight of new arguments is high, opinion homogenization is milder and preference convergence less likely. With b ¼ :, after  periods, the consensus probability is . and the correlation between the agents’ opinions is .. It is noteworthy, however, that preference convergence tends to be much stronger than with the Asymmetric Hot Stove model.

. The “Feedback Sampling” Model This model relies on two central assumptions. First, when deciding among options, agents seek positive feedback (Thorndike, ). In the social

https://doi.org/10.1017/9781009002042.024 Published online by Cambridge University Press



 , e¨   

media setting, this assumption is consistent with recent evidence, collected by Facebook and LinkedIn, about the reasons users post content on social media (Chen et al., ; Eckles et al., ). For example, researchers from LinkedIn experimented with manipulating the newsfeed display order to increase the visibility of content shared by users who were not yet decided if they wanted to continue posting on the platform (Chen et al., ). The users whose posts were promoted in their friends’ newsfeeds obtained more feedback and remained more engaged on the platform. Second, agents give positive feedback to choices they also like. They give positive feedback to messages on issues they care about, or to opinion statements aligned with their own opinions. This assumption is consistent with existing evidence that users of social media give more “likes” to like-minded content (Garz et al., ). .. Model We analyze the dynamics of opinions in a population of N agents connected in a network who make a series of choices between a fixed set of K options and update their valuations of the options based on the feedback they receive from the agents to which they are connected (their network “neighbors”). In each period, one agent i is randomly drawn. This agent selects an option and receives feedback from one other agent j, randomly drawn among their network neighbors. ... Initial Valuations For all agents, the initial valuation of option k consists in one random draw from a uniform distribution: V ik,  U ð, Þ. ... Valuation Updating The valuation updating rule is the same as in the Asymmetric Hot Stove model (Equation .) except for the fact that it is updated based on the feedback F ik,t 2 f, g. The valuations of options that were not selected do not change. ... Choice Rule At each period, an agent i is randomly drawn from the population. Let P ik,t denote the likelihood that the agent selects option k in period t. We implement the assumption that agents seek positive feedback by assuming

https://doi.org/10.1017/9781009002042.024 Published online by Cambridge University Press

Opinion Homogenization and Polarization



that P ik,t increases with V ik,t . More precisely, we assume a logistic choice rule: i

P ik,t ¼

e sV k,t : K X sV il ,t e

(19.7)

l ¼

where s >  is a parameter that characterizes the sensitivity of the choice probability to the valuations of the options. When s is large, the agent almost always selects the option with the highest valuation. When s is close to , choice is almost random. ... Feedback The feedback giver, j, is a random network neighbor of agent i – an agent connected to i. It is important to note that agents who are not connected to i do not provide any feedback. The feedback giver is more likely to provide a “like” when they value highly the option chosen by agent i. We implement this assumption by assuming that the probability of positive feedback is a logistic function of the option valuations by the feedback giver, j: j

j ϕk,t

e λV k,t ¼ K , P λV j e l ,t

(19.8)

l ¼

where k is the option chosen by agent i and λ >  is a parameter that characterizes the sensitivity of the feedback probability to the valuations of the alternatives. When λ is large, the feedback giver is very likely to give a “like” when they like the chosen option and very unlikely to give a like when they do not like it. When λ is close to , feedback is almost random. ..

Illustration in a Setting with Two Agents and Two Options

To understand how the model can give rise to opinion homogeneity, we first consider the case of two agents (“A” and “B”) learning about two uncertain options. For simplicity, we assume an extreme choice rule (the agent always selects their preferred option, s ¼ ∞) and an extreme feedback rule (the feedback giver j gives a “like” to the focal agent i if i selected the option j values the most, and does not give a “like” otherwise, λ ¼ ∞). Note that the valuations are between  and . This is because the initial

https://doi.org/10.1017/9781009002042.024 Published online by Cambridge University Press



 , e¨   

valuation is assumed to be a random draw in this interval and the valuation updating rule (Equation .) implies that the valuation of an option gets closer to  or  as a function of the observed feedback (each feedback instance is equal to  or ). Suppose that, initially, there exists a consensus such that both agents prefer option . Suppose, moreover, that A is drawn to make a choice in period . The preferences of A and B imply that A selects option , and B gives A a “like.” Agent A’s valuation for option  thus increases and becomes closer to . Suppose, without loss of generality, that B makes the period  choice. B prefers option  and thus selects option . Because A prefers option , A gives B a “like.” Hence, B’s valuation for option  increases. After the first two periods, both agents still prefer option . A recursive argument implies that this pattern persists in every period: both agents prefer option  in every period. A similar dynamic occurs if both agents initially prefer option . In this case, both agents prefer option  in every period. In summary, an initial consensus is persistent in every period. What happens if A initially prefers option  whereas B initially prefers option ? A selects option  and does not get a like (because B prefers option ). Agent A’s valuation of option  thus goes down. If A’s valuation of option  becomes so low that it leads to a preference reversal in favor of option , both agents prefer option  at the beginning of period . There is consensus, and the reasoning of the previous paragraph implies that the consensus persists in all subsequent periods. Initially, there was no consensus, but a consensus emerged in period  and remains in all subsequent periods. Consider now the case where A’s preference does not shift in period . Suppose, without loss of generality, that B makes the choice in period . B prefers option  and thus selects that option. Because A prefers option , A does not give a like to B. If B’s valuation of option  becomes lower than their valuation of option , B’s preference changes in favor of option . Both agents prefer option  at the beginning of period . There is a consensus, and the above reasoning for the case with an initial consensus implies that the consensus is persistent. If B’s preferences do not change, the situation at the beginning of period  is similar to what it was at the beginning of period . When the two agents start without consensus, the only uncertain aspect in the dynamics of valuations and preferences is the period when a consensus first happens. With probability , it will happen at some point, but it is not possible to predict ex ante on which option the agents will converge.

https://doi.org/10.1017/9781009002042.024 Published online by Cambridge University Press

Opinion Homogenization and Polarization



Consistent with the dynamics of preferences, the opinions of the two agents become correlated. They are initially independent, but after  periods, the correlation between the opinions is close to .. When the choice rule and the feedback rules are probabilistic (s and λ are finite) similar dynamics of opinion homogenization and preference convergence unfolds. Even though the persistence of a consensus is no longer deterministic, simulations show that as soon as feedback givers are somewhat discriminant in the way they give feedback (λ is not close to ), then the consensus probability after  periods is high (e.g., with b ¼ ., s ¼ , this is equal to . with λ ¼  and to . with λ ¼ ). When choice is random (s ¼ ), homogenization and convergence are a bit weaker than in the previous case but remain very strong (e.g., with b ¼ ., s ¼ , the consensus probability is equal to . with λ ¼  and to . with λ ¼ ). Finally, opinion homogenization and preference convergence remain high even if the belief updating weight is low (e.g., with b ¼ ., s ¼ , the consensus probability is . with λ ¼  and to . with λ ¼ ).

. Local Opinion Homogenization and Between-Group Polarization To explore the possibility of homogenization of opinions in densely connected social networks and between-group polarization, we analyze a two-group network of ten agents that belong to groups of five agents each ({A,B,C,D,E} and {F,G,H,I,J}, see Figure ., b–d). We define a group as a set of agents that are densely connected to each other but weakly connected to the rest of the network. In our simulations, we assume that the five agents of each group are connected to all other members of the group. We vary the density of connections between the groups by considering the cases with  links,  link, and  links. We simulated the dynamics of valuations over , periods (about  choices by each agent). Numerical estimates are based on , simulations of the models with the same baseline parameters as in the previous section. For all three models, the valuations become correlated and preferences converge within-group. When the two groups are disconnected, the within-group dynamics are independent from each other. This implies that whenever consensus emerges in the two groups, it happens on the same option (global consensus) about  percent of the time and toward different options (complete polarization) about  percent of the time. An important insight resulting from these simulations is that the process of within-group convergence is

https://doi.org/10.1017/9781009002042.024 Published online by Cambridge University Press



 , e¨   

sufficient to create an overall tendency for the groups to become more distinct in terms of preferences – preferences become more polarized. This is because we did not assume there was some “repulsive” forces between agents of the two groups (we do so in the next section). To quantify polarization, we denote by (A ð(B Þ the proportion of agents who prefer option  in group A (group B). We are interested in the between-group gap in preferences Δ( ¼j (A  (B j. This gap is initially equal, on average, to .. It grows over time to become close to . with all three models (it is a bit lower at . with the Asymmetric Hot Stove model, which is not surprising given within-group consensus is less likely to emerge with this model). Overall, the process of within-group homogenization thus leads to a general tendency for the groups to become more dissimilar. When the preferences of the members of the two groups start to converge on different options, it is unlikely that global consensus will be achieved. This is because social influence operates via network contacts who belong to the same group – it is local. A useful metric is the “average local support.” For each agent, we compute the proportion of their network contacts that have the same preference as them. We call this the “local support” of the preference of the focal agent. We then average the local support across all agents in the network to obtain the “average local support.” Because initial valuations are random and independent for each agent, the average local support is initially equal to .. Simulations reveal that it increases quickly. After , periods, the mean average local support is higher than . for all models and essentially  for the Feedback Sampling and Maximal Argument Strength models. This means that agents are surrounded by other agents with similar preferences to them. This is the case even when there is no within-group consensus. The fact that local support is so high implies that if the two groups tend to favor distinct options at some point, it is extremely unlikely that global consensus will ever happen. With some between-group links, the connections make it more likely (than without any connection) that, early on in the process, both groups happen to have a majority of agents who prefer the same option. When this happens, the reinforcing dynamics that mostly operate within the group imply that both groups will tend to converge toward the same option. In other words, the probability of global consensus increases, and the tendency toward polarization decreases as compared to the “disconnected” group case.

https://doi.org/10.1017/9781009002042.024 Published online by Cambridge University Press

Opinion Homogenization and Polarization



The initial “coupling” of the within-group dynamics is stronger when there are more between-group connections. Accordingly, for all three models, the probability of global consensus (all agents preferring the same option) is higher with one between-group link than with no link, and it is still higher with five between-group links. The consensus probabilities for the three models and // links are ././. (Asymmetric Hot Stove), ././. (Maximal Argument Strength), ././. (Feedback Sampling). Similarly, the between preference gap becomes lower with more links. The gaps for the three models and // links are ././ . (Asymmetric Hot Stove), ././. (Maximal Argument Strength), ././. (Feedback Sampling). Finally, it is worth noting that even when within-group consensus is not achieved, some level of local convergence does occur. To characterize this, it is useful to compute the average local support over the simulation runs for which global consensus did not emerge. Even in these cases, the average local support is higher than . with all three sampling models and  to  between-group links. This means that agents are connected to twice as many other agents with the same preference as them as compared to agents with different preferences. In other words, agents become surrounded by “like-minded” others. ..

Local Opinion Homogenization and Polarization in a Network of Two Groups with Distinct Identities

The assumptions of our sampling models regarding the nature of social interactions between network neighbors lead to opinion homogenization among densely connected agents. Interactions do not have to contribute to homogenization, however. Conflicts are an important part of human social interaction and can result in aggression, especially when group identity is involved (Densley & Peterson, ). Models of “repulsive” social influence have recognized this possibility by allowing encounters with someone an agent disagrees with to result in even larger opinion differences (Baldassarri & Bearman, ; Flache & Macy, ). The main psychological process proposed as the source of repulsive influence is the desire of people to differentiate themselves from dissimilar or disliked others (Brewer, ). Others can be disliked because they have different options from the agents (Rosenbaum, ) or because they belong to a group with a different identity – an “out-group” (Tajfel, Billig & Bundy et al., ). In this section, we incorporate these ideas into each of the three

https://doi.org/10.1017/9781009002042.024 Published online by Cambridge University Press



 , e¨   

sampling models. We assume there are two groups in the network (see Figure .e). We denote the group to which the focal agent i belongs as the “in-group” and the group i does not belong to as the “out-group.” Agents interact differently with members of the “in-group” than with members of the “out-group.” ... Asymmetric Hot Stove Model The choice of option by the focal agent is influenced differently by members of the two groups. Agent i tends to select options that are popular in the in-group but that are also unpopular in the out-group. In other words, the agent tries to avoid options that are popular in the outgroup. This could happen when agents see their actions as identity signals, and want to preserve a distinct identity (e.g., Brewer, ). We implement these assumptions by assuming that P ik,t depends on the mean i evaluation of option k by i’s neighbors who are in the in-group V k,t,in and on the mean evaluation of by i’s neighbors who are in the out-group i (V k,t,out ) as follows: i

i

e sV k,t þsin V k,t,in sout V k,t,out ¼ K , P sV i þsin V i sout V i l ,t,in l ,t,out e l ,t i

P ik,t

(19.9)

l ¼

where s > , s in > , and sout >  are parameters that characterize the sensitivity of the choice probability to the valuations of the options and its popularity among in-group and out-group neighbors respectively. In the simulations reported below, we take s ¼ sin ¼ , s out ¼  and b ¼ .. ... Maximal Argument Strength Model Here we consider the cases where there is some “mistrust” between the members of the groups, maybe because they have a competitive relationship. Consider an agent i who evaluates an option k based on arguments produced by in-group and out-group members. Let ik,t,in denote the most persuasive argument from the in-group and ik,t,out be the most distinctive argument produced by out-group members about option k. We assume that arguments produced by the out-group influence the agent negatively because the focal agent suspects that the out-group member is trying to mislead them by engaging in strategic behavior. This assumption is also consistent with research that has documented a backfiring effect of

https://doi.org/10.1017/9781009002042.024 Published online by Cambridge University Press

Opinion Homogenization and Polarization



exposure to groups of distinct identities (Nyhan & Reifler, ; Taber & Lodge, ). The valuation of the available options is updated as follows: V ik,t ¼ ð  bin þ bout ÞV ik,t þ bin

where bin 2 ½,  is the updating group members and bout 2 ½,  produced by out-group members. take s = , bin ¼ : and that bout

i k,t,in

 bout

i k,t,out ,

(19.10)

weight for arguments produced by inis the updating weight for arguments In the simulations reported below, we ¼ :.

... Feedback Sampling Model This model can also be adapted to reflect situations of strategic behavior. There are two possible approaches to capture this kind of setting: focusing on how the feedback recipient interprets feedback, or focusing on how the feedback givers provide feedback. In the former case, we can adopt an approach similar to that used to adapt the Maximal Argument Strength Model. Because the agent wants to be distinct from the out-group and interprets feedback as a trustful signal of appraisal, they will decrease their evaluations of options that are endorsed by the out-group – they give negative weight to feedback by members of the out-group. In the latter case, we assume that feedback givers try to influence members of the other group to select options different from them. If they believe the feedback recipients will interpret their feedback at “face-value,” they will simply give a “like” when the other agents select their least preferred alternative. To implement this in our model, we assume that the feedback rule is the same as before (Equation .) when the feedback giver is from the same community as the focal agent and that feedback is “flipped” when they are from a different community: they give a “like” whenever they would not have given a “like” to an agent from their community, and they fail to give a “like” whenever they would have given a “like.” In the simulations reported below, we implement the second approach and take b ¼ :, s ¼  and λ ¼ . ... Results We simulated the sampling models for , periods in a setting with five between-group links. With all three models, the opinions of agents in the 



It would also be possible to consider the case where feedback recipients suspect the out-group members to give feedback that does not reflect their true preferences, but this opens the door to complications associated to multilevel reasoning in games. This is most realistic in settings where feedback recipients do not easily know the identity of the feedback givers, for example with “likes” on social media posts.

https://doi.org/10.1017/9781009002042.024 Published online by Cambridge University Press



 , e¨   

same group become positively correlated (similar to what was obtained without distinct identities), whereas the opinions of agents of the two groups now become negatively correlated (they were uncorrelated or positively correlated in the previous simulation set). What changes in comparison to the setting without identities is that the probability of a global consensus becomes much lower (Asymmetric Hot Stove: . ! ; Maximal Argument Strength: . ! .; Feedback Sampling: . ! :). Relatedly, the gap in preferences between the two groups (Δ() is much higher than in the setting without identity and becomes high in all cases (Asymmetric Hot Stove: . ! .; Maximal Argument Strength: . ! .; Feedback Sampling: . !.). The most interesting finding from these simulations is obtained with versions of the model where only negative influence occurs (Asymmetric Hot Stove: s in ¼ ; Maximal Argument Strength: bin ¼ ; Feedback sampling: no feedback to agents in the same group). In this case, the opinions of agents that belong to different groups become negatively correlated, but there is no clear tendency for polarization (the probability that the majorities of the two groups prefer different options is lower than . for all three models). This demonstrates that within-group homogenization is a necessary component for polarization to emerge with these sampling models.

. Implications Consider again Florian, the German exchange student in Barcelona. As we have shown throughout the chapter, the fact that his classmates are proindependence makes him less likely to sample anti-independence information, more likely to sample more persuasive arguments about the proindependence position and to be rewarded by his classmates when he expresses a pro-independence stance. All this will likely lead Florian to speak in favor of an independent Catalonia to his friends back in Germany, believing he is sharing the dominant view. Our findings about the strength of “local support” produced by the three models show that it is possible that Florian’s opinion is not as widely shared as he might think. Florian’s opinion could be largely unpopular in the general population even though it is popular among Florian’s direct contacts. This inconsistency between local and global support is more likely to emerge if the group to which Florian belongs is somewhat disconnected from the rest of the population (Galesic et al., ,

https://doi.org/10.1017/9781009002042.024 Published online by Cambridge University Press

Opinion Homogenization and Polarization



). When this happens, Florian’s opinion might even become extreme. Our numerical illustrations relied on simplistic model of opinion extremization in which there were just two choice alternatives. It is easy, however, to extend the models to a setting with more alternatives. Such extension would provide a sampling explanation for how an extreme opinion can emerge in an isolated group in a network and remain stable over time. Such groups could consist in online forums and private groups that, as Sunstein argues (Sunstein, ), become sources of extreme views and conspiracy theories. A timely example consists of the anti-vaccine movement, which seems to have had negative effects on the Covid- vaccination campaigns in several countries. In addition to suggesting that simple mechanisms of information sampling could contribute to the emergence of polarized and extreme opinions, the sampling approach discussed in this chapter offers a different perspective on how society can limit polarization. Existing theories of polarization are largely based on mechanisms that invoke motivated information processing. These imply that, to correct the tendency for local opinion homogeneity, one needs to correct how information is processed by the individual. The sampling approach, by contrast, implies that the correction should focus on the nature of the information sampled by agents, before any processing by their minds happens. As our simulations show, the connectivity of the network has a crucial influence on the level of opinion polarization. The sampling approach, thus, suggests a seemingly simple solution: increase the diversity of information to which people are exposed. The models discussed in this chapter offer specific suggestions regarding how sampling diversity can be increased. Because the Asymmetric Hot Stove and Maximal Argument Strength models are concerned with sampling of information through newsfeeds, they suggest that opinion polarization can be reduced by simply expanding the user’s newsfeed and set of connections (which influences the information sent onto the newsfeed) to be more diverse. The Feedback Sampling model suggests that a possible way to combat polarization is to reduce feedback’s visibility and promote discussion in the comments where users can engage in a more argument-based conversation. Even though they are ostensibly simple, these potential solutions would be difficult to implement in practice. First, most social media platforms provide limited control to their users over the information that reaches them via their newsfeed and over the nature of the feedback provided to them. Moreover, the newsfeed algorithms are often proprietary. Second,

https://doi.org/10.1017/9781009002042.024 Published online by Cambridge University Press



 , e¨   

more diverse information sampling might go against an individual’s hedonic goals: the tendency to seek positive experiences. Being challenged about one’s opinions or receiving negative feedback are rarely positive experiences. The threat of negative experience could thus discourage individuals from seeking to broaden their information sampling, and they might stop using the social media platform altogether. This, in turn, encourages social media companies to design information environments that maximize the hedonic quality of user experiences. This results in curated newsfeeds that are skewed toward the arguments popular among network contacts and asymmetric feedback structures where expressing support is much easier than disagreement. Facebook has experimented with the approach that consists in making different perspectives on an issue easier to sample by creating “related articles.” Yet, making information easy to sample does not necessarily imply that people will sample it, especially if their identity affects their sampling behavior. As shown by our simulations of models with visible group identities, identity exacerbates the effects of information sampling and increases the probability of polarization. More generally, the resolution of the tension between the frequently hedonic goals of individuals and society’s need for informed citizens remains a formidable challenge for public policy. Even though we mostly discussed the effects of information sampling on the public, the models in this chapter have implications for understanding the behavior of politicians. Imagine a politician who is considering new legislation. The sampling models in this chapter propose that the composition of politicians’ social circles affects the information that politicians sample to inform their opinion about the policy. Research shows that politicians are more likely to interact with fellow party members and supporters that are on the same side of the political spectrum (Barberá et al., ). Then, according to the “Feedback Sampling model,” a politician would express support for the policy when it is received well by their social contacts. This is exactly what Schöll et al. () found by analyzing the behavior of Spanish politicians on Twitter: they tended to post more about topics that previously received a lot of likes and retweets. In addition to social media feedback, other mechanisms can affect politicians’ opinions. A politician will be relatively more exposed to information popular among their social circle and will find it more persuasive. This skew toward information in support of a specific opinion can result in the 

https://about.fb.com/de/news///update-zu-den-wahle

https://doi.org/10.1017/9781009002042.024 Published online by Cambridge University Press

Opinion Homogenization and Polarization



politician believing that the policy has stronger public support than it actually has. Therefore, these sampling models contribute to explaining why politicians misperceive public opinion (Broockman & Skovron, ), are more sensitive to the influence of the wealthy (Gilens & Page, ), and sometime support policies that contradict public consensus. REF ERE NCE S Baldassarri, D., & Bearman, P. (). Dynamics of political polarization. American Sociological Review, (), –. doi: ./  Banerjee, A. V. (, ). A simple model of herd behavior. Quarterly Journal of Economics, (), –. https://academic.oup.com/qje/article-lookup/ doi/./ doi:./ Barberá, P., Casas, A., & Nagler, J., et al. (, ). Who leads? Who follows? Measuring issue attention and agenda setting by legislators and the mass public using social media data. American Political Science Review, (), –. https://doi.org/./S doi: ./ S Bikhchandani, S., Hirshleifer, D., & Welch, I. (, ). A theory of fads, fashion, custom, and cultural change as informational cascades. Journal of Political Economy, (), –. https://www.journals.uchicago.edu/ doi/abs/./ doi: ./ Brewer, M. B. (, ). The social self: On being the same and different at the same time. Personality and Social Psychology Bulletin, (), –. doi: ./ Broockman, D. E., & Skovron, C. (, ). Bias in perceptions of public opinion among political elites. American Political Science Review, (), –. https://www.cambridge.org/core/journals/american-political-sci ence-review/article/abs/bias-in doi: ./S Burnstein, E., & Vinokur, A. (). Persuasive argumentation and social comparison as determinants of attitude polarization. Journal of Experimental Social Psychology, (), –. doi: ./-()- Burt, R. S. (). Structural holes: The social structure of competition. Cambridge, MA: Harvard University Press. Chen, G., Chen, B.-C., & Agarwal, D. (). Social incentive optimization in online social networks. In Proceedings of the tenth acm international conference on web search and data mining (pp. –). DeGroot, M. H. (). Reaching a consensus. Journal of the American Statistical Association, (), –. doi: ./.. Denrell, J. (). Why most people disapprove of me: Experience sampling in impression formation. Psychological Review, (). doi: ./X...

https://doi.org/10.1017/9781009002042.024 Published online by Cambridge University Press



 , e¨   

Denrell, J., & Le Mens, G. (). Interdependent sampling and social influence. Psychological Review, (), –. doi: ./-X... (, ). Information sampling, belief synchronization, and collective illusions. Management Science, (), –. http://pubsonline.informs.org/ doi/./mnsc.. doi:./mnsc.. Densley, J., & Peterson, J. (, ). Groups aggression (Vol. ). Amsterdam: Elsevier. doi: ./j.copsyc... Eckles, D., Kizilcec, R. F., & Bakshy, E. (). Estimating peer effects in networks with peer encouragement designs. Proceedings of the National Academy of Sciences, (), –. Flache, A., & Macy, M. W. (). Small worlds and cultural polarization. Journal of Mathematical Sociology, (–), –. doi: ./ X.. Friedkin, N. E. (). Choice shift and group polarization. American Sociological Review, (), . doi: ./ Galesic, M., Olsson, H., & Rieskamp, J. (, ). Social sampling explains apparent biases in judgments of social environments. Psychological Science, (), –. http://journals.sagepub.com/doi/./  doi: ./ (, ). A sampling model of social judgment. Psychological Review, (), –. https://psycnet.apa.org/fulltext/--.html Garz, M., Sood, G., Stone, D. F., & Wallace, J. (). Is there within-outlet demand for media slant? Evidence from US presidential campaign news. Available at SSRN: http://dx.doi.org/./ssrn. Germano, F., Gómez, V., & Le Mens, G. (). The few-get-richer: A surprising consequence of popularity-based rankings. In The web conference : Proceedings of the world wide web conference, www . doi: ./. Gilens, M., & Page, B. I. (, ). Testing theories of American politics: Elites, interest groups, and average citizens. Perspectives on Politics, (), –. www.cambridge.org/core/terms.https://doi.org/./ SDownloadedfromh doi: ./ S Isenberg, D. J. (). Group polarization. A critical review and meta-analysis. Journal of Personality and Social Psychology, (). doi: ./... Iyengar, S., Sood, G., & Lelkes, Y. (). Affect, not ideology: A social identity perspective on polarization. Public Opinion Quarterly  (). doi: ./ poq/nfs Latané, B., Nowak, A., & Liu, J. H. (). Measuring emergent social phenomena: Dynamism, polarization, and clustering as order parameters of social systems. Behavioral Science, (), –. doi: ./bs. Mäs, M., & Flache, A. (). Differentiation without distancing: Explaining bipolarization of opinions without negative influence. PLoS ONE, (), e. doi: ./journal.pone.

https://doi.org/10.1017/9781009002042.024 Published online by Cambridge University Press

Opinion Homogenization and Polarization



McGarty, C., Turner, J. C., Hogg, M. A., David, B., & Wetherell, M. S. (). Group polarization as conformity to the prototypical group member. British Journal of Social Psychology, (), –. doi: ./j.-.. tb.x Moscovici, S., & Zavalloni, M. (). The group as a polarizer of attitudes. Journal of Personality and Social Psychology, (). doi: ./h Myers, D. G., & Lamm, H. (). The group polarization phenomenon. Psychological Bulletin, (). doi: ./-... Nowak, A., Szamrej, J., & Latané, B. (). From private attitude to public opinion: A dynamic theory of social impact. Psychological Review, (). doi: ./-X... Nyhan, B., & Reifler, J. (, ). Does correcting myths about the flu vaccine work? An experimental evaluation of the effects of corrective information. Vaccine, (), –. doi: ./j.vaccine... Pariser, E. (). The filter bubble: What the Internet is hiding from you. London: Penguin. Rosenbaum, M. E. (). The repulsion hypothesis: On the nondevelopment of relationships. Journal of Personality and Social Psychology, (), –. Schöll, N., Gallego, A., & Le Mens, G. (). Politician–citizen interactions and dynamic representation: Evidence from Twitter. Barcelona: Barcelona School of Economics Working Paper no. . Sunstein, C. R. (). #Republic: Divided democracy in the age of social media. Princeton: Princeton University Press. doi: ./ Taber, C. S., & Lodge, M. (, ). Motivated skepticism in the evaluation of political beliefs. American Journal of Political Science, (), –. http:// doi.wiley.com/./j. ...x doi: ./j....x Tajfel, H., Billig, M. G., Bundy, R. P., & Flament, C. (, ). Social categorization and intergroup behaviour. European Journal of Social Psychology, (), –. doi: ./ejsp. Thorndike, E. L. (). The law of effect. American Journal of Psychology, (– ), –. Turner, J. C., Wetherell, M. S., & Hogg, M. A. (). Referent informational influence and group polarization. British Journal of Social Psychology, (), –. doi: ./j.-..tb.x Woiczyk, T. K. A., & Le Mens, G. (). Evaluating categories from experience: The simple averaging heuristic. Journal of Personality and Social Psychology, (), –. https://doi.org/./pspa

https://doi.org/10.1017/9781009002042.024 Published online by Cambridge University Press

https://doi.org/10.1017/9781009002042.024 Published online by Cambridge University Press

 

Computational Approaches

https://doi.org/10.1017/9781009002042.025 Published online by Cambridge University Press

https://doi.org/10.1017/9781009002042.025 Published online by Cambridge University Press

 

An Introduction to Psychologically Plausible Sampling Schemes for Approximating Bayesian Inference Jian-Qiao Zhu, Nick Chater, Pablo León-Villagrá, Jake Spicer, Joakim Sundh, and Adam Sanborn Both natural and built environments are complex, and people often have to operate under great uncertainty about the true state of the world. One major cause of this uncertainty, as discussed in several of the other chapters in this book, is that we often have access to only a small number of samples of information from the environment. These investigations of information sampling often study situations in which this information is either biased or unreliable. Though even when it is both unbiased and reliable, the number of relevant experiences we have available in most tasks is far fewer than the , that Jacob Bernoulli estimated were required to have “moral certainty” about the probability of even a simple binary event (Stigler, ). Fortunately, Bayesian models of cognition provide a principled way for the mind to deal with this uncertainty: stating all hypotheses h, defining a prior probability for these hypotheses pðhÞ, and then updating these beliefs according to the rules of probability theory as information about the environment d becomes available. These updated beliefs are the posterior probabilities pðh j d Þ, and are proportional to the prior multiplied by the likelihood that the data are produced from each hypothesis pðd j hÞ. Interestingly, in many complex domains from language production and commonsense reasoning to vision and motor control and intuitive physics, human behavior corresponds well to this probability calculus (Battaglia et al., ; Chater & Manning, ; Sanborn & Chater, ; Sanborn et al., ; Wolpert, ). If the assumptions of the Bayesian model are correct, there are straightforward ways to incorporate the costs and benefits of each action to decide the best action to take under uncertainty. To enjoy many of the benefits of the Bayesian approach, the brain, and, in fact, any biological and physical machinery, has to calculate the Zhu, León-Villagrá, Spicer, Sundh, and Sanborn were supported by a European Research Council consolidator grant (-SAMPLING).



https://doi.org/10.1017/9781009002042.026 Published online by Cambridge University Press



- ,    .

posterior in realistic amounts of time. Unfortunately, computing exact posterior degrees of belief has been proven computationally difficult and is typically not possible in most real-life applications (Roth, ; van Rooij, ). To give an intuition as to why, we consider the following running example: suppose that I plan to meet up with a friend in London for a bite to eat at a place of their choosing, though they have not yet told me where. I call my friend as I jump into the cab, but they do not answer. Where should I tell the driver to go? I don’t have the time (or money) to ride around London visiting possible locations, so I must make a single decision based on my internal knowledge. Ideally, in order to minimize how far I need to walk once my friend finally answers their phone, I would like to be dropped off at the average location (as opposed to the most likely location) where I expect my friend to be. To calculate this average location exactly, I would have to assign a probability to each establishment that serves food in London, based on my knowledge of my friend. Then, I would multiply the longitude of every establishment by the probability that my friend is at that establishment, sum up the results, and then repeat the calculation for latitude. However, because London has more than , establishments that serve food, calculating average locations in this way is implausibly difficult. There is no doubt that a compromise must be made: for example, one might settle for an approximation of Bayesian inference rather than an exact calculation. One class of algorithms for approximating the posterior, developed in computer science and statistics, is based on Monte Carlo methods. The idea is simple: instead of using the entire posterior distribution, pðh j d Þ, generate a sequence of samples from the posterior distribution, hi  pðh j d Þ, where hi is the ith sample of the hypotheses, and then use them to guide future behavior. These samples are then internal mental samples, rather than samples of information from the environment. While these samples provide a less accurate characterization of the posterior distribution than the probabilities do, they help with the complexity of calculating with that distribution. For example, to calculate the location I should travel to meet my friend, if I’m given a set of sampled locations where my friend might be, then I can simply take the average latitude and longitude of the samples. The accuracy of this estimate increases with the number of samples: while in the limit of an infinite number of samples the accuracy of my calculation would be perfect, an imperfect but still useful 

For simplicity, we assume that I want to minimize only the squared walking distance.

https://doi.org/10.1017/9781009002042.026 Published online by Cambridge University Press

Introduction to Psychologically Plausible Sampling Schemes



answer can still be found using a psychologically realistic sample size, one that is far smaller than the total number of eating establishments in London (Vul et al., ). In this chapter, we review a psychological literature that pursued this line of research and examine the potential link between sampling algorithms and cognitive psychology (summarized in Table . and Figure .), with a more in-depth discussion of how sampling algorithms can explain human behavior given in Chapter .

. Sampling with Global Knowledge .. Direct Sampling The simplest idea in sampling is to draw a random sample from the probability distribution of interest, termed the target distribution, or in our example, the distribution of where my friend might be, directly: hi  pðh j d Þ. Each of the samples are independent of one another and the sample average is also unbiased. This requires global knowledge of the distribution – the knowledge of all states and their probabilities. For the meeting-a-friend example, by analogy, direct sampling simply means drawing samples of many possible locations and taking the empirical average of these samples to approximate my friend’s true average location. We do not need to integrate over all possible places, but can instead rely on a few samples drawn from the distribution of the possible meetup places to get a good estimate of where to travel. ..

Psychological Applications of Direct Sampling

Given a specific input, Bayesian models of cognition are deterministic: each input typically leads to a single response for that input. In twoalternative forced choice where the reward history of both options has been observed, a reward-maximizing agent should always choose the option with the higher chance of reward. This contrasts with the observation that human behavior is almost inevitably noisy, even in tasks in which the stimuli are clear and so there is unlikely to be any sensory noise (Mosteller & Nogee, ; Spicer et al., ). Instead, an extensive empirical literature shows that people “probability match” and choose an option with a frequency proportional to the probability of reward (Vulkan, ). Adding the assumption of direct sampling makes Bayesian models of cognition stochastic, and so offers an explanation for the noise in human behavior without losing the benefits of the normative framework of

https://doi.org/10.1017/9781009002042.026 Published online by Cambridge University Press

Knowledge Required of Probability Distribution

Algorithms

Deviations from Ideal Inference

Global knowledge

Direct sampling

Stochastic behavior

Approximate global knowledge

Importance sampling; Particle filters

Stochastic behavior; Overweighting extreme events; Order effects in updating

Local knowledge

MCMC algorithms: Random Walk Metropolis; Gibbs sampling; Metropolis-coupled Markov chain Monte Carlo

Stochastic behavior; Framing effects; Autocorrelated behavior



https://doi.org/10.1017/9781009002042.026 Published online by Cambridge University Press

Table .. Sampling algorithms and their statistical and psychological implications.

Example Applications Probability matching behaviors (Vul et al., ); Exploration–exploitation tradeoff (Gershman, ; Speekenbrink & Konstantinidis, ) Overweighting of extreme events and the four-fold pattern of risky preference (Lieder et al., ; Nobandegani et al., ); Reproductions from memory of perceptual stimuli, and predictions about the duration of real-life events (Shi et al., ); Serial dependence and working memory capacity in category learning (Lloyd et al., ; Sanborn et al., ); Order effects in human causal learning (Abbott & Griffiths, ); Classical conditioning in animal behaviors (Daw & Courville, ; Gershman et al., ); Decision making in changing environments (Brown & Steyvers, ; Yi et al., ). Bistable perception (Gershman et al., ); Anchoring bias (Lieder et al., ); Biases in probability judgments (Dasgupta et al., ; Sanborn & Chater, ); Autocorrelation in human causal learning (Bramley et al., ); /f noise and Lévy flight in repeated estimation and memory retrieval (Zhu et al., ).

Introduction to Psychologically Plausible Sampling Schemes



Approximate Bayesian inference

Direct sampling

Sampling approximation

Markov Chain Monte Carlo Importance sampling

Particle lter

Gibbs sampling

MetropolisHastings

Random Walk Metropolis

Metropoliscoupled MCMC

Figure . A “family tree” of sampling algorithms where parent nodes represent more generalized concepts for specific algorithms in the leaf nodes. Circled algorithms require global or approximate global knowledge while squared ones require local knowledge.

Bayesian reasoning. When decisions are carried out based on one or a few samples, stochastic behavior is expected to occur because samples are randomly generated. The matching law fits nicely with the sampling view of decision making, especially when making decisions based on a single sample (Vul et al., ). Further assuming people draw one sample from past trials and act optimally toward the sample, the predicted choice pattern of one option should match the probability of reward of the option: direct sampling adds noise but does not add any bias (at least beyond any bias in the distribution it draws samples from). Indeed, direct sampling is also widely used in non-Bayesian psychological models; most stochastic models of human behavior tend to use direct sampling by default. For example, the drift diffusion model of choice and

https://doi.org/10.1017/9781009002042.026 Published online by Cambridge University Press



- ,    .

response time supposes that decision making is a process of evidence accumulation until a threshold is reached. Each piece of evidence is typically assumed to be independent of the last and is directly sampled from memory or the environment (Blurton et al., ; Nosofsky, ; Pleskac & Busemeyer, ; Ratcliff & Rouder, ; Shadlen & Shohamy, ; Usher & McClelland, ). Global memory matching models assume that recalled items are directly sampled from a distribution over items in memory (Brown et al., ; Raaijmakers & Shiffrin, ), while influential models of categorization and many other tasks assume that responses are directly sampled from a distribution of responses (Nosofsky, ).

. Sampling with Approximate Global Knowledge There remain, however, some psychologically implausible aspects of direct sampling from Bayesian models. Aside from the issue of generating samples that are independent and unbiased draws from a distribution (which people are notoriously poor at anyway; Wagenaar, ), there is the issue of obtaining and representing global knowledge: knowledge of all of the states and their probabilities. Without these probabilities, direct sampling cannot be implemented. Returning to our example, let’s say that while trying to decide what destination to tell my cab driver in London, I receive a somewhat ambiguous text message from my friend saying that they are hungry for a curry. This data (i.e., new piece of information from my friend), d , allows me to update my probabilities as to where I should tell the cab driver to go. But there is a tractability issue that arises from Bayes’ rule itself. When updating a probability distribution with new information, the posterior probability of each hypothesis is equal to its likelihood, pðd jhÞ, multiplied by its prior, pðhÞ, divided by a proportionality constant or partition function Z that also needs to be calculated: pðhjd Þ ¼

pðd jhÞpðhÞ p∗ ðhjd Þ ¼ Z Z

ð20:1Þ

This partition function Z is the sum of each and every hypothesis multiplied by its likelihood: Z ¼

X

pðhjd ÞpðhÞ

(20.2)

h

This means that while it is relatively easy to calculate a value p∗ ðhjd Þ ¼ pðd jhÞpðhÞ that is proportional to the posterior probability of a single hypothesis, it is much more computationally demanding to determine the constant Z needed to find the exact value of the posterior

https://doi.org/10.1017/9781009002042.026 Published online by Cambridge University Press

Introduction to Psychologically Plausible Sampling Schemes



probability. And, because the average longitude and average latitude require multiplying each longitude and latitude by its probability (whether or not sampling is used), knowing only p∗ ðhjd Þ instead of pðhjd Þ is not enough – it means that my answer will be off by Z , which is an unknown constant. If, as before, I wish to avoid summing across the more than , eating establishments in London, a new approximation approach is much needed. One way to do so is to utilize approximate global knowledge of the posterior probability distribution: starting with a rough idea of the posterior that is refined by calculating values p∗ ðhjd Þ proportional to the posterior distribution for only a sample of the hypotheses, rather than for every hypothesis as calculating Z requires. However, unlike direct sampling, for small sample sizes this method can introduce biases which depend on the particulars of the sampling algorithm. We next introduce a key example of a method which draws samples with such approximate global knowledge of the posterior, importance sampling. .. Importance Sampling If directly sampling from pðhjd Þ is computationally daunting except for the simplest toy examples, one may consider an alternative sampling strategy that first draws samples from conventional and simple distributions (e.g., a Gaussian distribution) and later adjusts these samples to align with the target distribution. Indeed, this is the key idea behind the importance sampling algorithm. This method draws samples from a simpler distribution qðhÞ (also known as the proposal distribution), and then reweights them in reference to the target posterior distribution pðhjd Þ, 



For discrete hypotheses, Z lies between zero and one, while for continuous hypotheses Z can take any value greater than zero. Another intuition about the importance of the partition function is to think about calculating the chances of winning a neighborhood lottery in which the ticket sellers have gone door to door. You will know how many tickets you yourself have purchased, but your chances of winning will depend on an unknown constant: the total number of tickets sold. Another intuition to understand why it could be hopeless to do direct sampling for complex problems can be drawn from the history and evolution of scientific theories. In science, deciding how probable a theory is when new data arrives requires integration over all possible theories. It is, however, an almost impossible job to imagine all possible theories. Furthermore, if we chose theories at random, the likelihood of the theory predicting the new data would almost always be zero. Instead, scientists make local adjustments to theories with only occasional breakthroughs (most dramatically in Kuhn [] called paradigm shifts). On explaining how light bends around heavy objects, for example, it is impossible to find a high likelihood account before Einstein’s theory of relativity or by integrating over all possible Newtonian theories of physics. Normalization is hard because it requires a comprehensive survey of all possible theories.

https://doi.org/10.1017/9781009002042.026 Published online by Cambridge University Press



- ,    .

correcting for the difference between the two distributions. As we shall soon see, the reweighting scheme (i.e., the correction for the difference) can be done with unnormalized probabilities. Therefore, the partition function, which requires summation over the all possible hypotheses, is no longer needed for importance sampling. In general, to make samples from q more representative of p, samples from more probable states in p and less probable states in q will receive more weight. Following on the same meeting-a-friend example as above, importance sampling does not require the global knowledge of all possible meeting places and their associated probabilities used by direct sampling. Instead, the extent of knowledge needed in importance sampling can be understood by analogy to rough knowledge of which areas in London have the highest chance of my friend being there (e.g., their home or workplace) and how quickly the chance drops off from those streets. I then could sample establishments guided by this rough knowledge and correct for the estimated probabilities of meetup based on the probability of that place. Formally, the correction step that reweights samples is defined as follows: wi ¼

N X pðhi jd Þ=qðhi Þ wi ¼  with N X i¼ pðhj jd Þ=qðhj Þ

(20.3)

j¼

where N is total number of samples. The weights of samples can be seen as a measure of how well the proposal distribution q (i.e., the rough knowledge) fits the target distribution p. Consider an extreme scenario when q matches p exactly (i.e., q ¼ p), the samples will be equally weighted (i.e., wi ¼ N , i ¼ ,, . . . , N ) because there is no need of reweighting to correct the differences between two distributions. To this end, the method may appear to be redundant: why bother to draw samples from another distribution if we can readily evaluate pðhjd Þ? The usefulness of the importance sampling algorithm is realized since Equation . can be further simplified as follows: wi ¼

p∗ ðhi jd Þ Z qðhi Þ N X p∗ ðhj jd Þ j¼

Z qðhj Þ

¼

p∗ ðhi jd Þ=qðhi Þ where hi  qðhÞ N X p∗ ðhj jd Þ=qðhj Þ

(20.4)

j¼

As Equation . suggests, the normalization constant of pðhjd Þ, Z , can be ignored in calculating weights, meaning that one can use the unnormalized values p∗ ðhjd Þ, significantly reducing the amount of computation required to run the algorithm. One common choice for the proposal

https://doi.org/10.1017/9781009002042.026 Published online by Cambridge University Press

Introduction to Psychologically Plausible Sampling Schemes



distribution is to use the prior distribution: q ðhÞ ¼ pðhÞ. That is, samples are initially drawn from the prior distribution, and then reweighted to act as samples from the posterior distribution. Note that the importance sampler needs to know the unnormalized posterior at the sampled hypotheses, but not the normalized posterior which requires a global knowledge of the entire hypothesis space. The weights for this scheme, called likelihood-weighted importance sampling, are particularly simple because the prior probabilities cancel wi ¼

p∗ ðhi jd Þ=pðhi Þ pðd jhi Þ ¼ where hi  pðhÞ N N X X p ðhj jd Þ=pðhj Þ pðd jhj Þ j¼

(20.5)

j¼

so the weights are just samples proposed from the prior weighted by their likelihoods. As with any proposal distribution, this scheme works well when the proposal distribution (i.e., now the prior) is similar to the target distribution (i.e., the posterior), but poorly when they are very different. While a speed-up in computation can be achieved through using the unnormalized distribution (i.e., p∗ ðhjd Þ), the proposal distribution still requires knowledge at least the space of states in p that are non-zero, and covers the same space in q with non-zero probabilities as well. This is simply because the importance sampling algorithm will never propose states from the spaces where the proposal distribution, q, has zero probabilities. In this sense, the importance sampling algorithm requires approximate global knowledge of the target distribution: a rough knowledge of where the probable states in the target distribution might be located. Without this knowledge the algorithm will be very inefficient. If q is too broad compared to p (e.g., q is a uniform distribution across all possible meetup locations, meaning that I have no prior knowledge of believing one restaurant is more probable for a meetup than another), many of the proposed states will be low probability states in the target distribution and a very large number of samples will be needed to produce a good approximation. And alternatively, when q is too narrow compared to p, high probability states in p are very unlikely to be proposed based on q, and so again a very large number of samples will be needed to produce a good approximation. In practice, the importance sampling algorithm has been widely used to approximate averages of some function f ðhÞ with state probabilities distributed according to pðhjd Þ. For example, when the function f ðhÞ is a utility function, the algorithm can be used to approximate expected utility. In our example of meeting a friend for a bite to eat, f ðhÞ, is either the

https://doi.org/10.1017/9781009002042.026 Published online by Cambridge University Press



- ,    .

longitude or latitude of the eating establishment h (i.e., a hypothesis of where my friend might be). The sample-based approximation to function average is simply the weighted average of the function values at sampled hypotheses: p ½f  

N X wi f ðhi Þ N i¼

(20.6)

And in fact, this functional form provides a generalization of the well-known exemplar model of categorization (Nosofsky, ). In an exemplar model, exemplars of each category are remembered and the category label of a new exemplar is inferred from their similarity of the new exemplar to each of the old exemplars (Nosofsky, ). If the remembered exemplars are considered to be samples from a prior distribution (where the sampling is done by the environment rather than internally), and the similarity of each new exemplar to the old exemplars is encoded by the likelihood function, then the exemplar model can be written exactly as a likelihood-weighted importance sampler, providing a mathematical link between a sampling scheme and an empirically supported cognitive model (Shi et al., ). While the link of the importance sampler to the exemplar model is an unbiased use of samples, in other situations unbiased samples are not the most useful samples. When function averages are approximated by importance sampling with a limited amount of samples, certain states should matter more in their impact on the average and thus should be prioritized. Intuitively, those states that are highly probable in p and are more extremely valued in the function f should influence the average stronger than the other states. In fact, the optimal proposal distribution for calculating function average has been proved to capture these intuitions (Geweke, ; Murphy, ): jf ðhÞjpðhjd Þ q optimal ðhÞ ¼ X jf ðh0 Þjpðh0 jd Þ h'

(20.7)

Paradoxically, in order to optimally use the importance sampling to approximate function average, the optimal proposal distribution requires global knowledge of both f and p. ..

Psychological Applications of the Importance Sampler

The optimality of preferentially sampling more extreme states in importance sampling has inspired a rational reinterpretation of people’s

https://doi.org/10.1017/9781009002042.026 Published online by Cambridge University Press

Introduction to Psychologically Plausible Sampling Schemes



overestimation in judging the frequency of extreme events (Lieder et al., ; Nobandegani et al., ). The overrepresentation of extreme events is often measured by observing people’s risky choices in experiments in which the extremities and probabilities of outcomes were systematically manipulated (Lieder et al., ; Tversky & Kahneman, ). While the rational model of risky decision making, expected utility theory (von Neumann & Morgenstern, ), prescribes that people choose the option with the highest expected utility (with some axiomatic assumptions on the utility function), the computation of expected utility as a weighted average of the utility function is generally intractable. This, however, can be approximated by importance sampling using a utility-weighted sampler which prioritizes eventualities according to their extremity and frequency: j uðhÞjpðhjd Þ (Lieder et al., ; Nobandegani et al., ). If people are assumed to calculate expected utility with importance sampling, the optimal proposal distribution to do so is determined by the extremity of the utility, meaning people should oversample (relative to their probabilities) hypotheses with large positive or negative utilities: in essence, it is important to consider rare events if their consequences could be life-changing (in either a positive or negative sense). Such overrepresentation can then persist in judgments even when samples are reweighted as described above. This model shows how a rational agent, who is bounded by computational resources and time in calculating exact expected utility (i.e., only small sample sizes are affordable), should overestimate extreme eventualities because they are drawn from an importance distribution designed to efficiently estimate expected utility. .. Particle Filters The importance sampling algorithm is designed for situations in which there are no new observations about the environment (i.e., sampling from a fixed target distribution), but in many realistic scenarios, especially those highlighted in other chapters in this book, it is important to deal with incoming information (i.e., a constantly evolving target distribution). To draw an analogy with the example of meeting my friend for a bite to eat, we can now imagine that my friend is sending me a series of text messages: first telling me that they want a curry, then telling me that they don’t want to spend too much money, and then telling me that they don’t want to travel too far. Each of these texts conveys additional information that I will want to use to update the best location to tell my cab driver, but the other algorithms that we review in this chapter are ill-suited for inference of this

https://doi.org/10.1017/9781009002042.026 Published online by Cambridge University Press



- ,    .

type because they need to start sampling from scratch whenever the posterior distribution changes. Luckily, importance sampling can be generalized to sample from changing posterior distributions by performing importance sampling sequentially. At its simplest, this involves updating the weights of each sampled hypothesis by multiplying each weight by the likelihood of that hypothesis producing the new incoming data, iteratively applying likelihood-weighted importance sampling. Sequential importance sampling is part of a family of algorithms known as particle filters, which often for efficiency reasons involve resampling the set of hypotheses from the weight distribution so that each resampled hypothesis has equal weight (Arulampalam et al., ; Doucet et al., ; Speekenbrink, ). In cognitive terms, the particle filter acts like an evolving set of hypotheses in the mind: the hypotheses that better explain the observed data have a higher chance of surviving and multiplying while hypotheses that poorly explain observed data are likely to go extinct. ..

Psychological Applications of Particle Filters

Particle filters have been used to explain primacy effects found in human cognition, in which early data have an outsized influence. These primacy effects are often difficult to explain with Bayesian models because the temporal structure of data typically should not matter for the tasks people are tested on, but particle filters can produce them as a result of the evolving set of hypothesis samples concentrating on initially promising hypotheses, and so failing to have any samples of hypotheses that should later dominate. For example, particle filters have been used to explain garden-path sentences such as “The women brought the sandwiches from the kitchen tripped” which first may lead the listener into first thinking that the sandwiches were brought by the women, until the word “tripped”makes it clear that the sandwiches were brought to the women (Levy et al., ). Primacy effects have also explained serial dependence and working memory capacity in category learning (Lloyd et al., ; Sanborn et al., ), order effects in human causal learning (Abbott & Griffiths, ), classical conditioning in animal behaviors (Daw & Courville, ; Gershman et al., ), and decision making in changing environments (Brown & Steyvers, ; Prat-Carrabin et al., ; Yi et al., ).

.

Sampling with Local Knowledge

While approximate global knowledge significantly reduces computational costs, is it possible to sample with a lesser form of knowledge? What if

https://doi.org/10.1017/9781009002042.026 Published online by Cambridge University Press

Introduction to Psychologically Plausible Sampling Schemes



I don’t know where my friend is likely to be, even approximately? I might only be able to easily calculate the unnormalized probabilities p∗ ðhjd Þ, and must somehow sample from the posterior distribution without access to the partition function Z . Here we introduce a general family of sampling algorithms that operate on this principle, Markov chain Monte Carlo (MCMC). The core idea in MCMC is to simulate transitions of a Markov chain whose stationary distribution is proportional to pðhjd Þ. That is, we can sample from a conditional probability q ðh0 jhÞ (i.e., the transition probability of a Markov chain) recursively long enough, such that the stationary distribution of the Markov chain is in proportion to pðhjd Þ. In our meetup example, this is similar to comparing the likelihood of two possible restaurants: we do not know the exact probability that my friend selected either location, but we do know which has the better chance, and, with enough pairwise comparisons, can create an approximation of the underlying distribution given certain assumptions (Tierney, ). While a Markov chain converges to a unique stationary distribution, there are many ways to design the Markov chain’s transitions, q ðh0 jhÞ, to do so while using only local knowledge of the target distribution. We define local knowledge of the distribution to be the ability to determine the unnormalized probability of any state p∗ ðhjd Þ, but there is no requirement to store all of the p∗ ðhjd Þ or even to store a list of all possible states h. It is only necessary to consider a very small number of states and unnormalized probabilities at any one time, and these can be forgotten following evaluation, which greatly increases the psychological plausibility of these sampling algorithms. Different q ðh0 jhÞ suit different target distribution geometries. As a result, the sampler dynamics vary among algorithms; so do their psychological implications. In this section, we will focus on discussing the Random Walk Metropolis (RWM) that has achieved empirical successes in explaining aspects of human data, while also briefly mentioning related algorithms including Gibbs sampling and Metropolis-coupled MCMC (MC). .. Random Walk Metropolis A commonly used proposal distribution of MCMC is a Gaussian distribution centered on the current state: q ðh0 j hÞ ¼ Nðh0 ; h, σ  Þ. That is, 

Following on our example of meeting a friend, the transition probability of a Markov chain can be thought as the probability of thinking about the next restaurant based on the current restaurant. It is also possible that the next restaurant turns out to be the same as the current one.

https://doi.org/10.1017/9781009002042.026 Published online by Cambridge University Press



- ,    .

new potential states are likely to fall close to the previous state. However, not every proposed state h0 will be automatically accepted; if rejected, the new state should remain h. There exist numerous rules for accepting or rejecting proposed states, whose ultimate objectives are to approximate a probability distribution pðh j d Þ by staying at a state for an amount of times that is also proportional to its probability (Dellaportas & Roberts, ). The combination of a Gaussian proposal distribution and the Metropolis–Hastings acceptance rules is also known as a Random Walk Metropolis (RWM) algorithm (Metropolis et al., ). To solve the meeting-a-friend example as a RWM sampler, I should plan my direction of travel by first mentally comparing restaurants in London in a sequential fashion. I start with one restaurant and then think about the unnormalized probability that my friend picked it. Then I think of a nearby restaurant and the unnormalized probability that my friend picked it instead. If the new restaurant has higher chance of being picked by my friend than the last, then I focus on the new restaurant. Conversely, if the new restaurant has a lower chance than the last, then I stochastically decide whether to focus on this restaurant or focus on the previous one. Whichever restaurant I focus on, I again then randomly think of one of its neighbors next. Surprisingly, the proportion of times I focus on a restaurant gives an estimate of that establishment’s probability of meetup. How does the Metropolis–Hastings acceptance rule guarantee that the sampler spends time in each state in proportion to its probabilities in the target distribution pðh j d Þ? Because we want a proposal distribution that does not relate to the target distribution (i.e., does not appeal to the global knowledge), the sampler cannot a priori know how probable the proposed state h0 is in the target distribution. The sampler does know, however, that the proposed state h0 is either more probable or less probable than the current state h; that is, whether my friend is more likely to be at the new establishment. If the sampler greedily moves only to more probable states, it then should mimic optimization algorithms that search for local optima. Our goal is not to find the maximum probability, however, but instead to draw representative samples from the target distribution; hence unlike most optimization algorithms, the sampler has to move to less probable states occasionally. The quantity that governs the stochastic movement between proposed and current states is the relative probability between the 0 two states ppððhhjdjdÞÞ. The formal condition underlying the convergence of RWM is known as the detailed balance (O’Hagan & Forster, ). In practice, the probability of accepting the proposed state follows:

https://doi.org/10.1017/9781009002042.026 Published online by Cambridge University Press



Introduction to Psychologically Plausible Sampling Schemes  paccept ¼ min ,

pðh0 jd Þq ðhjh0 Þ pðhjd Þq ðh0 jhÞ

 (20.8)

In RWM, a symmetric Gaussian distribution is used as the proposal distribution, meaning that qðh j h0 Þ ¼ qðh0 j hÞ. Furthermore, the calculation of relative probability between h and h0 can be further simplified using unnormalized distribution because the normalization constant cancels out: pðh0 jd Þ p∗ ðh0 jd Þ=Z p∗ ðh0 jd Þ ¼ ∗ ¼ ∗ pðhjd Þ p ðhjd Þ=Z p ðhjd Þ

(20.9)

Metropolis–Hastings algorithms are useful in practice because we now can sample from pðhjd Þ even if the normalization constant Z is unknown: we do not need to know the probability of every eatery in London, but only the relative chance of one location over another. For the algorithms running symmetric proposal distribution such as the Random Walk Metropolis, the acceptance probability can be further shortened as follows:   p∗ ðh0 jd Þ paccept ¼ min , ∗ p ðhjd Þ

..

(20.10)

Psychological Applications of the Random Walk Metropolis

RWM typically generates autocorrelated samples even when the target distribution remains unchanged. This is because the proposal distribution often suggests new states within a local neighborhood of the current state (i.e., local jumps). As a result, RWM is sensitive to its starting point, which allows it to explain framing effects. For example, RWM has been used to explain the anchoring bias where people’s estimates are biased toward a previously presented irrelevant quantity: though irrelevant to the subsequent task, people may use the irrelevant quantity as the starting point for RWM. If only a few samples are generated, there will only be a few local jumps from the starting point; hence estimates are biased toward the starting point (Lieder et al., ). The unpacking effect also can be explained by a biased starting point. Depending on the examples that are unpacked, people judge the probability of an implicitly unpacked event (e.g., “death from heart attack, cancer, and other natural courses”) differently than the probability of an equivalent packed one (e.g., “death from natural causes”) despite probability theory requiring the two to be the same (Dasgupta et al., ; Fox & Tversky, ; Sloman et al., ). Empirically, unpacking to typical examples increases probability judgments (i.e., subadditivity), while unpacking to

https://doi.org/10.1017/9781009002042.026 Published online by Cambridge University Press



- ,    .

atypical examples decreases probability judgments (i.e., superadditivity). In RWM, due to local search and autcorrelation, starting in the high probability region will likely miss out atypical examples, while starting in the low probability region will likely overestimate atypical examples, therefore explaining these unpacking effects as a starting point bias (Dasgupta et al., ). In addition to these two examples, the starting point has been used to explain a range of other biases, including the observed autocorrelation in human hypothesis generation (Bonawitz et al., ; Dasgupta et al., ; Gershman et al., ; Sanborn & Chater, ; Vul & Pashler, ), which we discuss further in Chapter . .. Gibbs Sampling When the probability distribution of one variable conditioned on the values of all the other variables is relatively simple to express, it can be more efficient to use the conditional probability as the proposal distribution. Gibbs sampling implements this very idea that the transition probability q ðh0 jhÞ is defined as the probability of one variable given fixed values of all of the other variables (Geman & Geman, ). The conditional probability is often easy to calculate in probabilistic graphic models where conditional dependencies are explicitly expressed in a graph (Koller & Friedman, ). If we consider the meeting-a-friend example again, instead of thinking of new eating establishments based on their straight-line distance from my current focus, I instead think of possible meetup places along the longitudes and latitudes of London iteratively. Making the unrealistic assumption that London is laid out on a grid this would be like sampling among all the eating establishments on a single (either north–south or east–west) road. For instance, I fix a latitude first, and then randomly sample an eatery along it (i.e., a longitude). Then, fixing that sampled longitude, I randomly sample an eatery along it (i.e., a new latitude). By iteratively swapping back and forth, I can gradually build up my estimate of the probability of each establishment being the actual meetup place. In this example, one iteration of the Gibbs sampling could be as follows:   hix  p hx jhi y , d   hiy  p hy jhix , d

(20.11)

where hx and hy are longitude and latitude of the restaurant h. It is straightforward to see that we are effectively sampling from the joint

https://doi.org/10.1017/9781009002042.026 Published online by Cambridge University Press

Introduction to Psychologically Plausible Sampling Schemes      distribution, pðhjd Þ ¼ p hx jhy , d p hy jd when the latitude that have been conditioned on (i.e., hy ) are sufficiently explored. ..

Psychological Applications of Gibbs Sampling

Causal relationships are often represented as a set of boxes and arrows with the boxes representing variables and the arrows indicating how variables causally influence one another. Gibbs sampling is often used for performing approximate inference because it is relatively easy to sample the value of a variable given fixed values of all of the other variables. Interestingly, the dynamics of Gibbs sampling match the sequential belief changes of human participants in online causal learning experiments, in which participants are asked to report their causal model after each observation (Bramley et al., ). More specifically, when asked to identify true causal models, there is a strong tendency for people to identify a model that only differs in a single aspect from the causal model they identified on the previous trial. That is, people appear to maintain a global hypothesis about the causal model that is updated by making local changes. This behavior matches Gibbs sampling which focuses on one specific aspect of the causal models at a time and updates conditional on the rest. .. Metropolis-Coupled MCMC (MC) While the Random Walk Metropolis and Gibbs samplers can be successful in exploring simple distributions, these algorithms can have issues with more complex spaces. A key example of this is provided by multimodal distributions: new states are unlikely to fall far from the current state, meaning the RWM sampler will have difficulty finding distant modes, particularly if those modes are separated by low value regions. Returning to our example of meeting my friend, if I start out sampling a restaurant that is surrounded by only steak houses then, knowing that my friend wants a curry, I might never sample beyond that initial restaurant and incorrectly conclude that this is where my friend must end up. To allow a sampler to escape from a single mode, one strategy is to “melt” the target distribution to make the differences between probabilistic peaks and valleys less extreme. In doing so, the sampler can quickly traverse through valleys between modes (low probability regions); hence, making the remote modes easier to sample from. In statistics, the process 

Here, hx and hy are interchangable.

https://doi.org/10.1017/9781009002042.026 Published online by Cambridge University Press



- ,    .

of smoothing out a distribution can be done through controlling the computational temperature of the distribution. Increasing the temperature of a distribution leads to a flatter probabilistic landscape; the distribution converges to a flat, uniform distribution when the temperature approaches infinity. The distorted distribution, though easier to explore, is no longer the original distribution. The computational temperature, which was initially increased in order to smooth out the distribution, should be decreased back to  in order to ensure sampling from the target distribution. This process of heating and cooling has been implemented both serially and parallelly in algorithms. The serial implementation is known as simulated annealing in statistics (Kirkpatrick et al., ); this algorithm intuitively resembles the way that metals are forged by repeatedly heating and cooling in order to align the molecules. The parallel implementation, which will be our focus, is known as Metropolis-coupled MCMC (MC), or parallel tempering in statistics (Earl & Deem, ; Geyer, ). This algorithm maintains multiple parallel Markov chains at different temperatures, and allows communication among chains. The higher temperature chains, which preferentially make long distance moves in the probabilistic landscape, assist lower temperature chains to make nonlocal jumps. ..

Psychological Applications of MC

The sampler dynamic of MC, due to swapping between chains, differs from that of RWM: a swap often induces a long-distance, nonlocal jump. As a result, the cold chain will perform local jumps on most occasions, but may intermittently swap to better locations with nonlocal jumps. The overall distribution of jump distances thus can resemble a power-law distribution, which is associated with Lévy flights in the animal and cognitive foraging literature, and has also been observed in people “internally foraging” for the names of animals (Todd & Hills, ; Rhodes & Turvey, ). For the same reason, MC also emits long-range, non-Markovian dependencies for every individual chain. This serial dependence can be characterized as /f noise, a type of long-range dependence which is ubiquitous in cognitive time series (Gilden, ; Gilden et al., ; Zhu et al., ).

. Discussion and Conclusions Sampling algorithms are methods for generating examples to approximate a probability distribution about which little might be known in advance. We have reviewed a number of algorithms here and discussed their

https://doi.org/10.1017/9781009002042.026 Published online by Cambridge University Press

Introduction to Psychologically Plausible Sampling Schemes



psychological plausibility and how they have been deployed by researchers to explain how internal samples are generated. While many algorithms are effective only when sampling from an unchanging distribution, there are some, such as particle filters, that are able to operate as information is obtained from the outside environment, as in information sampling. The juxtaposition of information and internal sampling in this book suggests an interesting prospect for investigation. Perhaps the sampling algorithms that we have outlined here are not just used internally, as in sampling from memory, but perhaps also used externally for information sampling. For instance, direct sampling, in combination with a greedy policy that selects the best action given the sample, has been widely applied in multi-arm bandit problems. The combined algorithm, famously known as Thompson sampling, balances exploitation of the current best alternative with exploration of potentially better alternatives in a way that is optimal for the simple bandit task and many other sequential decision making problems (Russo et al., ; Thompson, ). Human behavior in these sequential tasks, to some extent, also corresponds to Thompson sampling (Gershman, ; Speekenbrink & Konstantinidis, ). Finally, moving beyond the behavior of individuals, there is a long tradition in economics of seeing trading in markets, and especially in betting and financial markets, as implementing a form of distributed computation aggregating the knowledge of distinct rational (and sometimes boundedly rational) agents (e.g., Bowles, Kirman & Sethi, ). It is also an intriguing possibility that individual agents viewed as samplers may, when able to interact with each other (whether through copying, communication, or trade), in aggregate be viewed as carrying out distributed Bayesian computation by sampling (see, e.g., Krafft et al., ). REF ERE NCE S Abbott, J. T., & Griffiths, T. L. (). Exploring the influence of particle filter parameters on order effects in causal learning. Proceedings of the Annual Meeting of the Cognitive Science Society, , –. Arulampalam, M. S., Maskell, S., Gordon, N., & Clapp, T. (). A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Transactions on Signal Processing, (), –. Battaglia, P. W., Hamrick, J. B., & Tenenbaum, J. B. (). Simulation as an engine of physical scene understanding. Proceedings of the National Academy of Sciences, (), –. Blurton, S. P., Kyllingsbæk, S., Nielsen, C. S., & Bundesen, C. (). A Poisson random walk model of response times. Psychological Review,  (), .

https://doi.org/10.1017/9781009002042.026 Published online by Cambridge University Press



- ,    .

Bonawitz, E., Denison, S., Griffiths, T. L., & Gopnik, A. (). Probabilistic models, learning algorithms, and response variability: sampling in cognitive development. Trends in Cognitive Sciences, (), –. Bowles, S., Kirman, A., & Sethi, R. (). Retrospectives: Friedrich Hayek and the market algorithm. Journal of Economic Perspectives, (), –. Bramley, N. R., Dayan, P., Griffiths, T. L., & Lagnado, D. A. (). Formalizing Neurath’s ship: Approximate algorithms for online causal learning. Psychological Review, (), . Brown, G. D., Neath, I., & Chater, N. (). A temporal ratio model of memory. Psychological Review, (), –. Brown, S. D., & Steyvers, M. (). Detecting and predicting changes. Cognitive Psychology, (), –. Chater, N., & Manning, C. D. (). Probabilistic models of language processing and acquisition. Trends in Cognitive Sciences, (), –. Dasgupta, I., Schulz, E., & Gershman, S. J. (). Where do hypotheses come from? Cognitive Psychology, , –. Daw, N., & Courville, A. (). The pigeon as particle filter. Advances in Neural Information Processing Systems, , –. Dellaportas, P., & Roberts, G. O. (). An introduction to MCMC. In Spatial statistics and computational methods (pp. –). New York: Springer. Doucet, A., De Freitas, N., & Gordon, N. (). An introduction to sequential Monte Carlo methods. In Sequential Monte Carlo methods in practice (pp. -). New York: Springer. Earl, D. J., & Deem, M. W. (). Parallel tempering: Theory, applications, and new perspectives. Physical Chemistry Chemical Physics, (), –. Fox, C. R., & Tversky, A. (). A belief-based account of decision under uncertainty. Management Science, , –. Geman, S., & Geman, D. (). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence (), –. Gershman, S. J. (). Uncertainty and exploration. Decision, (), . Gershman, S. J., Blei, D. M., & Niv, Y. (). Context, learning, and extinction. Psychological Review, (), . Gershman, S. J., Vul, E., & Tenenbaum, J. B. (). Multistability and perceptual inference. Neural Computation, (), –. Geweke, J. (). Bayesian inference in econometric models using Monte Carlo integration. Econometrica, (), –. Geyer, C. J. (). Markov chain Monte Carlo maximum likelihood. In Keramidas (Ed.), Computing Science and Statistics: Proceedings of the rd Symposium on the Interface. Interface Foundation, Fairfax Station, pp. –. Gilden, D. L. (). Cognitive emissions of /f noise. Psychological Review,  (), .

https://doi.org/10.1017/9781009002042.026 Published online by Cambridge University Press

Introduction to Psychologically Plausible Sampling Schemes



Gilden, D. L., Thornton, T., & Mallon, M. W. (). /f noise in human cognition. Science, (), –. Gittins, J., & Jones, D. (). A dynamic allocation index for the discounted multiarmed bandit problem. Biometrika. (), –. Griffiths, T. L., & Tenenbaum, J. B. (). Optimal predictions in everyday cognition. Psychological Science, (), –. Griffiths, T. L., Vul, E., & Sanborn, A. N. (). Bridging levels of analysis for probabilistic models of cognition. Current Directions in Psychological Science, (), –. Kahneman, D., & Tversky, A. (). Subjective probability: A judgment of representativeness. Cognitive Psychology, (), –. Koller, D., & Friedman, N. (). Probabilistic graphical models: principles and techniques. Cambridge; MA: MIT. Krafft, P. M., Shmueli, E., Griffiths, T. L., & Tenenbaum, J. B. (). Bayesian collective learning emerges from heuristic social learning. Cognition, , . Kuhn, T. (). The structure of scientific revolutions, Chicago: University of Chicago Press. Levy, R. P., Reali, F., & Griffiths, T. L. (). Modeling the effects of memory on human online sentence processing with particle filters. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.), Advances in neural information processing systems  (pp. –). Lieder, F., Griffiths, T. L., & Hsu, M. (). Overrepresentation of extreme events in decision making reflects rational use of cognitive resources. Psychological Review, (), . Lloyd, K., Sanborn, A., Leslie, D., & Lewandowsky, S. (). Why higher working memory capacity may help you learn: Sampling, search, and degrees of approximation. Cognitive Science, (), e. Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., & Teller, E. (). Equation of state calculations by fast computing machines. Journal of Chemical Physics, (), –. Mosteller, F., & Nogee, P. (). An experimental measurement of utility. Journal of Political Economy, (), –. Murphy, K. P. (). Machine learning: a probabilistic perspective. Cambridge, MA: MIT. Nobandegani, A. S., Castanheira, K. D. S., Otto, A. R., & Shultz, T. R. (). Over-representation of extreme events in decision-making: A rational metacognitive account. arXiv preprint arXiv:.. Osofsky, R. M. (). Choice, similarity, and the context theory of classification. Journal of Experimental Psychology: Learning, Memory, and Cognition, (), –. Nosofsky, R. M. (). The generalized context model: An exemplar model of classification. In E. M. Pothos & A. J. Wills (Eds.), Formal approaches in categorization (pp. –). Cambridge University Press.

https://doi.org/10.1017/9781009002042.026 Published online by Cambridge University Press



- ,    .

O’Hagan, A., & Forster, J. J. (). Kendall’s advanced theory of statistics: Bayesian inference (Vol. B). London: Arnold. Pleskac, T. J., & Busemeyer, J. R. (). Two-stage dynamic signal detection: a theory of choice, decision time, and confidence. Psychological Review, (), . Prat-Carrabin, A., Wilson, R. C., Cohen, J. D., & Azeredo da Silveira, R. (). Human inference in changing environments with temporal structure. Psychological Review, (), –. Raaijmakers, J. G., & Shiffrin, R. M. (). Search of associative memory. Psychological Review, (), –. Ratcliff, R., & Rouder, J. N. (). Modeling response times for two-choice decisions. Psychological Science, (), –. Rhodes, T., & Turvey, M. T. (). Human memory retrieval as Lévy foraging. Physica A: Statistical Mechanics and its Applications, (), –. Roth, D. (). On the hardness of approximate reasoning. Artificial Intelligence, (–), –. Russo, D. J., Van Roy, B., Kazerouni, A., Osband, I., & Wen, Z. (). A tutorial on Thompson sampling. Foundations and Trends® in Machine Learning, (), –. Sanborn, A. N., & Chater, N. (). Bayesian brains without probabilities. Trends in Cognitive Sciences, (), –. Sanborn, A. N., Griffiths, T. L., & Navarro, D. J. (). Rational approximations to rational models: alternative algorithms for category learning. Psychological Review, (), . Sanborn, A. N., Mansinghka, V. K., & Griffiths, T. L. (). Reconciling intuitive physics and Newtonian mechanics for colliding objects. Psychological Review, (), . Shadlen, M. N., & Shohamy, D. (). Decision making and sequential sampling from memory. Neuron, (), –. Shi, L., Griffiths, T. L., Feldman, N. H., & Sanborn, A. N. (). Exemplar models as a mechanism for performing Bayesian inference. Psychonomic Bulletin & Review, (), –. Sloman, S., Rottenstreich, Y., Wisniewski, E., Hadjichristidis, C., & Fox, C. R. (). Typical versus atypical unpacking and superadditive probability judgment. Journal of Experimental Psychology: Learning, Memory, and Cognition, , –. Speekenbrink, M. (). A tutorial on particle filters. Journal of Mathematical Psychology, , –. Speekenbrink, M., & Konstantinidis, E. (). Uncertainty and exploration in a restless bandit problem. Topics in Cognitive Science, (), –. Spicer, J., Mullett, T. L., & Sanborn, A. N. (, December ). Repeated Risky Choices Become More Consistent with Themselves but not Expected Value, with No Effect of Trial Order. https://doi.org/./osf.io/jgefr Stigler, S. M. (). The history of statistics: The measurement of uncertainty before . Cambridge, MA: Harvard University Press.

https://doi.org/10.1017/9781009002042.026 Published online by Cambridge University Press

Introduction to Psychologically Plausible Sampling Schemes



Thompson, W. R. (). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, (–), –. Tierney, L. (). Markov chains for exploring posterior distributions. Annals of Statistics, –. Todd, P. M., & Hills, T. T. (). Foraging in mind. Current Directions in Psychological Science, (), –. Tversky, A., & Kahneman, D. (). Availability: A heuristic for judging frequency and probability. Cognitive Psychology, (), –. Usher, M., & McClelland, J. L. (). The time course of perceptual choice: The leaky, competing accumulator model. Psychological Review, (), . Van Rooij, I. (). The tractable cognition thesis. Cognitive Science, (), –. Von Neumann, J., & Morgenstern, O. (). Theory of games and economic behavior (commemorative edition). Princeton: Princeton University Press. Vul, E., Goodman, N., Griffiths, T. L., & Tenenbaum, J. B. (). One and done? Optimal decisions from very few samples. Cognitive Science, (), –. Vul, E., & Pashler, H. (). Measuring the crowd within: Probabilistic representations within individuals. Psychological Science, (), –. Vulkan, N. (). An economist’s perspective on probability matching. Journal of Economic Surveys, (), –. Wagenaar, W. A. (). Generation of random sequences by human subjects: A critical survey of literature. Psychological Bulletin, (), –. Wolpert, D. M. (). Probabilistic models in human sensorimotor control. Human Movement Science, (), –. Yi, M. S., Steyvers, M., & Lee, M. (). Modeling human performance in restless bandits with particle filters. Journal of Problem Solving, (), . Zhu, J.-Q., Sanborn, A., & Chater, N. (). Mental sampling in multimodal representations. Advances in Neural Information Processing Systems, , –.

https://doi.org/10.1017/9781009002042.026 Published online by Cambridge University Press

 

Approximating Bayesian Inference through Internal Sampling Joakim Sundh, Adam Sanborn, Jian-Qiao Zhu, Jake Spicer, Pablo León-Villagrá, and Nick Chater . Introduction Humans frequently make judgments and decisions regarding present or future states of a world of which they are at least partially ignorant; in psychology this is generally referred to as probability judgments and decision making under risk. Considering that there is little that is certain in life, we can reasonably assume that humanity would have to be equipped with some mechanism for making such inferences, ideally as quickly and accurately as possible, or we would have been hard-pressed to survive this long. Nevertheless, people regularly and systematically make simple mistakes (e.g., judgment biases) that, on closer inspection, seem rather obvious even without any formal schooling in either probability theory or psychology. Many of these biases can be explained due to the fact that the information we receive from the world is in itself biased. The world we live in is large and complex, and we are often aware of only a limited part of it at a time, either because only a small portion of information is available to us or because we do not have the time or capacity to enumerate the totality of our surroundings. Instead, we typically experience only a few glimpses of the world around us, which we usually refer to as samples, and if those samples are biased in some way, then the judgments and decisions that we make based on that information are likely to be biased as well. It is important to note that this will be the case even if the observer has a complete model of how information is generated and processes it flawlessly (e.g., Konovalova & Le Mens, ; Le Mens & Denrell, ). Yet, there remain certain biases that cannot be explained by the structure of the environment or biases in the information gathering process, Sundh, Sanborn, Zhu, Spicer, and León-Villagrá were supported by a European Research Council consolidator grant (-SAMPLING).



https://doi.org/10.1017/9781009002042.027 Published online by Cambridge University Press

Approximating Bayesian Inference through Internal Sampling



primarily those that demonstrate incoherence in human cognition. Coherence implies adherence to the principles of logic and probability theory (e.g., Kolmogorov’s axioms), which will necessarily be reflected in the information sampled from the world. For example, probability theory tells us that the probability of an event occurring plus the probability of an event not occurring will be equal to one (i.e., P(A) + P(not-A) ¼ ); conversely, if we count the number of rainy days during a week, then the number of rainy days plus the number of days without rain will obviously equal the total number of days of the week, regardless of whether our sample is representative of the average precipitation in our area or not. Therefore, since information sampled from the environment will necessarily be coherent according to the theorems of probability theory, the conclusions of a perfectly Bayesian mind should be coherent as well. Of course, perfect Bayesian accounts of actual cognitive processes are computationally intractable (e.g., van Rooij, ; van Rooij & Wareham, ), since the calculations involved in modeling every single possible state of any system relating to our physical environment would be overwhelmingly complex. Consequently, we must assume that the mind is limited to, at best, approximating the Bayesian solutions to inference problems. Fortunately, we can extend the information-sampling paradigm to apply to the internal workings of the mind as well. In this case, we assume that, rather than explicitly calculating optimal Bayesian solutions, the brain approximates these solutions by considering a small number of samples from relevant distributions. While Bayesian calculation is difficult, sampling is often surprisingly easy; even when it is not possible to represent a distribution analytically it might be possible to sample from it, which is why sampling is often used as a tool in computational statistics and machine learning. Additionally, these sampling processes can explain many apparent idiosyncrasies in human probabilistic inference, including the aforementioned incoherence biases, because of a combination of internal sampling with compensatory strategies for small samples and computationally rational sampling algorithms. (For an in-depth introduction to the algorithms by which these sampling processes are realized, see Chapter  in this volume.)

.

Approximating Bayesian Inference by Sampling

The internal sampling processes that we are discussing here differ from the process of sampling information from the environment (which, for clarity, we will refer to as “external sampling”) in that we are primarily concerned

https://doi.org/10.1017/9781009002042.027 Published online by Cambridge University Press



 ,    .

Internal samples Samples

Internal distribution

Inference

Environment

Figure . Schematic illustration of the internal sampling process: Information is sampled from the environment, which shapes an internal distribution. Inferences are based on a small number of samples drawn from the internal distribution.

with sampling taking place within the mind itself. This is not to say that internal and external sampling processes do not each contribute to various quirks in human behavior but, as previously observed, the way internal sampling is realized can explain certain incoherence biases that external sampling cannot. It is clear that these two types of sampling interact, since much of the information one samples from the mind will have been influenced by what has been sampled from the environment (see the Conclusions Section . for a brief discussion on this point), but for simplicity we will limit ourselves to internal sampling in this chapter. The basic principle of internal sampling is that for each inference a number of samples are drawn from an internal distribution, which is in turn based on information sampled from the environment (see Figure .). As an illustration, let us consider the standard statistical example of estimating the proportion of blue and red balls in an urn. As per usual, one can sample balls from the urn and make a frequency judgment based on the proportion of blue and red balls in the sample. But we need not assume that the information associated with the sampled balls is discarded after being sampled, rather we can assume that it contributes to the basis of a second “internal” urn. The obvious advantage of the internal urn is that whenever one is subsequently obliged to make a judgment regarding the proportion of blue and red balls in the external urn, one can use the internal urn as a proxy. Sampling has the desirable quality that, in the limit (i.e., with an infinite number of samples), it will approach the “true” distribution. Of course, we

https://doi.org/10.1017/9781009002042.027 Published online by Cambridge University Press

Approximating Bayesian Inference through Internal Sampling



cannot expect the brain to process an infinite number of samples, nor necessarily a large one. Human cognition is inevitably subject to computational constraints, implying an unstated upper bound on human rational inference, usually referred to as bounded rationality (Gigerenzer & Selten, ; Lieder & Griffiths, ; Simon, ). It is, therefore, more reasonable to assume that people only use enough samples to make “good enough” inferences about the world. Evidence further suggests that rational inferences might indeed only require very few samples, even as few as a single one in some situations (Vul et al., ). The phenomena of probability matching illustrate this process nicely. Research has shown that, when choosing among alternatives with stochastic payoffs, people tend to choose alternatives in proportion to the probability of their payoffs rather than sticking with the alternative that would maximize their expected outcome, even when they have reliable knowledge of the outcome distributions (e.g., Koehler & James, ). If we assume that, for each decision, one draws a limited number of samples from an internal distribution (i.e., the internal urn) then alternatives will consequently be chosen according to their proportions in this distribution, which, assuming that the external sampling process is unbiased, will match the proportions of the environment (i.e., the external urn). There are empirical as well as methodological hurdles for the internal sampling account to overcome. From the empirical perspective, there are a number of systematic biases, primarily those that imply incoherence in human probabilistic inference, that cannot be explained by sampling alone. From the theoretical perspective, a small number of samples might be enough to make rational decisions in many cases, but they often create obvious inaccuracies when making judgments. Additionally, although sampling is usually computationally easy, it might not be possible to represent the relevant distribution to sample from, meaning that independent and identically distributed sampling is not possible. Fortunately, as we will demonstrate in the following sections, we can find solutions to the empirical hurdles by solving the methodological ones. First, we will describe how Bayesian adjustment for a small number of samples, expressed by the Bayesian sampler model’s prior on responses, can account for incoherence biases such as subadditivity and the conjunction fallacy, as well as repulsion and representativeness effects. Second, we will describe how sampling algorithms that do not require global knowledge of the sampling space can explain heavy-tailed changes and autocorrelations in human cognition, as well as supply an additional account for subadditivity, anchoring, and representativeness. We will see that both of these

https://doi.org/10.1017/9781009002042.027 Published online by Cambridge University Press



 ,    .

mechanisms generate departures from standard probability theory: the prior on responses through the small number of samples and how this is compensated for, and the alternative sampling methods through the introduction of correlations between samples and the consequent biases. These mechanisms can provide explanations of various classic heuristics and biases in the literature on judgments and decision making (see Table .) and indeed, as we will see, the same phenomena can often be generated by both types of mechanisms. ..

A Prior on Responses: Adjustment for Limited Samples

As previously described, internal sampling can explain the stochasticity of individual human probabilistic inference but cannot in itself explain biases that imply that people’s judgments and decisions are incoherent on average. First, people’s probability judgments tend to be subadditive, meaning that when making separate probability judgments of the components of a category (e.g., “rain,” “snow,” or “any other type of precipitation”), the sum of these separate judgments is higher than the judgment of the category itself (e.g., “any type of precipitation”; e.g., Redelmeier et al., ). Second, people tend to make conjunction fallacies, by judging the probability of the conjunction of two (or more) events (e.g., rainy and windy weather) as higher than either of the individual events (e.g., rainy weather; e.g., Tentori et al., ; Tversky & Kahneman, ). Internal sampling can potentially result in subadditivity and conjunction fallacies for individual judgments solely on account of the variability inherent in a small number of samples, but if the samples are representative of the internal distribution, then the average of repeated sampling will approach the average of the distribution, meaning that people’s judgment should be unbiased on average. If people compensate for limited samples in a Bayesian manner, however, then biases such as these are unavoidable, and some compensation is often necessary in order to avoid obviously inaccurate inferences. For example, if one drew one blue ball from an urn with an unknown proportion of blue and red balls, then one would most likely not conclude that the urn contained only blue balls based on that evidence alone. Similarly, if one flipped a coin once and it came up heads, then one would certainly not conclude that the coin will always come up heads. Rather, assuming that any combination of red or blue balls in the urn is equally likely (i.e., a uniform prior), the Bayesian estimate is . blue balls. As for the coin, we usually have a strong prior that the probability of

https://doi.org/10.1017/9781009002042.027 Published online by Cambridge University Press

Approximating Bayesian Inference through Internal Sampling



Table .. Examples of how various psychological phenomena are explained either by a prior on responses or alternative sampling algorithms. Phenomena

Mechanism

Subadditivity

Explicit subadditivity: Each judgment is adjusted, so the sum of a number of judgments of the components of a category will be adjusted more than a single judgment of the category (PR) Implicit subadditivity/superadditivity: Combined judgments of common components of a category are judged as more likely than the category, while combined judgments of uncommon components of a category are judged as less likely than the category (SA) Conjunctions between events are based on fewer samples than individual events, and therefore judgments of conjunctions are adjusted more (PR) If sampling starts in a rich region for the individual events, more occurrences of the individual events are likely to be found and therefore they are likely to be judged as more likely (SA) Gradually exploring the sample space means that responses will be biased by the starting point of the sampling processes (SA) Optimal stopping implies that sampling is more likely to stop when evidence favors a particular decision, shifting judgments away from the boundary (PR) Counting sufficiently similar “near misses” (i.e., approximate Bayesian computation) will bias judgments toward alternatives that are similar to available samples (PR) Sampling algorithms moving through the distribution with momentum will struggle with generating truly random sequences (SA) Heavy-tailed changes arise from the MC sampling algorithm (Metropolis-coupled Markov chain Monte Carlo), in which several autocorrelated samplers run in parallel, at different levels of noise, with the possibility of switching the lowest

Conjunction fallacy

Anchoring

Repulsion

Representativeness

Random number generation

Heavy-tailed changes in individual aggregate behavior (e.g., potentially in artificial and real financial markets)

https://doi.org/10.1017/9781009002042.027 Published online by Cambridge University Press



 ,    . Table .. (cont.)

Phenomena

Mechanism noise level (the lowest temperature) to the current best guess (SA; see Chapter  for further details) Changes in behavioral measures, from rhythmic tapping (Gilden, Thornton & Mallon, ), to reaction times in standard psychological experiments (van Orden et al., ) often show characteristic autocorrelations at all scales over time; MC sampling generates this pattern, but most sampling algorithms, and indeed other types of cognitive models, do not (SA)

/f noise

PR ¼ Prior on Responses; SA ¼ Sampling Algorithms

heads is ., so we would reasonably require a rather long run of heads to change our opinion. The Bayesian sampler model (Zhu et al., ) formalizes this intuition with respect to human probability judgments. This model predicts that, for each probability judgment, a small number of instances are sampled from an internal representation, for example by drawing them from longterm memory or performing mental simulation, and the judgment is based on the proportion of outcomes in the sample after being adjusted according to a prior on responses. Given a symmetric prior expressed by the beta distribution Beta(β, β), the average judgment of the Bayesian sampler is  E P^ ðAÞ ¼

N β P ðAÞ þ : N þ β N þ β

(21.1)

We can see that when the number of samples N increases, the first term will approach one and the second term will approach zero, while when the prior parameter β increases, the first term will approach zero and the second term will approach .. Therefore, the Bayesian sampler predicts that, for a limited number of samples, human probability judgments will be adjusted toward the middle of the probability scale. This type of conservatism is indeed what we see in human behavior; as a rule, people’s probability judgments tend to be less extreme than one would expect (Edwards, ; Erev et al.,; Hilbert, ; Peterson & Beach, ).

https://doi.org/10.1017/9781009002042.027 Published online by Cambridge University Press

Approximating Bayesian Inference through Internal Sampling



The Bayesian sampler can explain the subadditivity effect described above, by predicting that samples of lower (or higher) probabilities are adjusted more than probabilities close to .. Because the components of a category are necessarily less probable than the category itself, the probabilities of each of them are likely to be more heavily adjusted than the probability of the category as a whole. The sum of the individual adjustments of the components is larger than the single adjustment of the category, which will in turn result in subadditivity. The conjunction fallacy, on the other hand, can be explained if we assume that fewer samples are used when judging the probabilities of conjunctions than when judging the probabilities of single events (a reasonable assumption, since conjunctions are more complex than individual events and therefore presumably more difficult to sample). In this case, conjunctions will be adjusted more than individual events due to the smaller number of samples, and conjunction fallacies will result, given that the probabilities of the conjunctions and the individual events are close enough to each other. The Bayesian sampler thus differs from some other sampling-based models in that biases are not due to a “naive” or “myopic” interpretation of small samples (e.g., Juslin et al., ) but rather due to a well-founded correction process that will, in the long run, improve accuracy. This does not necessarily imply that we can expect equivalent correction processes in all sampling-based inferences; probability judgments in particular allow for a correction process that is both intuitively and mathematically accessible due to the application of a Beta prior, which might not always be the case. For example, naive use of small samples causes confidence intervals to be too narrow, but it is difficult to correct for this without knowing the form of the distribution. Nevertheless, the Bayesian sampler invites us to view traditional sampling models in a new light and ask ourselves what types of corrections (if any) people make in other situations. Ultimately, although beyond the scope of this chapter, applying a Bayesian perspective on extant models of internal as well as external sampling might allow for many new insights. The use of a Beta prior also allows for the definition of an optimal stopping rule for sampling, which in turn provides an explanation for the repulsion bias sometimes observed in decision making, describing that comparisons of perceptual features against predefined boundaries can lead to subsequent estimates being repulsed from that boundary. For example, judging whether the number of dots appearing in an array is above or below  causes an aversion to responses of  in later direct estimates

https://doi.org/10.1017/9781009002042.027 Published online by Cambridge University Press



 ,    .

(Zhu et al., ). In such cases, a decision maker might take samples from a sensory representation to guide their answer, collecting evidence for either “higher” or “lower.” In combination with the prior, this evidence can then be used to determine the value of continued sampling: with each sample, the decision maker can weigh the projected benefit of information from further samples against the time and effort required to retrieve those samples, stopping when the cost exceeds the benefit. Such a rule can then lead to biases in the aggregated samples since sampling is more likely to stop where collected evidence favors a particular decision, even if those samples do not accurately reflect the target distribution. Any subsequent judgment made based on those samples will then reflect that bias; in the case of the dots, a series of samples suggesting the response “higher” could lead to early termination, but possible overestimation when attempting to determine the actual count, thus shifting responses away from the boundary (Zhu et al., ; see Chapter  for treatment of related phenomena). This viewpoint also provides a natural mechanism in terms of which we can understand the probabilistic biases that are often ascribed to the representativeness heuristic (Kahneman & Tversky, ). Suppose a person is asked to estimate the probability of, say, a fair coin falling heads five times in a row (HHHHH) versus a mix of heads and tails (e.g., HTTHH). To solve this directly by sampling sequences of five coin flips and comparing the relative frequencies of five heads versus the different mixes of heads and tails would require mentally simulating these coin flips many times; indeed, given that the true probability is just /, a large sample of at least several hundred repetitions would be required to give a reasonably accurate estimate of either sequence. Clearly, for more complex events, the size of the sample required will grow very rapidly. But a sampling approach can still be applied, by drawing a small number of samples and seeing whether they are similar to the event of interest. For example, to work out the rough probability of crashing your car, it would be inefficient (and dangerous!) simply to wait for a sufficient number of crashes to obtain a reliable estimate. Observing a few near-misses however can inform us that this is, perhaps, more probable than we had expected, and consequently that we should drive more cautiously. Indeed, in computational statistics, this intuition is embodied in the method of Approximate Bayesian Computation (Beaumont, ), where a probability is estimated from the number of samples that are sufficiently similar to the target event. But this approximation may be misleading in some circumstances, depending on the psychology of similarity judgments. For example, if we mentally simulate or recall typical short sequences of coin

https://doi.org/10.1017/9781009002042.027 Published online by Cambridge University Press

Approximating Bayesian Inference through Internal Sampling



flips, most will be an unpredictable mix of heads and tails, and therefore judged as similar to the equally patternless HTTHH, but few if any of a small sample of sequences will be judged as similar to HHHHH. Thus, counting “near misses” will lead to the erroneous conclusion that the irregular sequence is in fact more probable. .. “A Sense of Location:” Sequential Dependencies in Internal Sampling So far, we’ve assumed that people can draw independent samples from their internal model of a probability distribution, but in reality, this is neither computationally feasible nor consistent with the operation of human memory. For example, if a person is asked to think of as many different animals as they can, starting with lion would probably prompt a sample of other animals native to Africa, such as zebra, antelope, hippopotamus, and so on, while starting with whale might lead to primarily sampling other animals that live in the ocean. The same principle can also be applied to problem-solving. If given a scrambled word such as CIBRPAMOLET most people will find it very difficult to discern what the unscrambled word is supposed to be, which is hardly surprising considering that there are ,, different possible combinations. From a sampling perspective, this is indeed a very large distribution to sample over and finding the correct solution by direct sampling would require quite a lot of time. On the other hand, if we are given a starting point closer to our goal, such as PROBELMATIC, then it is much easier to reach a correct conclusion (Sanborn et al., ). Dependencies in internal sampling are perhaps most evident when people are asked to generate random sequences; human-generated sequences of random numbers show too many adjacent numbers, too few repetitions of digits, and continuing along the number line too often (e.g., --) compared to truly random sequences (Wagenaar, ). These results have been taken as evidence that people are attempting to ensure that their sequences are locally representative of a random sequence (Kahneman & Tversky, ), or alternatively that they generate sequences using schemas (Baddeley, ). Although fully random sampling is generally the most efficient way of ensuring that samples are representative, it is clearly psychologically and computationally unrealistic since it requires global knowledge of the target distribution. Instead, there are more psychologically reasonable alternatives that retain the guarantee of convergence to the target distribution in the limit of a large number of samples. In particular, Markov chain Monte

https://doi.org/10.1017/9781009002042.027 Published online by Cambridge University Press



 ,    .

Carlo (MCMC) algorithms work by gradually exploring the sampling space, meaning that each sample will be dependent on the sample that came before, and therefore provides an alternative explanation for these deviations from truly random sequences. MCMC algorithms make local proposals, which account for the higher-than-expected proportion of adjacent numbers, and some of the most popular versions of this algorithm prefer to transition to new states when the probabilities of the states are equal, which can account for the fewer-than-expected repetitions. Continuing along the number line further than expected requires a more specialized mechanism, however, such as a sampling algorithm with some persistent momentum in how it moves, which can be introduced into popular sampling algorithms like Hamiltonian Monte Carlo (Castillo et al., ). Although algorithms such as MCMC require fewer cognitive resources than direct sampling, it will create dependencies and autocorrelations, which, as noted, is something we also observe in human behavior (see Chapter  in this volume for an in-depth introduction to sampling algorithms and their implications). While these explanations are difficult to distinguish in random number generation, different predictions can be made for newly trained representations. Castillo et al. () performed such an experiment, in which participants learned a one-dimensional or two-dimensional grid of syllables and were then asked to generate random syllables. Participants’ generated sequences were better matched by moving around the representation with momentum than by the transitions between syllables in either natural language or in the training they received on the representation. ... The Sampling Algorithm’s Starting Point Applying algorithms with a sense of location to internal sampling means that where you start sampling can have a large influence on what is sampled, as in the relative difficulties of unscrambling CIBRPAMOLET and PROBELMATIC. The substantial influence of the starting point has thus been used to explain a number of framing effects: effects on how a question is asked upon the answer that is produced. For example, while in explicit subadditivity, the sum of separate judgments (e.g., of the probabilities of “rain,” “snow,” or “any other type of precipitation”) is higher than the probability of the combined judgments (e.g., “any type of precipitation”) there are implicit versions of the task that show different results. If you were asked for a combined judgment of the probability of “rain, snow, or any other type of precipitation,” then this is composed of common components that might be judged as more likely than “any type

https://doi.org/10.1017/9781009002042.027 Published online by Cambridge University Press

Approximating Bayesian Inference through Internal Sampling



of precipitation.” However, if you were asked for a combined judgment of “diamond dust, virga, or any other type of precipitation,” you may judge this to be less likely than “any type of precipitation.” Implicit subadditivity effects such as these depend on the likelihood of the examples given, which can be explained by the starting point of internal sampling; starting with highly probable examples means that they are unmissable, while for the packed version of the question they could be missed. Conversely, starting with highly improbable examples can make it more difficult to bring the highly probable examples to mind (Dasgupta et al., ; Sanborn & Chater, ; Sloman et al., ). The starting point of internal sampling can also be used to explain the anchoring effect, in which estimates are seemingly pulled toward values presented in preceding comparisons. For example, participants estimating the proportion of African countries in the United Nations guess higher numbers if first comparing this figure to  percent than without such a comparison (Tversky & Kahneman, ). These “anchors” might act as starting points for MCMC chains that, with a limited number of iterations, the decision maker is unable to sufficiently escape (Lieder et al., ). Finally, the starting point of internal sampling can explain at least some forms of the conjunction fallacy. In Tversky and Kahneman’s seminal  paper, the first empirical evidence they provided resulted from asking participants to estimate across four pages of a novel either the number of words that had the form “_ _ _ _ i n g” or the number of words that had the form “_ _ _ _ _ n _.” While the first question asked for a subset of the words asked for by the second, participants still estimated the number of words with the form “_ _ _ _ i n g” to be higher. Like with the implicit subadditivity example, we can think of this as a result of internal sampling that starts in a richer region of the internal representation when given the form “_ _ _ _ i n g,” and which has more trouble finding this richer region from the starting point “_ _ _ _ _ n _” (Sanborn & Chater, ). ... The Sampling Algorithm’s Movement Aside from the starting point of internal sampling, the way in which sampling moves through a mental representation is also important for explaining aspects of human behavior. As noted in the animal-naming example, the contents of the mind that are relevant for answering a question might be divided into clusters (animals native to Africa, animals that live in the ocean, etc.), and so are internally sampled in a clustered fashion (Bousfield & Sedgewick, ; Hills et al., ).

https://doi.org/10.1017/9781009002042.027 Published online by Cambridge University Press



 ,    .

This problem of retrieving clustered items has been extensively studied in the animal foraging literature, as animals face the parallel challenge of obtaining food that is distributed into patches (e.g., berry bushes that cluster together with substantial distances between the clusters). The distances that animals travel have been found to correspond to powerlaw distributions in these kinds of environments. This means that while there is a high probability of traveling short distances, there is also a substantial probability of traveling large distances; the probability of each distance is proportional to that distance raised to a (negative) power. This implies that we should expect any type of concept generation or memory sampling to demonstrate similar patterns where information is similarly clustered in the mind. These distributions of movements appear to be adaptive, as the most effective blind search means moving according to a power-law distribution with an exponent of negative two (Viswanathan et al., ). Interestingly, the dynamics of internal sampling show a surprising correspondence to animal foraging. When retrieving animal names, there are long delays between retrieving names that come from different clusters (Hills et al., ), and delays between retrievals are distributed according to a power-law distribution with an exponent that is close to negative two (Rhodes & Turvey, ). This suggests that internal sampling is able to move effectively in a clustered mental representation. The way in which sampling moves through a mental representation does not just depend on the previous state, as it might if location were all that mattered. Instead, internal sampling shows long-range autocorrelations, with the next state of internal sampling depending on long-ago states as well. This has been demonstrated in a wide range of dependent measures, including repeated estimates of time intervals (e.g., one second), estimates of spatial intervals (e.g., one inch), and the response times of repeated trials of tasks such as lexical decision, mental rotation, visual search, and implicit association (Correll, ; Gilden, ; Gilden et al.,). In particular, these dependencies have been characterized as /f noise, which is notable because such a pattern is not straightforward to produce (Gardner, ). This means, in practice, that the response you make does not just depend on your last response but also on responses you made many trials ago. The combination of power-law distributions and /f noise in human cognition is interesting for two reasons, one theoretical and one practical. It is of interest to psychological theory because it is even more difficult to produce power-law distance distributions and /f noise in tandem; the most common models of each effect (i.e., Levy flights for distance

https://doi.org/10.1017/9781009002042.027 Published online by Cambridge University Press

Approximating Bayesian Inference through Internal Sampling



distributions and fractal Brownian motion for /f noise) do not produce the other effect. It is also very useful for distinguishing between sampling algorithms. Of all the algorithms reviewed in Chapter , only Metropoliscoupled MCMC seems capable of reliably producing both effects as people do, which is of particular interest as this is also the best algorithm in this set for sampling from a clustered representation (Zhu et al., ; Zhu et al. ). The combination of power-law distance distributions and /f noise is of practical interest because it is very similar to some of the stylized facts found in financial markets. Both individual asset prices and indices like the S&P  show heavy-tailed price increases and decreases: while there are many days in which the prices change a little bit, there are a few days in which it changes dramatically. In addition, markets have long-range autocorrelations in their volatility: the magnitude of a price change (though importantly not the direction of a price change) depends on the magnitude of previous price changes (Cont, ). The correspondence between these effects found in internal sampling and those found in financial markets allows for the tentative suggestion that there may be a relationship between these levels that past market-based explanations have overlooked (Zhu et al. (preprint)).

. Putting the Pieces Together: The Bayesian Sampler with a Sense of Location In the above sections, we discussed two possible mechanisms for explaining various classic biases in the literature on judgments and decision making. These mechanisms do not necessarily have to act separately, but instead can work together within the same model. As an illustration, we return to the previously described numerosity task in which participants are asked to judge whether the number of dots presented briefly on a computer screen is greater than . We can integrate the two mechanisms by assuming that a sampling algorithm with a sense of location, such as MCMC, generates samples from a distribution of estimates of the number of dots observed, for example, ---. Next, samples are recoded according to the decision boundary (i.e., greater than ), so that they can be expressed as ---. The confidence in the hypothesis “greater than ” is then a result of combining these recoded samples with the Beta prior, and a (potentially varying) threshold in confidence is used to determine when to stop sampling and make a decision (Zhu et al., ).

https://doi.org/10.1017/9781009002042.027 Published online by Cambridge University Press



 ,    .

This simple model reproduces an impressive number of stylized facts about the relationships between confidence, accuracy, and response time. It also gives a novel explanation of why erroneous response times are often slower than correct response times, especially when participants are instructed to focus on accuracy, being a result of the autocorrelation of the samples. Finally, beginning to sample for the next trial where sampling for the previous trial left off allows the model to produce the long-range autocorrelations in response times that can explain the general variability in this measure (Gilden, ; Zhu et al., ).

. Discussion On the one hand, human judgment and decision making are subject to a number of systematic biases. On the other hand, human behavior in much more complex problems than the simplified tasks commonly used in judgment and decision making studies have been shown to be close to ideal/normative Bayesian inference in areas such as perception (Kersten et al., ), categorization (Anderson, ; Lake et al.,; Sanborn et al., ), reasoning and argumentation (Hahn & Oaksford, ; Oaksford & Chater, , ), and intuitive physics (Battaglia et al.,; Sanborn et al., ). The common factor of these latter approaches, sometimes collectively labeled Bayesian cognitive science, is that probabilistic reasoning, rather than heuristics, is considered the core component of human cognition, which is often viewed as Bayesian in the sense that it is adapted to make rational inferences based on subjective uncertainty (Oaksford & Chater, ). Yet the aforementioned biases seemingly contradict this perspective, implying an apparent paradox when different areas of cognitive science are compared. In this chapter, we have demonstrated how internal and external sampling can relate to each other, using the metaphor of an external and an internal urn illustrated in Figure ., and that the concept of internal sampling can be used to extend the information sampling paradigm in order to create a more complete account of human probabilistic inference and associated biases. External sampling in itself can explain biases in the correspondence of judgments and decisions to the external environment, because biased information will persist even if one were a perfect Bayesian inference machine, yet it cannot account for biases that imply incoherence. To understand this better, we can first make the simplifying assumptions that the brain perfectly stores and retrieves unmodified and unbiased samples from the environment to make inferences (e.g., Sanborn et al.,

https://doi.org/10.1017/9781009002042.027 Published online by Cambridge University Press

Approximating Bayesian Inference through Internal Sampling



). After storing a small number of external samples, draws from the internal urn will on average not be very close to the true environmental value: there will be a random deviation of the mean of the internal urn from the true environmental value. Only with a large number of external samples will draws from the internal urn be on average close to the true environmental value. This serves as a (highly simplified) model of expertise: more samples will mean that judgments will on average correspond more closely to the environment, given a suitable inference algorithm. But the number of draws from the internal urn plays a separate role in the coherence of judgments. If we assume that the information that has previously been sampled from the environment (i.e., from the external urn) is in turn sampled with replacement from the mind (i.e., from the internal urn) for each individual judgment or decision, then using small samples that are adjusted appropriately can be used to explain many of the most well known and persistent biases observed in human behavior. A person who draws a small number of samples will show greater inconsistency when asked the same question a second time, and assuming a prior on their responses, will show greater explicit subadditivity and conjunction fallacy effects. It is only with a very large number of samples that a person would be perfectly coherent and consistent, no matter the number of external samples in their urn. Thus, internal sampling processes, perhaps related to cognitive capacities such as working memory capacity (Lloyd et al., ), can unite seemingly contradictory results, by the assumption that people indeed make rational inferences, given computational restrictions and small samples. In conjunction, the number of internal and external samples can explain the surprising finding that experts often demonstrate the same biases as amateurs, despite more experience and expertise (e.g., Redelmeier, Koehler, Liverman et al., ; Reyna & Lloyd, ; Tversky & Kahneman, ). Going back to the central metaphor, although the experts by virtue of their expensive experiences and greater opportunity for sampling presumably have a larger number of balls in their internal urn, they still draw the same number of samples when making a decision. Therefore, although experts might have an internal urn that better represents the nature of the world, incoherence biases will nevertheless occur. Although most of the findings discussed in this chapter can also be explained by other (sometimes many other) theories, one of the main strengths of the internal sampling account is that it simultaneously explains so many of them. Additionally, because it will, in the limit, result in a perfectly calibrated inference, sampling is based on a rational

https://doi.org/10.1017/9781009002042.027 Published online by Cambridge University Press



 ,    .

foundation that other theories sometimes lack. As such, internal sampling stands out as one of the most complete accounts of incoherence in human probabilistic inference. Furthermore, there are some cognitive phenomena that sampling is uniquely equipped to tackle, such as probability matching (e.g., Koehler & James, ), since it is very difficult to account for such stochasticity without an underlying stochastic process such as sampling. An important next step in further validating the internal sampling account is to determine if human behavior exhibits more such characteristic patterns; for example, recent research has shown that the variance of human probability judgments is consistent with a binomial process, where the variance can be used to approximate the number of samples used (Howe & Costello, ; Sundh et al., ). Of course, there are many complications to this picture. Alongside the internal sampling mechanisms we discuss above, information samples from the environment are often biased, and participants at least partially correct for these biases as well as incorporate other information into their decisions (Hayes et al., ). Instead of assuming a frequentist interpretation of these samples (e.g., Costello & Watts, ), we instead take a Bayesian perspective in which samples are drawn according to subjective degrees of belief that have already incorporated prior beliefs and whatever correction for environmental biases that people use. Furthermore, internal sampling differs from external sampling in the sense that we cannot observe the distributions or the samples directly. There are indications that samplingbased accounts of cognition are roughly compatible with those used in neural sampling models (Buesing et al., ; Fiser et al., ; Hoyer & Hyvarinen, ; Moreno-Bote et al.,, ), but until this connection can be modeled on an implementation level (see Marr, ) internal sampling methods remain as-if models. Nevertheless, a clear strength of the internal sampling account is that we know that it could indeed be implemented by the structure and machinery of the brain.

.

Conclusions

In this chapter, we have considered how the human mind is able to deal with a complex and uncertain world. The “ideal” approach to dealing with uncertainty is often thought to be Bayesian probabilistic reasoning, but the calculations involved are hopelessly intractable for dealing with the challenges of the real world. Instead, we suggest that the mind approximates Bayesian inference by sampling small numbers of items from the relevant probability distributions, where these samples are generated through

https://doi.org/10.1017/9781009002042.027 Published online by Cambridge University Press

Approximating Bayesian Inference through Internal Sampling



mental simulation or by drawing on memory. Indeed, this is a widely adopted strategy in computational statistics and artificial intelligence, where various tractable schemes for sampling from complex distributions have been developed. However, approximate Bayesian inference using sampling will inevitably lead to systematic departures from precise Bayesian probabilistic calculation. We have considered two systematic ways in which this is the case. First, if we draw samples, probability estimates from these samples have to be modified by prior knowledge (or many events will be assigned definitive probabilities of  or ), but this adjustment will itself lead to systematic biases in probability judgments, as we saw arising from the Bayesian sampler model of probability judgment (inflating small probabilities will lead, for example, to subadditivity). Second, for problems of realistic complexity, samples cannot be drawn independently from the probability distribution of interest, which is, after all, complex and unknown. Instead, samples will be generated by local sampling methods such as Markov chain Monte Carlo methods, and thus introduce “autocorrelated” samples, where each sample tends to be similar to prior samples. Such methods will, in the limit, sample accurately from the underlying distribution; but, of course, the cognitive system must make do instead with a small number of samples, so that the starting point of the sampling process, in particular, will have strong impacts on the sample drawn, and hence on the resulting probability judgments. We have seen that both sources of systematic bias can help explain well-known phenomena traditionally captured in the heuristics and biases framework pioneered by Kahneman and Tversky; thus, the conjunction fallacy, and biases associated with anchoring, representativeness, and others, seem naturally to arise from a sampling framework. In contrast to many of the chapters of this book, we have focused primarily on internal sampling, through mental simulation or from memory, rather than potentially biased sampling arising from our interactions with the external world. But there are potentially interesting connections between sampling from the mind and sampling from the environment, which are likely to be interesting to explore in future research. For example, it may be that mental samples simply mirror samples we experience from the environment (Anderson, ; Stewart et al., ), so that biased sampling of the environment may be reflected in mental sampling. Or, in many social contexts, one person’s mental sampling feeds into another person’s environment (e.g., if one person’s judgments, arising from mental sampling, are then communicated to

https://doi.org/10.1017/9781009002042.027 Published online by Cambridge University Press



 ,    .

others); so, biases in mental sampling may shape biases in environmental sampling. Other more complex interactions can, of course, also be imagined. We suggest that combining theories about mental sampling and sampling from the environment, together with the quirks and biases of both, is likely to be a useful direction for understanding both the remarkable human ability to cope with a complex and uncertain world, and our tendency to make systematic errors on even the most elementary reasoning problems. R E F E R EN C E S Anderson, J. R. (). The adaptive character of thought. Hillsdale, NJ: Psychology Press. (). The adaptive nature of human categorization. Psychological Review,  (), . Baddeley, A. (). Random generation and the executive control of working memory. Quarterly Journal of Experimental Psychology: Section A, (), –. Battaglia, P. W., Hamrick, J. B., & Tenenbaum, J. B. (). Simulation as an engine of physical scene understanding. Proceedings of the National Academy of Sciences, (), –. Beaumont, M. A. (). Approximate Bayesian computation. Annual Review of Statistics and its Application, , –. Bousfield, W. A., & Sedgewick, C. H. W. (). An analysis of sequences of restricted associative responses. Journal of General Psychology, (), –. Buesing, L., Bill, J., Nessler, B., & Maass, W. (). Neural dynamics as sampling: A model for stochastic computation in recurrent networks of spiking neurons. PLoS Computational Biology, (), e. Castillo, L., León-Villagrá, P., Chater, N., & Sanborn, A. (). Local sampling with momentum accounts for human random sequence generation. Proceedings of the Annual Meeting of the Cognitive Science Society, . https://escholarship.org/uc/item/gzkg Zhu, J.-Q., Spicer, J., Sanborn, A. N., & Chater, N. (Preprint). Cognitive variability matches speculative price dynamics. doi: ./osf.io/gfjvs Cont, R. (). Empirical properties of asset returns: Stylized facts and statistical issues. Quantitative Finance, (), – Correll, J. (). /f noise and effort on implicit measures of bias. Journal of Personality and Social Psychology, (), . Costello, F., & Watts, P. (). Surprisingly rational: Probability theory plus noise explains biases in judgment. Psychological Review, (), . Dasgupta, I., Schulz, E., & Gershman, S. J. (). Where do hypotheses come from? Cognitive Psychology, , –. Edwards, W. (). Conservatism in human information processing: Formal representation of human judgment. Hoboken, NJ: John Wiley.

https://doi.org/10.1017/9781009002042.027 Published online by Cambridge University Press

Approximating Bayesian Inference through Internal Sampling



Erev, I., Wallsten, T. S., & Budescu, D. V. (). Simultaneous over-and underconfidence: The role of error in judgment processes. Psychological Review, (), . Fiser, J., Berkes, P., Orbán, G., & Lengyel, M. (). Statistically optimal perception and learning: From behavior to neural representations. Trends in Cognitive Sciences, (), –. Gardner, M. (). White and brown music, fractal curves and one-over-f fluctuations. Scientific American, (), –. Gigerenzer, G., & Selten, R. (Eds.). (). Bounded rationality: The adaptive toolbox. Cambridge, MA: MIT. Gilden, D. L. (). Fluctuations in the time required for elementary decisions. Psychological Science, (), –. Gilden, D. L., Thornton, T., & Mallon, M. W. (). /f noise in human cognition. Science, (), –. Hahn, U., & Oaksford, M. (). The rationality of informal argumentation: A Bayesian approach to reasoning fallacies. Psychological Review, (), . Hayes, B. K., Banner, S., Forrester, S., & Navarro, D. J. (). Selective sampling and inductive inference: Drawing inferences based on observed and missing evidence. Cognitive Psychology, , . Hilbert, M. (). Toward a synthesis of cognitive biases: How noisy information processing can bias human decision making. Psychological Bulletin, (), . Hills, T. T., Jones, M. N., & Todd, P. M. (). Optimal foraging in semantic memory. Psychological Review, (), . Howe, R., & Costello, F. (). Random variation and systematic biases in probability estimation. Cognitive Psychology, , . Hoyer, P. O., & Hyvärinen, A. (). Interpreting neural response variability as Monte Carlo sampling of the posterior. In S. Becker, S. Thrun & K. Obermayer (Eds.), Advances in neural information processing systems (pp. –). Cambridge, MA: MIT. Juslin, P., Winman, A., & Hansson, P. (). The naïve intuitive statistician: A naïve sampling model of intuitive confidence intervals. Psychological Review, (), –. Kahneman, D., & Tversky, A. (). Subjective probability: A judgment of representativeness. Cognitive Psychology, (), –. Kersten, D., Mamassian, P., & Yuille, A. (). Object perception as Bayesian inference. Annual Review of Psychology, , –. Koehler, D. J., & James, G. (). Probability matching in choice under uncertainty: Intuition versus deliberation. Cognition, (), –. Konovalova, E., & Le Mens, G. (). An information sampling explanation for the in-group heterogeneity effect. Psychological Review, (), . Lake, B. M., Salakhutdinov, R., & Tenenbaum, J. B. (). Human-level concept learning through probabilistic program induction. Science,  (), –. Le Mens, G., & Denrell, J. (). Rational learning and information sampling: On the “naivety” assumption in sampling explanations of judgment biases. Psychological Review, (), .

https://doi.org/10.1017/9781009002042.027 Published online by Cambridge University Press



 ,    .

Lieder, F., & Griffiths, T. L. (). Resource-rational analysis: Understanding human cognition as the optimal use of limited computational resources. Behavioral and Brain Sciences, . Lieder, F., Griffiths, T. L., Huys, Q. J., & Goodman, N. D. (). The anchoring bias reflects rational use of cognitive resources. Psychonomic Bulletin & Review, (), –. Lloyd, K., Sanborn, A., Leslie, D., & Lewandowsky, S. (). Why higher working memory capacity may help you learn: Sampling, search, and degrees of approximation. Cognitive Science, (), e. Marr, D. (). Vision: A computational investigation into the human representation and processing of visual information. Cambridge, MA: MIT. Moreno-Bote, R., Knill, D. C., & Pouget, A. (). Bayesian sampling in visual perception. Proceedings of the National Academy of Sciences, (), –. Oaksford, M., & Chater, N. (). A rational analysis of the selection task as optimal data selection. Psychological Review, (), . Oaksford, M., & Chater, N. (). Bayesian rationality: The probabilistic approach to human reasoning. Oxford: Oxford University Press. Oaksford, M., & Chater, N. (). New paradigms in the psychology of reasoning. Annual Review of Psychology, , –. Peterson, C. R., & Beach, L. R. (). Man as an intuitive statistician. Psychological Bulletin, (), . Redelmeier, D. A., Koehler, D. J., Liberman, V., & Tversky, A. (). Probability judgment in medicine: Discounting unspecified possibilities. Medical Decision Making, (), –. Reyna, V. F., & Lloyd, F. J. (). Physician decision making and cardiac risk: Effects of knowledge, risk perception, risk tolerance, and fuzzy processing. Journal of Experimental Psychology: Applied, (), . Rhodes, T., & Turvey, M. T. (). Human memory retrieval as Lévy foraging. Physica A: Statistical Mechanics and its Applications, (), –. Sanborn, A. N., & Chater, N. (). Bayesian brains without probabilities. Trends in Cognitive Sciences, (), –. Sanborn, A. N., Griffiths, T. L., & Navarro, D. J. (). Rational approximations to rational models: Alternative algorithms for category learning. Psychological Review, (), . Sanborn, A. N., Mansinghka, V. K., & Griffiths, T. L. (). Reconciling intuitive physics and Newtonian mechanics for colliding objects. Psychological Review, (), . Sanborn, A. N., Zhu, J.-Q., Spicer, J., et al. (). Sampling as the human approximation to probabilistic inference. In S. Muggleton & N. Chater (Eds). Human-like machine intelligence. Oxford: Oxford University Press. Shi, L., Griffiths, T. L., Feldman, N. H., & Sanborn, A. N. (). Exemplar models as a mechanism for performing Bayesian inference. Psychonomic Bulletin & Review, (), –.

https://doi.org/10.1017/9781009002042.027 Published online by Cambridge University Press

Approximating Bayesian Inference through Internal Sampling



Simon, H. A. (). A behavioral model of rational choice. Quarterly Journal of Economics, (), –. Sloman, S., Rottenstreich, Y., Wisniewski, E., Hadjichristidis, C., & Fox, C. R. (). Typical versus atypical unpacking and superadditive probability judgment. Journal of Experimental Psychology: Learning, Memory, and Cognition, (), . Stewart, N., Chater, N., & Brown, G. D. (). Decision by sampling. Cognitive Psychology, (), –. Sundh, J., Zhu, J. Q., Chater, N., & Sanborn, A. (, July ). The meanvariance signature of Bayesian probability judgment. https://doi.org/ ./osf.io/yuhaz Tentori, K., Crupi, V., & Russo, S. (). On the determinants of the conjunction fallacy: Probability versus inductive confirmation. Journal of Experimental Psychology: General, (), . Tversky, A., & Kahneman, D. (). Judgment under uncertainty: Heuristics and biases. Science, (), –. Tversky, A., & Kahneman, D. (). Extensional versus intuitive reasoning: The conjunction fallacy in probability judgment. Psychological Review, (), . Van Rooij, I. (). The tractable cognition thesis. Cognitive Science, (), –. Van Rooij, I., & Wareham, T. (). Intractability and approximation of optimization theories of cognition. Journal of Mathematical Psychology,  (), –. Viswanathan, G. M., Buldyrev, S. V., Havlin, S., et al. (). Optimizing the success of random searches. Nature, (), –. Vul, E., Goodman, N., Griffiths, T. L., & Tenenbaum, J. B. (). One and done? Optimal decisions from very few samples. Cognitive Science, (), –. Wagenaar, W. A. (). Generation of random sequences by human subjects: A critical survey of literature. Psychological Bulletin, (), . Zhu, J.-Q., Leon-Villagra, P., Chater, N., & Sanborn, A. N. (). Understanding the structure of cognitive noise. PLOS Computational Biology, (). doi: ./journal.pcbi. Zhu, J. Q., Sanborn, A. N., & Chater, N. (). Mental sampling in multimodal representations. In S. Bengio, H. Wallach, H. Larochelle et al. (Eds.), Advances in neural information processing systems . Montreal: Curran Associates. Zhu, J. Q., Sanborn, A., & Chater, N. (). Why decisions bias perception: An amortised sequential sampling account. Proceedings of the Annual Meeting of the Cognitive Science Society, , –. Zhu, J. Q., Sanborn, A. N., & Chater, N. (). The Bayesian sampler: Generic Bayesian inference causes incoherence in human probability judgments. Psychological Review, (), –.

https://doi.org/10.1017/9781009002042.027 Published online by Cambridge University Press



 ,    .

(Preprint). Cognitive variability matches speculative price dynamics. doi: ./osf.io/gfjvs Zhu, J., Sundh, J., Spicer, J., Chater, N., & Sanborn, A. N. (, February ). The autocorrelated Bayesian sampler: A rational process for probability judgments, estimates, confidence intervals, choices, confidence judgments, and response times. https://doi.org/./osf.io/qxf

https://doi.org/10.1017/9781009002042.027 Published online by Cambridge University Press

 

Sampling Data, Beliefs, and Actions Erik Brockbank, Cameron Holdaway, Daniel Acosta-Kane, and Edward Vul

.

Introduction

This volume tackles the ambitious challenge of reviewing the heterogeneous set of results, phenomena, accounts, and models that fall under the banner of sampling in judgment and decision making. These accounts address different domains of behavior, in different contexts, with different constraints, yet the reliance on “sampling” suggests that they share some central properties. In this chapter we aim to provide an organizing framework for a sample of this literature, to highlight the conceptual differences and the core similarities across domains. What do sampling accounts have in common? Indeed, what constitutes a “sampling” account of human behavior? In all cases, sampling accounts posit that people perform some calculation using a small subset of all the values that are relevant for the calculation. This subset is often generated through stochastic processes over which the person may not have full control. In recent years, the use of sampling as a mechanism to explain various forms of inference and decision making has spread across broad domains of psychology. The most familiar type of sampling account is concerned with observations of the world (Fiedler & Juslin, ). These accounts are motivated by the similarity between what a statistician must do to draw inferences from limited data, and what individual humans must do to act based on noisy, sparse observations of their environment. These accounts formalize the notion that the world is far larger, more complicated, and more dynamic than any one person can apprehend. Consequently, people must act based on a small subset of observations – a sample – rather than a complete snapshot of the world state. Though initially developed to explain inferences about perception and the physical state of the world (Braddick, ), these models have been applied to a range of more abstract inferences about social structure and the behavior of others (Fiedler, ). However, sampling has also been proposed as a 

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press



 ,    .

modeling framework for the computations that people undertake to revise their beliefs or to generate predictions about the state of the world (Griffiths et al., ). On this view, human beliefs are probability distributions over possible states of the world, and prior beliefs are updated in light of new data via Bayes’ rule to form posterior beliefs. This belief updating is usually analytically intractable and in applied settings is often approximated via Monte Carlo methods that draw samples from the posterior (Doucet et al., ). Work in this space has proposed that similar strategies are adopted by the brain; beliefs about world states are represented as finite sets of samples. More recently, sampling has also been postulated as a strategy for choosing actions (Phillips et al., ). Under these accounts, the set of possible actions is impossibly large, so choosing the best one cannot be accomplished by evaluating the prospective quality of all actions. Instead, we sample a subset of actions to consider and evaluate, then pick the best action from among those sampled. Far from being a comprehensive overview, these examples illustrate the wide reach that sampling has attained in computational models of decision making; the additional role attributed to samples from memory for instance (Nosofsky & Palmeri, ) broadens the purview of sampling accounts even further. Given this diversity of sampling models in the literature and the range of research goals they support, we might ask whether it’s helpful to view them as members of a coherent class of “sampling accounts” at all. Would we be better off instead choosing distinct terms for each, to avoid confusion? The premise of this chapter is that a unified review of these accounts will not only help disentangle their differences, but can also provide structure for considering their commonalities and identifying opportunities for crosspollinating research ideas across fields. We propose that these different sampling accounts are all approximating different components of expected utility maximization: data, beliefs, and actions. We review a broad swath of sampling literature within this framework and show how describing sampling accounts as approximating specific components of expected utility clarifies how they are related and poses novel questions and comparison points for seemingly distinct areas of research. We close by highlighting opportunities for synthesis across sampling domains and outlining challenges that arise in light of the fact that in real-world decisions, all of these components must be sampled together.

. Expected Utility Framework We propose that most sampling accounts can be best understood in the context of choosing an action to maximize expected utility. The

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press

Sampling Data, Beliefs, and Actions



computations necessary to pick an action that maximizes expected utility require combining data, marginalizing beliefs, and optimizing over actions. Within this framework, existing sampling accounts describe the consequences of using small subsets of alternatives to approximate each of these operations. .. Sampling to Approximate Expected Utility Many of our everyday decisions revolve around choosing the best action in a given situation. What will we wear to a friend’s party in the evening and what should we bring? What will be the best route to drive there? How long should we stay? All of these decisions involve selecting an ideal course of action from among many. How do people evaluate their choices and ultimately settle on one? We start with the expected utility hypothesis, the assumption that for an intelligent agent, the best choice ðA∗ Þ is the action that yields the largest expected utility ðU ð:ÞÞ. A∗ ¼ arg max ½U ðAÞ A

(22.1)

This formulation, though intuitive, is ultimately a crude decision making policy because it disregards context-dependent variation in action outcomes. For intelligent behavior, we would expect the payoffs of actions to vary as a function of context, as indicated to the agent by observed data (x). A∗ ¼ arg max ½U ðA j x Þ A

(22.2)

In this richer formulation, utilities associated with actions vary based on observed data; we condition on the data x to yield different predicted action outcomes. Expected utility in Equation . is equivalent to the Qfunction in reinforcement learning (Watkins & Dayan, ), associating payoffs with specific action-context combinations. While equating context with observed data is sufficient in settings with simple, unambiguous observations, in more realistic settings where observations are complex and noisy, it is more parsimonious to consider payoffs to be contingent on a latent state (s), rather than the data directly. Under this formulation, the utility function abstracts away from data, considering utility to be a property of an action in a state. Uncertainty arises from ambiguity about what state the agent is in. A∗ ¼ arg max A

X

s

U ðA; s ÞP ðs j x Þ

(22.3)

Equation . emphasizes the different roles that data (x), beliefs (P(s)), and actions (A) play in decision making: We optimize over actions

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press

  ,    .   arg max , we take an expectation, or marginalize, over beliefs P A   s f ðs ÞP ðs Þ , and we condition on data ðP ðs j x ÞÞ. Choosing an action that maximizes expected utility requires that an agent perform all of these calculations in order to decide what to do. Given an infinite number of available actions and possible states of the world, these calculations are computationally intractable in their general formulation. Thus, human behavior is typically seen as approximating the expected utility maximization described in Equation . (Gershman et al., ). Broadly, sample-based accounts of behavior and decision making propose that when solving problems which are intractable via brute force calculation, people rely on a limited set of samples to approximate the underlying calculation. This can be done for each of the operations in Equation . described above. Instead of considering all the available information, people make decisions based on a sample of observations, acting as “intuitive statisticians” (Fiedler & Juslin, ). Likewise, instead of considering the full probability distribution over states of the world, Monte Carlo methods (Robert & Casella, ) demonstrate that one can consider a sample of possible states to achieve the same end. Finally, numerical optimization techniques show that a global optimum such as the best action in a given context may be found with high probability by considering a small subset of alternatives (Karnopp, ). In this way, sample-based accounts offer an answer to the question of how people might maximize expected utility in their decisions; an approximate solution to Equation . can be estimated using samples from the relevant distributions over data, beliefs, and actions without relying on the full underlying distributions. This approximation of classical expected utility (Von Neumann & Morgenstern, ) contrasts somewhat with forms of subjective expected utility (Savage, ) which argue that the perceived utility or probability of an outcome may be fundamentally distorted from the true underlying probability or value (a prominent example is the argument in prospect theory (Tversky & Kahneman, ) that the utility 

We rely on this simple formulation throughout the paper. However, a more thorough model-based view of utility would rewrite U ðA; sÞbased on state transitions arising from actions: P U ðA; s Þ ¼ s0 U ðs 0 ÞP ðs 0 j A, s Þ. This ascribes utilities to particular states, rather than state–action combinations, and has the advantage of clarifying the role of predictive beliefs – beliefs not just about the current state (s), but also about future states (s 0 ). This formulation may be further expanded to consider sequential decisions. The proposal in this chapter, that diverse sampling accounts support approximate expected utility maximization, applies equally to expanded versions of Equation ., but for our purposes the version in Equation . is sufficient.

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press

Sampling Data, Beliefs, and Actions



curve for losses and gains is asymmetric). As we describe below, various distortions to the perceived probability or utility of an outcome can arise from the process of sampling data, beliefs, or actions, and some of these may lead to behavior that is consistent with prominent accounts like prospect theory. However, an approximation that distorts the true probability can be considered somewhat distinct from maintaining a high fidelity but asymmetric utility curve. To illustrate this sample-based account of approximate expected utility maximization, imagine the familiar and sometimes daunting task of deciding what to wear to a party. Here, selecting the action that maximizes expected utility amounts to choosing the best outfit. A brute force solution would require examining every item of clothing that may already be available in one’s closet, or which might be purchased in advance of the party. One must then calculate the expected utility of each combination of garments. However, we often opt for a simpler solution: sample a set of actions, i.e., candidate outfits from our wardrobe, and evaluate those. Formally, the arg max in Equation . is evaluated over a sample of A actions in A. But here too, we face a challenge in calculating the expected utility of a given outfit because it will be impacted by incidental information about the state of the world, like what sort of party we are going to. To calculate the expected success of a candidate outfit, we would need to consider how well that outfit would be received in every conceivable party: a pool party or a drunken wake, a reception by an ambassador or a birthday party for a three-year old, and so on. To calculate the expected utility of a given outfit, a rational agent must marginalize over all of these possible states weighted by their probability in light of all the information they have about the party in advance. Obviously, we do not do anything so thorough. In this case, we might begin by sampling data from the external world which supports our decision making: who else is going to this party? What does the invitation look like? Is it somebody’s birthday? If it’s outside, what will the weather be like? And how late will it go? These data samples are used to update beliefs about what kind of party we are attending, which impact the expected utility estimation. This means that the P ðsjx Þ in Equation . is evaluated based on only a sample of data points in x – we do not ask all of our friends whether they will be going, but just a few. Further, we do not then consider every possible belief we might have about the party. Instead, we consider just a few different prospective party environments, sampled with frequency proportional to their probability under P ðsjx Þ. This set of samples is an approximate representation of P ðsjx Þ, and allows us to estimate the expected value of

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press



 ,    .

Figure . The expected utility framework for sample-based models in decision making, shown here for the decision about what to wear to a party. Left: when making decisions, we typically sample a subset of options from the infinite space of possible actions (A), e.g., a handful of possible outfits. Top right: actions are evaluated based on their utility averaged over possible world states, sampled according to the agent’s beliefs (probabilities of states s), such as what kind of party it will be. Bottom right: beliefs are informed by data (x) sampled from the external world, such as what the weather will be like.

an outfit without considering infinitely many party possibilities. This makes our expected utility calculation in Equation . computationally much simpler. We choose the best of a subset of possible actions (outfits), based on its utility averaged over a subset of possible states (party environments), sampled according to their probability in lightPof a sample of relevant data (e.g., who is attending): A∗ ¼ arg max ni¼ U ða; si Þ=n a2A

where s i  P ðs j x Þ. This process is illustrated in Figure .. Critically, decisions such as this are sometimes difficult when we choose to evaluate our options carefully, yet they are fairly commonplace and people, for the most part, do them well. How do people regularly make such rich, sample-based inferences and what are the limitations in this ability? We propose that the many 

This process may arise not just in one-off decisions but in repeated decisions or those made over a longer time frame. We thank a reviewer for pointing out that the iterative process of behavioral research itself relies on sampling data which approximates an underlying distribution, sampling beliefs about the data to formulate or refine hypotheses, and choosing from among sampled actions to pursue further inquiry.

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press

Sampling Data, Beliefs, and Actions



sampling accounts in the psychology literature offer insights into these questions when viewed within the context of sampling data, beliefs, and actions to maximize expected utility. This framework may not cleanly encompass every sampling model or example of sample-based inference in decision making, and some important classes of sampling models, such as sampling from memory, may be cross-cutting categories that play a role in multiple terms of the expected utility calculation. Nonetheless, we argue that framing a large swath of existing models and their contributions as supporting expected utility approximation provides both a unified perspective on sampling research, and, critically, a means of guiding future research in these areas: how can prior work on data sampling accounts inform belief sampling research? And what does the literature on belief sampling tell us about models of action sampling? ..

Comparing Sample-Based Accounts

Though a diverse set of sample-based accounts might in theory be integrated under expected utility calculation as described above, comparing them with this lens is only useful if it provides novel insights or future research directions. However, evaluating the shared constraints and challenges across different sampling domains is itself nontrivial. For one, because of the unique questions motivating each of these research traditions, these different types of sampling have largely been considered in isolation. Further, out of practical considerations, researchers studying data, belief, or action sampling mostly ignore the other variables to provide greater experimental control. For example, researchers interested in data sampling typically design tasks where state estimates are determined by the data (i.e., there is little role for priors or uncertainty arising from internal models), and where the set of available actions is small and explicitly provided. Consequently, all the behavioral variability may be attributed to the data sampling process and calculations about beliefs or actions may be safely ignored. Research on belief and action sampling place similar restrictions on the other facets of expected utility maximization so as to effectively isolate the component under study. This raises the question of how to compare these distinct sampling paradigms with the goal of integrating them into a more unified account of sampling in decision making. Can we view data, belief, and action sampling as solving the same kinds of problems or operating within similar constraints? Or are they better understood as solving sufficiently different problems that their similarities stop at the use of samples? If the latter is the case, the mere

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press



 ,    .

inclusion of these models in a broader expected utility paradigm offers little additional ability to compare them or make progress on one by appeal to the others. We believe that models of data, belief, and action sampling face a number of similar constraints which allow them to be usefully compared, and the successes of one potentially applied to the others. The constraints they share, and the point of departure for comparing them in this chapter, draws on what we consider to be essential aspects of any sampling model. Critically, a sampling-based account of behavior is only useful insofar as it makes divergent predictions from what would be expected if people were not sampling but instead considering the full set of possible alternatives. For instance, action sampling is only interesting insofar as it predicts something different than considering all possible actions. Where sampling accounts make identical predictions as inference based on complete information, we may have little reason to prefer the sampling models. What then differentiates a sampling model from one based on complete analytical inference? We propose that nearly all sampling models can be compared along two key dimensions that distinguish their predictions from alternative accounts: the number of samples and the sampling distribution. These features are central to what it means to be a sampling account and, critically, form the basis of a sampling account’s unique predictions in decision making contexts. Consider first the number of samples. Under all sampling accounts, if infinitely many samples are used, this is no different from relying on the full set of possibilities, so nothing is gained from postulating sampling. In contrast, when a decision maker relies on only a small number of samples, their inferences may be biased in various ways by the limits in their sampling. Therefore, the number of samples used, and the considerations that influence this, are central to the predictions that a sampling account offers. Here, we aim to show that how many samples are drawn and how this number is determined can be asked equally of data, belief, and action sampling models, and the answers provided by one can be useful for the others. Second, the distribution generated by sampling is similarly critical for a sampling account to make concrete predictions. Under optimal conditions, independent and identically distributed (IID) samples might lead to decisions consistent with use of the full distribution, in which case sampling models may not be identifiable. However, in many cases it is simply impossible to generate idealized samples, and whatever algorithms are used to obtain the samples will necessarily create some systematic deviation in the set of samples which will affect downstream behavior

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press

Sampling Data, Beliefs, and Actions



(Juni et al., ; Sanborn, ). Further, even samples obtained optimally for one purpose are likely to yield systematic biases with respect to another goal (Fiedler, ). For instance, when sampling in lowprobability, high-stakes situations, one might either correctly estimate the probability of each outcome (by sampling according to probability), or correctly estimate the utility maximizing action payoffs (by sampling according to probability weighted by utility), but no adjustment strategy can allow both decisions to be unbiased using a small number of samples (Lieder et al., ; Vul et al., ). Thus, the sample distribution, like the number of samples taken, is a critical feature of sampling models that determines the particular ways in which sample-based decisions deviate from optimal behavior. Given the fundamental role that the number of samples and the sample distribution play in sample-based models, we use these two characteristics as a basis for reviewing data, belief, and action sampling accounts of behavior under the umbrella of expected utility maximization. In what follows, we discuss how results in data, belief, and action sampling vary along these axes, and critically, how each one can inform future research in the others. In restricting the scope of our review to consider only the number of samples and the sample distribution, we do not intend to overlook other meaningful aspects of sampling models, such as the costs and benefits of samples, the algorithms that yield them, or how explicit the sampling process is. Instead, we propose that the consequences of these and other noteworthy sources of variation in sampling procedures are largely captured by virtue of their role in the number, and the distribution, of samples. As we show, comparison of sampling accounts along these dimensions alone offers fertile ground for identifying the major contributions of existing sampling models, as well as opportunities for future work which might improve our understanding of how people use samples to maximize the expected utility of their decisions.

. Overview of Sampling Models in the Literature This chapter proposes that sampling accounts of behavior can be fruitfully examined within a unified framework of expected utility maximization. Here, we illustrate this process, reviewing a large range of sample-based models that are consistent with data, belief, and action sampling in sequence. For each of these sampling domains, we address considerations of the number, and the distribution, of samples, and how these results might inform or benefit from the other classes of sampling model.

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press



 ,    . ..

Data Sampling

At their core, models of data sampling are about gathering information which will reduce uncertainty about the present environment to support better (i.e., more informed) downstream decisions. In the example we provided at the outset (Figure .), in which a person seeks to maximize the expected utility of possible clothes to wear to a party, data sampling amounts to seeking information which will refine their belief about the sort of party they are attending, for example, what will the weather be like and who else will be there? From an expected utility standpoint, the role of each piece of data, x, is to improve posterior estimates over states. In most simple settings, each datum constrains the posterior distribution, P ðs j x Þ, Q multiplicatively: P ðs j x Þ / P ðsÞ P ðx j s Þ. Given this broad formulation, x

we consider any process that obtains information from the outside world and supports subsequent decision making to be an instance of data sampling. Because acquiring information about the surrounding environment is a critical behavior for most if not all animals, some of the earliest sample-based models in psychology – and, as we’ll discuss, some of the most well formalized – have concerned data sampling. ... The Number of Samples in Data Sampling Data sampling models are rooted in Fechner’s () two-alternative forced choice (AFC) experimental paradigm, in which observers make repeated binary classifications of stimuli. In a canonical example, random dot kinematograms (Braddick, ) present participants with a field of dots each moving to the left or right. People are asked to judge the prevailing direction of the dots in each image as quickly and accurately as possible. Typical behavior in the task reflects a speed–accuracy tradeoff; intuitively, responding more quickly on any given trial decreases the probability of answering correctly, while increasing the number of subsequent trials that can be completed. The AFC paradigm provides a precise and highly controlled environment for examining how people sample data from the external world. Computational accounts of behavior in the AFC task fall into the broad class of drift diffusion models (DDM) (Ratcliff & McKoon, ; Ratcliff & Smith, ) (see Voss et al. () for a practical guide), which owe their origins to the sequential probability ratio test (SPRT) (Wald & Wolfowitz, ). Using speed and accuracy data for each participant, DDMs fit a decision threshold α which represents the level of certainty

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press

Sampling Data, Beliefs, and Actions



required to choose either option (essentially the desired level of accuracy), and a drift rate ν which corresponds to the rate at which people accumulate evidence in the task (related to their task speed). The decision threshold indicates how much evidence people accumulate before making a decision, which imposes a distribution on the number of stimulus samples they consider in their choice (Vul et al., ). In this way, DDMs provide a descriptive account of how different contextual or environmental features, for example, the proportion of left and right stimuli (Ratcliff & McKoon, ), impact the number of samples a subject uses to make a decision. Critically, DDMs not only allow for precise characterization of the number of samples taken in data sampling settings, but also lend themselves to rational analysis: how many samples should one take in a given context? The decision policies adopted by the SPRT and DDM are optimal in the sense that they yield the fastest decision times for a given level of accuracy (Bogacz et al., ). This in turn reflects an optimality of data sampling; these models minimize the number of samples needed to achieve a desired level of accuracy. Critically, optimal thresholds in these models must reflect the objective or utility functions of the specific tasks: what are the relative benefits of speed and accuracy? Thus, decision thresholds connect an optimal agent’s objectives and the number of samples they take. For instance, one often-used objective function is maximizing reward rate, as determined by task specific parameters. Given a particular time cost of samples, fixed nondecision processing time, and inter-trial delays associated with both correct and incorrect responses, one can find a threshold function that optimizes the rate of reward. This function will in turn dictate an optimal number of samples. Other objective functions like minimizing Bayes risk can similarly be mapped onto specific threshold functions (Bogacz et al., ). Broadly, the SPRT and DDM therefore provide a normative approach to how many data samples to draw in AFC tasks. Given the success of DDMs in characterizing both empirical and optimal data sampling behavior, these models provide a template for studying finite sampling across domains, but also highlight important directions for future development within data sampling. DDMs have been best characterized in AFC tasks, but many important human behaviors deviate from this simple case, for instance by facing choices with more than two alternatives, or data samples that are actively selected rather than passively received. In these scenarios, we must turn to more elaborate sampling algorithms that can grapple with these complications. However, even in these richer settings, the same core considerations clearly identified by the DDM paradigm will apply: what are the costs and

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press



 ,    .

benefits of samples, and how can a stopping rule be chosen to optimize an objective function for a given problem? In short, the formalization of number of samples offered by DDMs can guide efforts at similar precision in other related tasks. ... The Sample Distribution in Data Sampling In data sampling models, samples are stochastic observations of the world which people use to reduce uncertainty and inform action choices according to Equation .. Notably, the drift diffusion paradigm typically places people in the role of passive observer, presented with natural samples (Gigerenzer & Hoffrage, ) that correspond to the likelihood function observers use (Ratcliff & Smith, ). In this canonical formulation, departures from rational behavior cannot be explained by the sample distribution as it is reflected in the likelihood function P ðx j sÞ, since the sample-generating process and the observer’s model thereof are presumed to match. However, a large body of research has examined what happens when samples from the decision maker’s environment do not represent IID samples from the data distribution presumed by observers. Research examining decisions based on biased samples has mostly done so in the context of social inferences about the people around us. While this represents a far more abstract domain than, say, assessing the prevailing direction of dot kinematograms, the paradigm of sampled data from one’s environment supporting downstream approximation is largely the same. For example, when people were asked to provide estimates for a range of health and well-being measures in the general population (e.g., income and education levels, work stress, and health problems), their responses showed systematic biases relative to their own self-appraisal on these metrics; the biases are well predicted by a model in which people’s inferences were based on samples from their immediate environment, which may have differed from the population distribution substantially (Galesic et al., , ). Similar accounts based on biased samples from one’s surroundings have been proposed in other domains of social reasoning, such as the role of peers in determining social attitudes and the robust tendency to judge one’s in-group as more heterogeneous than one’s out-group (Konovalova & Le Mens, ). These findings fall under the broad umbrella of wicked learning environments (Hogarth et al., ), in which the sampled data from which people learn and generalize deviates systematically from the population or “test” data (this is contrasted with kind learning environments in which there is a closer relation and any divergence is primarily a result of noise). Hogarth et al. () show that a broad range of robust biases such as survivorship bias and the “Hot Stove

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press

Sampling Data, Beliefs, and Actions



Effect” can be accounted for by small samples drawn in various kinds of wicked environments. The previous examples suggest that biases in our social judgments and attitudes may be explained by external processes that systematically distort the distribution of data we sample in the course of everyday experience. However, there are a number of ways in which our own behavior can further bias the sample distribution of data on which we base decisions and actions. Learning and decision making scenarios in which people actively query the environment may require them to make critical decisions about what kind of data to sample and when to stop sampling. These active learning paradigms are often designed to reflect more naturalistic scenarios; in the example in Figure ., deciding what to wear to a party has this character, since a person can exercise some control over how many friends to ask about party attendance, and how long to persist. Work on data sampling in more active settings has shown that decisions about when to stop sampling and what to sample can lead to biased sample distributions which account for further idiosyncrasies in decision making based on these samples. First, consider the decision about when to stop sampling. In the multiarmed bandit paradigm (Daw et al., ; Gittins, ) participants sample outcomes from two “bandits” or levers, each of which has a unique underlying reward probability (or a distribution, e.g., a  percent chance of one dollar and a  percent chance of five dollars). At the end of the trial period, participants make a (usually binary) decision about which bandit they will select for their final reward (Hertwig & Erev, ). A large body of work exploring binary choices in these settings finds that people tend to take very few samples, even when the cost of samples is negligible (Hau et al., ; Hertwig et al., ; Hertwig & Erev, ; Hertwig & Pleskac, ). Such sparse sampling tends to underrepresent low-probability outcomes, and this bias in sample distributions can impact downstream behavior. While the biased distribution may make certain actions more efficient (e.g., choosing from among two gambles), people are unlikely to correct for this bias when their goals change or they are confronted with a new task, for example, estimating the underlying distributional characteristics (Coenen & Gureckis, ; Jazayeri & Movshon, ). In addition to decisions about when to stop sampling, decision makers in active learning contexts can make goal-directed decisions about what data to sample. For example, when deciding where to go for dinner, we 

In active learning settings where samples are generated through behavior, the line between data sampling and action sampling may seem somewhat blurred. For our purposes, we aim to distinguish between sampling actions for consideration, and trying actions to learn from the data they generate.

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press



 ,    .

might sample reviews for a particular cuisine and look for positive ones or seek out positive samples and then choose a cuisine from among them. Such selective sampling can greatly increase the efficiency of decision making relative to natural sampling (Fiedler, ), but selective sampling of a particular variable will necessarily produce biased samples with respect to that variable’s base rate or to other correlated variables (Dawes, ). In problems of information search, people show little ability to correct for these biases in subsequent judgments; broadly, where data sampling reflects distortions from the underlying distribution brought about by people’s sampling choices, their downstream judgments will be similarly biased (Fiedler, , ). In some cases, decisions about what kind of data to sample can reflect broad biases in information search, as in the case of positive test strategies and confirmation bias (Klayman & Ha, ). Recent work has attempted to characterize the rational principles and goals that might guide such biased sampling decisions, for example maximizing expected information gain (Rothe et al., ) or testing sparse hypotheses (Navarro & Perfors, ; Oaksford & Chater, ). However, identifying such guiding principles in data sampling can be challenging (Rothe et al., ) and, more importantly, it remains the case that when such principles lead to biased samples, people typically fail to correct for these distortions in the sample distribution, leading to a range of familiar behavioral biases (Coenen & Gureckis, ; Fiedler, ). ..

Belief Sampling

In the last few decades, prominent models of higher-level cognition, reasoning, and decision making have relied on probabilistic inference over rich internal knowledge structures (Chater et al., ; Knill & Richards, ; Oaksford et al., ; Tenenbaum et al., ). These probabilistic reasoning accounts postulate that human beliefs can be characterized as probability distributions like the ones supporting expected utility maximization in Equation .: the posterior distribution over states conditioned on observations, P ðs j x Þ. In the example discussed at the beginning, the sampled set of beliefs follow this pattern; what kind of party we are attending and whether there will be dancing allow us to 

It is worth noting that samples from memory may also be considered along these lines; decision making can be aided by reaching back in memory for relevant experiences that will inform the current decision. However, due to the breadth and complexity of such models, we save this for the discussion.

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press

Sampling Data, Beliefs, and Actions



approximate a more complicated distribution over states given the available data (see Figure .). This illustrates one of the central challenges of belief sampling models, namely that estimating the underlying probability distributions may be arbitrarily difficult depending on what constitutes a “state,” as well as the complexity of the models postulated for the prior (P(s)) and likelihood ðP ðx j sÞÞ distributions. The sophistication of these models has raised concerns about the plausibility of the human brain carrying out such fundamentally intractable inference (Gigerenzer & Goldstein, ; Jones & Love, ; Kwisthout et al., ). This tension has created an active area of research on biologically and psychologically plausible inference algorithms that might approximate probabilistic inference over large knowledge structures (Tenenbaum et al., ), with the most attention paid to variations of sampling algorithms (Bonawitz et al., ; Sanborn et al., ; Shi et al., ). These belief sampling accounts propose that the probability distributions over knowledge structures – that is, beliefs, under the probabilistic reasoning framework – are approximated by sets of samples (Lieder et al., ; Sanborn et al., ; Vul et al., ). Such belief sampling accounts arise in models of physical reasoning (Battaglia et al., ; Ullman et al., ), category learning (Bonawitz & Griffiths, ; Goodman et al., ; Shi et al., ), sentence parsing (Levy et al., ), theory of mind (Baker et al., ), creative thinking (Smith et al., ), multiple object-tracking (Vul et al., ), and many more. In all these domains, inference is supported by a sampled set of beliefs about what sort of world we might be in given the available data: what rule governs category membership, what a particular sentence’s meaning evaluates to, or what sort of physical outcome is likely to occur from a particular starting state. ... The Number of Samples in Belief Sampling Belief sampling accounts start with the assumption that sampling is done by an algorithm that approximates the relevant probability distribution without bias, at least in the limit of infinitely many samples (Gershman et al., ). This assumption is important for addressing the motivating challenge of biological and psychological plausibility of probabilistic inference noted previously. However, while the sampling algorithms underlying belief sampling models may make certain guarantees in the limit, when these algorithms are used to model human behavior they often rely on only a few samples. After all, considering infinitely many samples seems just as implausible as working with the complex probability distribution directly.

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press



 ,    .

Further, relying on few samples ensures that a sample-based approximation to a probabilistic model generates novel predictions (otherwise, it perfectly mimics the probabilistic model with no sampling). The net result is that belief sampling accounts start with the assumption that few samples are used and aim to characterize just how small that number is. One approach to characterizing the number of samples relies on the role of sampling variability in explaining differences between individual and group-level behavior (Dasgupta et al., ; Vul et al., ). In a range of settings, decision making across a group of participants or aggregated over many trials closely resembles complete probabilistic reasoning (i.e., based on infinite samples), yet individual or trial-by-trial results are often idiosyncratically variable (Goodman et al., ; Griffiths & Tenenbaum, ; Lewandowsky et al., ; Mozer et al., ). The tension between aggregate population behavior consistent with probabilistic inference and seemingly irrational individual behavior can be resolved by positing that individuals use very few samples to guide decisions. This produces high variance individual trial behavior that approximates full probabilistic inference over many trials. By modeling variation in individual behavior, researchers estimate how many samples people might be using and often find low numbers (Mozer et al., ). Although such reliance on few samples may seem surprising, in a broad set of decision tasks, under reasonable assumptions about the cost of samples, making quick decisions based on only one or a few samples is optimal (Vul et al., ). As in data sampling models, considerations of sampling cost are critical for belief sampling. In some cases, generating belief samples can be highly burdensome, relying on the simulation of complex physical world models or generative processes. This, in combination with the effort to underscore the cognitive plausibility of belief sampling models, places the cost of samples front and center. The concept of sample re-use across inferences has therefore emerged as a relevant factor effecting the number of (novel) samples. When the cognitive costs of sampling are assumed to be large, the ability to remember previous samples offers a powerful opportunity to save time and computation across suitably similar decision contexts (Dasgupta & Gershman, ; Logan, a). First, sample reuse is obviously helpful when we might need to answer a more or less identical question again. For example, when asked to give an estimate for simple questions like, “What percent of the world’s airports are in the United States?” people provided less correlated responses when they were asked the question again three weeks later than when they were asked immediately after the first response (Vul & Pashler, ), and the

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press

Sampling Data, Beliefs, and Actions



correlation between immediate repeated guesses was lower for people with lower memory capacity (Hourihan & Benjamin, ). Here, memory for the previous answer is assumed to bias the second response when people are prompted at close intervals, suggesting that people will re-use their initial sample in subsequent judgments. The broader question of how samples from memory align with the expected utility framework is something we address at greater length in the discussion. A related body of work addresses the conditions under which people resample or continue to rely on a previous sample even as they receive new data. For example, across a number of category learning tasks, individual behavior can be fit by a particle filter algorithm with a single particle, a finding consistent with continuing to use a sampled hypothesis as long as it continues to fit the data (Sanborn et al., ). Similar work showed that a “win-stay, lose-sample” algorithm which retains an existing hypothesis and only resamples when it fails to describe the data captures adult and toddler behavior in a causal learning task (Bonawitz et al., ). Even more dramatically, Goodman et al. () provide evidence that participants in sequential concept learning settings will continue to use a sampled rule even when other rules are more likely (or even fit the data perfectly). Together, these results suggest that people exhibit a strong tendency to reuse samples across repeated decisions, limiting the number of samples they need to draw when doing so is costly. But how far does this tendency go? A growing body of work suggests that people will re-use costly samples to support multiple related decisions (Dasgupta & Gershman, ; Dasgupta et al., ; Dasgupta et al., ; Gershman & Goodman, ). For example, when people are asked to make judgments that can be supported by a previous inference due to conditional dependence, their responses can be strongly predicted by the previous response relative to people who did not make the previous inference first, and their response times are consistent with sample re-use (Gershman & Goodman, ). Though these results suggest that people make dynamic inferences about sample re-use in belief sampling, many questions remain about how people make such decisions and what kinds of limits or biases are introduced in the process. ... The Sample Distribution in Belief Sampling When sampling beliefs, the sample distribution reflects the challenges of generating representative samples from P ðs j x Þ in Equation . through purely cognitive processes. While sampling data is often a matter of exploring or receiving information from the world, sampling beliefs

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press



 ,    .

requires deploying machinery capable of imagining different possible worlds. In some cases, this may be as simple as imagining what kind of outcome we might receive from a particular gamble (Lieder et al., ) or what values a simple random variable in the environment might take (Vul et al., ). However, in many cases, this requires more sophisticated mental models or generative processes to produce the samples. To illustrate, consider “noisy physics engine” models of intuitive physics, in which inferences about whether a tower of blocks will fall or a ball will hit a target are based on forward simulation of a dynamic physics engine to produce samples of how different configurations of blocks might behave (Battaglia et al., ). Or, in category learning paradigms, generating a hypothesis about the rule that determines category membership (i.e., the belief that best describes the available data) may involve a rich generative process to produce sample rules (Bonawitz & Griffiths, ; Goodman et al., ) or draw on prior knowledge (Williams & Lombrozo, ). In light of these challenges, bias in the sample distribution arises first from the difficulty of obtaining a sample at all. In such settings, sampling requires creative algorithmic solutions which simplify the process of generating a sample. However, as a result of these simplifications, such processes may not faithfully represent the underlying distribution with only a limited number of samples. One common approach relies on Markov chain Monte Carlo (MCMC), in which each sample is generated through easily computable modifications to the previous sample. A sequence or “chain” of samples generated in this way has the property that in the limit, it approximates the underlying distribution with high fidelity (Gilks et al., ). MCMC is commonly used in machine learning applications to approximate complex distributions that cannot be analytically specified. However, because of the iterative sampling algorithm, MCMC samples have an autocorrelation that is more pronounced in small-sample regimes. The sample distribution may in turn be more homogeneous than would be expected, since each sample is correlated with the one that preceded it. If people implement a form of MCMC sampling in belief sampling, we might expect the autocorrelation of samples to have behavioral consequences; MCMC-like processes have been proposed as an account of perceptual switching in binocular rivalry (Gershman et al., ), sequential dependence in semantic memory search (Bourgin et al., ), and anchoring biases (Lieder et al., ). In a similar vein to MCMC, sequential Monte Carlo algorithms, or particle filters, provide a natural way of capturing online belief updating as new information comes in (Levy et al., ; Sanborn et al., ; Vul

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press

Sampling Data, Beliefs, and Actions



et al., ) rather than static inferences. These models maintain a set of sampled hypotheses that can change over time and be reevaluated as new data is observed. For instance, the first few words of a sentence are consistent with many candidate parses, but subsequent words narrow the set of possibilities. A particle filter may sample such sentence parses and reweight and resample plausible parses as new words are read, allowing for efficient, psychologically plausible approximation of the posterior distribution over parses (Levy et al., ). Similar dynamics have motivated the use of particle filters in other sequential tasks in humans (Sanborn et al., ; Vul et al., ), and animals (Daw & Courville, ). Notably, psychologically plausible particle filters only entertain a limited number of samples at any one time; as time goes on, these samples are “pruned” when they are inconsistent with the observed data. This may lead to scenarios where new data is entirely inconsistent with existing samples. In sentence parsing, such particle filter “collapse” has been proposed as an account of the experience of being stymied when parsing a garden-path sentence such as, “the horse raced past the barn fell” (Levy et al., ). The previous examples illustrate the potential for biases in the sample distribution driven by various algorithmic attempts to make sampling tractable in the first place. However, a second source of bias in the distribution of belief samples arises from attempts to generate the right samples. Just as data sampling may underrepresent rare but high utility outcomes (Hertwig et al., ), belief sampling algorithms must account for states that have low probabilities but high-magnitude utilities, like winning the lottery or contracting a fatal illness. Such “black swan” states are unlikely to be sampled from the underlying distribution, but due to their high (positive or negative) utility, failure to consider them could lead to missed opportunities or highly undesirable outcomes. Empirically, these events are often given disproportionate attention and rated as more probable than they truly are (Kahneman & Tversky, ), suggesting that natural state sampling offers a poor psychological account of the ways in which people treat such events. How can a sample-based account of decision making – which would require taking potentially thousands of samples to assess particularly rare outcomes (Lieder et al., ) – address the readiness with which people consider possible black swan states when making decisions? As with the challenge of generating complex belief samples, answers to this question largely appeal to the sampling algorithm itself. Lieder et al. () propose that people use a form of importance sampling in which samples are drawn from a biased distribution that over-weights high-utility-variance states, and then corrects for this bias by weighting by the inverse of the utility variance.

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press



 ,    .

This algorithm ensures that extreme, albeit low probability, events are considered when making a decision. Further, the strategy avoids black swans in the ultimate decision, but causes people to overestimate the likelihood of these extreme, low probability events, thus offering an account of the availability bias (Kahneman & Tversky, ). Lieder et al. () show that this same strategy also accounts for other peculiarities of human decision making, including inconsistent risk preferences and certain memory biases. Broadly, these results illustrate that when using few samples and faced with the challenge of generating relevant samples at all, algorithms that solve this problem may introduce biases and distortions in the sample distribution which have downstream behavioral consequences. Though research on the behavioral consequences of biased sample distributions has largely focused on adults, recent developmental work has highlighted the value of exploring similar questions in children. After all, young children, perhaps more than adults, are faced with a daunting task of assembling coherent representations of the state of the world given noisy and sparse data. A number of results suggest that children may indeed rely on sampling mechanisms similar to adults to update their beliefs about their environment (Bonawitz et al., ; Bonawitz et al., ; Denison et al., ). And just as biased sample distributions may lead to distinct biases in adult decision making, recent work has investigated whether patterns of sampling behavior might help explain key changes in cognitive development. For example, evidence that children are often more exploratory than adults and can sometimes learn novel relationships better than adults is consistent with children sampling a wider range of hypotheses and generalizing less (Gopnik et al., ; Gopnik et al., ; Lucas et al., ; Schulz et al., ). In this vein, developmental research offers a unique opportunity for further testing predictions of belief sampling models. .. Action Sampling The decision making framework presented here posits that given a choice of possible actions, people will choose the one that maximizes expected utility. In the expected utility calculation in Equation ., the process of choosing the best action is glossed over in a simple arg max operator, but implementing it is a challenging optimization problem (Nocedal & Wright, ). In the controlled settings that dominate empirical work in psychology and decision making, people select among a predefined set of options (Kalis et al., ; Smaldino & Richerson, ), so the

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press

Sampling Data, Beliefs, and Actions



optimization problem is well-constrained; we can evaluate each of the few explicitly stated alternatives to find the best one. However, real-world situations present us with innumerably many possible actions to choose from. Our options about what to do for dinner include not only obvious choices like raiding the fridge or ordering take-out, but also novelties such as snaring a neighborhood squirrel, and nonsequiturs such as opening a pedicure salon. Intuitively, our deliberation process cannot evaluate all possible options, so the choice of best action necessarily involves choosing from a limited sample of alternatives, often called the consideration set (Hauser & Wernerfelt, ). This is illustrated by the example in Figure ., where deciding what to wear to a party will in practice only involve explicitly sampling a handful of possible items. However, even remaining open to everything in our closet already represents a narrowing of the consideration set if we do not also consider buying a whole new outfit on the way to the party or fashioning new coverings from the contents of our kitchen pantry. In general, since we cannot consider all available actions when making decisions, sampling offers a potential solution. Because our aim in considering actions is finding the best one, there are substantial differences in constraints on, and properties of, action sampling compared to belief or data sampling. On the one hand, optimization simplifies what we do with a sampled action: simply evaluate how good it is, and ultimately choose the best one. There is no need to resample an action multiple times to estimate frequency, or to weight it in some manner to correct for a biased sampling distribution. On the other hand, optimization underscores the need to sample actions that are likely to be valuable. With a potentially infinite, discontinuous set of possible actions, the number of terrible actions is vast, and if we sample a subset of actions from an irrelevant distribution, the best action in our consideration set might be quite bad indeed. In light of this, investigations into the number of samples and the sampling distribution for actions focuses on understanding how we manage to consider the right kinds of actions most of the time. This description of decision making as choosing from among candidate (sampled) actions the one with the highest expected utility has been challenged by a diverse set of accounts. In particular, many alternatives emphasize the role of exogenous factors like potential regret (Loomes & Sugden, ) or disappointment (Loomes & Sugden, ), or affective responses more generally (Mellers et al., ) in decision making. Since these variables can be considered somewhat independent of an action’s

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press



 ,    .

underlying utility, the extent to which they impact downstream decision making is important. Critically, the role that the number of action samples and the sample distribution play in decision making remains consistent with these theories. While our discussion focuses on choosing the action which maximizes approximate expected utility, we might just as easily replace this with the action that minimizes expected regret, or maximizes a more complex joint distribution of affect and utility. In either case, the motivating question remains how people sample the right actions from a practically infinite space of options. ... The Number of Samples in Action Sampling With belief and data sampling, the number of samples places an obvious constraint on our ability to make downstream inferences because the samples represent our estimated probability distributions P ðs j x Þ. However, as Equation . illustrates, sampled actions do not play the same role; we simply choose the best from among them. Therefore, it could be the case that despite limiting the number of actions in the consideration set, this choice has no noteworthy consequences for behavior. Since we just pick the best action from the consideration set, so long as the best action is contained within, it doesn’t matter how small that set is. In some cases, the first action that comes to mind is indeed likely to be the best. Johnson & Raab () found that the first action considered by handball players faced with a possible game scenario was ultimately chosen  percent of the time and that in general, subsequently generated alternatives decreased in quality. However, perfect calibration between the actions we consider and the utility function we aim to maximize not only seems implausible in theory, but is also not borne out in practice. First, ensuring that the best action is somewhere in the consideration set seems to require an omniscient procedure that already knows the outcome of the decision process; if we could choose a consideration set based on which actions are the best, we could just use that process to make our choice. Second, ensuring that the best answers are in the consideration set is not just a priori implausible. In many domains, people can identify the best action from among a set of alternatives, but fail to generate that action themselves (Adams et al., ). For instance, when proposing good questions in a modified battleship game, subjects could reliably identify the most informative questions when presented with a list, but rarely generated the optimal questions on their own (Rothe et al., ). This ability to recognize useful actions, despite failing to generate them, suggests a suboptimality that arises not

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press

Sampling Data, Beliefs, and Actions



from action evaluation itself, but instead from relying on a subset of actions that may not always be calibrated to the problem at hand. Given that the limited consideration set of actions may not always include the best one, the number of samples drawn plays a role in determining how likely the consideration set is to contain promising choices. As with data and belief sampling, additional samples will incur time costs (and thus a reduction in the rate of decisions) and are likely to also impose cognitive effort. In some cases, the effort required to evaluate sampled actions can be quite large. The familiar experience of choice overload, in which having more options can paradoxically lead to higher levels of regret, choice deferral, or dissatisfaction, makes these costs clear (Chernev et al., ). The benefits of sampling more actions depend in part on the variation in payoffs across sampled alternatives; if all actions are equally beneficial, nothing is lost from poor optimization by considering too few samples. In addition, the value of additional sampled actions depends on the alignment between the sampling distribution and the true context-relevant payoffs. For sampled actions to be useful, they must be sampled according to a plausible approximation of their present expected utility, such as prior success in similar settings. However, the actual utility or reward obtained from a given action in the present context may not match the expectation. If the sample distribution is less likely to provide the best action for the given decision, then having more samples to choose from gives one better odds of choosing a high value action. These considerations suggest that the expected utility calculation based on sampled actions will be sensitive to how many samples are taken and whether these samples are well calibrated to the environment. Indeed, in simulations testing these predictions, Morris et al. () show that the expected reward rate varies as a function of the number of actions considered, as well as the correlation between action utilities and the probability that actions are sampled into the consideration set. But how many action samples should we take? With a perfect correlation, the first action considered will most often be the optimal action, so one sample yields the highest possible reward. However, as this correlation decreases, the consideration set will need to be larger to maintain a high probability of containing a good action; consequently, the optimal number of samples 

While these results might seem counter to the claim that decision makers simply choose the best action from the consideration set, they primarily suggest that evaluating actions is indeed difficult and costly, so evaluating more choices can increase error or uncertainty in our choice. Factors that often lead to greater choice overload are exactly those we intuitively associate with making errors more likely, for example, choice set complexity and preference uncertainty (Chernev et al., ).

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press



 ,    .

is larger. Morris et al. () find that so long as the correlation between the distribution over actions and the true utilities is larger than ., the optimal number of actions considered out of a possible , is less than ten. These simulation results therefore suggest that in scenarios where our expectations map reasonably well onto the present decision, as with data and belief sampling, there may be little value in taking more than a few samples. ... The Sample Distribution in Action Sampling The expected utility maximization goal given by Equation . is to select the best action from the set of possible actions, but given that the space of actions is immense, the arg max must be evaluated over a subset of actions. This subset represents the sample distribution over actions. In the previous sections on data and belief sampling, we discussed the ways in which biased sample distributions predict systematic deviations from optimal behavior. However, the potential for such deviations in action sampling is complicated by the fact that given a consideration set of actions, the decision maker need only pick the best one. If the globally best action is in the consideration set, any other bias in the sample distribution is essentially irrelevant. Thus, just as the number of samples in the consideration set only effects behavior if the best action is not guaranteed to be in the sample, the sample distribution will only bias behavior if the optimal choice is not present. This poses a challenge for researchers testing the role of samples in action selection: how to identify scenarios where people’s consideration set will reliably fail to include the optimal choice? Intuitively, this depends on how the candidate actions are sampled. Morris et al. () propose that we consider actions based primarily on global average utilities that are not dependent on the local decision context. These actions are then sampled into the consideration set, and each one is evaluated more systematically given all that is known about the current context. On this account, a consideration set is constructed where the probability of including an option is proportional to its “cached” historical value, perhaps pulled from long-term memory (Kaiser et al., ), or obtained via model-free reinforcement learning (Pezzulo et al., ). Critically, such a process might introduce a bias toward actions that have previously been selected or rewarded but which may be incongruous in novel circumstances. In new situations, if no prior circumstances are particularly relevant, we are left to consider only actions that have been previously successful on average. This predicts that people tend to disproportionately favor globally useful actions in novel circumstances. In

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press

Sampling Data, Beliefs, and Actions



support of this account, Morris et al. () found that when asked to list all meal ideas following dental surgery, subjects tended to generate options that were high in general value (i.e., their favorite dishes), but were less likely to generate options of high utility in the specific context (i.e., suitable given dental restrictions). This finding is attributed to a process which samples broadly “useful” actions but, faced with the challenge of evaluating them in a relatively unfamiliar context (after dental surgery), does little additional filtering or optimization. In contrast, Phillips et al. () propose that we use the current context to determine which actions were frequently chosen, or had yielded high utility, in prior similar circumstances. These actions then form the basis of our sampled consideration set. On this account, the consideration set is more tailored to a given decision context, making it more likely that the optimal action is in the sample. Though intuitive, it can be difficult to extract behavioral predictions from this account. For one, the mechanism for identifying similar past situations in a given decision context is highly unconstrained; any past experience, behavioral tendency, observed contingency, or accrued reward could plausibly be used as input into the system that learns reward estimates, and any representation of the current context might be used to query this cache for good actions. Second, on this account, it is unclear when biases in the consideration set will emerge since the sample is predicted to be more context specific. Scenarios in which the context seems to invite a particular sample of actions as a useful consideration set but in fact are best approached with a different action require somewhat unique circumstances where previously learned adaptations are no longer useful or are incidentally low utility despite similar circumstances. If we walk out to somebody else’s car with them and instinctively reach into our pocket for our own keys, this sort of action has the character of being a utility maximizing behavior under similar circumstances but one which, evaluated in the current setting, is not nearly as useful. Such paradigms are often used to study the tension between reflexive, model-free decisions and more deliberative, model-based action selection (Gläscher et al., ). On this view, model-free choices reflect the sort of behaviors that past experience alone predicts would be utility maximizing – actions selected from a potentially biased sample distribution without further consideration – while model-based action selection involves the more careful evaluation of all items in the consideration set and leads to different choices in the context at hand. In this way, action sampling might predict when the decision system as a whole will behave consistently with model-free habits or model-based plans, according to the

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press



 ,    .

interplay of the cost of action sampling and the correlation between the action sampling distribution and the underlying utility function.

. Discussion In this chapter, we have attempted to explicitly relate a broad array of sampling models in the literature as facets of an over-arching computation: approximating expected utility maximization. We argued that existing sample-based accounts in a diversity of settings can be succinctly described as sampling the underlying data, belief, and action distributions that are central to calculating expected utility. In this way, these models come together to support a unified process of making good decisions. Previously, data, belief, and action sampling have largely been considered in isolation in tasks designed to make all but one of these components trivially simple. We begin with the premise that in real-world settings, all three aspects are nontrivial, and therefore an integrative account of decision making by sampling ought to consider them all. By jointly evaluating how samples of each type are used together, we see that subsets of data, belief, and actions play a fundamentally different role in the expected utility calculation: conditioning for data, marginalization for beliefs, and optimization for actions. These differences highlight discrepancies in the solutions and broader research goals in each domain. However, the present work suggests that there is room for optimism about a more integrated view. In support of this, we discuss major trends in these distinct literatures along two key lines of analysis which are critical to any mature samplebased theory: a precise account of how decision making in each setting reflects the number of samples and the sample distribution. We take as our starting point that in data, belief, and action sampling, decision making relies on only a few samples and that in each domain, the sample distribution that results from these limited samples must play a role in downstream behavior. This perspective provides a number of broad insights across these domains. Data sampling allows for a precise account of the number of samples, belief sampling models show a critical relationship between the sampling algorithm and behavior, and action sampling poses a challenge in specifying the relationship between the sample distribution or consideration set and decision making. The differences between data, belief, and action sampling algorithms highlighted by our review offer a clear opportunity for synthesis. Because each field has emphasized different aspects of sampling, they have made progress on issues that may have been ignored by the other fields. In this

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press

Sampling Data, Beliefs, and Actions



vein, considering the different classes of sampling together provides a basis for development in one field to be translated to others. The formal tools for deriving optimal stopping rules in drift-diffusion models of data sampling may be fruitfully brought to bear on the meta-cognitive choices pertaining to belief sampling and might even inform the tradeoff between implicit and deliberative action selection. Likewise, correlated sampling algorithms from belief sampling highlight methods for grappling with nonindependent data sampling environments. Progress in both of these domains may offer a number of future directions for relatively nascent models of action selection. The synthesis offered in this chapter leaves open two large classes of questions which we briefly consider. First, the expected utility maximization framework for sampling models should be taken to its logical conclusion: how do people integrate data, belief, and action sampling when making complex decisions? Second, while this chapter attempts to bring a breadth of sample-based models into the framework of expected utility maximization, this is far from a complete survey. How might we consider still other sampling accounts, such as sampling utilities or memories? ..

Unifying Sample-Based Expectation Maximization

The motivation for unifying all the facets of sampling under one account of approximating expected utility maximization is that real-world tasks entail ambiguity, uncertainty, and approximation of every aspect of the calculation. Throughout this chapter, we have alluded to a simplistic example of deciding what to wear to a party which illustrates this (Figure .). However, everyday life presents us with an abundance of similar scenarios. Critically, explicitly combining sampling all of these variables creates at least two major complications: (a) determining which variables are sampled in what order, and (b) optimizing sample sizes given sample-based error along all variables.P First, the basic expression arg max s U ðA; sÞP ðs Þ entails evaluating the A utility of every combination of sampled action and sampled state, meaning that the processing load grows as a product of sample sizes. Alternatively, perhaps not all combinations of action and state need to be evaluated; but then how can the system determine which combinations to consider? Equivalently, if we consider jointly sampling (action, state) tuples, we face the same problem: what dependence structure ought to exist in the joint distribution? Another alternative is that we sample sequentially. Just as we condition on data to sample states, perhaps we condition on states to

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press



 ,    .

sample actions. Indeed, this would be one method to allow better generalization from past experience when determining the action sampling distribution. However, this conditional action sampling procedure does not find the action with the highest expected utility, but instead finds the action with the highest optimistic utility: the action that has the highest utility in the state for which it is well suited. In short, because action and belief sampling accounts typically work in isolation, little is known about how people ought to set the order and dependency structure when sampling both, and nothing is known about how they actually do this. Second, the number of samples we ought to take for data, beliefs, or actions are inextricably linked. In the limit, where we sample only one action, it is irrelevant how much data we have to inform our latent state estimates, or how precisely we approximate our beliefs about states, since the solitary sampled action determines our choice. Likewise, if we sample no data, nothing is to be gained from sampling more than one action. If we do not inform our estimates about the current state, then our statecontingent utility calculation will not differ from a global average utility from which we might sample actions. However, not all samples are complements; additional data samples decrease the entropy of our belief distribution, but it is advantageous to take more belief samples in medium entropy scenarios (Vul et al., ). This means that the optimal number of belief samples does not change monotonically with the number of data samples; it initially rises when we obtain some data, then drops as data sufficiently constrains our beliefs. Given these interdependencies between sample sizes, the joint optimization over the number of data, belief, and action samples might yield several discrete classes of algorithms, depending on the balance of data, belief, and action sample costs. ..

Other Forms of Sampling

The framework proposed in this chapter emphasized data, state, and action sampling. However, two other interrelated forms of sampling have not been considered: sampling utilities (Stewart et al., ) and memories (Nosofsky & Palmeri, ). First, consider utility sampling. In the formalism in Equation ., there is no uncertainty or error in utility evaluations: U ða; sÞ is accessible directly and infallibly. However, some relaxations of this assumption can be accommodated in our formulation. An uncertain or stochastic utility function can be captured without explicitly sampling utilities by adding more state possibilities. For instance, instead of U ða; sÞ yielding ten dollars  percent of the time and zero

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press

Sampling Data, Beliefs, and Actions



dollars the remainder of the time, we can posit a constant utility function defined over two new states: U ða; uÞ ¼ $ and U ða; vÞ ¼ $. We then split the probability associated with state s evenly across the two substates u and v. However, other challenges to the assumption of an available value function cannot be so addressed, and would instead require a new sampling term. For instance, if we do not have direct access to explicit value functions (Hayden & Niv, ), but must instead reconstruct them from past experience (Stewart et al., ), this reconstruction must have some sampling fidelity, and must be nested within state-action sampling. Such an approach is most relevant when we consider that value functions are likely to be learned over time, via reinforcement learning. Indeed, the Thompson Sampling (Thompson, ) strategy for striking a balance between exploitation and exploration in multi-armed bandit problems is to choose the action with the highest sampled payoff, suggesting that there is an important algorithmic role for sampling utility functions themselves. Sampling from memory is a more complicated case, because memory sampling takes on many different flavors. The most basic example of memory sampling is conjuring past observations from memory – this is consistent with data sampling, but where the information search process proceeds through one’s memory stores (Hills et al., ; Ratcliff, ). This form of memory sampling, although conceptually quite different from sampling information from the external world, is still consistent with sampling data; the sampled memories are conditioned on and yield noteworthy predictions insofar as they come from a biased distribution. Most memories, however, are not pure retrieval of facts but are instead reconstructions of the past (Bartlett, ). In that sense, a sampled memory lies somewhere between a sample of data, and a sample of the beliefs, that is, the inferred world state. Finally, memory also serves to cache previous actions and calculations, as in the instance theory of automatization (Logan, b), which seems most consistent with sampling actions based on previously calculated optimal choices. These different modes of memory sampling are sometimes quite tangled. For instance, sampling exemplars from memory is consistent with sampling beliefs in a categorization problem (Nosofsky & Palmeri, ; Shi et al., ), and more generally, sampling previously considered beliefs or interim calculations from memory (Dasgupta & Gershman, ; Dasgupta et al., ) is consistent with reusing past samples of beliefs, actions, or data. Altogether, when considered in our framework, sampling memory is a broad collection of strategies for reusing prior samples of either data, beliefs, or actions. In short, although utility and memory sampling may be accommodated

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press



 ,    .

within our framework, a serious treatment of either of these domains requires considerable elaboration of our framework, and thus offers a promising avenue for future work.

.

Conclusions

In summary, this chapter addresses the challenge of integrating the many sample-based models across diverse areas of psychology. We begin with the observation that in decision making contexts, a rational agent chooses the action that maximizes expected utility. Yet doing so poses major computational hurdles even for simple, everyday decisions. In light of this, we propose that several broad classes of sampling models in the literature can be viewed as approximating different components of the expected utility calculation: data, beliefs, and actions. Under this unifying framework, our review compares models of data, belief, and action sampling along two distinct dimensions which are central to any sampling account: the number of samples and the sample distribution. This comparison offers novel insights into the strengths of different sampling accounts as well as opportunities for future work. Finally, we advocate for further inquiry aimed at understanding how data, belief, and action sampling processes come together to support rich, grounded decision making. R E F E R EN C E S Adams, G. S., Converse, B. A., Hales, A. H., & Klotz, L. E. (). People systematically overlook subtractive changes. Nature, (), –. Baker, C. L., Saxe, R., & Tenenbaum, J. B. (). Action understanding as inverse planning. Cognition, (), –. Bartlett, F. C. (). Remembering: A study in experimental and social psychology. Cambridge, UK: Cambridge University Press. Battaglia, P. W., Hamrick, J. B., & Tenenbaum, J. B. (). Simulation as an engine of physical scene understanding. Proceedings of the National Academy of Sciences, (), –. Bogacz, R., Brown, E., Moehlis, J., Holmes, P., & Cohen, J. D. (). The physics of optimal decision making: A formal analysis of models of performance in two-alternative forced-choice tasks. Psychological review, (), . Bonawitz, E., Denison, S., Gopnik, A., & Griffiths, T. L. (). Win-stay, losesample: A simple sequential algorithm for approximating Bayesian inference. Cognitive psychology, , –. Bonawitz, E., Denison, S., Griffiths, T. L., & Gopnik, A. (). Probabilistic models, learning algorithms, and response variability: Sampling in cognitive development. Trends in Cognitive Sciences, (), –.

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press

Sampling Data, Beliefs, and Actions



Bonawitz, E. B., & Griffiths, T. L. (). Deconfounding hypothesis generation and evaluation in Bayesian models. Proceedings of the Annual Meeting of the Cognitive Science Society, (). Bourgin, D., Abbott, J., Griffiths, T., Smith, K., & Vul, E. (). Empirical evidence for Markov chain Monte Carlo in memory search. Proceedings of the Annual Meeting of the Cognitive Science Society, (). Braddick, O. (). A short-range process in apparent motion. Vision Research, (), –. Chater, N., Tenenbaum, J. B., & Yuille, A. (). Probabilistic models of cognition: Where next? Trends in Cognitive Sciences, (), –. Chernev, A., Böckenholt, U., & Goodman, J. (). Choice overload: A conceptual review and meta-analysis. Journal of Consumer Psychology,  (), –. Coenen, A., & Gureckis, T. (). The distorting effects of deciding to stop sampling information. PsyArXiv. doi:./osf.io/tbrea Dasgupta, I., & Gershman, S. J. (). Memory as a computational resource. Trends in Cognitive Sciences, (), –. Dasgupta, I., Schulz, E., & Gershman, S. J. (). Where do hypotheses come from? Cognitive Psychology, , –. Dasgupta, I., Schulz, E., Goodman, N. D., & Gershman, S. J. (). Remembrance of inferences past: Amortization in human hypothesis generation. Cognition, , –. Dasgupta, I., Schulz, E., Tenenbaum, J. B., & Gershman, S. J. (). A theory of learning to infer. Psychological Review, (), . Daw, N., & Courville, A. (). The pigeon as particle filter. Advances in Neural Information Processing Systems, , –. Daw, N. D., O’Doherty, J. P., Dayan, P., Seymour, B., & Dolan, R. J. (). Cortical substrates for exploratory decisions in humans. Nature, (), –. Dawes, R. M. (). Prediction of the future versus an understanding of the past: A basic asymmetry. American Journal of Psychology, (), –. Denison, S., Bonawitz, E., Gopnik, A., & Griffiths, T. L. (). Rational variability in children’s causal inferences: The sampling hypothesis. Cognition, (), –. Doucet, A., de Freitas, N., & Gordon, N. (Eds.) (). Sequential Monte Carlo methods in practice. New York: Springer. Fechner, G. T. (). Elemente der psychophysik (Vol. ). Wiesbaden: Breitkopf u. Härtel. Fiedler, K. (). Beware of samples! A cognitive-ecological sampling approach to judgment biases. Psychological Review, (), . (). The ultimate sampling dilemma in experience-based decision making. Journal of Experimental Psychology: Learning, Memory, and Cognition, (), . Fiedler, K., & Juslin, P. (). Taking the interface between mind and environment seriously. In K. Fiedler & P. Juslin (Eds.), Information sampling and adaptive cognition (pp. –). New York: Cambridge University Press.

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press



 ,    .

Galesic, M., Olsson, H., & Rieskamp, J. (). Social sampling explains apparent biases in judgments of social environments. Psychological Science, (), –. (). A sampling model of social judgment. Psychological Review, (), . Gershman, S., & Goodman, N. (). Amortized inference in probabilistic reasoning. Proceedings of the Annual Meeting of the Cognitive Science Society, (). Gershman, S., Horvitz, E. J., & Tenenbaum, J. B. (). Computational rationality: A converging paradigm for intelligence in brains, minds, and machines. Science, (), –. Gershman, S. J., Vul, E., & Tenenbaum, J. B. (). Multistability and perceptual inference. Neural Computation, (), –. Gigerenzer, G., & Goldstein, D. G. (). Reasoning the fast and frugal way: Models of bounded rationality. Psychological Review, (), . Gigerenzer, G., & Hoffrage, U. (). How to improve Bayesian reasoning without instruction: Frequency formats. Psychological Review, (), . Gilks, W. R., Richardson, S., & Spiegelhalter, D. (). Markov chain Monte Carlo in practice. Boca Raton, FL: CRC Press. Gittins, J. C. (). Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society: Series B (Methodological), (), –. Gläscher, J., Daw, N., Dayan, P., & O’Doherty, J. P. (). States versus rewards: Dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron, (), –. Goodman, N. D., Tenenbaum, J. B., Feldman, J., & Griffiths, T. L. (). A rational analysis of rule-based concept learning. Cognitive Science, (), –. Gopnik, A., Griffiths, T. L., & Lucas, C. G. (). When younger learners can be better (or at least more open-minded) than older ones. Current Directions in Psychological Science, (), –. Gopnik, A., O’Grady, S., & Lucas, C. G., et al. (). Changes in cognitive flexibility and hypothesis search across human life history from childhood to adolescence to adulthood. Proceedings of the National Academy of Sciences, (), –. Griffiths, T. L., & Tenenbaum, J. B. (). Optimal predictions in everyday cognition. Psychological Science, (), –. Griffiths, T. L., Vul, E., & Sanborn, A. N. (). Bridging levels of analysis for probabilistic models of cognition. Current Directions in Psychological Science, (), –. Hau, R., Pleskac, T. J., & Hertwig, R. (). Decisions from experience and statistical probabilities: Why they trigger different choices than a priori probabilities. Journal of Behavioral Decision Making, (), –. Hauser, J. R., & Wernerfelt, B. (). An evaluation cost model of consideration sets. Journal of Consumer Research, (), –. Hayden, B., & Niv, Y. (). The case against economic values in the brain. Behavioral Neuroscience, (), –.

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press

Sampling Data, Beliefs, and Actions



Hertwig, R., Barron, G., Weber, E. U., & Erev, I. (). Decisions from experience and the effect of rare events in risky choice. Psychological Science, , –. Hertwig, R., & Erev, I. (). The description–experience gap in risky choice. Trends in Cognitive Sciences, , –. Hertwig, R., & Pleskac, T. J. (). Decisions from experience: Why small samples? Cognition, , –. Hills, T. T., Jones, M. N., & Todd, P. M. (). Optimal foraging in semantic memory. Psychological Review, (), . Hogarth, R. M., Lejarraga, T., & Soyer, E. (). The two settings of kind and wicked learning environments. Current Directions in Psychological Science,  (), –. Hourihan, K. L., & Benjamin, A. S. (). Smaller is better (when sampling from the crowd within): Low memory-span individuals benefit more from multiple opportunities for estimation. Journal of Experimental Psychology: Learning, Memory, and Cognition, (), . Jazayeri, M., & Movshon, J. A. (). A new perceptual illusion reveals mechanisms of sensory decoding. Nature, (), –. Johnson, J. G., & Raab, M. (). Take the first: Option-generation and resulting choices. Organizational Behavior and Human Decision Processes, (), –. Jones, M., & Love, B. C. (). Bayesian fundamentalism or enlightenment? On the explanatory status and theoretical contributions of Bayesian models of cognition. Behavioral and Brain Sciences, (), . Juni, M. Z., Gureckis, T. M., & Maloney, L. T. (). Information sampling behavior with explicit sampling costs. Decision, , . Kahneman, D., & Tversky, A. (). Prospect theory: An analysis of decision under risk. Econometrica, (), –. Kaiser, S., Simon, J. J., & Kalis, A., et al. (). The cognitive and neural basis of option generation and subsequent choice. Cognitive, Affective, & Behavioral Neuroscience, (), –. Kalis, A., Kaiser, S., & Mojzisch, A. (). Why we should talk about option generation in decision-making research. Frontiers in Psychology, , . Karnopp, D. C. (). Random search techniques for optimization problems. Automatica, (–), –. Klayman, J., & Ha, Y.-W. (). Confirmation, disconfirmation, and information in hypothesis testing. Psychological Review, (), . Knill, D. C., & Richards, W. (). Perception as Bayesian inference. New York: Cambridge University Press. Konovalova, E., & Le Mens, G. (). An information sampling explanation for the in-group heterogeneity effect. Psychological Review, (), . Kwisthout, J., Wareham, T., & van Rooij, I. (). Bayesian intractability is not an ailment that approximation can cure. Cognitive Science, (), –.

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press



 ,    .

Levy, R. P., Reali, F., & Griffiths, T. L. (). Modeling the effects of memory on human online sentence processing with particle filters. Advances in Neural Information Processing Systems, , –. Lewandowsky, S., Griffiths, T. L., & Kalish, M. L. (). The wisdom of individuals: Exploring people’s knowledge about everyday events using iterated learning. Cognitive Science, (), –. Lieder, F., Griffiths, T. L., & Hsu, M. (). Overrepresentation of extreme events in decision making reflects rational use of cognitive resources. Psychological Review, , . Lieder, F., Griffiths, T. L., Huys, Q. J., & Goodman, N. D. (). The anchoring bias reflects rational use of cognitive resources. Psychonomic Bulletin & Review, (), –. Logan, G. D. (a). Toward an instance theory of automatization. Psychological Review, (), . (b). Toward an instance theory of automatization. Psychological Review,  (), . Loomes, G., & Sugden, R. (). Regret theory: An alternative theory of rational choice under uncertainty. Economic Journal, (), –. (). Disappointment and dynamic consistency in choice under uncertainty. Review of Economic Studies, (), –. Lucas, C. G., Bridgers, S., Griffiths, T. L., & Gopnik, A. (). When children are better (or at least more open-minded) learners than adults: Developmental differences in learning the forms of causal relationships. Cognition, (), –. Mellers, B. A., Schwartz, A., Ho, K., & Ritov, I. (). Decision affect theory: Emotional reactions to the outcomes of risky options. Psychological Science,  (), –. Morris, A., Phillips, J., Huang, K., & Cushman, F. (). Generating options and choosing between them depend on distinct forms of value representation. Psychological Science, (), –. Mozer, M. C., Pashler, H., & Homaei, H. (). Optimal predictions in everyday cognition: The wisdom of individuals or crowds? Cognitive Science, (), –. Navarro, D. J., & Perfors, A. F. (). Hypothesis generation, sparse categories, and the positive test strategy. Psychological Review, (), . Nocedal, J., & Wright, S. (). Numerical optimization. New York: Springer Science & Business Media. Nosofsky, R. M., & Palmeri, T. J. (). An exemplar-based random walk model of speeded classification. Psychological Review, (), . Oaksford, M., & Chater, N. (). A rational analysis of the selection task as optimal data selection. Psychological Review, (), . (). Bayesian rationality: The probabilistic approach to human reasoning. Oxford: Oxford University Press. Pezzulo, G., Rigoli, F., & Chersi, F. (). The mixed instrumental controller: Using value of information to combine habitual choice and mental simulation. Frontiers in Psychology, , .

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press

Sampling Data, Beliefs, and Actions



Phillips, J., Morris, A., & Cushman, F. (). How we know what not to think. Trends in Cognitive Sciences, , –. Ratcliff, R. (). A theory of memory retrieval. Psychological Review, (), . Ratcliff, R., & McKoon, G. (). The diffusion decision model: Theory and data for two-choice decision tasks. Neural Computation, (), –. Ratcliff, R., & Smith, P. L. (). A comparison of sequential sampling models for two-choice reaction time. Psychological Review, (), . Robert, C. P., & Casella, G. (). Monte Carlo statistical methods (Vol. ). New York: Springer. Rothe, A., Lake, B. M., & Gureckis, T. M. (). Do people ask good questions? Computational Brain & Behavior, , –. Sanborn, A. N. (). Types of approximation for probabilistic cognition: Sampling and variational. Brain and Cognition, , –. Sanborn, A. N., Griffiths, T. L., & Navarro, D. J. (). Rational approximations to rational models: Alternative algorithms for category learning. Psychological Review, (), . Savage, L. J. (). The foundations of statistics. New York: Wiley. Schulz, E., Wu, C. M., Ruggeri, A., & Meder, B. (). Searching for rewards like a child means less generalization and more directed exploration. Psychological Science, (), –. Shi, L., Griffiths, T. L., Feldman, N. H., & Sanborn, A. N. (). Exemplar models as a mechanism for performing Bayesian inference. Psychonomic Bulletin & Review, (), –. Smaldino, P. E., & Richerson, P. J. (). The origins of options. Frontiers in Neuroscience, , . Smith, K., Huber, D. E., & Vul, E. (). Multiply-constrained semantic search in the remote associates test. Cognition, (), –. Stewart, N., Chater, N., & Brown, G. D. (). Decision by sampling. Cognitive Psychology, , –. Tenenbaum, J. B., Kemp, C., Griffiths, T. L., & Goodman, N. D. (). How to grow a mind: Statistics, structure, and abstraction. Science, (), –. Thompson, W. R. (). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, (/ ), –. Tversky, A., & Kahneman, D. (). Prospect theory: An analysis of decision under risk. Econometrica, (), –. Ullman, T. D., Spelke, E., Battaglia, P., & Tenenbaum, J. B. (). Mind games: Game engines as an architecture for intuitive physics. Trends in Cognitive Sciences, (), –. Von Neumann, J., & Morgenstern, O. (). Theory of games and economic behavior. Princeton: Princeton University Press. Voss, A., Nagler, M., & Lerche, V. (). Diffusion models in experimental psychology: A practical introduction. Experimental Psychology, (), . Vul, E., Frank, M. C., Tenenbaum, J. B., & Alvarez, G. A. (). Explaining human multiple object tracking as resource-constrained approximate

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press



 ,    .

inference in a dynamic probabilistic model. Advances in Neural Information Processing Systems, , –. Vul, E., Goodman, N., Griffiths, T. L., & Tenenbaum, J. B. (). One and done? Optimal decisions from very few samples. Cognitive Science, , –. Vul, E., & Pashler, H. (). Measuring the crowd within: Probabilistic representations within individuals. Psychological Science, , –. Wald, A., & Wolfowitz, J. (). Optimum character of the sequential probability ratio test. Annals of Mathematical Statistics, –. Watkins, C. J., & Dayan, P. (). Q-learning: Machine learning, (–), –. Williams, J. J., & Lombrozo, T. (). Explanation and prior knowledge interact to guide learning. Cognitive Psychology, (), –.

https://doi.org/10.1017/9781009002042.028 Published online by Cambridge University Press

Index

Action sampling –, , –, –,  Adaptation Level Theory  Adaptivity , , ,  Attitude Attitude change  Attitude formation ,  Attitude learning  Attitude similarity – Attitudes toward novel stimuli ,  Biased attitudes  Risk attitudes  Autonomy , –, –

Brunswikian sampling , , , , , ,  Brunswikian uncertainty , –, , , – Cognitive bias , –, , , ,  Cognitive science , ,  Collective evaluations , ,  Computational modelling , , , , ,  Confirmation bias , , –, ,  Contingency learning , , , , 

Bayesian , , , , –, ,  Bayesian computation , , ,  Bayesian inference , , , , , –, –, , ,  Bayesian Information Criterion – Bayesian Marginal Model  Bayesian models –, , , –, , –, , –, , , , , , ,  Bayesian posteriors , – Bayesian reasoning , , , , ,  Bayesian Sampler Model , ,  Bayesian sampling , –, , –, ,  Bayesian updating , , ,  Behavioral Economics ,  Behavioral Science , , , ,  Belief sampling , –, –, – Brunswik , –, , , , ,  Brunswikian discriminability  Brunswikian Induction Algorithm for Social Inferences  Brunswikian samples , –, , , , 

Data sampling , –, , , –, ,  Decisions from description , , , , , , , ,  Decisions from experience –, –, , , , , , –,  Description–experience gap , , –, , , –, –,  Diagnosticity , , , , , –, –,  Distinctiveness , –, , –, –,  Efficient coding –,  Evaluability –, –, ,  Evaluative conditioning –, , ,  Evaluative learning –, , –, –,  Exemplar-based memory ,  Expectancy  Expected utility maximization –, –, , , , , – Experience sampling ,  Exploration–exploitation tradeoff , , , –,  Exploitation , –, , , –, 



https://doi.org/10.1017/9781009002042.029 Published online by Cambridge University Press



Index

Exploration–exploitation tradeoff (cont.) Exploration , , , , –, , , , , , , , , ,  Flat priors  Forecasting , ,  Fourfold pattern of risk attitudes  Frequency estimates  Generalized Context Model –, , –, , ,  Heuristics , , , , , , , –, , , , ,  Availability heuristic , – Heuristic sampling  Representativeness heuristic  Social-circle heuristic –, , – Hot Stove Effect –, –, , –, , , –, –, , – IBL Model , –, –, –, – Impression formation , , –, , –, –, –,  Inductive reasoning ,  Information costs –, , –, – Instance-Based Learning Theory – Intergroup bias , –, ,  Intuition processes –, –, –, –, , , –,  J/DM Separation Paradox –,  Mechanistic account  Mere presentation effect , , –, –,  Metacognitive  Metacognitive deficit  Metacognitive myopia  Metacognitive task  Minimal groups –, –,  Motivated reasoning  Motivated sampling –, , –, ,  Multiple-cue judgment –, , , , , –,  Negativity bias , , , –, , –, , 

Online rating , , , ,  Online rating systems  Opportunity costs , , – Optional stopping , , ,  Overgeneralization , ,  Polarization , –, , , , –, , , –, – Positive testing ,  Precise/Not Precise (PNP) Model –, , –, –, , , , –,  Prediction error , , , , – Preparedness Conceptualization of preparedness , ,  Preparedness in evaluative learning , ,  Probability weighting , , ,  Property induction –,  Pseudocontingency , , –, –, – Range Frequency Theory –, –, ,  Rationality , , –, , , ,  Bounded rationality , ,  Ecological rationality , –, ,  Regression effect –, , ,  Regressiveness ,  Relative rank , –, , –, , –,  Relative rank effect(s)  Relative Rank Theory  Relative ranked position –, –, , –, –, , – Reward pursuit  Risk taking , , , , , , –,  Financial risk taking ,  Root-memory – Saliency –,  Sample distribution , , , –, –, –, –, ,  Sampling assumptions –,  Sampling distribution , , , , ,  Sampling frames , –, –, – Sampling paradigm –, , –, –, , , –, , ,  Sampling strategies , , , , –, , , , , , , , , –, , ,  Hedonic sampling strategy  Heuristic sampling strategy  Information sampling strategy 

https://doi.org/10.1017/9781009002042.029 Published online by Cambridge University Press

Index Selection bias – Skewness , , , , , ,  Small samples , , , , –, , , –, , , , , , , , , ,  Functionally small samples  Inference from small samples – Reliance on small samples , –, , , , , , –, –, ,  Small samples hypothesis ,  Social feedback  Social judgment , , , –, , , , –, ,  Social media , , –, , , – Social memory –, , –, , , 



Social sampling –, –, –, , , –, ,  Social Sampling Model , , , , –, –, –, –, – Social-Circle Model , –, , –, , , , , ,  Thurstonian uncertainty , , –, –, –,  Trade-off Cost–benefit trade-off –, ,  Speed–accuracy trade-off –, –, –, ,  Underweighting of rare events , , , , , , –,  Valence –, –, , –, , –, , , , , –, , , , , 

https://doi.org/10.1017/9781009002042.029 Published online by Cambridge University Press

https://doi.org/10.1017/9781009002042.029 Published online by Cambridge University Press