Successful remembering and successful forgetting: a festschrift in honor of Robert A. Bjork 9781848728912, 8620111531, 0203842537, 1848728913

940 133 7MB

English Pages xvii, 541 Seiten : Illustrationen, Diagramme [560] Year 2016

Polecaj historie

Successful remembering and successful forgetting: a festschrift in honor of Robert A. Bjork 9781848728912, 0203842537, 1848728913

This volume provides a window into cutting-edge research in cognitive psychology on inhibition in memory, metacognition,

603 46 2MB Read more

Waikiki: A History of Forgetting and Remembering 9780824865528

Waikiki:A History of Forgetting and Remembering presents a compelling cultural and environmental history of the area, ex

137 56 15MB Read more

Topics in Empirical International Economics: A Festschrift in Honor of Robert E. Lipsey 9780226060859

In this timely volume emanating from the National Bureau of Economic Research's program in international economics,

169 38 2MB Read more

Istanbul: City of Forgetting and Remembering 9781909961159

With its varied and glorious history, Istanbul remains one of the world’s perennially fascinating cities. Richard Tillin

479 93 2MB Read more

Skills of a Successful Software Engineer 9781617299704

Skills to grow from a solo coder into a productive member of a software development team, with seasoned advice on everyt

178 112 1MB Read more

The Journey of a Successful Entrepreneur 9782381582412

783 134 1MB Read more

101 Tips for a Successful Automation Career

1,793 217 5MB Read more

Apollo 13: A Successful Failure 1541559002, 9781541559004

"Houston, we've had a problem." On April 13, 1970, the three astronauts aboard the Apollo 13 spacecraft w

571 36 15MB Read more

Secrets to a Successful Startup 9781608686674

901 121 1MB Read more

On remembering, forgetting, and understanding sentences: A study of the deep structure hypothesis 9783111351575, 9783110996869

198 39 11MB Read more

Successful remembering and successful forgetting: a festschrift in honor of Robert A. Bjork
9781848728912, 8620111531, 0203842537, 1848728913

Author / Uploaded
Benjamin
Aaron S.;Bjork
Robert A

Citation preview

Successful Remembering and Successful Forgetting

Y109937.indb 1

10/15/10 11:03:10 AM

Y109937.indb 2

10/15/10 11:03:11 AM

Successful Remembering and Successful Forgetting A Festschrift in Honor of Robert A. Bjork

EditEd by

Aaron S. Benjamin

Psychology Press New York London

Y109937.indb 3

10/15/10 11:03:11 AM

Psychology Press Taylor & Francis Group 270 Madison Avenue New York, NY 10016

Psychology Press Taylor & Francis Group 27 Church Road Hove, East Sussex BN3 2FA

© 2011 by Taylor and Francis Group, LLC This edition published in the Taylor & Francis e-Library, 2011. To purchase your own copy of this or any of Taylor & Francis or Routledge’s collection of thousands of eBooks please go to www.eBookstore.tandf.co.uk. Psychology Press is an imprint of Taylor & Francis Group, an Informa business International Standard Book Number: 978-1-84872-891-2 (Hardback) For permission to photocopy or use material electronically from this work, please access www. copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging‑in‑Publication Data Successful remembering and successful forgetting : a festschrift in honor of Robert A. Bjork / edited by Aaron S. Benjamin. p. cm. Includes bibliographical references and index. ISBN 978-1-84872-891-2 (hardcover : alk. paper) 1. Memory. I. Bjork, Robert A. II. Benjamin, Aaron S. BF371.S86 2011 153.1’2--dc22 ISBN 0-203-84253-7 Master e-book ISBN

Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the Psychology Press Web site at http://www.psypress.com

2010041648

To my mentors: Dad, Herb, Ken, Gus, and Bob, of course

v

Y109937.indb 5

10/15/10 11:03:12 AM

Y109937.indb 6

10/15/10 11:03:12 AM

Contents

Preface Contributors Chapter 1

xi xiii

On the Symbiosis of Remembering, Forgetting, and Learning

Robert A. Bjork

1

Chapter 2

Intricacies of Spaced Retrieval: A Resolution

23

Chapter 3

Distributed Learning and the Size of Memory: A 50-Year Spacing Odyssey

49

Chapter 4

The Causes and Consequences of Reminding

71

Chapter 5

Retrieval-Induced Forgetting and the Resolution of Competition

89

Henry L. Roediger III and Jeffrey D. Karpicke

Thomas K. Landauer

Aaron S. Benjamin and Brian H. Ross

Benjamin C. Storm

Chapter 6

On the Relationship Between Interference and Inhibition in Cognition

107

Sleep, Retrieval Inhibition, and the Resolving Power of Human Memory

133

Michael C. Anderson and Benjamin J. Levy

Chapter 7

Malcolm D. MacLeod and Justin C. Hulbert

vii

Y109937.indb 7

10/15/10 11:03:12 AM

viii • Contents

Chapter 8

Blocking Out Blocks: Adaptive Forgetting of Fixation in Memory, Problem Solving, and Creative Ideation 153 Steven M. Smith

Chapter 9

A Contextual Framework for Understanding When Difficulties Are Desirable 175 Mark A. McDaniel and Andrew C. Butler

Chapter 10 Testing, Generation, and Spacing Applied to Education: Past, Present, and Future

199

Chapter 11 Learning From and for Tests

217

Chapter 12 Can Desirable Difficulties Overcome Deceptive Clarity in Scientific Visualizations?

235

Catherine O. Fritz

William B. Whitten II

Marcia C. Linn, Hsin-Yi Chang, Jennifer L. Chiu, Zhihui Helen Zhang, and Kevin McElhaney

Chapter 13 Desirable Difficulties and Studying in the Region of Proximal Learning 259 Janet Metcalfe

Chapter 14 Data Entry: A Window to Principles of Training

277

Chapter 15 An Output-Bound Perspective on False Memories: The Case of the Deese–Roediger–McDermott (DRM) Paradigm

297

Chapter 16 How Should We Define and Differentiate Metacognitions?

329

Chapter 17 Learning From the Consequences of Retrieval: Another Test Effect

347

Alice F. Healy, James A. Kole, Erica L. Wohldmann, Carolyn J. Buck-Gengler, and Lyle E. Bourne Jr.

Asher Koriat, Ainat Pansky, and Morris Goldsmith

Harry P. Bahrick, Melinda K. Baker, Lynda K. Hall, and Lise Abrams

Elizabeth Ligon Bjork, Benjamin C. Storm, and Patricia A. deWinstanley

Y109937.indb 8

10/15/10 11:03:12 AM

Contents • ix

Chapter 18 Failing to Predict Future Changes in Memory: A Stability Bias Yields Long-Term Overconfidence

365

Chapter 19 Relying on Other People’s Metamemory

387

Chapter 20 Multidimensional Models for Item Recognition and Source Identification

409

Chapter 21 Pursuing a General Model of Recall and Recognition

427

Chapter 22 Memory for Pictures: Sometimes a Picture Is Not Worth a Single Word

447

Chapter 23 Administration of Dehydroepiandrosterone (DHEA) Increases Serum Levels of Androgens and Estrogens But Does Not Enhance Recognition Memory in Postmenopausal Women

463

Chapter 24 On the Fruitful Relationship Between Functional Neuroimaging and Cognitive Theories of Human Learning and Memory

477

Nate Kornell

Barbara A. Spellman, Elizabeth R. Tenney, and Margaret J. Scalia

Thomas D. Wickens

Troy A. Smith and Daniel R. Kimball

Joyce M. Oates and Lynne M. Reder

Bethany Stangl, Elliot Hirshman, and Joseph Verbalis

Alan Richardson-Klavehn

Chapter 25 Age-Related Changes in the Episodic Simulation of Past and Future Events 505 Index

Y109937.indb 9

Daniel L. Schacter, Brendan Gaesser, and Donna Rose Addis

527

10/15/10 11:03:12 AM

Y109937.indb 10

10/15/10 11:03:12 AM

Preface

In January 2009, the University of California, Los Angeles hosted a conference in which only a small portion of the many friends, mentors, colleagues, and students of Robert Bjork came together to discuss current research on human memory and to celebrate together the many traditions and friendships that mark Bob’s career. There is an awkwardness to convening such a “legacy” event to a career that is still very much unfolding, but reviewing even the fraction of Bob’s career now in the rearview mirror revealed such a creative and diverse array of themes that we were thankful for the opportunity to catch our breath and start thinking about the sequel to this book for the future. The conference and this volume would not have been possible were it not for a Festschrift grant generously provided by the American Psychological Association, supplementary funding provided by the UCLA College of Letters and Science and the UCLA Department of Psychology, and assistance with this book provided by the Association for Psychological Science. Special gratitude is owed to the foot soldiers at UCLA who handled the logistics of the event: Jami Jesek, Cat Zimmerman, and Deanna Evans. The chapters in this volume are a testament to the many ways in which Bob’s ideas have shaped the course of research on human memory over the past four decades. The idea that forgetting is an adaptive response to the demands of a retrieval system fraught with competition finds considerable discussion here. This attention is not surprising; such ideas have helped recalibrate conceptualizations of memory away

xi

Y109937.indb 11

10/15/10 11:03:12 AM

xii • Preface

from one in which the computer is the dominant metaphor. The first eight chapters of this volume pursue this theme in various forms. Bob’s career is also marked by a commitment to using educational and training settings in the real world as a proving ground for psychological findings and theories. This is reflected in aspects of his service to the field, including his co-founding of the influential journal Psychological Science in the Public Interest, his many awards for distinguished teaching and service, and his chairmanship of the National Research Council Committee on Techniques for the Enhancement of Human Performance. Chapters 9 through 14 of this volume review current issues and techniques in the application of research on learning and memory to enhancing human performance. The remaining chapters address topics that are relevant to the translation of cognitive psychology to human performance. Bob was the first to recognize the critical role of metacognition in such problems—if learners hold misconceptions of their own levels of knowledge and skills, then they are unlikely to be able to guide their own learning effectively. Current research in metacognition is covered in Chapters 15 through 19. The final chapters cover a variety of issues related to how remembering can be enhanced, how research on remembering can be profitably guided by the use of mathematical modeling, and how biological data can be brought to bear on theorizing about memory. It is no accident that the topics in this volume align rather poorly with the organization of a traditional textbook on human memory. Bob’s research has never been guided by the boutique theorizing that follows current trends, nor by cycles of popular interest or funding availability—except insofar as his research has driven those cycles. Bob’s research represents a somewhat renegade tradition that he has worked to pass on to the many students, postdoctoral fellows, and faculty with whom he has collaborated. I, along with all of the authors in this volume, hope that the chapters here represent this tradition well and that they serve as a reminder of the gratitude we all have to Bob for his leadership, guidance, and friendship. Aaron Benjamin

Y109937.indb 12

10/15/10 11:03:13 AM

Contributors Lise Abrams Department of Psychology University of Florida Gainesville, Florida

Aaron S. Benjamin Department of Psychology University of Illinois Champaign, Illinois

Donna Rose Addis Department of Psychology The University of Auckland Auckland, New Zealand

Elizabeth Ligon Bjork Department of Psychology University of California, Los Angeles Los Angeles, California

Michael C. Anderson MRC Cognition and Brain Sciences Unit Cambridge, England Harry P. Bahrick Department of Psychology Ohio Wesleyan University Delaware, Ohio Melinda K. Baker Department of Psychology Ohio Wesleyan University Delaware, Ohio

Robert A. Bjork Department of Psychology University of California, Los Angeles Los Angeles, California Lyle E. Bourne Jr. Department of Psychology and Neuroscience University of Colorado Boulder, Colorado

xiii

Y109937.indb 13

10/15/10 11:03:13 AM

xiv • Contributors

Carolyn J. Buck-Gengler Department of Psychology and Neuroscience University of Colorado Boulder, Colorado Andrew C. Butler Psychology and Neuroscience Duke University Durham, North Carolina Hsin-Yi Chang Graduate Institute of Science Education National Kaohsiung Normal University (NKNU) Taiwan, Republic of China Jennifer L. Chiu Education in Mathematics, Science, and Technology University of California, Berkeley Berkeley, California Patricia A. deWinstanley Department of Psychology Oberlin College Oberlin, Ohio Catherine O. Fritz Psychology in Education County South, Lancaster University Lancaster, United Kingdom Brendan Gaesser Department of Psychology Harvard University Cambridge, Massachusetts

Y109937.indb 14

Morris Goldsmith Department of Psychology University of Haifa Haifa, Israel Lynda K. Hall Department of Psychology Ohio Wesleyan University Delaware, Ohio Alice F. Healy Department of Psychology and Neuroscience University of Colorado Boulder, Colorado Elliot Hirshman Department of Psychology University of Maryland, Baltimore County Baltimore, Maryland Justin C. Hulbert MRC Cognition & Brain Sciences Unit Cambridge, England Jeffrey D. Karpicke Department of Psychological Sciences Purdue University West Lafayette, Indiana Daniel R. Kimball Department of Psychology University of Oklahoma Norman, Oklahoma James A. Kole Department of Psychology and Neuroscience University of Colorado Boulder, Colorado

10/15/10 11:03:13 AM

Contributors • xv

Asher Koriat Department of Psychology University of Haifa Haifa, Israel

Janet Metcalfe Department of Psychology Columbia University New York, New York

Nate Kornell Department of Psychology Williams College Williamstown, Massachusetts

Joyce M. Oates American University Washington, DC

Thomas K. Landauer Pearson Knowledge Technologies and University of Colorado at Boulder Boulder, Colorado Benjamin J. Levy Department of Psychology Stanford University Palo Alto, California

Lynne M. Reder Department of Psychology Carnegie Mellon University Pittsburgh, Pennsylvania

Marcia C. Linn Education in Mathematics, Science, and Technology University of California, Berkeley Berkeley, California

Alan Richardson-Klavehn Department of Neurology and Center for Advanced Imaging Otto von Guericke University of Magdeburg Magdeburg, Germany

Malcolm D. MacLeod School of Psychology University of St. Andrews St. Andrews, Scotland

Henry L. Roediger III Department of Psychology Washington University St. Louis, Missouri

Mark A. McDaniel Department of Psychology Washington University St. Louis, Missouri Kevin McElhaney Education in Mathematics, Science, and Technology University of California, Berkeley Berkeley, California

Y109937.indb 15

Ainat Pansky Department of Psychology University of Haifa Haifa, Israel

Brian H. Ross Department of Psychology University of Illinois Champaign, Illinois Margaret J. Scalia Department of Psychology University of Virginia Charlottesville, Virginia

10/15/10 11:03:13 AM

xvi • Contributors

Daniel L. Schacter Department of Psychology Harvard University Cambridge, Massachusetts Steven M. Smith Department of Psychology Texas A&M University College Station, Texas Troy A. Smith Department of Psychology University of Oklahoma Norman, Oklahoma Barbara A. Spellman Department of Psychology and School of Law University of Virginia Charlottesville, Virginia Bethany Stangl Department of Psychology George Washington University Washington, DC Benjamin C. Storm Department of Psychology University of Illinois at Chicago Chicago, Illinois

Joseph Verbalis Department of Medicine Georgetown University Medical Center Washington, DC William B. Whitten II Graduate School of Education Fordham University New York, New York Thomas D. Wickens Department of Psychology University of California, Berkeley Berkeley, California Erica L. Wohldmann Department of Psychology California State University, Northridge Northridge, California Zhihui Helen Zhang Education in Mathematics, Science, and Technology University of California, Berkeley Berkeley, California

Elizabeth R. Tenney Department of Psychology University of Virginia Charlottesville, Virginia

Y109937.indb 16

10/15/10 11:03:13 AM

Contributors • xvii

Group Photograph

From left to right: Hal Pashlet, Malcolm MacLeod, Alan Richardson-Klavehn, John Shaw III, Thomas Landauer, Arthur Glenberg, William Whitten III, Catherine Fritz, Aaron Benjamin, Thomas Wickens, Lynne Reder, Henry Roediger III, Elliot Hirshman, Elizabeth Bjork, Bruce Baker, Marcia Linn, Robert Bjork, Michael Anderson, Harry Bahrick, Daniel Schacter, Patricia deWinstanley, Alice Healy, Eric Eich, Steve Smith, Janet Metcalfe, Anthony Wagner, and Barbara Spellman.

Y109937.indb 17

10/15/10 11:03:16 AM

Y109937.indb 18

10/15/10 11:03:16 AM

1

On the Symbiosis of Remembering, Forgetting, and Learning Robert A. Bjork

It is natural for people to think that learning is a matter of building up skills or knowledge in one’s memory, and that forgetting is a matter of losing some of what was built up. From that perspective, learning is a good thing and forgetting is a bad thing. The relationship between learning and forgetting is not, however, so simple, and in certain important respects is quite the opposite: Conditions that produce forgetting often enable additional learning, for example, and learning or recalling some things is a contributor to the forgetting of other things. My goal in this chapter is to characterize the interdependencies of remembering, forgetting, and learning—interdependencies that essentially define the unique functional architecture of how humans learn and remember, or fail to learn and remember. After some comments on the importance of forgetting, I discuss how learning and remembering contribute to forgetting; why forgetting enables, rather than undoes, learning; and why the interplay of forgetting, remembering, and learning is adaptive and yet poorly understood by the user.

Why Forgetting Is Important One of the “important peculiarities of human memory” that motivated Elizabeth Bjork and me to propose our new theory of disuse framework (R. A. Bjork & Bjork, 1992), to which I refer intermittently in this 1

Y109937.indb 1

10/15/10 11:03:16 AM

2 • Robert A. Bjork

chapter, is that human memory is characterized by a storage capacity that is essentially unlimited, coupled with a retrieval capacity that is severely limited. At any one point in time, most of the vast amount of information that exists in our memories (names, facts, procedures, numbers, events, and so forth), all of which was recallable at earlier points in time, is not recallable. Even the most overlearned information, such as a combination lock number, or phone number, or street address, which may have been instantly and automatically recallable after a long period of use, becomes nonrecallable after a long enough period of disuse—but remains in memory. We labeled our framework a “new theory of disuse” to distinguish it from Thorndike’s (1914) original “law of disuse,” which asserted that memories, without continued use or access, decay from memory. Instead, we argue—as many others have—that although memories become inaccessible without continued use and access, they remain in memory. The theory distinguishes between the retrieval strength of a memory representation—that is, how activated or accessible it is at a given point in time, which is influenced by local conditions, such as recency and current cues—and the storage strength of that representation, which is an index of how entrenched or interassociated that representation is with related representations in memory. Recall is assumed to be entirely determined by current retrieval strength, whereas storage strength is a latent variable that acts to retard the loss (forgetting) and enhance the gain of retrieval strength. Other assumptions of the theory are mentioned below where they are relevant to particular interactions of remembering, forgetting, and learning. The failure to recall information we know exists in our memories is a major frustration, but were everything in our memories to be recallable, we would suffer greater frustrations. Even recalling one’s current phone number, for example, would become a slow and error-prone process if every number one has had across one’s lifetime were to come to mind, requiring some kind of decision process to select the current number. As William James (1980) was one of the first to emphasize, “If we remembered everything, we should on most occasions be as ill off as if we remembered nothing” (p. 680). In short, because we remember so much, we do not want everything in our memories to be accessible. We have a constant burden, for example, to keep our memories current. We need to remember our current phone number, not our prior phone number; we need to remember where we parked the car today, not yesterday or a week ago; we need to remember how some current software or hardware works, not how the prior versions work; and on and on. Such updating, as I and my

Y109937.indb 2

10/15/10 11:03:16 AM

On the Symbiosis of Remembering, Forgetting, and Learning • 3

collaborators have argued in multiple papers over the years (e.g., R. A. Bjork, 1970, 1972, 1978, 1989; E. L. Bjork, Bjork, & Anderson, 1998), requires some mechanism to set aside, inhibit, or erase information that is now out of date and, hence, a source of errors and confusion. Without some such mechanism, I have argued, we would “degenerate to a proactive-interference-induced state of total confusion” (Bjork, 1972, p. 218). The mechanism in the case of human memory, in my view, is retrieval inhibition, which I have argued is a broadly adaptive mechanism in human memory (R. A. Bjork, 1989). Without continuing access and use, previously learned information and procedures are not lost from memory, but become inaccessible—except, possibly, when highly distinctive situational, interpersonal, or body state cues associated with a given memory are reinstated. That is, retrieval of the information or procedures in question becomes inhibited—and, as I sketch in the next section, learning and remembering other information and procedures contribute to such inhibition.

How Learning and Remembering Contribute to Forgetting Why do we forget information that was once recallable? The principal answer to that question, alluded to above, is not that the information— like footprints in the sand—fades away or decays in our memories over time, as was thought to be the case by researchers during the early decades of controlled research on memory (e.g., Thorndike, 1914). The decay idea, which remains compelling to most people based on their introspections, was instantiated in Thorndike’s law of disuse, as mentioned above. Thorndike’s law, though, came to be thoroughly discredited, starting with a devastating critique by McGeoch (1932). Instead, McGeoch argued, information that has been stored in our long-term memories tends to remain there, but it can become inaccessible (forgotten) owing to one or both of two factors: “reproductive inhibition,” which refers to losing access to information in memory by virtue of interference from competing information in memory, and “altered stimulating conditions,” which refers to the changing of the retrieval cues that are available to us as we move on with our lives. (For a brief history of research on interference and forgetting, see R. A. Bjork, 2003). Learning, therefore, contributes to forgetting. As we learn new information, procedures, and skills, we create the potential for competition with related information, skills, and procedures that already exist in

Y109937.indb 3

10/15/10 11:03:16 AM

4 • Robert A. Bjork

memory. Access to that earlier learning can then be inhibited or blocked by related aspects of the newer, and perhaps more accessible, learning. (Whether the primary mechanism is inhibition or blocking remains a matter of current dispute; see the chapters by Anderson, Bjork, Buzsaki, Hasher, and MacLeod in Roediger, Dudai, & Fitzpatrick, 2007). Such competition, however, goes both ways: Earlier learned information can also block or inhibit access to more recently learned information. That is, to use the jargon of research on interference and forgetting, we are subject to both retroactive interference and proactive interference. Retrieval as a Memory Modifier The results of more recent research add to the picture sketched above. The act of retrieving information from our memories does much more than simply reveal that the information in question exists in our memories. In fact, retrieving information modifies our memories: The retrieved information becomes more recallable than it would have been otherwise, and other information in competition with the retrieved information—that is, information associated to the same retrieval cue or set of cues—becomes less accessible. Using our memories, in effect, alters our memories; that is, retrieval is a “memory modifier” (R. A. Bjork, 1975). In our new theory of disuse, Elizabeth Bjork and I concur with Thorndike’s assertion that disuse is a key factor in forgetting, but not because unused memories decay, but rather because access to those memories becomes inhibited—owing, primarily, to retrieval of competing memories (R. A. Bjork & Bjork, 1992). Demonstrations of what might be considered the positive effects of retrieval as a memory modifier—that retrieving information from memory is a powerful learning event—trace back across almost 100 years of the research literature (e.g., R. A. Bjork, 1975, 1988; Carrier & Pashler, 1992; Gates, 1917; Glover, 1989; Hogan & Kintsch, 1971; Izawa, 1970; Landauer & Bjork, 1978; Landauer & Eldridge, 1967; McDaniel & Masson, 1985; Spitzer, 1939; Tulving, 1967; Whitten & Bjork, 1977), and there has recently been renewed interest in such effects, given their pedagogical implications (e.g., Karpicke & Roediger, 2008; Morris & Fritz, 2000; Pashler, Zarow, & Triplett, 2003; Roediger & Karpicke, 2006a,b; Storm, Bjork, & Storm, in press). For present purposes, however, it is the negative effects of retrieval as a memory modifier—termed retrievalinduced forgetting by Anderson, Bjork, and Bjork (1994)—that are of interest: that is, the loss of access to information that is in competition with the retrieved information. It is only from one perspective, however, that retrieval-induced forgetting is a negative effect. From another perspective, retrieval-induced

Y109937.indb 4

10/15/10 11:03:17 AM

On the Symbiosis of Remembering, Forgetting, and Learning • 5

forgetting modifies the accessibility of information in memory in adaptive ways. As we use our memories, we make more accessible the information, procedures, and skills we are using, and we make less accessible competing information, procedures, or skills. The interpretation my collaborators and I have advocated (e.g., Anderson, 2003; Anderson & Bjork, 1994; Anderson et al., 1994; Anderson, Bjork, & Bjork, 2000; Anderson & Spellman, 1995; E. L. Bjork, Bjork, & Anderson, 1998; Storm, Bjork, Bjork, & Nestojko, 2006) is that the act of recalling information from memory requires not only that the information be selected and produced, but also that other information associated to the same cues be selected against and not produced. The information selected against is inhibited, rendering it less accessible should it be the target of recall in the future (for arguments against inhibitory accounts, see MacLeod, Dodd, Sheard, Wilson, & Bibi, 2003; Perfect et al., 2004). How long such inhibitory effects might last is a matter of current research (e.g., MacLeod & Macrae, 2001) and debate, but Storm, Bjork, and Bjork (2008) have obtained evidence that the recall of items repeatedly selected against goes to essentially zero within a single experimental session. Whether repeatedly making items the target of retrieval-induced forgetting across experimental sessions would make those items permanently inaccessible—at least in the absence of very specialized and discriminating retrieval cues—remains to be seen. Adaptive Aspects of the Interplay of Forgetting and Remembering Beyond the general consideration that forgetting is important, given the storage and retrieval characteristics of human memory and the ongoing need we have to keep our memories current, there are other, more specific reasons why the interplay of forgetting and remembering is adaptive. Compared to some kind of system in which out-of-date memories were to be overwritten or erased, for example, having such memories become inaccessible, but remain in storage, has important advantages. Because those memories are inaccessible, they do not interfere with the retrieval of current information and procedures, but because they remain in memory they can—at least under some circumstances—be recognized when presented and, more importantly, be relearned at an accelerated rate, should that be desirable. In fact, some of the findings discussed in the next section suggest that such inhibited memories are uniquely relearnable, especially if they were strongly encoded in memory at some earlier point. Phrased in terms of the assumptions of the new theory of disuse, the largest increments in both storage and retrieval strength occur when the to-be-learned (or relearned) information has low retrieval strength and high storage strength. Thus, some

Y109937.indb 5

10/15/10 11:03:17 AM

6 • Robert A. Bjork

name or number or procedure from one’s past, even one’s distant past, can be relearned with great efficiency, should it become relevant again. Another consideration has to do with the statistics of use. Information and procedures we will need in the near future tend to be from the recent past, which is one reason that computer programs and electronic gadgets, such as cell phones, make recently accessed documents, addresses, and numbers more readily accessible than other documents, addresses, and numbers. In the case of human memory, remembering information makes that information more accessible in the near future and any competing information and procedures less accessible. In that context, another “important peculiarity of human memory” (R. A. Bjork & Bjork, 1992) is relevant—namely, that with disuse of two competing memory representations, access shifts toward the earlier learned representation with time. Such a shift from recency to primacy across a period of disuse is a very general effect, one that occurs on multiple timescales and for many different types of memories. I have speculated elsewhere (Bjork, 2001) as to the mechanisms that might be responsible for such regression effects, but the important point for present purposes is that such regression effects, from the standpoint of the statistics of use, may also be adaptive. The reason has to do with why, in real-world contexts, people might stop using the most recent of competing representations. Often, those reasons will be accompanied by the earlier learned representation again becoming needed. One of many possible examples might be returning to the United States after a prolonged stay in Great Britain, during which driving a car and staying alive required acquiring a set of perceptual and procedural routines that differ, markedly, from the corresponding routines in the United States. Disuse of the newer, Great Britain-appropriate routines could mean a number of things, including having a fatal accident somewhere in Great Britain, but it is more likely to mean that one has returned to the United States, where it is now advantageous to have the earlier learned routines be most accessible. Finally, in principle, the contributions of remembering to forgetting can make unwanted memories less accessible. That is, to the extent that memories one wants to recall are associated with the sames cues or contexts as memories one does not want to recall, repeatedly recalling the more positive memories can, via retrieval-induced forgetting, inhibit access to the unwanted memories (Anderson & Green, 2001; E. L. Bjork et al., 1998; Levy and Anderson, 2008). Whether retrievalinduced forgetting, or a more explicit and directed inhibitory mechanism (e.g., Anderson & Green, 2001; Levy & Anderson, 2008), actually

Y109937.indb 6

10/15/10 11:03:17 AM

On the Symbiosis of Remembering, Forgetting, and Learning • 7

provides a mechanism for repression is a matter of current controversy (see Anderson & Levy, 2006).

Forgetting as a Facilitator of Learning There are a variety of conditions of learning in which some manipulation known to produce forgetting, when introduced between learning opportunities, enhances learning. Ted Allen and I found, for example, that introducing a more difficult shadowing activity after a first opportunity to study a to-be-learned word triad impaired recall of that triad, relative to recall following an easier shadowing activity, but enhanced the learning of the triad (R. A. Bjork & Allen, 1970). That is, if—rather than be tested for recall of the triad—participants were provided a second study opportunity at the same point in time, later recall was enhanced by the more difficult intervening activity. Similarly, Steve Smith, Arthur Glenberg, and I found that introducing forgetting via a change in environmental context enhanced learning (Smith, Glenberg, & Bjork, 1978). When participants were given a list of items to study in a first environmental context and then came back after three hours and studied the list again, either in the same setting or a new setting, their subsquent recall in a neutral context was enhanced by having studied the list in two different contexts. Context change induces forgetting, so had the participants been tested at the time of the second study session their recall of the list would, presumably, have been impaired by the change in context, but their learning, as measured by their subsequent recall, was enhanced. Perhaps the prime example of forgetting enhancing learning is the spacing effect, one of the most robust and general effects from the entire history of experimental psychology (for reviews, see Cepeda, Pashler, Vul, Wixted, & Rohrer, 2006; Dempster, 1996). When a second study opportunity is provided after a delay following a first study opportunity, rather than being presented with little or no delay, long-term recall is enhanced, often very significantly. Again, though, were the studied material to be tested following a short delay or a long delay, we would observe that the longer delay results in poorer recall of the studied material—that is, more forgetting. Similarly, interleaving, rather than blocking, the learning trials on separate to-be-learned tasks produces more forgetting between trials on a given task during the learning phase, but tends to enhance long-term retention and transfer (e.g., Kornell & Bjork, 2008; Shea & Morgan, 1978; Simon & Bjork, 2001). Finally, we recently tested, and obtained support for, an unintuitive prediction of the new theory of disuse—namely, that materials

Y109937.indb 7

10/15/10 11:03:17 AM

8 • Robert A. Bjork

subjected to retrieval-induced forgetting should be relearned more effectively than materials not subject to such forgetting (Storm et al., 2008). This prediction follows because retrieval-induced forgetting is assumed to lower retrieval strength, but not storage strength, and the theory assumes that increments in storage strength and retrieval strength resulting from relearning are greater the lower the current retrieval strength of the to-be-relearned material. Forgetting and Desirable Difficulties The foregoing examples illustrate a subset of the manipulations I have labeled “desirable difficulties” (R. A. Bjork, 1994a, 1994b). Manipulations such as variation, spacing, introducing contextual interference, and using tests, rather than presentations, as learning events, all share the property that they appear during the learning process to impede learning, but they then often enhance learning as measured by posttraining tests of retention and transfer. Conversely, manipulations such as keeping conditions constant and predictable and massing trials on a given task often appear to enhance the rate of learning during instruction or training, but then typically fail to support long-term retention and transfer (for a broader discussion of desirable difficulties, see E. L. Bjork & Bjork, 2011; R. A. Bjork, 1999). As I and my colleagues have argued (E. L. Bjork & Bjork, in press; R. A. Bjork & Linn, 2006; Christina & Bjork, 1991; Ghodsian, Bjork, & Benjamin, 1997; Richland, Linn, & Bjork, 2007; Schmidt & Bjork, 1992), this pattern of effects has the potential to mislead instructors and students alike. Instructors become susceptible to choosing poorer conditions of learning over better conditions of learning, and students become susceptible to preferring those poorer conditions over better conditions (for a discussion of when desirable difficulties are desirable and when they are not, see McDaniel & Butler, this volume). Why Do Conditions That Induce Forgetting Often Enhance Learning? Why is it that forgetting, rather than undoing learning, often creates the conditions for more effective learning? In the context of the new theory of disuse framework the answer is straightforward: Gains in storage strength (learning) are a decreasing function of current retrieval strength. Any forgetting (i.e., decrease in retrieval strength) will, therefore, increase the acquisition of storage strength. The new theory of disuse is not, however, a process model, so that answer is unsatisfying in terms of clarifying the processes responsible for the fact that forgetting often enables learning. The following non-mutually-exclusive mechanisms may all play a role in why the conditions that create forgetting often also create opportunities for additional learning.

Y109937.indb 8

10/15/10 11:03:17 AM

On the Symbiosis of Remembering, Forgetting, and Learning • 9

Encoding Variability One possibility is illustrated by Estes’s (1955) fluctuation model, which assumes that to-be-learned responses are associated with cues in the environment, only some of which are “sampled” by the learner at any one point in time. Which cues are sampled (“available” in Estes’s terminology) is assumed to fluctuate across time as a function of changes in the learner’s physical, emotional, and cognitive state and changes in the environment itself. Forgetting is then a consequence of more new (unassociated) cues and fewer old (associated) cues being sampled as a retention interval increases. What such forgetting also does, though, is provide additional cues that can be associated to the target response, which results in more total cues being sampled and associated to the target response—that is, more learning, which is assumed to be a function of the percentage of the total population of cues that are associated to the response in question. Spacing or context change between trials will enhance learning, whereas massed learning trials will result in very little forgetting between trials, but also very little learning. The fluctuation model, which was originally developed to account for animal learning phenomena, but was extended by Bower (1972) and others to human learning, including verbal/conceptual learning, is able to account nicely not only for spacing effects, but also for results such as the Bjork and Allen (1970) and Smith et al. (1978) findings summarized above. The more general idea the fluctuation model embodies is encoding variability. Contextual cues, including environmental, interpersonal, mood state, and body state cues, influence not only what is accessible from memory, but also how to-be-learned information is encoded or interpreted. Context change across repetitions of to-be-learned material induces forgetting, because current cues differ from prior cues, but also enhance learning because the material in question becomes associated with a greater range of contextual cues or encoded in more than one way. Such encoding variability will, in turn, faciliate access to learned materials, especially at a delay and across multiple contexts. Retrieval Practice Another possible explanation as to why forgetting can enable learning hinges on the fact that retrieving information from memory is a learning event, and the more involved or difficult the act of retrieval, provided it succeeds, the greater the learning benefit (see, e.g., Appleton-Knapp, Bjork, & Wickens, 2005; Benjamin & Ross, this volume; Bjork, 1988; Landauer & Bjork, 1978; Whitten & Bjork, 1977). The basic idea is that retrieving information from long-term memory is a fallible and probabilistic process—a kind of skill that, like other skills, profits from practice. The more difficult or involved the act of retrieving

Y109937.indb 9

10/15/10 11:03:17 AM

10 • Robert A. Bjork

to-be-learned materials during the learning phase, the more that act exercises the processes that will be needed later, after the learning phase (see Pyc & Rawson, 2009, for recent evidence consistent with the idea that the more difficult an act of retrieval, provided it succeeds, the more that retrieval enhances subsequent recall). Forgetting, therefore, by rendering access to to-be-learned information more difficult, can enhance learning. From a practical standpoint, it is important to realize not only that effortful retrieval can enhance learning, but also that the converse is true as well: Trivially easy retrievals appear to result in essentially no learning. Demonstrations of how ineffective easy/immediate retrievals of information can be for learning go back to the 1960s—and probably before. James Greeno demonstrated, for example, using anticipation trials in paired-associate learning, that providing some items with an extra and immediate second presentation during each cylce through a list did nothing for the long-term learning of those items (Greeno, 1964). The participants were essentially perfect in being able to anticipate (generate) the associated response after just having been presented an anticipation trial on a given pair, but that perfect responding resulted in no learning, as measured by the subsequent first trials on each such item on succeeding cycles through the list. In my own dissertation research, in which I used a Markov model to account for the combination of short-term memory and long-term learning effects in paired-associate learning, I found that the best-fitting estimate of the probability of an item going to the learned state after being studied while in the short-term-memory state was exactly zero (R. A. Bjork, 1966). That is, if a given to-be-learned pair had not already made a transition to the learned state, but was available in short-term memory (owing to recency), studying that item had no effect on its learning. Basically, when something is maximally accessible from memory, little or no learning results from studying or retrieving that something. Stated in terms of the new theory of disuse, when information already is at maximum retrieval strength, owing to recency or other factors, retrieving (or studying) that information has essentially no effect on the information’s storage strength, and it is storage strength, not retrieval strength, that corresponds to the relatively permanent changes that define learning. Solving a Problem Versus Remembering the Solution Another conjecture as to why forgetting enables learning traces to research by Larry Jacoby (1978). The basic idea is that learning has a problem-solving aspect— learners must find encoding or retrieval activities that will make studied

Y109937.indb 10

10/15/10 11:03:17 AM

On the Symbiosis of Remembering, Forgetting, and Learning • 11

materials accessible after a delay—and forgetting between learning trials is necessary for learners to carry out additional such activities. In Jacoby’s experiment, participants were asked to study pairs such as Foot: Shoe and were later tested via cued recall (Foot: ________?). A given study trial was either a read trial, in which a pair was shown intact, or a construct trial, in which the response had to be constructed from letter cues (S**e) and its semantic relationship to the stimulus term. Consistent with research on generation effects (e.g., deWinstanley, Bjork, & Bjork, 1996; Hirshman & Bjork, 1988; Slamecka & Graf, 1978), there was a very large advantage in later recall—more than two to one—when an item was constructed, rather than simply read. Of special interest, though, are the conditions when an item was studied intact and then, after either 0 or 20 intervening trials on other pairs, either studied again intact or constructed. A repetition after 20 intervening trials had the effects one would expect: a clear benefit of repetition on later cued recall and a substantially greater benefit when the response member had to be constructed rather than simply read. However, when the repetition was essentially immediate, final cued recall profited very little from either an additional study or construct trial. In particular, studying the item intact on one trial and then having to construct it on the next trial produced poorer final recall (42%) than did having only a single construct trial (57%). In Jacoby’s analysis, construct trials involve a kind of problem-solving activity—namely, deciding what semantic associate of the stimulus fits the letter cues provided—an activity that supports participants’ later ability to recall the response term when shown only the stimulus term. When, however, the answer to the problem has just been shown, those processes are nullified: The solution needs only to be remembered from the preceding trial, not constructed, and such immediate recall is not effective practice for the later cued-recall test. If, though, enough intervening trials are inserted to produce forgetting, it becomes necessary for the learner to solve the problem, not just remember the solution, which enhances later recall. Reloading Procedures and Skills A related idea from the motor skills literature is the reloading hypothesis, advocated by Lee and Magill (1983, 1985) to explain why randomly interleaving trials on separate to-belearned movement patterns, versus blocking those trials by movement pattern, enhances long-term retention and transfer. Once again, the findings in the motor skills literature are that blocking appears optimal, based on the rate of improvement of performance during practice,

Y109937.indb 11

10/15/10 11:03:18 AM

12 • Robert A. Bjork

but random/interleaved practice proves superior as measured by posttraining tests of retention or transfer. The advantages of interleaved over blocked learning trials have been demonstrated in a wide variety of laboratory and real-world tasks. Examples range from learning simple movement patterns (e.g., Shea & Morgan, 1978; Simon & Bjork, 2001) to acquiring sports skills, such as learning the several kinds of serves in badminton (Goode & Magill, 1986); to learning procedures, such as carrying out transactions on automated teller machines (Jamieson & Rogers, 2000); to more purely cognitive tasks, such as learning Boolean logic operations (Carlson & Yaure, 1990) or the formulas for the volumes of solids (Rohrer & Taylor, 2007), inducing the styles of artists from examples of their paintings (Kornell & Bjork, 2008), and moving from calculating the answers to multiplication problems to achieving direct retrieval of those answers (Rickard, Lau, & Pashler, 2008). For possible limitations on the advantages of random/interleaved practice, see Wulf and Shea (2002). According to the reloading hypothesis, interleaving the learning of each of several motor movement tasks enhances learning because it produces, in effect, a desirable kind of forgetting. The trials on the other to-be-learned movement patterns that intervene between successive trials on a given pattern result in a loss of access to the motor program that corresponds to that pattern, which then requires that the program be reloaded or reconstructed. Such reloading is assumed, as in the case of the retrieval of verbal information, to support the learning of—and later retrieval of—that movement pattern. In the case of a blocked practice trial on a given pattern, no such reloading—or at least no such effortful and, hence, productive reloading—is required.

How We Learn Versus How We Think We Learn In a number of prior papers, my colleagues and I have stressed that the accumulated body of research on how people learn, or fail to learn, has very important implications for optimizing instruction and training (e.g., R. A. Bjork & Bjork, 2011; Bjork, 1994a, 1994b, 1999, 2009; E. L. Bjork & Linn, 2006; Christina & Bjork, 1991; deWinstanley & R. A. Bjork, 2002; Ghodsian et al., 1997; Jacoby, Bjork, & Kelley, 1994; Kornell & Bjork, 2007; Schmidt &Bjork, 1992). Current customs and standard practices in instruction, training, and schooling do not, for the most part, seem to be informed by an understanding of the complex and unintuitive dynamics that characterize human learning and memory. Nor do we, as individuals, seem to understand how to engage fully our remarkable capacity to learn. Instead, we seem guided by a faulty

Y109937.indb 12

10/15/10 11:03:18 AM

On the Symbiosis of Remembering, Forgetting, and Learning • 13

mental model of ourselves as learners that leads us to manage our own learning activities in far from optimal ways. Why We Develop Faulty Mental Models of Ourselves as Learners It is very puzzling, in fact, that as lifelong users of our memories and learning capabilities, we do not end up with a more accurate mental model of how we learn, or fail to learn. Why is it, in short, that we are not educated by the “trials and errors of everyday living and learning” (R. A. Bjork, 1999, p. 455)? One consideration is that the functional architecture of how humans forget, remember, and learn is unlike the corresponding processes in man-made devices. Most of us do not, of course, understand the engineering details of how information is stored, added, lost, or overwritten in man-made devices, such as a computer or video recorder, but the functional architecture of such systems is simpler and more understandable than is the complex architecture of human learning and memory. To the extent, for example, that we do think of ourselves as working like such devices, we become prone to assuming that exposing ourselves to information and procedures will lead to storage (i.e., recording) of such information or procedures in our memories—that the information will write itself in one’s memory, so to speak. Also, if we think of human memory as akin to the memory in a manmade device of some kind, we are unlikely to appreciate the extent to which retrieving information from our memory increases the subsequent accessibility of that information and reduces the accessibility of competing information. Retrieving information from a compact disc or computer memory leaves the status of that information and related information unperturbed. More globally, we may fail to appreciate the volatility that characterizes access to information from our memories as conditions change, events intervene, and new learning happens. Recent findings (Koriat, Bjork, Sheffer, & Bar, 2004; Kornell & Bjork, 2009) suggest that learners are susceptible to what Kornell and Bjork have termed a stability bias—a tendency to think that access to information in memory will remain stable across a retention interval or additional study opportunities. Another consideration is that it may be natural for us to think of the memory traces that correspond to stored information or procedures as varying along some single dimension of strength. That is, the full multidimensionality of memory representations may be virtually impossible to appreciate based on intuition and experience alone. Without being privy to the amazing array of interactions of encoding conditions and test conditions that have been demonstrated in controlled experiments,

Y109937.indb 13

10/15/10 11:03:18 AM

14 • Robert A. Bjork

for example, how might someone come to appreciate those interactions? We may have a general idea that some learning activities produce better retention than others, but how, based on intuition and experience alone, would we ever come to appreciate fully that our performance on a test of a certain type—whether free or cued recall, or some type of recognition, priming, or savings test—will depend on a complex interaction of the nature of our encoding activities, how long it has been since those activities, and what cues will or will not be available at test? To the extent that we do not realize the multidimensional character of human memory, we also become prone to being misled as to the degree that learning has or has not happened and whether we will or will not be able to access needed information or procedures at a later time. Interpreting current performance (retrieval strength) as learning (storage strength) is perilous, for example, because current retrieval strength is heavily influenced not only by recency, but also by retrieval cues that are available now, but are likely to be unavailable later. Various subjective indices, too, can mislead us. The sense of familiarity or perceptual fluency, for example, can be taken as an index of understanding when it may reflect factors that are unrelated to understanding, such as perceptual priming (see, e.g., Reder & Ritter, 1992). Similarly, the sense of retrieval fluency—that is, how readily information “comes to mind”—can be misleading when it is the product of conditions that are available now, but will not be available later (Benjamin & Bjork, 1996; Benjamin, Bjork, & Schwartz, 1998). A major factor in our being overinfluenced by current objective or subjective indices of performance is that we fail to understand fully the degree to which the ability to retrieve information and procedures is cue dependent. We are apparently unable not only to look forward in time to a test situation and assess how the cues available then will differ from the current cues, but also to assess how that difference will affect our performance. Koriat and Bjork (2005, 2006) have demonstrated, for example, that participants are subject to what they term a foresight bias: When an answer is available at the time of study, people are prone to overestimate the likelihood that the answer will come to mind when it is required, but absent, on a later test. More specifically, as defined by Koriat and Bjork (2006), “Judgments of learning (JOLs) are inflated whenever information that is present at study and absent, but solicited, at test, such as the targets in cue–target paired associates, highlights aspects of cues that are less apparent when those cues are presented alone” (p. 959). Thus, for example, people overestimate the likelihood when presented a word pair such as light–lamp that they will later be able to recall “lamp” when cued with “light–____?” because they are

Y109937.indb 14

10/15/10 11:03:18 AM

On the Symbiosis of Remembering, Forgetting, and Learning • 15

unable to envision that other associates of light, such as dark or heavy, will come to mind, and compete with the retrieval of lamp, when light is presented alone. Similarly, Benjamin et al. (1998) demonstrated that participants— in predicting the likelihood that they would later be able to free-recall the answers they gave to general-knowledge questions (without the questions again being presented)—were fooled by how quickly an answer to a given question came to mind. The participants assumed, apparently, that answers that come to mind quickly in the presence of the question would again come to mind quickly when the question was absent. To have predicted the actual relationship—namely, that the longer it took to answer a question, the more likely that answer would be recallable later—requires that someone understand two poorly understood characteristics of human memory: (1) that retrieval is a learning event and the more difficult a (successful) retrieval, the more potent the learning, and (2) that semantic memory, as tested by the original question, differs from episodic memory, as tested by the later free-recall test. The Importance of Becoming Metacognitively Sophisticated as a Learner Whatever the reasons for our not developing accurate mental models of ourselves as learners, the importance of becoming sophisticated as a learner cannot be overemphasized. Increasingly, coping with the changes that characterize today’s world—technological changes, job and career changes, and changes in how much of formal and informal education happens in the classroom versus at a computer terminal, coupled with the range of information and procedures that need to be acquired—requires that we learn how to learn. Also, because more and more of our learning will be what Whitten, Rabinowitz, and Whitten (2006) have labeled unsupervised learning, we need, in effect, to know how to manage our own learning activities. To become effective in managing one’s own learning requires not only some understanding of the complex and unintuitive processes that underlie one’s encoding, retention, and retrieval of information and skills, but also, in my opinion, avoiding certain attribution errors. In social psychology, the fundamental attribution error (Ross, 1977) refers to the tendency, in explaining the behaviors of others, to overvalue the role of personality characteristics and undervalue the role of situational factors. That is, behaviors tend to be overattributed to a behaving individual’s or group’s characteristics and underattributed to situational constraints and influences. In the case of human metacognitive processes, there is both a parallel error and an error that I see as essentially the opposite.

Y109937.indb 15

10/15/10 11:03:18 AM

16 • Robert A. Bjork

The parallel error is to overattribute the degree to which students and others learn or remember to innate ability. Differences in ability between individuals are overappreciated, whereas differences in effort, encoding activities, and whether the prior learning that is a foundation for the new learning in question has been acquired are underappreciated. The second attribution error, very different in kind, has to do with one’s own learning and the tendency to attribute whether one learns efficiently or not to factors largely beyond one’s control, as opposed to being a consequence of one’s own efforts or activities. One manifestation of such attributions is the widespread appeal of the learning styles idea: If what is to be learned will only be presented in a way that meshes with one’s learning style, one’s learning will not only be greatly enhanced, but easy as well (for a review of the learning styles idea and evidence for and against the idea, see Pashler, McDaniel, Rohrer, & Bjork, 2009). Another manifestation is in the context of tasks requiring participants to predict their own future performance. In such tasks, people tend to be very sensitive to item difficulty and largely insensitive to factors such as how many times they will be allowed to study the to-be-learned items (Kornell & Bjork, 2009) or how long the retention interval will be before they are tested (Koriat et al., 2004). That is, people seem prone to assume that the properties of to-be-learned materials will largely determine their future ability to recall those materials, versus assuming that their later recall will depend on factors under their control, such as how effectively they encode those materials. Becoming maximally sophisticated as a learner is, in a sense, not enough. Becoming a truly effective learner also requires an appreciation of one’s capacity to learn and a commitment to the proposition that one’s learning is under one’s control.

Concluding Comment Among the definitions of symbiosis is “a relationship of mutual benefit or dependence” (American Heritage Dictionary, 2000). In this chapter I have tried to emphasize that the relationships among remembering, forgetting, and learning are complex, unintuitive, and often beneficial. Forgetting focuses remembering and fosters learning; remembering generates learning and causes forgetting; learning causes forgetting, begets remembering, and supports new learning. In concert, those interdependencies act to modify in adaptive ways what is and is not accessible from our memories as we live and learn. It is a system that is remarkably interesting and effective, if fallable, and it is no less remarkable by virtue of being so frequently unappreciated and misunderstood by the user.

Y109937.indb 16

10/15/10 11:03:18 AM

On the Symbiosis of Remembering, Forgetting, and Learning • 17

References American heritage dictionary of the English language (4th ed.). (2000). New York, NY: Houghton Mifflin Company. Anderson, M. C. (2003). Rethinking interference theory: Executive control and the mechanisms of forgetting. Journal of Memory and Language, 49, 415–445. Anderson, M. C., Bjork, E. L., & Bjork, R. A. (2000). Retrieval-induced forgetting: Evidence for a recall-specific mechanism. Psychonomic Bulletin and Review, 7, 522–530. Anderson, M. C., & Bjork, R. A. (1994). Mechanisms of inhibition in long-term memory: A new taxonomy. In D. Dagenbach & T. Carr (Eds.), Inhibitory processes in attention, memory and language (pp. 265–326). New York: Academic Press. Anderson, M. C., Bjork, R. A., & Bjork, E. L. (1994). Remembering can cause forgetting: Retrieval dynamics in long-term memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 1063–1087. Anderson, M. C., & Green, C. (2001). Suppressing unwanted memories by executive control. Nature, 410, 366–369. Anderson, M. C., & Levy, B. J. (2006). Encouraging the nascent cognitive science of repression. Behavioral and Brain Sciences, 29, 511–513. Anderson, M. C., & Spellman, B. A. (1995). On the status of inhibitory mechanisms in cognition: Memory retrieval as a model case. Psychological Review, 102, 68–100. Appleton-Knapp, S., Bjork, R. A., & Wickens, T. D. (2005). Examining the spacing effect in advertising: Encoding variability, retrieval processes and their interaction. Journal of Consumer Research, 32, 266–276. Benjamin, A. S., & Bjork, R. A. (1996). Retrieval fluency as a metacognitive index. In L. M. Reder (Ed.), Implicit memory and metacognition: The 27th Carnegie Symposium on Cognition (pp. 309–338). Hillsdale, NJ: Erlbaum. Benjamin, A. S., Bjork, R. A., & Schwartz, B. L. (1998). The mismeasure of memory: When retrieval fluency is misleading as a metamnemonic index. Journal of Experimental Psychology: General, 127, 55–68. Bjork, E. L., & Bjork, R. A. (2011). Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning. In M. A. Gernsbacher, R. W. Pew, & J. R. Pomerantz (Eds.), Psychology and the real world: Essays illustrating fundamental contributions to society (pp. 56–64). New York: Worth Publishers. Bjork, E. L., Bjork, R. A., & Anderson, M. C. (1998). Varieties of goal-directed forgetting. In J. M. Golding & C. MacLeod (Eds.), Intentional forgetting: Interdisciplinary approaches (pp. 103–137). Hillsdale, NJ: Erlbaum. Bjork, R. A. (1966). Learning and short-term retention of paired associates in relation to specific sequences of interpresentation intervals (Doctoral Dissertation, Stanford University). Dissertation Abstracts, 27, 3684B. (University Microfilms No. 67-4316).

Y109937.indb 17

10/15/10 11:03:18 AM

18 • Robert A. Bjork

Bjork, R. A. (1972). Theoretical implications of directed forgetting. In A. W. Melton & E. Martin (Eds.), Coding processes in human memory (pp. 217– 235). Washington, DC: Winston. Bjork, R. A. (1975). Retrieval as a memory modifier. In R. Solso (Ed.), Information processing and cognition: The Loyola Symposium (pp. 123– 144). Hillsdale, NJ: Lawrence Erlbaum Associates. Bjork, R. A. (1978). The updating of human memory. In G. H. Bower (Ed.), The psychology of learning and motivation (Vol. 12, pp. 235–259). New York: Academic Press. Bjork, R. A. (1988). Retrieval practice and the maintenance of knowledge. In M. M. Gruneberg, P. E. Morris, & R. N. Sykes (Eds.), Practical aspects of memory: Current research and issues: Memory in everyday life (Vol. 1, pp. 396–401). New York: John Wiley & Sons. Bjork, R. A. (1989). Retrieval inhibition as an adaptive mechanism in human memory. In H. L. Roediger & F. I. M. Craik (Eds.), Varieties of memory and consciousness: Essays in honor of Endel Tulving (pp. 309–330). Hillsdale, NJ: Erlbaum. Bjork, R. A. (1994a). Institutional impediments to effective training. In D. Druckman & R. A. Bjork (Eds.), Learning, remembering, believing: Enhancing human performance (pp. 295–306). Washington, DC: National Academy Press. Bjork, R. A. (1994b). Memory and metamemory considerations in the training of human beings. In J. Metcalfe & A. Shimamura (Eds.), Metacognition: Knowing about knowing (pp. 185–205). Cambridge, MA: MIT Press. Bjork, R. A. (1999). Assessing our own competence: Heuristics and illusions. In D. Gopher & A. Koriat (Eds.), Attention and performance XVII. Cognitive regulation of performance: Interaction of theory and application (pp. 435– 459). Cambridge, MA: MIT Press. Bjork, R. A. (2001). Recency and recovery in human memory. In H. L. Roediger, J. S. Nairne, I. Neath, & A. M. Surprenant (Eds.), The nature of remembering: Essays in honor of Robert G. Crowder (pp. 211–232). Washington, DC: American Psychological Association Press. Bjork, R. A. (2003). Interference and forgetting. In J. H. Byrne (Ed.), Encyclopedia of learning and memory (2nd ed., pp. 268–273). New York: Macmillan Reference USA. Bjork, R. A. (2009). Structuring the conditions of training to achieve elite performance: Reflections on elite training programs and related themes in Chapters 10–13. In K. A. Ericsson (Ed.), Development of professional expertise: Toward measurement of expert performance and design of optimal learning environments (pp. 312–329). Cambridge, UK: Cambridge University Press. Bjork, R. A., & Allen, T. W. (1970). The spacing effect: Consolidation or differential encoding? Journal of Verbal Learning and Verbal Behavior, 9, 567–572.

Y109937.indb 18

10/15/10 11:03:19 AM

On the Symbiosis of Remembering, Forgetting, and Learning • 19

Bjork, R. A., & Bjork, E. L. (1992). A new theory of disuse and an old theory of stimulus fluctuation. In A. Healy, S. Kosslyn, & R. Shiffrin (Eds.), From learning processes to cognitive processes: Essays in honor of William K. Estes (Vol. 2, pp. 35–67). Hillsdale, NJ: Erlbaum. Bjork, R. A., & Linn, M. C. (2006, March). The science of learning and the learning of science: Introducing desirable difficulties. APS Observer, 19, 29, 39. Bower, G. H. (1972). Stimulus sampling theory of encoding variability. In A. W. Melton & E. Martin (Eds.), Coding processes in human memory (pp. 85–123). Washington, DC: Winston. Carlson, R. A., & Yaure, R. G. (1990). Practice schedules and the use of component skills in problem solving. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16(3), 484–496. Cepeda, N. J., Pashler, H., Vul, E., Wixted, J. T., & Rohrer, D. (2006). Distributed practice in verbal recall tasks: A review and quantitative synthesis. Psychological Bulletin, 132, 354–380. Christina, R. W., & Bjork, R. A. (1991). Optimizing long-term retention and transfer. In D. Druckman & R. A. Bjork (Eds.), In the mind’s eye: Enhancing human performance (pp. 23–56). Washington, DC: National Academy Press. Dempster, F. N. (1996). Distributing and managing the conditions of encoding and practice. In R. Bjork & E. Bjork (Eds.), Memory (pp. 317–344). San Diego, CA: Academic Press. deWinstanley, P. A., Bjork, E. L., & Bjork, R. A. (1996). Generation effects and the lack thereof: The role of transfer-appropriate processing. Memory, 4, 31–48. deWinstanley, P. A., & Bjork, R. A. (2002). Successful lecturing: Presenting information in ways that engage effective processing. In D. F. Halpern & M. D. Hakel (Eds.), Applying the science of learning to university teaching and beyond (pp. 19–31). San Francisco: Jossey-Bass. Estes, W. K. (1955). Statistical theory of spontaneous recovery and regression. Psychological Review, 62, 145–154. Gates, A. I. (1917). Recitation as a factor in memorizing. In R. S. Woodworth (Ed.), Archives of psychology (No. 40, pp. 1–104). New York: The Science Press. Ghodsian, D., Bjork, R. A., & Benjamin, A. S. (1997). Evaluating training during training: Obstacles and opportunities. In M. A. Quinones & A. Ehrenstein (Eds.), Training in a rapidly changing workplace: Applications of psychological research (pp. 63–88). Washington, DC: American Psychological Association. Glover, J. A. (1989). The “testing” phenomenon: Not gone but nearly forgotten. Journal of Educational Psychology, 81, 392–399. Goode, S. A., & Magill, R. A. (1986). The contextual interference effect in learning three badminton serves. Research Quarterly for Exercise and Sport, 57, 308–314. Hogan, R. M., & Kintsch, W. (1971). Differential effects of study and test trials on long-term recognition and recall. Journal of Verbal Learning and Verbal Behavior, 10, 562–567.

Y109937.indb 19

10/15/10 11:03:19 AM

20 • Robert A. Bjork

Izawa, C. (1970). Optimal potentiating effects and forgetting-prevention effects of tests in paired-associate learning. Journal of Experimental Psychology, 83, 340–344. Hirshman, E. L., & Bjork, R. A. (1988). The generation effect: Support for a twofactor theory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14, 484–494. Jacoby, L. L. (1978). On interpreting the effects of repetition: Solving a problem versus remembering a solution. Journal or Verbal Learning and Verbal Behavior, 17, 649–667. Jacoby, L. L., Bjork, R. A., & Kelley, C. M. (1994). Illusions of comprehension, competence, and remembering. In D. Druckman & R. A. Bjork (Eds.), Learning, remembering, believing: Enhancing human performance (pp. 57–80). Washington, DC: National Academy Press. James, W. (1980). The principles of psychology (Vol. 2). New York: Holt. Jamieson, B. A., & Rogers, W. A. (2000). Age-related effects of blocked and random practice schedules on learning a new technology. Journal of Gerontology, 55B, 343–353. Karpicke, J. D., & Roediger, H. L. (2008). The critical importance of retrieval for learning. Science, 319, 966–968. Koriat, A., & Bjork, R. A. (2005). Illusions of competence in monitoring one’s knowledge during study. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 187–194. Koriat, A., & Bjork, R. A. (2006). Illusions of competence during study can be remedied by manipulations that enhance learners’ sensitivity to retrieval conditions at test. Memory and Cognition, 34, 959–972. Koriat, A., Bjork, R. A., Sheffer, L., & Bar, S. K. (2004). Predicting one’s own forgetting: The role of experience-based and theory-based processes. Journal of Experimental Psychology: General, 133, 643–656. Kornell, N., & Bjork, R. A. (2007). The promise and perils of self-regulated study. Psychonomic Bulletin and Review, 6, 219–224. Kornell, N., & Bjork, R. A. (2008). Learning concepts and categories: Is spacing the “enemy of induction”? Psychological Science, 19, 585–592. Kornell, N., & Bjork, R. A. (2009). A stability bias in human memory: Overestimating remembering and underestimating learning. Journal of Experimental Psychology: General, 138, 449–468. Landauer, T. K., & Bjork, R. A. (1978). Optimum rehearsal patterns and name learning. In M. M. Gruneberg, P. E. Morris, & R. N. Sykes (Eds.), Practical aspects of memory (pp. 625–632). London: Academic Press. Landauer, T. K., & Eldridge, L. (1967). Effects of tests without feedback and presentation-test interval in paired-associate learning. Journal of Experimental Psychology, 75, 290–298. Lee, T. D., & Magill, R. A. (1983). The locus of contextual interference in motorskill acquisition. Journal of Experimental Psychology: Learning, Memory, and Cognition, 9, 730–746.

Y109937.indb 20

10/15/10 11:03:19 AM

On the Symbiosis of Remembering, Forgetting, and Learning • 21

Lee, T. D., & Magill, R. A. (1985). Can forgetting facilitate skill acquisition? In D. Goodman, R. B. Wilberg, & I. M. Franks (Eds.), Differing perspectives in motor learning, memory, and control (pp. 3–22). Amsterdam: North Holland. Levy, B. J., & Anderson, M. C. (2008). Individual differences in suppressing unwanted memories: The executive deficit hypothesis. Acta Psychologica, 127, 623–635. MacLeod, C. M., Dodd, M. D., Sheard, E. D., Wilson, D. E., & Bibi, U. (2003). In opposition to inhibition. In B. H. Ross (Ed.), The psychology of learning and motivation (Vol. 43, pp. 163–214). San Diego, CA: Academic Press. MacLeod, M. D., & Macrae, C. N. (2001). Gone but not forgotten: The transient nature of retrieval-induced forgetting. Psychological Science, 12(2), 148–152. McDaniel, M. A., & Masson, M. E. J. (1985). Altering memory representations through retrieval. Journal of Experimental Psychology: Learning, Memory, and Cognition, 11, 371–385. McGeoch, J. A. (1932). Forgetting and the law of disuse. Psychological Review, 39, 352–370. Morris, P. E., & Fritz, C. O. (2000). The name game: Using retrieval practice to improve the learning of names. Journal of Experimental Psychology: Applied, 6, 124–129. Pashler, H., McDaniel, M., Rohrer, D., & Bjork, R. A. (2009). Learning styles: A review of concepts and evidence. Psychological Science in the Public Interest, 3, 105–119. Pashler, H., Zarow, G., & Triplett, B. (2003). Is temporal spacing of tests helpful even when it inflates error rates? Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 1051–1057. Perfect, T., Stark, L.-J., Tree, J. J., Moulin, C. J. A., Ahmed, L., & Hutter, R. (2004). Transfer appropriate forgetting: The cue-dependent nature of retrievalinduced forgetting. Journal of Memory and Language, 51, 399–417. Pyc, M. A., & Rawson, K. A. (2009). Testing the retrieval effort hypothesis: Does greater difficulty correctly recalling information lead to higher levels of memory? Journal of Memory and Language, 60, 437–447. Reder, L. M., & Ritter, F. E. (1992). What determines initial feeling of knowing? Familiarity with question terms, not with the answer. Journal of Experimental Psychology: Learning, Memory, and Cognition, 13, 435–451. Richland, L. E., Linn, M. C., & Bjork, R. A. (2007). Cognition and instruction: Bridging laboratory and classroom settings. In F. Durso, R. Nickerson, S. Dumais, S. Lewandowsky, & T. Perfect (Eds.), Handbook of applied cognition (2nd ed., pp. 555–583). West Sussex, UK: John Wiley & Sons Ltd. Rickard, T., Lau, J., & Pashler, H. (2008). Spacing and the transition from calculation to retrieval. Psychonomic Bulletin and Review, 15, 656–661. Roediger, H. L., Dudai, Y., & Fitzpatrick, S. M. (Eds.). (2007). Science of memory: Concepts. Oxford, UK: Oxford University Press.

Y109937.indb 21

10/15/10 11:03:19 AM

22 • Robert A. Bjork

Roediger, H. L., & Karpicke, J. D. (2006a). The power of testing memory: Basic research and implications for educational practice. Perspectives on Psychological Science, 1, 181–210. Roediger, H. L., & Karpicke, J. D. (2006b). Test-enhanced learning: Taking memory tests improves long-term retention. Psychological Science, 17, 249–255. Rohrer, D., & Taylor, K. (2007). The shuffling of mathematics practice problems improves learning. Instructional Science, 35, 481–498. Ross, L. (1977). The intuitive psychologist and his shortcomings: Distortions in the attribution process. In L. Berkowitz (Ed.), Advances in experimental social psychology (Vol. 10, pp. 173–220). New York: Academic Press. Schmidt, R. A., & Bjork, R. A. (1992). New conceptualizations of practice: Common principles in three paradigms suggest new concepts for training. Psychological Science, 3, 207–217. Shea, J. B., & Morgan, R. I. (1979). Contextual interference effects on the acquisition, retention, and transfer of a motor skill. Journal of Experimental Psychology: Human Learning and Memory, 5, 179–187. Simon, D. A., & Bjork, R. A. (2001). Metacognition in motor learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27, 907–912. Slamecka, N. J., & Graf, P. (1978). The generation effect: Delineation of a phenomenon. Journal of Experimental Psychology: Human Learning and Memory, 4, 592–604. Smith, S. M., Glenberg, A. M., & Bjork, R. A. (1978). Environmental context and human memory. Memory and Cognition, 6, 342–353. Spitzer, H. F. (1939). Studies in retention. Journal of Educational Psychology, 30, 641–656. Storm, B. C., Bjork, E. L., & Bjork, R. A. (2008). Accelerated relearning after retrieval-induced forgetting: The benefit of being forgotten. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34, 230–236. Storm, B. C., Bjork, E. L., Bjork, R. A., & Nestojko, J. F. (2006). Is retrieval success a necessary condition for retrieval-induced forgetting? Psychonomic Bulletin and Review, 13, 1023–1027. Storm, B. C., Bjork, R. A., & Storm, J. C. (2010). Optimizing retrieval as a learning event: When and why expanding retrieval practice enhances longterm retention. Memory and Cognition, 38, 244–253. Thorndike, E. L. (1914). The psychology of learning. New York: Teachers College Press. Whitten, W. B., & Bjork, R. A. (1977). Learning from tests: The effects of spacing. Journal of Verbal Learning and Verbal Behavior, 16, 465–478. Whitten, W. B., Rabinowitz, M., & Whitten, S. E. (2006, May). Enhancing unsupervised learning through guided cognition. Paper presented at the meetings of the Association for Psychological Science, New York. Wulf, G., & Shea, C. H. (2002). Principles derived from the study of simple skills do not generalize to complex skill learning. Psychonomic Bulletin and Review, 9, 185–211.

Y109937.indb 22

10/15/10 11:03:19 AM

2

Intricacies of Spaced Retrieval A Resolution Henry L. Roediger III and Jeffrey D. Karpicke

Robert Bjork has spent his entire professional life studying learning and memory, and many of us have spent our lives (in part) reading his pathbreaking research. One interesting characteristic of Bob’s work, much of it conducted in collaboration with Elizabeth Bjork, is the often counterintuitive nature of the findings emanating from their lab. At the risk of overstatement, one can view many of the important contributions that the Bjorks have made as creating a paradox and then mounting a satisfactory explanation for it. Our chapter will deal with several paradoxes raised by the Bjorks’ work. There are three interrelated puzzles. First, remembering an event that is repeated is greatly aided if the first presentation is forgotten to some extent before the repetition occurs. (Yes, you read it correctly— good remembering of an event can depend on its forgetting.) Second, retrieving an event can be a more potent learning opportunity than restudying it, which flies in the face of educational wisdom that studying creates learning and testing merely measures it. Third, putting these two paradoxes together, testing an event has a greater effect if one waits for some forgetting to make retrieval more effortful and difficult. This last claim seems especially puzzling, because if we want to test people, shouldn’t we want to do it under conditions in which they cannot make errors? After all, the idea of learning through “errorless retrieval” is a hallmark of certain approaches to memory remediation 23

Y109937.indb 23

10/15/10 11:03:19 AM

24 • Henry L. Roediger III and Jeffrey D. Karpicke

in brain-damaged individuals. As we shall show, these approaches advocating errorless retrieval imply a wrong assumption, at least in healthy people (the case may be different in older adults and braindamaged individuals). We can thank Bob and Elizabeth Bjork for these insights. In this chapter we unpack them and show how and when they are true.

Forgetting an Event Can Enhance Its Relearning The theme of this volume is how successful forgetting can sometimes enhance remembering. The case is most obvious in studies of directed (or intentional) forgetting. If people must remember two sets of information successively, they can learn and remember the second set better if they have been told just before learning it that they can forget the first set of material they recently learned. That is, if two lists are presented, getting a forget instruction for the first list improves retention of a second list relative to the case where subjects feel responsible for remembering the first list while learning the second list. Establishing this fact was one of Bob Bjork’s first major scientific contributions (e.g., Bjork, 1970; Bjork, LaBerge, & LeGrand, 1968). Intentional forgetting has been examined in many studies over the years, and whole volumes are devoted to it (Golding & MacLeod, 1998). Forgetting of information can lead to successful remembering in another, more paradoxical way, too. Strangely, successful remembering of information can depend—in certain situations—on having successfully forgotten (to some degree) the same information earlier. The previous statement may seem weird or even patently absurd, but we review evidence here that it is true. Once again, Bob Bjork was responsible for this critical insight (Bjork & Allen, 1970). The condition in which the previous statement holds true occurs when an event to be remembered is repeated in some form, either restudied or tested. To the extent that a first presentation is forgotten, its repetition will be well remembered. Bjork gleaned this insight from research on the spacing effect and then extended it. The spacing effect (e.g., Glenberg, 1976; Madigan, 1969; Melton, 1970) refers to the situation when events are repeated and the spacing or lag between repetitions is varied. When an event occurs, its repetition has little effect on retention when the repetition occurs immediately (when the event is still fresh from its first presentation), but the impact grows as the repetition is delayed. Figure 2.1 shows a typical spacing effect from a careful study by Madigan (1969) in which words were presented once or twice in a long list and the spacing between repetitions was manipulated. Free recall of the list items was the dependent

Y109937.indb 24

10/15/10 11:03:20 AM

Intricacies of Spaced Retrieval • 25

Probability of Recall

0.6 0.5 0.4 0.3 0.2 0.1 0.0

1P

024 8

20 Lag

40

Figure 2.1 The basic spacing (or lag) effect in free recall. Words were either presented once (1P) or repeated. When repeated, items occurred at various spacings indicated on the abscissa. Later recall increased as a function of lag between presentations. (Data are adapted from Madigan, S. A., Journal of Verbal Learning and Verbal Behavior, 8, 828–835, 1969.)

measure. Of course, as the lag increases since the first presentation in these kinds of experiments, the first presentation is increasingly forgotten. So the second presentation creates more durable learning when it occurs with an increasing amount of forgetting of the first presentation (up to some limiting condition). Crowder (1976, Chapter 9) spells out the logic quite clearly. Bjork and Allen (1970) took this observation from the spacing effect and created an experiment in which the time between presentations would be held constant, but forgetting could still be manipulated by varying the difficulty of the task given to the subject between presentations. Subjects either performed a difficult task (one causing more forgetting) after the first presentation or performed an easy task (with less forgetting) between presentations. Sure enough, the second presentation led to greater recall on a final criterial test when it occurred after the difficult rather than the easy interpolated task. Ergo, forgetting of the information causes its greater retention after a repetition. Others replicated this finding (Robbins & Wise, 1972; Tzeng, 1973), but it may not extend to all situations (Roediger & Crowder, 1975). However, Logan, Roediger, and McDermott (2010) have shown how this principle—greater forgetting prior to a representation leading to greater recall—may benefit foreign language vocabulary learning. More recently, Storm, Bjork, and Bjork (2008) examined recall of items after two presentations. After the first presentation, some items

Y109937.indb 25

10/15/10 11:03:21 AM

26 • Henry L. Roediger III and Jeffrey D. Karpicke

were subjected to retrieval-induced forgetting (using the Anderson, Bjork, & Bjork, 1994, technique) and others were not. All items were repeated and then recalled on a later test. Storm et al. found that “items that were relearned benefited more from that relearning if they had previously been forgotten” (2008, p. 230). They commented that this outcome “is very surprising from a common sense standpoint” (ibid.). Of course, so are all the findings reviewed here: How and why should greater forgetting of an event before it is presented again cause better later retention? That mystery runs throughout this chapter (see Crowder, 1976, pp. 273–314, for ideas in the context of spacing effect research).

Retrieval as a Memory Modifier The remainder of this chapter is about the effects of testing one’s memory on later retention. This is not a new topic. In fact, it predates the festschrift for Bob Bjork by exactly 100 years, if we take the date as being of the first empirical papers we can find on the topic (Abbott, 1909; see Roediger & Karpicke, 2006a, for a review). The discovery made by Abbott and replicated by countless others is that the effect of taking a test is not neutral but alters later retention. When information is correctly retrieved on a test, this act makes the probability of future retention on a delayed test greater than if no test had occurred or even if the person had restudied the material rather than being tested on it (see Roediger & Karpicke, 2006b; Whitten & Bjork, 1977, among many others). In the cognitive psychology of memory, the 1970s were the heyday of studies of retrieval, with many important papers on topics such as the encoding specificity principle (Tulving & Thomson, 1973) and transferappropriate processing (Morris, Bransford, & Franks, 1977). Another milestone publication of that era was R. A. Bjork’s (1975) chapter that has the same title as this section heading. He argues that educators and psychologists both tended to ignore the importance of testing: Retrieval from memory is often assumed, implicitly or explicitly, as a process analogous to the way in which the contents of a memory location in a computer are read out, that is, as a process that does not, by itself, modify the state of the retrieved item in memory. In my opinion, however, there is ample evidence for a kind of Heisenberg principle with respect to retrieval processes: an item can seldom, if ever, be retrieved from memory without modifying the representation of that item in memory in significant ways. (p. 123)

Y109937.indb 26

10/15/10 11:03:21 AM

Intricacies of Spaced Retrieval • 27

Bjork’s chapter went on to report research on retrieval as a memory modifier. He interpreted the phenomenon of the testing/retrieval effect through the lens of the then-new levels of processing ideas of Craik and Lockhart (1972), maintaining that there could be levels of processing during retrieval just like there were during encoding. Specifically, when retrieval occurred under easy, superficial conditions, it did not benefit later retention. However, when retrieval involved more difficult and complex processes, the effects on later recall were much greater. Thus, all acts of retrieval are not equal: Some confer great benefit and some provide little or no benefit. We return to this theme, too, later in the chapter. A couple of years later, Whitten and Bjork (1977) reported an elegant experiment that documented Bjork’s earlier points quite well. We report only a sketch of the logic here; the actual experiment was more complex. The authors presented subjects with two words to be remembered and then had them perform a distracter task for varying amounts of time afterwards: 4, 8, or 14 seconds. At this point, the items were either presented again or tested. Subjects had to recall the pair of words on test trials. When items were tested, recall declined from .72 to .61 to .54 across the three intervals. No feedback was given, so the level of retrieval success is critical in the case of tested events. Of course, when items were restudied, subjects were reexposed to 100% of the original items, so testing put items in that condition at a disadvantage relative to repeated study conditions, especially in the long-delayed conditions. A final test was given a bit later, after many items had been presented in these conditions. We consider here the final test results for items that were studied twice or studied once and then tested. For simplicity, we consider only the extreme lags in this figure, those items that had been tested or restudied after lags of 4 or 14 seconds during the initial learning phase. The results can be seen in Figure 2.2, with data points estimated from Whitten and Bjork’s (1977) Figure 1. Final recall showed a spacing effect in both cases: Performance was better when the second presentation or the test occurred after 14 seconds of distracter activity rather than only 4 seconds, which shows the usual lag effect (in both restudy and testing). In addition, final recall performance was better in the condition in which subjects had taken a test during learning than when they had restudied the item. Note that this testing effect occurred despite the fact that, as the delay increased, recall on that first test became increasingly poor, such that barely more than half the items (54%) were recalled on the initial test after 14 seconds of distracter activity. Thus when overt recall occurred early (after 4 seconds), it had less of a positive effect on

Y109937.indb 27

10/15/10 11:03:21 AM

28 • Henry L. Roediger III and Jeffrey D. Karpicke

Proportion Recalled

0.4

Restudy Test 0.34

0.3

0.2

0.41

0.35

0.28

4 14 Spacing (Seconds)

Figure 2.2 Final recall results as a function of the lag between study of a word pair and its restudy (white bars) or test (gray bars). Both spacing (lag) and testing had positive effects. (Data adapted from Whitten, W. B., & Bjork, R. A., Journal of Verbal Learning and Verbal Behavior, 16, 465–478, 1977, Figure 1.)

a final test than when recall occurred after 14 seconds (despite the fact that recall on the initial test dropped during this period). Whitten and Bjork interpreted the results as indicating that retrieval difficulty was the critical component and cited related research by Jacoby and Bartz (1972) as reinforcing their point (see too Gardiner, Craik, & Bleasdale, 1973; Jacoby, 1978). Whitten and Bjork’s (1977) results may look rather slender. The advantage of testing to restudy was only 6% or so, although it was consistent. However, the problem of low performance on the initial test should be borne in mind. When Whitten and Bjork performed conditional analyses, examining final recall performance conditional on subjects successfully recalling items on the first test, the testing effect was much larger. Yet such conditional analyses raise the specter of item selection effects. The general logic is that “easier” items are, by definition, the ones recalled on the initial test. Therefore, any resulting advantage of recalling these items at a higher level on the delayed test may be due to selection of easy items in this condition rather than an effect of testing. That is true, but many analyses have shown convincingly that testing effects are not due to item selection effects and are not restricted only to easy items (Karpicke, 2009; Karpicke & Roediger, 2007a, 2008; Roediger & Karpicke, 2006b), and even in Whitten and Bjork’s study there was an absolute advantage in the testing conditions when all items were included. Testing effects are often quite large in other experiments (see Roediger & Karpicke, 2006a).

Y109937.indb 28

10/15/10 11:03:22 AM

Intricacies of Spaced Retrieval • 29

Expanding Retrieval Schedules In 1978 Landauer and Bjork provided another important empirical contribution that has guided research and thinking in the intervening years. They asked: If testing aids retention (and it does), and if multiple tests provide greater benefits to retention than do single tests (also true), what schedule of testing provides the best performance? If we want to learn a person’s name, or foreign language vocabulary, or definitions of scientific concepts, what is the best way to schedule our self-testing? This question is of critical importance for students who must learn a large body of factual knowledge. Two experiments by Landauer and Bjork (1978) sought an answer. The authors contrasted several different possible schedules of testing that will be described momentarily. The materials subjects learned were paired associates (either first names with last names in Experiment 1 or face–name pairs in Experiment 2). After studying a pair, students received various schedules of repeated tests. We will describe selected conditions of their Experiment 1 here. In one condition, items were presented only once. In four other conditions, four schedules of repeated testing with various schedules of spacing between tests were used. Three tests were given in all conditions, but the lags between tests varied according to the four schedules of spacing. The conditions of repeated testing were uniform-short spacing, uniform-moderate spacing, expanding spacing, and contracting spacing. (We provide an operational explanation of these labels shortly.) During tests, students were given the first name of the person and asked to produce the last name (in Experiment 1). In the two uniform conditions, the three tests were given with equal intervals between them. Thus, in the uniform-short condition, students were tested three times immediately after studying a pair. Following Landauer and Bjork (1978), we will refer to this condition as the 0-0-0 condition, because no intervening items occurred between tests. The uniform-moderate condition employed a 5-5-5 schedule of spacing, meaning that five intervening study or test events occurred between the tests of a particular pair in this condition. This condition is also called an equal interval condition, because the interval between tests is equivalent. The expanding test condition used a 1-4-10 spacing, indicating that a pair was first tested after only 1 intervening item, then after 4 more, and finally after 10 intervening items. In the contracting condition, the spacing was reversed: 10, 4, and 1. Many items were tested in these various conditions. In addition, as a baseline control condition, some items were presented a single time and never tested, which permits an answer to the question of what benefits the

Y109937.indb 29

10/15/10 11:03:22 AM

30 • Henry L. Roediger III and Jeffrey D. Karpicke

various testing schedules have over and above a single presentation of a pair with no testing. A final point is that all tests in these experiments were given without feedback. After the acquisition phase of the experiment just described, students were given a final test 30 minutes later (with a lecture occurring during the interval). They were again given the first name of the pair and asked to produce the last name. The results (estimated from Figure 2 of Landauer & Bjork, 1978) are presented in Figure 2.3. All the testing conditions produced better final recall than the single-presentation study condition, but performance differed widely among the testing conditions (despite the fact that the number of prior tests was held constant at 3). The uniform-short (0-0-0) condition was poorest, the uniform-long (5-5-5) and contracting (10-4-1) conditions were intermediate, and the expanding condition (1-4-10) was best. Note that the three latter conditions all have equivalent numbers of total events between tests (15); the critical point is how they were distributed. The expanding retrieval

Proportion Recalled

0.5

0.47 0.42

0.4

0.38 0.33

0.3 0.25

0.2

Study Once 0-0-0 5-5-5 1-4-10 Spacing Condition

10-4-1

Figure 2.3 Final recall after either a single presentation (study once) or a single presentation and three tests. Schedules of the three tests had a large effect on recall. All testing conditions aided recall relative to the single presentation condition, but the massed testing condition conferred the least benefit, and the expanding retrieval condition produced the most benefit. (Data adapted from Landauer, T. K., & Bjork, R. A., in M. Gruneberg, P. E. Morris, & R. N. Sykes (Eds.), Practical Aspects of Memory, Academic Press, London, England, 625–632, 1978, Figure 2.)

Y109937.indb 30

10/15/10 11:03:24 AM

Intricacies of Spaced Retrieval • 31

schedule was best. Experiment 2, which used face-name pairs, reported the same finding. Landauer and Bjork (1978) extolled expanding retrieval as the best method to learn new information such as names and faces, and probably everyone reading their report quickly agreed. (The first author of this paper certainly did, when he read it; the second author here was not yet born.) The underlying rationale seems so straightforward and the benefit seems commonsensical (after the fact). The advice would be that when you meet a new person and hear her name, you should retrieve it rather quickly before the name is lost from immediate (working) memory. That initial retrieval ensures you encoded the item. After that initial retrieval, you should then wait a bit longer and retrieve it again, to practice retrieval at an intermediate time span. Then, finally, you should wait even longer for a further retrieval that would solidify or consolidate the memory more permanently. Landauer and Bjork wrote: “The expanding procedure may thus be seen as an effective shaping procedure for successively approximating the desired behavior of unaided recall at long delays” (p. 631). For many psychologists who had learned about shaping of behavior through reinforcement by successive approximations to the desired behavior (e.g., Skinner, 1953, Chapter VI), the principle seemed intuitive (at least with the 20/20 wisdom of hindsight). Many textbook writers and teachers (again, the first author included) began to preach that expanding retrieval was the best way to practice new information to retain it best. The main thrust in the remainder of this chapter is to claim that, despite the early rush to embrace expanding retrieval as a central technique in using retrieval-enhanced learning via testing, the idea is fundamentally flawed. As it has usually been operationalized in extant research, expanding retrieval has a fatal flaw: The first test given (often after lags of zero or one intervening item from initial presentation) makes retrieval “too easy,” and making retrieval easy undermines its positive effect. We provide evidence below to support this claim, but of course it took many years for researchers to understand this point. After all, the data in Landauer and Bjork’s (1978) paper showed that expanding retrieval was better than equal interval retrieval, so what is the problem? We describe it below. Although Landauer and Bjork’s (1978) claims now seem wrong to us, Bob Bjork actually anticipated the problem in his writings before that 1978 paper, ones we reviewed above. In his 1975 chapter, Bjork argued that retrieval difficulty is critical to the testing effect—the more difficult the retrieval on a first test, the better the later recall on a second

Y109937.indb 31

10/15/10 11:03:24 AM

32 • Henry L. Roediger III and Jeffrey D. Karpicke

test. However, in the wisdom of hindsight, the expanded retrieval technique makes an initial retrieval very easy: In any schedule in which the first retrieval occurs after a lag of zero intervening items, it is essentially perfect, and with one intervening item performance does not drop much. These are the standard lags for initial tests in expanding retrieval conditions. As we shall see later in the chapter, the difficulty of the first retrieval in the typical expanding scheme is critical to later performance. But we are getting ahead of ourselves. Before we discuss this later part of the story, we will review (albeit briefly) the 30-year historical impact of the Landauer and Bjork paper by selectively reviewing research from 1978 to 2007.

Expanding Retrieval: Research and Controversy A strange thing happened to research in this area after publication of Landauer and Bjork’s (1978) landmark paper: nothing. For many years no one did research on the issue of expanding retrieval, at least not compared to that for equally spaced retrieval. The matter seemed to have been considered a closed case; no further research seemed needed. Why? Our guess is that the findings (although new) made so much sense that everyone nodded and said “of course.” The fact that the findings were compelling and intuitive seemed to choke off further inquiry into the matter for about a decade. On the positive side, many people talked about the findings and included them in lectures and books, which was hardly surprising because they were interesting and were directed at an important practical problem. In this section, we provide a selective overview of research directed at this issue after the 1978 paper until 2007, when a spate of new research was published. Balota, Duchek, and Logan (2007) have provided a much more thorough review of work during this period, which should be consulted for additional detail. A few researchers did examine expanding retrieval sequences as a mode of learning. Rea and Modigliani (1985) tested third-grade school children as they learned multiplication facts and spelling words. However, their control condition was massed testing—four tests with no other items between tests (0-0-0-0, using the notation above). Rea and Modigliani showed that an expanded retrieval sequence (0-1-24) was more effective than massed retrieval, but they did not have the critical equally spaced condition, and so total spacing was confounded with condition. Other researchers also compared expanding schedules of retrieval to various other conditions, again usually massed testing or sometimes expanding study (rather than testing) schedules. They

Y109937.indb 32

10/15/10 11:03:24 AM

Intricacies of Spaced Retrieval • 33

generally concluded that the expanding testing schedule was better either in neurologically impaired patients (e.g., Camp & McKitrick, 1992) or in healthy adults learning names (e.g., Morris & Fritz, 2002) than were massed schedules or multiple presentations without testing. However, the critical equal interval testing condition was not included. In the first study since Landauer and Bjork’s original one comparing expanding retrieval to equally spaced retrieval, Shaughnessy and Zechmeister (1992) were able to replicate Landauer and Bjork and showed a small positive effect of expanding retrieval over equally spaced retrieval on a test given soon after acquisition. However, a few years later Cull, Shaughnessy, and Zechmeister (1996) obtained quite mixed evidence across a series of five experiments. The results were puzzling, so Cull (2000) followed up this work with a more dedicated effort. Without going into the details of all the experiments (see Balota et al., 2007), suffice it to say that Cull found no evidence that expanding retrieval schedules provided any benefit to recall relative to equal interval schedules (although both led to better performance than did massed testing schedules). Carpenter and DeLosh (2005) also showed no superiority of expanding to equal interval training. In fact, the trend (during both acquisition and retention phases) was for the equal interval condition to be superior. Balota, Duchek, Sergent-Marshall, and Roediger (2006) mounted a study with large numbers of young adults, healthy older adults, and other older adults with early-stage Alzheimer’s disease. Because the subjects had widely different memory abilities, Balota et al. began all subjects with two massed tests of paired associates to ensure subjects had encoded the material well before implementing further massed, spaced, or equal interval schedules. Thus, all subjects received five tests after a single presentation with the following schedules: 0-0-0-0-0, 0-03-3-3, or 0-0-1-3-5. During acquisition, massed testing produced essentially perfect performance in all subject groups, whereas the expanding condition led to greater performance on the last test than did the equal interval condition. Because expanding retrieval led to better performance during learning, one might expect this benefit to carry forward to the final criterial test at the end of the session. However, this did not happen. Despite the fact that spaced retrieval produced much greater final recall than did massed retrieval for all three groups of subjects, expanding retrieval was not better than equal interval retrieval in any of the groups. Thus, once again, no evidence was found supporting Landauer and Bjork’s hypothesis that expanding retrieval could “shape” later recall.

Y109937.indb 33

10/15/10 11:03:24 AM

34 • Henry L. Roediger III and Jeffrey D. Karpicke

Two other points are worth making about the Balota et al. (2006) results. First, in the massed condition, subjects were tested on and successfully recalled all items. Thus, there were five successful retrievals under conditions that fostered errorless retrieval, thought on some accounts to be optimal for later performance (because subjects never make an error or draw a blank). However, this massed condition produced the worst performance on the final test, probably because the retrievals were effortless and shallow (Bjork, 1975). The second point is more subtle: Recall that at the end of learning, the expanding retrieval condition produced higher performance than the equal interval condition, yet on the delayed test, the two conditions were equivalent. What this pattern must indicate is that forgetting occurs more rapidly after expanding retrieval than after equal interval retrieval. In fact, this same pattern occurred in Landauer and Bjork’s (1978) original study. In almost all the experiments discussed thus far, the final criterial test was at the end of one experimental session (but see Cull, 2000). The pattern of differential forgetting between conditions suggests that, with much longer retention intervals, there may be a reversal—retention may actually be better following equal interval retrieval practice relative to expanding retrieval practice. We consider designs with such delays in the next section.

The Mystery of Expanding Retrieval Practice and Its Vicissitudes: A Partial Solution At this point in the chapter, the reader is rightfully confused. Landauer and Bjork (1978) found that expanding retrieval is superior to massed or equal interval retrieval, and their finding accords well with other ideas in the learning and memory literature, such as shaping and errorless retrieval. Although their conclusion about expanding retrieval was accepted for many years (and all studies show that it is superior to massed retrievals), evidence since the mid1990s paints a mixed picture. Why? We attempt to answer that question in this section by relying on two related concepts championed by Bob Bjork. Recently Bjork (1999) has advocated an important and counterintuitive idea about the relation between initial learning performance and long-term retention. There are many instances where the rate and level of initial learning is very good relative to some other condition, yet these seemingly beneficial conditions ultimately produce

Y109937.indb 34

10/15/10 11:03:24 AM

Intricacies of Spaced Retrieval • 35

poor long-term retention as assessed on delayed tests (again, relative to a companion condition in which learning was slower). Stated another way, conditions that make initial learning slower and more difficult might produce worse initial learning performance but lead to gains in long-term retention. Bjork has called this the idea of creating “desirable difficulties” to promote learning, and he has gathered a variety of evidence supporting this concept (see Bjork, 1999; Schmidt & Bjork, 1992). Some difficulty that makes initial learning slower and more effortful can make long-term retention better. An example of desirable difficulties relevant to this chapter is the spacing effect: When repeated presentations are massed together, they often produce better performance on an immediate test (one soon after the second presentation) than does spacing the presentations (Peterson, Wampler, Kirkpatrick, & Saltzman, 1963). However, as is well known, spaced repetition produces better retention on delayed criterial tests than does massed practice (see Figure 2.1). This spacing × retention interval interaction for studied materials is both replicable and important (see Balota et al., 2007; Balota, Duchek, & Paullin, 1989). The same pattern occurs if we consider spaced retrieval practice. Performance is essentially perfect on massed repeated tests (e.g., with a 0-0-0 schedule) and will be better than performance on equally spaced tests because forgetting will have occurred before the first retrieval attempt (e.g., with a 5-5-5 schedule). Yet invariably the spaced retrieval conditions produce better performance on delayed retention tests than does massed retrieval. In short, Bjork’s key point from the concept of desirable difficulties is that performance during initial learning is not necessarily diagnostic of long-term retention. This fact has profound implications for education and other training scenarios, because instructors often use initial learning performance as the metric by which they evaluate the effectiveness of learning and training activities. They rarely test performance long after the learning episode to determine what is retained. Returning to the focus of this chapter—schedules of retrieval practice—an expanding retrieval condition is bound to perform better during the initial learning phase than an equally spaced condition. That is, subjects are likely to recall more items in an expanding condition than in an equally spaced condition because the first retrieval attempt occurs soon after study in the expanding condition. In most experiments on spaced retrieval, subjects are not given feedback after each test, but there is also very little (if any) forgetting across tests after the first one. Therefore, the position of the first test determines the level of performance on subsequent tests. If 80% of items are recalled on test 1, then approximately 80% will be recalled on repeated tests. If 60% are recalled

Y109937.indb 35

10/15/10 11:03:24 AM

36 • Henry L. Roediger III and Jeffrey D. Karpicke

on test 1, then about 60% will be recalled on repeated tests. And so on. This fact is independent of the schedule of repeated tests and is apparent in Landauer and Bjork’s (1978) data (Figure 3 in their paper) and in other experiments, too. The difference in level of performance across conditions is entirely due to the position of the first test. Yet the surprising finding is that the forgetting rate seems faster in the expanding than in the equally spaced condition. This is indicated in studies where there are large advantages of expanding relative to equally spaced conditions during initial learning, but no differences between the conditions on retention tests given at the end of the experimental session (e.g., Balota et al., 2006). Again, the same pattern can also be seen in Landauer and Bjork’s data (their Figure 3). Does the concept of desirable difficulties help explain the puzzling effects of retrieval practice schedules? That is, does expanding retrieval promote good performance during initial learning (greater retrieval success than equally spaced schedules) but result in relatively poor long-term retention? A number of recent experiments have addressed this question and suggested that the answer is yes. We carried out a series of experiments in which subjects learned difficult vocabulary words under a variety of spaced retrieval conditions (Karpicke & Roediger, 2007b). We examined massed (0-0-0), expanding (1-5-9), and equally spaced (5-5-5) conditions, and we also included two conditions in which subjects took just a single test during initial learning: The single test occurred either after a lag of one trial or after a lag of five trials. The latter two conditions are conceptually similar to those used by Whitten and Bjork (1977) and others (e.g., Jacoby, 1978). The critical aspect of the experiment was that we manipulated the retention interval that occurred between the initial learning phase and the final criterial test: Half of the subjects took the final test at the end of the experimental session (about 10 minutes after the initial learning phase) and half took the final test 2 days later. Figure 2.4 shows the proportion of word pairs recalled on the final tests in each spacing condition at the two different retention intervals. First, it is worth pointing out that at both retention intervals the spaced retrieval conditions (expanding and equal interval) led to better recall than did massed retrieval. The left panel of Figure 2.4 provides recall on the final test that occurred shortly after learning, and the data show an advantage of expanding retrieval relative to equal spacing. This outcome replicates Landauer and Bjork’s (1978) original finding and is due to greater retrieval success during the learning phase, because the expanding condition recalled more items initially than the equally spaced condition. However, two days after learning the pattern had

Y109937.indb 36

10/15/10 11:03:24 AM

Intricacies of Spaced Retrieval • 37 Immediate

0.8

0.71

Proportion Recalled

0.7 0.65

0.62

0.57

0.6

2-Day Delay

0.47

0.5

0.45

0.4

0.33

0.30

0.3

0.22

0.2

0.20

0.1 0.0

1

5

0-0-0 1-5-9 5-5-5

1

5

0-0-0 1-5-9 5-5-5

Spacing Condition

Figure 2.4 Final recall as a function of various schedules of retrieval practice. The left panel shows final recall 10 minutes after the learning phase, and the right panel shows final recall 2 days after the learning phase. Expanding retrieval (1-5-9) produced a short-term benefit relative to equally spaced retrieval (5-5-5), but equally spaced retrieval produced better long-term retention than expanding retrieval. (Data adapted from Karpicke, J. D. & Roediger, H. R. Experimental Psychology: Learning, Memory, and Cognition, 38, 116–124, 2007b, Experiment 1.)

reversed: Now the equally spaced condition produced better long-term retention than expanding retrieval. Note that a similar interaction occurred when considering just the single-test conditions: A single test after a short delay during acquisition (one intervening item) produced better recall than a single test after a somewhat longer delay (five intervening items) both during acquisition and on the immediate test, but on the test given two days later, the single, more effortful initial test (the one after five intervening items) led to better retention than the easier initial test (the one given after one item). Another feature of the data in Figure 2.4 documents the fact that giving several tests under conditions that are too easy undermines the positive effects of testing. In the 0-0-0 condition subjects were required to recall items three times under conditions in which they were essentially always correct. However, these three (easy) retrievals led to later retention that was even worse than a single test given under more difficult conditions (the 5 conditions at both delays). Logan and Balota (2008) also recently conducted an experiment examining the effects of expanding and equally spaced retrieval schedules at short and long retention intervals. They tested both younger and older

Y109937.indb 37

10/15/10 11:03:25 AM

38 • Henry L. Roediger III and Jeffrey D. Karpicke

adults and examined several different spacing schedules. The subjects in their experiments learned weakly associated word pairs under different schedules and took a final test either at the end of the experimental session (immediate) or one day later. The results are shown in Figure 2.5. Overall, Logan and Balota did not find a consistent advantage of expanding retrieval over equally spaced retrieval in either subject group at either retention interval. In fact, they found that equally spaced retrieval was often better than expanding retrieval on the delayed final test. The Karpicke and Roediger (2007b) and Logan and Balota (2008) results might seem strange given the belief that expanding retrieval is supposed to improve long-term retention. But the findings are consistent

Proportion Recalled

0.8

1-2-3 vs. 2-2-2 Immediate 1-Day Delay Expanding

Equal

0.8

0.75 0.75

0.6

0.60

0.55 0.46

0.50

0.4 0.2 0.0

0.16

Younger

Older

Younger

1.0

Proportion Recalled

0.8

1-3-5 vs. 3-3-3 Immediate 1-Day Delay

1.0

Proportion Recalled

1.0

0.20

0.73 0.74

0.6

0.52

0.55

0.51 0.52

0.4 0.23

0.2

0.11

0.0

Older

Younger

Older

Younger

Older

1-3-8 vs. 4-4-4 Immediate 1-Day Delay

0.78 0.75 0.61

0.6

0.61 0.50

0.51

0.4 0.23

0.2 0.0

Younger

Older

Younger

0.20

Older

Figure 2.5 Final recall after expanding or equally spaced retrieval practice on immediate or oneday delayed tests. The figure shows results for both younger and older adults. The three panels show performance for different expanding and equally spaced schedules that are matched in total spacing. No advantage of expanding retrieval was evident, and equally spaced retrieval often produced better final recall than expanding retrieval on the one-day delayed test. (Data adapted from Logan, J. M., & Balota, D. A. Aging, Neuropsychology, and Cognition, 15, 257–280, 2008.)

Y109937.indb 38

10/15/10 11:03:26 AM

Intricacies of Spaced Retrieval • 39

with Bjork’s concept of learning tasks that produce desirable difficulties. The desirable condition, however, is the equally spaced retrieval schedule, not expanding retrieval. This pattern of results must force us to reconsider the theory about why expanding retrieval ought to work. The standard theory of expanding retrieval practice is that the schedule combines the positive features of retrieval success and retrieval difficulty. Of course, difficult retrieval is important, but unless subjects are given feedback (and they are not in most spaced retrieval studies), retrieval practice can only promote learning when a person is able to successfully recover the desired item. Therefore, expanding retrieval is thought to work in part because the early first retrieval promotes retrieval success and, as noted above, this determines the level of performance on repeated tests. Retrieval difficulty comes into play because it is assumed that gradually increasing the spacing of repeated tests should increase retrieval difficulty on the tests. However, Karpicke and Roediger (2007b) and Logan and Balota (2008) examined response times on tests during initial learning and showed that retrieval grew increasingly faster across repeated tests. This does not accord with the idea that retrieval grew increasingly difficult across tests regardless of the schedule of repeated tests. The alternative hypothesis we have proposed is that the position of the first test is the important difficulty for improving long-term retention, not the schedule of repeated tests (see Karpicke & Roediger, 2007b). In expanding retrieval conditions, the first retrieval attempt often occurs almost immediately after studying the item (lags of zero or one trial). This retrieval attempt might not be effective because retrieval occurs while items still reside in immediate memory. Therefore, equally spaced retrieval practice might enhance retention because that schedule involves a delayed first test (e.g., a lag of five trials between study and a first test). The crux of the problem in virtually all comparisons of expanding and equal interval retrieval is that the position of the first retrieval attempt is confounded with the schedule of repeated tests. Expanding retrieval conditions involve an immediate first test (e.g., 1-5-9), and equally spaced conditions involve a delayed first test (e.g., 5-5-5). We conducted an experiment that eliminated this confound (Karpicke & Roediger, 2007b, Experiment 3; see too Carpenter & DeLosh, 2005). Two conditions involved an immediate first test (after a lag of zero trials), and two involved a delayed first test (after a lag of five trials). Then the repeated tests were either expanding (1-5-9) or equal (5-5-5). The results are shown in Figure 2.6. When we controlled for the position of the first test, the advantage of expanding retrieval practice disappeared on an immediate final test (cf. Carpenter & DeLosh, 2005) and there was no difference as

Y109937.indb 39

10/15/10 11:03:26 AM

40 • Henry L. Roediger III and Jeffrey D. Karpicke

a function of placement of the first test (0 or 5). However, on the test two days later, an overall advantage of the two conditions with a delayed first test (5-1-5-9 and 5-5-5-5) appeared (relative to the conditions in which the first test was immediate). Thus, in delayed recall, the effect of position of the first test mattered, but the schedule of repeated tests (expanding or equally spaced) did not have any effect. This result falls perfectly in line with the results of Whitten and Bjork (1977) and accords with Bjork’s (1975) notion that difficult retrieval is critical for promoting learning, but once again, it does not support the idea that expanding retrieval is the best schedule of retrieval practice for long-term retention. We end this section by describing an experiment that explored the effects of different schedules of retrieval on learning educational texts. Landauer and Bjork’s (1978) original study was focused on a rather specific applied scenario: learning faces and names when it is inappropriate or impossible to receive feedback after an initial presentation. The idea of expanding retrieval practice emerged from this study, and subsequently the argument was made that expanding retrieval was a general technique that could be applied broadly. The data reviewed here Immediate

0.8

Proportion Recalled

0.7 0.6

0.62

0.66

0.63

2-Day Delay

0.61 0.51

0.5 0.43

0.52

0.45

0.4 0.3 0.2

0-1-5-9 0-5-5-5 5-1-5-9 5-5-5-5 0-1-5-9 0-5-5-5 5-1-5-9 5-5-5-5

Spacing Condition

Figure 2.6 Final recall as a function of schedule of retrieval practice. The left panel shows final recall 10 minutes after the learning phase, and the right panel shows final recall 2 days after the learning phase. The four retrieval schedules factorially crossed the position of the first test (lags of 0 or 5) with the schedule of repeated tests (1-5-9 or 5-5-5). There was no effect of schedule on immediate final tests, but there was a main effect of delaying the first test on the delayed final tests. (Data adapted from Karpicke, J. D., & Roediger, H. L., Journal of Experimental Psychology Learning, Memory, and Cognition, 33, 704–719, 2007b, Experiment 3.)

Y109937.indb 40

10/15/10 11:03:27 AM

Intricacies of Spaced Retrieval • 41

suggest that expanding retrieval might not represent the best retrieval schedule for promoting long-term retention, but as of yet there have been few tests of the idea that expanding retrieval might apply broadly to materials and contexts that are more educationally relevant than those used in paired-associate learning tasks. Perhaps when taken out of the context of paired-associate learning, an advantage of expanding retrieval would become apparent. To address this question, we examined free recall of brief expository texts (Karpicke & Roediger, 2010). Subjects read brief texts and recalled them on free recall tests spaced according to different schedules. In both experiments we factorially crossed the position of the first test (immediate or delayed) and the spacing of repeated tests (expanding or equal interval). We examined the effects of the different retrieval practice schedules on a final criterial test one week after learning. Figure 2.7 shows several important results. First, there is a testing effect: taking a single test after reading a text enhanced long-term retention more than reading the text and not testing. Second, repeated testing (in the spaced retrieval conditions) enhanced retention more than taking a single test. Third, testing with feedback (restudying the passages) produced better retention than testing without feedback. However, and most importantly for our purposes, there were no differences between expanding and equally spaced schedules of retrieval practice. In sum, the body of evidence indicating that expanding retrieval practice is not beneficial (relative to equal interval practice) is growing. If anything, equal interval schedules seem to produce better retention on delayed tests, probably because the initial test is rendered more difficult when it does not occur immediately after study, as is the case in expanding schedules of retrieval. The difficulty of the initial retrieval seems to hold the key to performance in experiments of this kind. The subsequent schedule of retrieval practice seems to have little effect under conditions examined thus far.

Practical Advice What advice might we give students about how to apply the research on testing reviewed in this chapter? We think the answer is straightforward: Students should determine the knowledge they want to retain, create a testing mechanism with feedback, and test themselves until they can retrieve the information on a much-delayed test (say, two days since original study). The testing should not be done under massed or even closely spaced fashion; if the literature is clear on any point, it is that repeated testing under conditions in which retrieval is easy leads to

Y109937.indb 41

10/15/10 11:03:27 AM

42 • Henry L. Roediger III and Jeffrey D. Karpicke

No Feedback

1.0

Proportion Recalled

0.9 0.8 0.7 0.6

0.55

0.5 0.4 0.3 0.2

0.53

0.45

0.52

0.35 0.22

0.1 0.0

Study

Single 0-1-2-3 0-2-2-2 2-1-2-3 2-2-2-2 Test

Spacing Condition Feedback

1.0

Proportion Recalled

0.9

0.80

0.8

0.78

0.84

0.83

0.7 0.6 0.5

0.40

0.4 0.3 0.2

0.20

0.1 0.0

Study

Single 0-1-2-3 0-2-2-2 2-1-2-3 2-2-2-2 Test

Spacing Condition

Figure 2.7 Final recall of expository texts as a function of initial retrieval practice schedule. The top panel shows performance without initial feedback, and the bottom panel shows performance with feedback (students reread the texts after each recall test). Taking a single test enhanced retention relative to reading once, and repeated testing produced even greater effects on retention. Feedback also enhanced long-term retention. However, the schedule of retrieval did not matter. (Data adapted from Karpicke, J. D., & Roediger, H. L., Memory & Cognition, 38, 116–124, 2010, Experiment 2.)

Y109937.indb 42

10/15/10 11:03:28 AM

Intricacies of Spaced Retrieval • 43

poor long-term retention. (So much for the principle of errorless retrieval being a good way to study.) But what about the mechanism for spacing of retrieval? Our data reviewed above suggest that the critical ingredient is encouraging fairly difficult retrieval, especially on an initial test. Beyond that point, it probably does not matter whether students test themselves using expanding or equal interval conditions. What matters is repeated spaced retrieval (with feedback if an error is made). Let us consider a practical example. A fifth-grade student needs to learn the capitals of the 50 states. She creates flash cards for each state with, for example, Montana on one side and Helena on the other. The 50 flashcards would first be studied one at a time, perhaps employing some mnemonic (my aunt Helen was from Montana). After this initial study, the cards are shuffled and then ten minutes later the student gives herself a test, looking at the name of each state and trying to remember the capital. Whether or not she produces a name, she turns the card over to study the reverse side (see Butler, Karpicke, & Roediger, 2008). Any items missed are put at the end of the deck for further practice in the same session. She records the number correct on the first pass through and then returns to test herself again on the ones she missed, again with feedback. After this phase, the student puts the cards away and studies other material. Then, hours later, she returns to the cards and tests herself in the same way. This process would be repeated the next day and then sporadically thereafter, as needed. Each time the deck would be shuffled anew. With spacing between retrievals spread over days, the whole issue of schedule of individual state–capital pairs within a session would not need to be much considered. Of course, the spacing of entire testing/relearning sessions would then be of interest. One critical point about the foregoing advice: students should not trust their own intuitions about what they know and quit testing themselves too soon. Just because Helena can be retrieved a time or two does not mean that it is in a “learned” state. Students need to practice retrieval even of learned information (Karpicke & Roediger, 2008). The technique just described can be applied to nearly any sort of factual material—scientific concepts, the critical points of important journal articles, the presidents of the United States and their main accomplishments and events while they were in office, and so on. The title of one of our articles is “Repeated Retrieval During Learning Is the Key to Long-Term Retention” (Karpicke & Roediger, 2007a), and we believe more firmly than ever that this is the case.

Y109937.indb 43

10/15/10 11:03:28 AM

44 • Henry L. Roediger III and Jeffrey D. Karpicke

Conclusion We began the chapter by noting how Bob and Elizabeth Bjork’s work had, over the years, pointed to several apparent paradoxes (or at least nonintuitive findings). We explored several paradoxes and applied their (and our) analyses to the issue of the best way of practicing retrieval over relatively short intervals, such that testing can be used to best advantage. All studies show that repeated massed retrieval is poor, despite its errorless nature. Bjork (1975) argued this was true based on data then available. However, the mystery of whether expanding or equal interval retrieval leads to better long-term retention turns out to rest on a similar consideration. When retention is measured at a healthy delay (say two days or one week after learning), delayed recall is better following equal interval practice because (in the usual design) the first retrieval in the equal interval design occurred under more difficult retrieval conditions. Thus, expanding retrieval turns out to exemplify the Bjorkian principle of a desirable difficulty—although initial recall is poorer with equal interval schedules relative to expanding schedules, long-term retention is better. Our results provide a resolution of claims in the literature: Landauer and Bjork’s (1978) results can be replicated at short retention intervals (when testing occurs in the same experimental session as acquisition). However, after longer retention intervals (two days or a week in our experiments), the situation reverses: Equal interval schedules of retrieval practice in an initial learning session produce better retention than expanding schedules of retrieval practice. We suggest in the preceding section on practical applications that, so long as one uses sessions of spaced retrieval practice with feedback, the question of expanding or equal interval schedules within a session may well be moot. Spaced retrieval practice (with feedback) is the key to longterm retention. Even though Landauer and Bjork’s (1978) important claim about expanding retrieval turns out 30 years later to be limited (or even wrong), the reasons for this state of affairs are accounted for by Bjork’s other research and theorizing (Bjork, 1975; 1994). Even when Bob Bjork seems to be wrong in one arena, he turns out to have been right all along.

References Abbott, E. E. (1909). On the analysis of the factors of recall in the learning process. Psychological Monographs, 11, 159–177.

Y109937.indb 44

10/15/10 11:03:28 AM

Intricacies of Spaced Retrieval • 45

Anderson, M. C., Bjork, E. L., & Bjork, R. A. (1994). Remembering can cause forgetting: Retrieval dynamics in long-term memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 1063–1087. Balota, D. A., Duchek, J. M., & Logan, J. M. (2007). Is expanded retrieval practice a superior form of spaced retrieval? A critical review of the extant literature. In J. S. Nairne (Ed.), The foundations of remembering: Essays in honor of Henry L. Roediger, III (pp. 83–105). New York: Psychology Press. Balota, D. A., Duchek, J. M., & Paullin, R. (1989). Age-related differences in the impact of spacing, lag, and retention interval. Psychology and Aging, 4, 3–9. Balota, D. A., Duchek, J. M., Sergent-Marshall, S. D., & Roediger, H. L. (2006). Does expanded retrieval produce benefits over equal interval spacing? Explorations in healthy aging and early stage Alzheimer’s disease. Psychology and Aging, 21, 19–31. Bjork, R. A. (1970). Positive forgetting: The noninterference of items intentionally forgotten. Journal of Verbal Learning and Verbal Behavior, 9, 255–268. Bjork, R. A. (1975). Retrieval as a memory modifier: An interpretation of negative recency and related phenomena. In R. L. Solso (Ed.), Information processing and cognition: The Loyola symposium (pp. 123–144). Hillsdale, NJ: Erlbaum. Bjork, R. A. (1994). Memory and metamemory considerations in the training of human beings. In J. Metcalfe & A. Shimamura (Eds.), Metacognition: Knowing about knowing (pp. 185–205). Cambridge, MA: MIT Press. Bjork, R. A. (1999). Assessing our own competence: Heuristics and illusions. In D. Gopher & A. Koriat (Eds.), Attention and performance XVII. Cognitive regulation of performance: Interaction of theory and application (pp. 435– 459). Cambridge, MA: MIT Press. Bjork, R. A., & Allen, T. W. (1970). The spacing effect: Consolidation or differential encoding? Journal of Verbal Learning and Verbal Behavior, 9, 567–572. Bjork, R. A., LaBerge, D., & LeGrande, R. (1968). The modification of shortterm memory through instructions to forget. Psychonomic Science, 10, 55–56. Butler, A. C., Karpicke, J. D., & Roediger, H. L. (2008). Correcting a metacognitive error: Feedback increases retention of low confidence correct responses. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34, 218–928. Camp, C. J., & McKitrick, L. A. (1992). Memory interventions in Alzheimer’stype dementia populations: Methodological and theoretical issues. In R. L. West & J. D. Sinnott (Eds.), Everyday memory and aging: Current research and methodology (pp. 152–172). New York: Springer-Verlag. Carpenter, S. K., & DeLosh, E. L. (2005). Application of the testing and spacing effects to name learning. Applied Cognitive Psychology, 19, 619–636.

Y109937.indb 45

10/15/10 11:03:28 AM

46 • Henry L. Roediger III and Jeffrey D. Karpicke

Craik, F. I. M., & Lockhart, R. (1972). Levels of processing: A framework for memory research. Journal of Verbal Learning and Verbal Behavior, 11, 671–684. Crowder, R. G. (1976). Principles of learning and memory. Hillsdale, NJ: Erlbaum. Cull, W. L. (2000). Untangling the benefits of multiple study opportunities and repeated testing for cued recall. Applied Cognitive Psychology, 14, 215–235. Cull, W. L., Shaughnessy, J. J., & Zechmeister, E. B. (1996). Expanding understanding of the expanding-pattern-of-retrieval mnemonic: Toward confidence in applicability. Journal of Experimental Psychology: Applied, 2, 365–378. Gardiner, J. M., Craik, F. I., & Bleasdale, F. A. (1973). Retrieval difficulty and subsequent recall. Memory and Cognition, 1, 213–216. Glenberg, A. M. (1976). Monotonic and nonmonotonic lag effects in pairedassociate and recognition memory paradigms. Journal of Verbal Learning and Verbal Behavior, 15, 1–16. Golding, J. M., & MacLeod, C. M. (Eds.). (1998). Intentional forgetting: Interdisciplinary approaches. Mahwah, NJ: Lawrence Erlbaum. Jacoby, L. L. (1978). On interpreting the effects of repetition: Solving a problem versus remembering a solution. Journal of Verbal Learning and Verbal Behavior, 17, 649–667. Jacoby, L. L., & Bartz, W. H. (1972). Rehearsal and transfer to LTM. Journal of Verbal Learning and Verbal Behavior, 11, 561–565. Karpicke, J. D. (2009). Metacognitive control and strategy selection: Deciding to practice retrieval during learning. Journal of Experimental Psychology: General, 138, 469–486. Karpicke, J. D., & Roediger, H. L. (2007a). Repeated retrieval during learning is the key to long-term retention. Journal of Memory and Language, 57, 151–162. Karpicke, J. D., & Roediger, H. L. (2007b). Expanding retrieval practice promotes short-term retention, but equally spaced retrieval enhances longterm retention. Journal of Experimental Psychology: Learning, Memory, and Cognition, 33, 704–719. Karpicke, J. D., & Roediger, H. L. (2008). The critical importance of retrieval for learning. Science, 319, 966–968. Karpicke, J. D., & Roediger, H. L. (2010). Is expanding retrieval a superior method for learning text materials? Memory & Cognition, 38, 116–124. Landauer, T. K., & Bjork, R. A. (1978). Optimum rehearsal patterns and name learning. In M. Gruneberg, P. E. Morris, & R. N. Sykes (Eds.), Practical aspects of memory (pp. 625–632). London: Academic Press. Logan, J. M., & Balota, D. A. (2008). Expanded vs. equal interval spaced retrieval practice: Exploring different schedules of spacing and retention interval in younger and older adults. Aging, Neuropsychology, and Cognition, 15, 257–280. Logan, J. M., Roediger, H. L., & McDermott, K. B. (2009). Using spaced retrieval practice to learn foreign language vocabulary: How does activity during the interval affect learning? Manuscript in preparation.

Y109937.indb 46

10/15/10 11:03:28 AM

Intricacies of Spaced Retrieval • 47

Madigan, S. A. (1969). Intraserial repetition and coding processes in free recall. Journal of Verbal Learning and Verbal Behavior, 8, 828–835. Melton, A. W. (1970). The situation with respect to the spacing of repetitions and memory. Journal of Verbal Learning and Verbal Behavior, 9, 596–606. Morris, D. D., Bransford, J. D., & Franks, J. J. (1977). Levels of processing versus transfer appropriate processing. Journal of Verbal Learning and Verbal Behavior, 16, 519–533. Morris, P. E., & Fritz, C. O. (2002). The improved name game: Better use of expanding retrieval practice. Memory, 10, 259–266. Peterson, L. R., Wampler, R., Kirkpatrick, M., & Saltzman, D. (1963). Effect of spacing presentations on retention of a paired associate over short intervals. Journal of Experimental Psychology, 66, 206–209. Rea, C. P., & Modigliani, V. (1985). The effect of expanded versus massed practice on the retention of multiplication facts and spelling lists. Human Learning, 4, 11–18. Robbins, D., & Wise, P. S. (1972). Encoding variability and imagery: Evidence for a spacing-type effect without spacing. Journal of Experimental Psychology, 95, 229–230. Roediger, H. L., & Crowder, R. G. (1975). Spacing of lists in free recall. Journal of Verbal Learning and Verbal Behavior, 14, 590–602. Roediger, H. L., & Karpicke, J. D. (2006a). The power of testing memory: Basic research and implications for educational practice. Perspectives on Psychological Science, 1, 181–210. Roediger, H. L., & Karpicke, J. D. (2006b). Test-enhanced learning: Taking memory tests improves long-term retention. Psychological Science, 17, 249–255. Schmidt, R. A., & Bjork, R. A. (1992). New conceptualizations of practice: Common principles in three paradigms suggest new concepts for training. Psychological Science, 3, 207–217. Shaughnessy, J. J., & Zechmeister, E. B. (1992). Memory-monitoring accuracy as influenced by the distribution of retrieval practice. Bulletin of the Psychonomic Society, 30, 125–128. Skinner, B. F. (1953). Science and human behavior. Oxford, England: Macmillan. Storm, B. C., Bjork, E. L., & Bjork, R. A. (2008). Accelerated relearning after retrieval-induced forgetting: The benefit of being forgotten. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34, 230–236. Tulving, E., & Thomson, D. M. (1973). Encoding specificity and retrieval processes in episodic memory. Psychological Review, 80, 352–373. Tzeng, O. J. (1973). Stimulus meaningfulness, encoding variability, and the spacing effect. Journal of Experimental Psychology, 99, 162–166. Whitten, W. B., & Bjork, R. A. (1977). Learning from tests: Effects of spacing. Journal of Verbal Learning and Verbal Behavior, 16, 465–478.

Y109937.indb 47

10/15/10 11:03:28 AM

Y109937.indb 48

10/15/10 11:03:28 AM

3

Distributed Learning and the Size of Memory A 50-Year Spacing Odyssey Thomas K. Landauer

About Spaced Practice For about a hundred years after Ebbinghaus discovered that he could remember a list of nonsense syllables better if he took a break once in a while, the kinds of things learned and the temporal and sequential units whose spacing were explored were quite limited. The items were primarily words or nonsense syllables, and the units that were spaced were whole groups of items. So, for example, a given nonsense syllable never followed itself very soon, at least in a way that was planned or analyzed so as to reveal the almost null effects of immediate repetition. Then, about 50 years ago, an important discovery took place, apparently independently in several labs. I first stumbled on the phenomenon in 1959 when I was analyzing data from a paired-associate experiment that had two presentations of the same items randomly placed in 50-item lists. The analysis found unusual variability between subjects on the same items. A closer look revealed that widely separated repetitions had twice as many correct responses. After replicating the result, I began teaching the trick to Harvard graduate students in education. I didn’t publish it—a lifelong failing you will see embarrassingly often in this somewhat autobiographical chapter. However, a few years later, 49

Y109937.indb 49

10/15/10 11:03:29 AM

50 • Thomas K. Landauer

then at Dartmouth, I undertook a more systematic exploration of the phenomenon. Laurie Snell, a world famous number theorist, volunteered to produce lists in which spacing was carefully controlled, reducing some of the confounds later worried about by, for example, Kahana and Howard (2005). But doing this myself was a nightmare beyond my patience and computational means. So I tabled the effort in favor of simpler models like rats and hamsters. Then, in 1967 at Stanford, where I had psychology grad students and big computers to do the work and 400 intro students to try things out, I returned to the problem. I was pleased to find that Bob Bjork, Jim Greeno, Doug Hintzman, and Chizko Izawa, all at Stanford at more or less the same time, were also studying repetition of items within rather than between lists. In a nutshell, we all found the same thing. Immediate repetition—the darling of commonsense memorization, nagging parents, and annoying television ads—was usually not significantly better for long-term retention than a single presentation, and much longer intervals could produce twice as much learning. As an example of the flurry and range of studies that followed, here is a sample of papers that just Bob and I published between 1966 and 1977: Bob (Bjork, 1966): “Learning and Short-Term Retention of Paired Associates in Relation to Specific Sequences of Interpresentation Intervals” Tom (Landauer & Eldridge, 1967): “Effect of Tests Without Feedback and Presentation-Test Interval in Paired-Associate Learning” Tom (Landauer, 1967): “Interval Between Item Repetitions and Free Recall Memory” Tom (Landauer, 1969): “Reinforcement as Consolidation” Bob (Bjork & Allen, 1970): “The Spacing Effect: Consolidation or Differential Encoding?” Bob (Bjork, 1970a): “Repetition and Rehearsal Mechanisms in Models of Short-Term Memory” Tom (Landauer, 1975): “Memory Without Organization: Properties of a Model With Random Storage and Undirected Retrieval” Tom (Landauer & Ainslie, 1975): “Exams and Use as Preservatives of Course-Acquired Knowledge” Bob (Whitten & Bjork, 1977): “Learning From Tests: Effects of Spacing” Tom (Landauer & Ross, 1977): “Can Simple Instructions to Use Spaced Practice Improve Ability to Remember a Fact? An Experimental Test Using Telephone Numbers”1

Y109937.indb 50

10/15/10 11:03:29 AM

Distributed Learning and the Size of Memory • 51

Then in 1977–1978 at Bell Labs, Bob and I collaborated on studies of spacing of test items without feedback versus with correctness feedback (Landauer & Bjork, 1978). Materials were names to go with faces, and first names to go with last names. The experiments were done as large class demonstrations involving hundreds of students flipping, studying, and marking punch cards at six-second intervals. They were tested again at hour’s end by raised hands to determine whether spacing helped, which, of course, it did dramatically. We also analyzed the data more accurately with a card sorter and found that repetition spacing had worked as effectively as usual. The data also showed that tests without feedback also benefitted learning, and that the spacing of the tests mattered too. Bob’s approach and mine have tended to be philosophically somewhat different. I have always hoped to find an underlying mechanism, either at the neural level or in an abstract mathematical form, that yields quantitative predictions. Bob’s work, although rigorously done, has more often been at the phenomenological level, describing what learners and contexts do to produce spacing effects. The variables in his theories have been such constructs as rehearsal, depth of processing, or explicit or implicit associations during learning activities. However, we shared both then and now a desire to put the spacing effect to work for people. My take on this follows Donald Stokes’s in Pasteur’s Quadrant (1997). Stokes maintains that useful knowledge rarely springs from pure research. Rather, it starts with a practical problem, turns to research to solve it, and to applications to test its validity. The way to be sure that you haven’t omitted or committed something important is to make it do important things in its natural environments. I believe this is especially important in psychology because humans can too easily be asked to do things in labs that they never do elsewhere, raising doubt about the legitimacy of generalization. The remainder of this chapter has nine interlocking but largely independent parts. It primarily reviews the winding trails of my research and theorizing to a new hypothesis described at the end. It contains 1. A reprieve of an old mathematical model of memory called memory without organization—a random storage model that accounted surprisingly well for spacing of practice effects and a number of other learning phenomena 2. A consistent hypothesis and model of the neural mechanism involved 3. A conundrum later noticed that it and most other theories of spacing incorrectly predicted that two unrelated items should

Y109937.indb 51

10/15/10 11:03:29 AM

52 • Thomas K. Landauer

show a spacing effect if measured by the probability that at least one is remembered 4. A reprieve of an old mathematical analysis that estimated that the lifetime accumulation of learned information in human memory is about 109 bits 5. A reprieve of a similar theory by Robert Hooke in 1681 that implies the same conundrum and the same 109 bits of accumulated memory 6. A new mathematical model that estimates the total amount of memory storage space to be perhaps as much as 1013 bits, some 10,000 times the amount used in a normal lifetime 7. An implication that in such a sparsely occupied memory, two different items could easily fail to show a spacing effect 8. Some conjectures on other memorial phenomena that such a huge and sparse space might imply 9. A summary and coda including a speculative hypothesis about how evolution of the human brain arose from spacing and random storage

I have tried to make it possible to read these sections by themselves. While the whole story involves multiple dependencies, most of its sections can largely stand alone, and none of the conclusions depends on all the others. The various parts have different degrees of empirical support, ranging from quite firm to quite conjectural. Thus, the final result could turn out to be a house of cards. However, my hope is that readers who persist will find at least a thought-provoking whole. In some places, space limitations urge consultation with cited sources for fuller understanding. Add one more caveat: The literature relevant to this essay is immense and your time and my pages limited. However, I am acutely aware that work on practice spacing has produced a very rich and broad set of phenomena, facts, models, attractive hypotheses, and outstanding puzzles. Indeed, it seems more than likely that spacing effects occur differently for different reasons in different situations and with different consequences. What is proffered here is not a survey but a recounting of one interrelated set of the phenomenon’s manifestations and a particular related set of hypotheses about its mechanism. Newest is a radical hypothesis that the unique power of human memory arises from vastly greater unfilled storage space than previously imagined. Much of the chapter involves matters that Bob Bjork has been involved in, including some that we worked on jointly and some that we bounced disagreements regarding. The joint work was about the effect

Y109937.indb 52

10/15/10 11:03:29 AM

Distributed Learning and the Size of Memory • 53

of spacing of repetitions with and without correctness feedback. The argument was about how to account for both the two greater than one conundrum and the standard spacing effect itself. In my interpretation, Bob has explained spacing phenomena as results of perception, thought, and action at their phenomenological levels, such as mental and physical effort, encoding variability or context specificity, whereas I have tried to explain them with neural and mathematical models of possible mechanisms. There is little disagreement on practical consequences. And I believe that resolution of the main outstanding difference, how to explain the two–one conundrum, is among the consequences of the new hypothesis with which the chapter ends.

Memory Without Organization In a paper titled “Memory Without Organization: Properties of a Model With Random Storage and Undirected Retrieval” (Landauer, 1975), I proposed that memory traces are stored in an n-dimensional closed space at loci that describe a random walk from point to point in “acquired memory space.” Memories are retrieved by a uniformly spreading signal sent out from the current locus activating previously stored traces that are both sufficiently similar in content and sufficiently close physically. If strong enough, the activation also lays down a trace of its contents at the current point in the random walk. This action either overwrites an existing trace or creates a new one.2 The course of the random walk is not altered by these events, it just keeps moving (see Figures 3.1 and 3.2). (In the final sections of this chapter, a simple parameter change—much greater potential storage space—allows these to be independent random paths for independent inputs.) At the time when it was first proposed, the random walk model was realized in two ways: by mathematical analysis and by a crude simulation as a three-dimensional multicell cubic space with its active point moving one cube at a time to a neighboring cube. They both gave qualitatively realistic functions for errors and reaction times in initial learning, forgetting rates after different numbers of prior repetitions, and spacing of repetitions. The continuous n-dimensional mathematical simulations also agreed quantitatively with reaction times for (1) naming line drawings (Oldfield & Wingfield, 1965), (2) word vs. nonword and category vs. noncategory judgments (Landauer & Meyer, 1972), and (3) an unpublished large and carefully controlled laboratory experiment with artificial stimuli that will be described later.

Y109937.indb 53

10/15/10 11:03:29 AM

54 • Thomas K. Landauer

Figure 3.1 A conceptual depiction of the random walk model of information storage. Information is deposited along a successive series of loci, each at an independent single step in an n-dimensional space. The path is not altered by the effects of storage.

Figure 3.2 A conceptual depiction of the random walk model of information retrieval. Stimulation of a locus initiates a uniformly spreading activation that in turn stimulates loci that it encounters.

Reinforcement as Consolidation In my PhD dissertation (Landauer, 1960) I had measured learning by secondary reinforcement in rats as a function of the time between a bar press and a brief light to which they had been previously trained to expect water availability. Maximum learning occurred when the light (no water) followed 0.5 second after the response, and thereafter declined in a sigmoid function of delay of presentation of the light. Later I used this finding in a neural model of the process (Landauer, 1969). The hypothesis was that after a stimulus causes the firing of a set of

Y109937.indb 54

10/15/10 11:03:30 AM

Distributed Learning and the Size of Memory • 55

neurons (e.g., a Hebbian cell assembly), a strength-limited activity continues at a decreasing rate over the following few seconds. The changes in synaptic conductivity of which learning consists are presumed to be related to the area under the diminishing activity curve. Learning thus ranges from none if stimuli are too close in time to twice as much or more depending on the time between repetitions. This neural mechanism hypothesis was much later qualitatively mirrored for long-term potentiation (LTP; a persistent increase in synaptic strength following high-frequency stimulation of a chemical synapse) in the sea slug aplysia. This process is probably not identical to vertebrate learning, but is analogous. The aplysia research, by Eric Kandel and associates (e.g., Hawkins, Carew, & Kandel, 1983) led to a Nobel Prize. Its relevance here is that the time-courses found in within-list spacing experiments, rat learning, and physiological data are reasonably consistent with each other and, in turn, consistent with the 1969 mathematical model. What is in common is that they all involve learning consequences for spacing. To recap, as most readers will know, the most dramatic manifestation of the spacing effect is that immediate repetition usually has almost no benefit, and that repeating the same information after relatively long intervals can greatly increase learning and retention. This summary holds most strongly for the very simple cases just described. However, as many later and recent studies have shown, the matter becomes more complex as the amount and consistency of what is being learned—say, a phone number, a lecture, a golf club swing, or a science course—is the unit, or if the learning is under different time constraints or instructions. It is not surprising, then, that no tidy way of capturing the effects of every occasion in which repetitions and tests occur at different spacings and schedules is yet available. To some extent, the effort has become engineering as much as science, finding causes, contexts, and parameters that govern different situations in which different experiences interact with each other as a function of time. Nonetheless, the core phenomenon has become increasingly useful. Perhaps my favorite example is the use of simple expanding spacing to teach hospitalized dementia patients room numbers, nurses’ names, and the whereabouts of toilet facilities. (American Psychological Association, 2006; Wilson 1995; and see Balota, Duchek, & Logan, 2007, for many more.) The present story, however, is focused on resolution of a mystery about the effects modeled by the random storage hypothesis and most strongly and consistently found in temporally constrained list learning and retrieval.

Y109937.indb 55

10/15/10 11:03:30 AM

56 • Thomas K. Landauer

The Two Is Not Better Than One Conundrum In 1978, Brian Ross and I realized that if the spacing advantage is determined by storage in two different places, the same advantage should accrue to the probability of finding at least one of two different items. This is predicted by both the random storage model and by some, but not all, versions of context-specific and differential encoding hypotheses (e.g., Bjork & Allen, 1970). But in the studies reported by Ross and Landauer (1978), no significant effects of the kind occurred. However, several researchers have also found ways to explain the apparent mystery in cases close to the original where it has been replicated (e.g., Glenberg, 1976). However, in all of those that I know of, additional verbal learning activities might have carried the additional learning rather than the spacing effect itself. Similarly, some mathematical models have explained the phenomenon by introducing additional parameters of unknown natural origin (e.g., Raaijmakers, 2003). Others have tried and failed to resolve the mystery by removing factors that may have caused the intralist spacing effect (e.g., Kahana & Howard, 2005). I have stayed out of the fray until now, busying myself with other matters, which now turn out to offer yet another escape or resolution. Next, the other matters.

How Much People Learn and Remember in a Lifetime: My Estimate An important motive for understanding how and why schedules of practice matter lies in the cumulative demand for acquisition and maintenance of knowledge. As Bjork has discovered and richly explored, forgetting is often as useful as remembering because remembering can be both confusing and counterproductive (Bjork, 1970a). But evolution has presumably had a steering consequence that might be stated as making a brain accumulate the most useful information and keep it functional as long as it is so. To make an estimate of how much accumulates in a human lifetime, I found or created data of several kinds and sources from which the rate of information input to long-term memory in bits per second could be estimated, as well as the rate of loss from long-term memory. Some were based on information gained from passages of reading, and others from recognition of previously presented single words, phrases, pictures, or tunes, and still others from models of word meaning acquisition. The estimates were remarkably close, ranging from 6 to 11 bits

Y109937.indb 56

10/15/10 11:03:30 AM

Distributed Learning and the Size of Memory • 57

per second (Landauer, 2002; Nickerson, 1968; Shepard, 1967; Standing, 1973). Assuming that people are constantly adding and losing new information at the average of these rates for sixteen hours a day yields a total of approximately 109 bits of newly stored information in a lifetime (Landauer, 1986). A more recent (unpublished) estimate was based on the amount of information needed by latent semantic analysis to mimic human knowledge of words. Corrected for lack of syntactic and grammatical knowledge (Landauer, 2002), the estimated information in a vocabulary of 100,000 words was about 108. A total learned memory of about 10 times word knowledge seems acceptable. An immediate bearing of this research and its conclusions on spacing issues lies largely in its success in using information theory to model memory functions, a powerful methodological tool I believe to have been woefully underused in cognitive psychology (despite the loosely labeled “information processing” approach). For spacing effects in particular, an example of what might be done is to study the cumulative storage efficiency of differing repetition regimes and scheduling distributions. This might yield more accurate, quantitative, and continuous valued treatments of the effects of variables and their interactions than measures of raw frequencies of right and wrong answers or raw accuracy of performance. However, the 1986 paper concerned only the amount of information normally accumulated. Total storage capacity with numbers up to the 1013 cortical synapses had previously been loosely conjectured, but no serious modeling effort had ever been undertaken. Clearly, questions about whether, how much, and in what way total capacity might surpass accumulation could lead to a better understanding not only of spacing phenomena, but also of brain and nervous system function in general. Pursuit of an estimate of total capacity drove the most recent step of the odyssey, with information theory again in a central role.

A Fascinating Historical Footnote: Hooke’s 1681 Model of How Memory Is Stored and How Much People Learn and Remember in a Lifetime Robert Hooke (1635–1703), experiment curator of the Royal Society, is known as the Leonardo of England for his many prescient ideas about science and technology. Hooke made important contributions to architecture, astronomy, paleontology, biology, physics, chronometry, microscopy, mechanics, cryptography, magnetism, and more. Among the most well known are the law of elasticity, invention of the compound

Y109937.indb 57

10/15/10 11:03:30 AM

58 • Thomas K. Landauer

microscope leading to the discovery of biological cells, postulation of elliptical orbits for the earth and moon, proposal of the inverse square law of gravitational attraction, spring escapement that made wristwatches possible, universal joint that lets cars turn smoothly, and theory of the catenary curve—the load-bearing principle obeyed in every suspension bridge and power line. He also collaborated with Wren in the reconstruction of London after the great fire of 1666. And he had a bitter controversy with Newton over the nature of light that caused Newton to refuse the presidency of the Royal Society until Hooke died. This was largely on the grounds that Newton thought science should be purely empirical generalization and detested Hooke for his many speculations,3 but also for his claims to priority in theories of light, color, and gravitation. Newton said he would head no organization that regularly listened to Hooke. Two of the most remarkable and least noticed of Hooke’s ideas were contained in an eight-page paper on memory that he read to the Royal Society in 1681 (Hooke, 1681/1956), An Hypothetical Explication of Memory: How the Organs Made Use of by the Mind in its Operation May Be Mechanically Understood.4

The Hooke Memory Model Hooke proposed that ideas were formed by the soul, which was located at the anatomically unspecified center of a memory repository. To form and store an idea, the soul processed and combined sensations with existing memories, then pushed the new idea onto the near end of an expanding coil. His account of retrieval from the repository coil was somewhat ambiguous, or perhaps dual. He first explained the sense of time by the distance along the coil between two ideas formed at different times, commenting on how this explained the experienced lack of time passing during sleep, when, he thought, few or no new ideas are formed. Later in the same paper, he postulated that retrieval was accomplished by a radiation from the soul encountering a resonating idea, and the excited idea returning a reverse radiation that transmitted its contents to the soul. He noted that the total process required a “double function” of distance from the soul to the stored idea. It is not entirely clear whether by this he meant twice the distance along the coil or a reference to the inverse square law. However, his eventual use of the sun’s radiation to the earth as an explanatory metaphor appears to suggest the latter, as does his explanation of temporary forgetting, in which a newer idea could temporarily interpose itself between the soul and an older idea

Y109937.indb 58

10/15/10 11:03:30 AM

Distributed Learning and the Size of Memory • 59

until the coil expanded further. He added confusion and degradation as other sources of forgetting, and interactions between neighboring ideas as further sources of confabulation and creativity. Figures 3.3 and 3.4 show one interpretation of Hooke’s memory theory in diagrammatic form.

Figure 3.3 A diagrammatic interpretation of Robert Hooke’s 1681 theory of memory storage. According to Hooke, ideas were stored in an expanding coil around the soul.

Figure 3.4 A diagrammatic interpretation of Robert Hooke’s 1681 theory of memory retrieval. On receiving an idea, the soul initiates a spreading excitation. If the excitation resonates with a previously deposited idea, the contents of that idea are returned to the soul. Earlier ideas could sometimes hide later ones.

Y109937.indb 59

10/15/10 11:03:31 AM

60 • Thomas K. Landauer

Hooke’s Estimate of Lifetime Storage of Ideas Hooke estimated that he could store about four ideas per second. He neglected to explain how he reached that conclusion, but here are some possibilities. Both explicit and implicit speech have normal rates of about two per second, but both can get up to around eight with practice and effort (Landauer, 1962). Eye fixations that result in recognition of objects have a similar normal rate. Perhaps Hooke timed himself in such activities. Hooke believed that attention was necessary for storing an idea. Sperling and Reeves (1980) estimated that attention shifting takes about half a second. Hooke lacked methods available to Sperling and Reeves, but, as an astronomer and builder of telescopes, undoubtedly knew that he could not time two azimuth crossings at once, and had a good idea of how much longer two took than one, a value not far off from half a second. Hooke thought he was faster than the average person, so he settled on a one per second rate as normal, and used it to calculate a lifetime’s worth of stored ideas. Multiplying a life of 100 years by 365 days by 16 waking hours by 60 minutes by 60 seconds, he arrived at a total of 2,000 million ideas (2 × 109).5 He intuitively judged this to be much too high, and revised his estimate down 100-fold by postulating large periods of illness and inattention as decrements, which, as we have seen, make his estimate more consistent with mine in information theoretic bits. An “idea” to Hooke must have been a considerably larger unit than a single bit.

The Total Amount of Memory Storage Space Available As mentioned earlier, the 1986 model did not provide an estimate of total learned memory capacity of the mind/brain, only the amount of information stored in a lifetime. It was not until 2003 that I hit upon a way to do this, and it is only here that the idea is being first published, and that with much trepidation. The approach involves a rather obvious extension of the random storage model to include an effect of new information overwriting old. The model is quite simple. It still assumes that information is entered at a locus on a random path where it either lays down a new trace or replaces an old one. The trace is again assumed to be a discrete unit that holds a certain amount of information—for convenience we call this a “fact.” When the store is full, all new facts must overwrite old ones. The

Y109937.indb 60

10/15/10 11:03:31 AM

Distributed Learning and the Size of Memory • 61

number and average bit size of facts needed to reach that point gives the total size of the random memory store. Recall that according to the 1975 model, each experience of a given fact adds another copy, and the average retrieval time decreases because the average distance to the nearest of the multiple copies decreases exponentially. This is how the model has achieved its good fits to spacing phenomena and a wide variety of other learning and forgetting data. Now suppose that every experience a person had was exactly the same. If there is a finite number of loci to be occupied, they would all eventually be filled by copies of that same one fact. The time needed to retrieve this fact would thereafter remain constant at some minimum that includes all processes common to every retrieval, such as perceptual and motor times and the processing of—but not search for—a single record. The same thing happens if there are different facts being laid down, but simulating the process and measuring its information content is more difficult and noisy.6 The estimation proceeds as follows. First, find a task that yields accurately measurable and highly consistent reaction times. Next, separate reaction times into two components: a constant representing average input and output times for a standard fact, and an exponential component. In other words, fit a model with a constant plus a logarithmic component to the data. If the resulting fit is nearly perfect over a range of at least four points,7 extrapolate it to where the exponential factor meets the constant factor. The number of facts required to reach this point is the number that fills memory. This is the point for which additional input does not decrease reaction time (RT), implying that new information can only overwrite old. The number of facts times their average bit rate is then the total capacity. This requirement is met by having a task in which the amount learned in bits on each repetition can be calculated and is reasonably stable and of low variance.8 Unfortunately, after considerable searching of published data, the two sets with the best precision for the purpose were ones of my own.9 In both cases, the data consisted of means over multiple subjects and multiple blocks of repetition trials and times for the same response. To make the response the same on every occasion, only data for no responses were fitted to the model. Case I: Data From Landauer and Meyer (1972) Subjects were shown words and nonwords in random order and pressed one of two buttons accordingly. And (for other purposes), although the words were sampled from the Kucera Francis norms and

Y109937.indb 61

10/15/10 11:03:31 AM

62 • Thomas K. Landauer

spanned a large range of frequencies, for greater uniformity it was only reaction times to press the no (nonword) key that were used. The information contained in a no response was set at the mean value in Landauer (1986) for words and nonsense syllables, a range of 6.2 to 11.5 bits, and could probably make a difference of no more than a factor of two in the final result. Data were averaged across subjects. An r > .9999 was obtained between data and model for a total space of ~1012.5 bits. Case II In an unpublished experiment originally designed to study the effects of category size and frequency of category membership on retrieval time, six adult subjects learned to which of 6 artificial categories (conveniently named 1, 2, …, 6) each of 63 different nonsense syllables belonged. The various nonsense syllables of the 6 categories appeared for each subject with frequencies in the ratio 1, 2, 4, 8, 16, and 32, with 10 replications of each set over the course of 10 sessions on 10 different days. Presentations were counterbalanced across subjects in such a way that every nonsense syllable was used equally often in each of the various conditions, and word-by-category assignments were fully counterbalanced over subjects and within subject by order of occurrence. Thus, in total, a given subject had “stored” one nonsense syllable 10 times, another 20 times, and so on, up to up 320 times. Data for 10 subjects were averaged. To store a decision (fill or overwrite a memory location) was assumed to require an average of 10 bits, so we again multiply the number of “facts” stored by an insignificant amount to get the total in bits. An r > .9999 was obtained between data and model for a total space of 1013.5 bits. Thus, for two sets of data obtained in two quite different ways with different stored facts, one taken from data close to natural verbal judgments, the other from a carefully controlled laboratory experiment with artificial stimuli, estimates of total memory capacity were ~1013 bits, some 10,000 times the amount previously estimated for the amount stored in an average lifetime. Figure 3.5 shows the results for the artificial category data.

Toward Resolving the Two Is Not Better Than One Conundrum This last conclusion and inferences from it are, of course, dependent on how well the model captures or agrees with the natural mechanism. The two independent sets of data provide at least a minimal insurance of

Y109937.indb 62

10/15/10 11:03:31 AM

Distributed Learning and the Size of Memory • 63 3

Log rt-min

2.5 2 1.5 1 0.5 0

1

2

3

4

5

6

7 8 9 10n bits

10 11 12 13 14

Figure 3.5 Estimating total information capacity of human memory. Data are reaction times as a function of number of negative category membership responses to the same nonsense syllable. The Y-axis is mean reaction times minus estimated perceptual and motor times; the X-axis is the number of repetitions of the identical stimulus–response events. The zero intercept of the log-log function estimates the number of bits that would be required to fill the total storage space. r > .9999.

generality. And the fact that the size estimate agrees with prior guesses, for example, ones based on the number of modifiable synapses in the brain and Turing’s conjecture, gives some comfort. But it would be better to have more corroborating evidence. Meanwhile, however, supposing that this estimate is roughly correct or even just on the right track, how it would affect our theories of spacing (and many other psychological phenomena) seems worth exploring. The possible consequence that motivated this part of the story is that hugely greater capacity—an extreme sparseness of occupation—might allow much more independence of one memory or string of memories from another under certain circumstances. For example, if only data from highly similar sources or occurring closely in time are stored near each other, then it would seem that all of the ways in which spacing has been shown to affect learning—as consolidation, random storage, implicit rehearsal, context specificity, or differential encoding—could coexist. The brain could sometimes do unrelated, inconsistent, or contradictory things with little or no interference. But it could also evince many known, and perhaps novel, interference effects. We might be able, for example, to find situations in which massed practice is better than spaced—oh heresy—and our job would change from trying to determine which is the one right mechanism to which mechanism will and won’t do what.

Y109937.indb 63

10/15/10 11:03:32 AM

64 • Thomas K. Landauer

Here is an example of one way this might happen. Suppose recall of an exception to a highly overlearned fact is needed, for example, right after changing a password. Overlearned competing facts would be widely spread in the sparse space. To overcome competition from their traces, effective spacing of practice on the new password would depend on the distribution of the old traces in a complex manner. For example, the best scheduling of additions could depend on whether the previous loci were tightly or evenly distributed. Perhaps such complexities could lie behind recent spacing results such as those reported by Karpicke and Roediger (2007), in which even spacing was better than expanding, a finding that seem at odds with Landauer and Bjork (1978). For example, one might hypothesize that expanding spacing would be optimum for information that is needed quickly but ephemerally because its records are too close together, whereas long-term availability profits from the greatest dispersion. An optimum combined learning plan might then start with expanding spacing and move to even. The detailed effects of unlimited varieties of schedules could conceivably be predicted from this sort of extension of the model.

Coda: A Quick Summary and an Audacious Hypothesis The train of research and theory reported here contains a consistent underlying theme. Efficient human learning and memory depends strongly and possibly universally on spreading the input of information over time. That this is not an empty or practically unimportant observation is belied by the fact that common intuition has it the other way around. Students, writers, educators, and advertisers persist in practicing massed practice despite our community of researchers haranguing them endlessly to mend their ways. So why does nature disagree? Here is a highly conjectural hypothesis. In animals lower than humans on the evolutionary tree almost all changes in information-based behavior happen over multiple generations. Correspondingly, mental functions can be highly localized for efficiency, and changing behavior patterns would almost always require gradual changes in old neural structures. Stability is achieved, but the ability to do species-novel things during a lifetime is severely limited. And repeating things before they are forgotten is the only or best way not to forget or to hold a series together. What did it take to break this mold? The solution seems almost obvious. Instead of a place for everything and everything in its place,

Y109937.indb 64

10/15/10 11:03:32 AM

Distributed Learning and the Size of Memory • 65

put many copies of things in many places. Because for much of human information what will be needed next may be almost anything at all, the brain can’t know where to put it. So in addition to the temporarily useful repetition trick, it also spreads it more everywhere over time. Sparse distributed coding has a long and prominent history in the mathematical psychology of memory (see especially Hinton, McClelland, & Rumelhart, 1986). However, what has been greatly undervalued is the importance to human-specific intelligence of the combination of randomness, vast uncommitted storage space, and the central role of spacing. I propose that human evolution’s great discovery was the power of this combination. The right kind of randomness was a random walk, which makes recent news much easier to find (and lose), and thereafter continually approaches perfect randomness. Spacing and randomness keep things from overwriting each other but require a lot of space to do so. The time-course of brain size expansion and human memory-based intelligence ran in parallel over the evolutionarily brief last million years. More free storage space required bigger braincases, whose size and resulting competence increased by Baldwinian evolution (smart people had more mates and children) until the skull couldn’t exit the birth canal safely. Given this new kind of brain, spacing of repetitions fortuitously (a kind of optional spandrel?) provides an algorithm that, if followed, ensures a well-controlled random walk. Of course, spacing works for every animal. But this story also makes it clear that a variety of natural and voluntary spacing regimens are only made possible by the wonders of human brains. And it also implies that differences in the effects of spacing will depend on many complex interactions. The best recent confirmation of this hypothesis was the discovery that if some fact has just been encountered for the first time, repetitions without feedback should profit from an expanding schedule that keeps them close enough to rekindle each other. But after many repetitions, even spacing should be more effective in spreading copies uniformly so as to make them close to any starting place (see Balota et al., 2007). For another example, contradictory facts—rife in political and domestic life—should be more easily tolerated if laid down at different times, while complementary ones such as knowledge of the multiplication table might be better learned by being near each other or, better still, in multiple local clusters. I leave it to readers to think of other examples, and especially of problems with the story. The odyssey is certainly not over.

Y109937.indb 65

10/15/10 11:03:32 AM

66 • Thomas K. Landauer

Endnotes 1. For an excellent and recent review, see Balota, Duchek, and Logan (2007). 2. In the 1975 mathematical model, it was assumed that the storage space was so large that overwriting could be ignored. We will see later that this assumption sits well with the new model for estimating total information capacity. 3. One might wonder what Newton would think of this essay. 4. Sources: Draaisma (2000), Espinaasse (1956). 5. Interestingly, Alan Turing, in his famous 1950 paper proposing his test of whether a computer could think like a human, supposed that the main barrier for the computer was storage capacity, and conjectured that 109 bits would be needed! 6. Possibly important complications involving nonuniformity of filling due to intrinsic and extrinsic sources of path initiation are ignored here, as they were in the original random walk model. We are essentially assuming that the modeled part of the knowledge store is effectively random. 7. Three points are needed to define the exponential function; an additional point was required to increase confidence in the linearity of fit. 8. Estimation of the constant term is an important consideration. The higher it is, the lower the estimate of the total. Ideally, this factor should include only the time for the relevant brain mechanisms to process and store the information, and not the time required to receive, deliver, and otherwise process but not store it. Event Related Potential (ERP) measurements have onset times measured at the skull of around 100 ms, of which some must be caused by activity after memory retrieval has started, and motor transmission plus effecter times are fairly comparable. So constant times in the 100 to 200 ms range seem most plausible. The value for the graph was obtained by working backwards from the observed data to find what sizes of constants in that range produced plausible results. The ~1013 estimate is based on a constant of 150 ms. In the vicinity of that value each .05 ms adds or reduces the total capacity by a factor of around 100. Low values seem appropriate, as these are highly overlearned responses. On the other hand, the estimate falls off steeply as the input–output time grows. Thus, it would seem wise at this point in the odyssey to be content with hypothesizing the total capacity at somewhere between 1011 and 1015, or one hundred to ten million times the average lifetime storage. Anywhere in this enormous range would probably support the major consequences conjectured below. 9. Additional data sets and better constant estimates would be extremely welcome.

Y109937.indb 66

10/15/10 11:03:32 AM

Distributed Learning and the Size of Memory • 67

References American Psychological Association. (2006). Explorations of spacing effects in healthy aging and early stage Alzheimer’s disease. Psychology and Aging, 21(11), 19–31. Balota, D. A., Duchek, J. M., & Logan, J. M. (2007). Is expanding retrieval a superior form of spaced retrieval? A critical review of the extant literature. In J. S. Nairne (Ed.), The foundations of remembering: Essays in honor of Henry L. Roediger III (pp. 83–106). New York: Psychology Press. Bjork R. A (1966). Learning and short-term retention of paired associates in relation to specific sequences of interpresentation intervals (Doctoral dissertation, Stanford University). Dissertation Abstracts, 27, 3684B. University Microfilms 67-4316. Bjork R. A. (1970a). Positive forgetting: The noninterference of items intentionally forgotten. Journal of Verbal Learning and Verbal Behavior, 9, 255–268. Bjork, R. A. (1970b). Repetition and rehearsal mechanisms models of shortterm memory. In D. A. Norman (Ed.), Models of memory (pp. 307–330). New York: Academic Press. Bjork, R. A., & Allen, T. W. (1970). The spacing effect: Consolidation or differential encoding? Journal of Verbal Learning and Verbal Behavior, 9, 567–572. Draaisma, D. (2000). Metaphors of memory: A history of ideas about the mind (pp. 56–59). Cambridge, UK: Cambridge University Press. Espinaasse, M. (1956). Robert Hooke. Berkeley: University of California Press. Glenberg, A. M. (1976). Monotonic and nonmonotonic lag effects in pairedassociate and recognition memory paradigms. Journal of Verbal Learning and Verbal Behavior, 15, 1–16. Hawkins, R. D., Carew, T. J., & Kandel, E. R. (1983). Effects of interstimulus interval and contingency on classical conditioning in aplysia. Society of Neuroscience Abstracts. Hinton, G. E., McClelland, J. L., & Rumelhart, D. E. (1986). Sparse distributed memory. Cambridge, MA: MIT Press. Kahana, M. J., & Howard, M. W. (2005). Spacing and lag effects in pure and mixed lists. Psychonomic Bulletin and Review, 12, 159–164. Karpicke, J. D., & Roediger, H. L. (2007). Expanding retrieval practice promotes short-term retention, but equally spaced retrieval enhances long-term retention. Journal of Experimental Psychology: Learning, Memory, and Cognition, 33, 704–719. Landauer, T. K. (1960). Studies of acquired reinforcers. Cambridge, MA: Harvard University. Landauer, T. K. (1962). Rate of implicit speech. Perceptual and Motor Skills, 15, 646. Landauer, T. K. (1967). Interval between item repetitions and free recall memory. Psychonomic Science, 8, 439–440.

Y109937.indb 67

10/15/10 11:03:32 AM

68 • Thomas K. Landauer

Landauer, T. K. (1969). Reinforcement as consolidation. Psychological Review, 76, 82–96. Landauer, T. K. (1975). Memory without organization: Properties of a model with random storage and undirected retrieval. Cognitive Psychology, 7, 495–531. Landauer, T. K. (1986). How much do people remember? Some estimates of the quantity of learned information in long-term memory. Cognitive Science 10, 477–493. Landauer, T. K. (2002). On the computational basis of learning and memory: Arguments from LSA. In N. Ross (Ed.), The psychology of learning and motivation, 41, 43–84. Landauer, T. K., & Ainslie, K. I. (1975). Exams and use as preservatives of course-acquired knowledge. Journal of Educational Research, 69, 99–104. Landauer, T. K., & Bjork, R. A. (1978). Optimum rehearsal patterns and name learning. In M. M. Gruneberg, P. E. Morris, & R. N. Sykes, (Eds.), Practical aspects of memory (pp. 625–632). London: Academic Press. Landauer, T. K., & Eldridge, L. (1967). Effect of tests without feedback and presentation-test interval in paired-associate learning. Journal of Experimental Psychology, 75, 290–298. Landauer, T. K., & Meyer, D. E. (1972). Category size and semantic-memory retrieval. Journal of Verbal Learning and Verbal Behavior, 11, 539–549. Landauer, T. K., & Ross, B. H. (1977). Can simple instructions to use spaced practice improve ability to remember a fact? An experimental test using telephone numbers. Bulletin of the Psychonomic Society, 10, 215–218. Nickerson, R. (1968). A note on long-term recognition memory for pictorial material. Psychonomic Science, 11, 58. Oldfield, R. C., & Wingfield, A. (1965). Response latencies in naming objects. Quarterly Journal of Experimental Psychology, 18, 273–281. Raaijmakers, J. G. W. (2003). Spacing and repetition effects in human memory: Application of the SAM model. Cognitive Science: A Multidisciplinary Journal, 27(3), 431–452. Ross, B. H., & Landauer, T. K. (1978). Memory for at least one of two items: Test and failure of several theories of spacing effects. Journal of Verbal Learning and Verbal Behavior, 17, 669–680. Shepard, R. N. (1967). Recognition memory for words, sentences and pictures. Journal of Verbal Learning and Verbal Behavior, 6, 156–163. Sperling, G., & Reeves, A. (1980). Measuring the reaction time of a shift of visual attention. In R. Nickerson (Ed.), Attention and performance VIII. Hillsdale, NJ: Earlbaum, 347–360. Standing, L. (1973). Learning 10,000 pictures. Quarterly Journal of Experimental Psychology, 25, 207–222. Stokes, D. E. (1997). Pasteur’s quadrant basic science and technological innovation. Washington, DC: Brookings Institution Press. Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59, 433–460.

Y109937.indb 68

10/15/10 11:03:32 AM

Distributed Learning and the Size of Memory • 69

Whitten, W. B., & Bjork, R. A. (1977). Learning from tests: Effects of spacing. Journal of Verbal Learning and Verbal Behavior, 16, 465–478. Wilson, B. A. (1995). Management and remediation of memory problems in brain-injuries in adults. In A. D. Baddeley, B. A. Wilson, & F. N. Watts (Eds.), Handbook of memory disorders (pp. 441–449). Chichester, UK: John Wiley & Sons.

Y109937.indb 69

10/15/10 11:03:32 AM

Y109937.indb 70

10/15/10 11:03:32 AM

4

The Causes and Consequences of Reminding Aaron S. Benjamin and Brian H. Ross

We survive and thrive by making efficient use of our knowledge. When specific prior events are relevant to current situations, aspects of those past events are retrieved and guide us through the present. Sometimes this remembering is deliberate, but often we are reminded less by force of our own will than by the stimulus itself. Such instances of reminding reflect fundamental principles of association by similarity, as noted by early theorists in psychology (James, 1890; Woodworth, 1921). Reminding is ubiquitous in higher cognition and daily life, and has found a home in theoretical developments ranging from psychology to artificial intelligence. It is thus surprising to find little mention of the concept of reminding in the literature on human memory. This absence likely reflects a historical shorthand in memory research, whereby study phases of experiments are considered to be encoding events and tests to be retrieval events. Current research has dispensed with part of this false dichotomy: Tests are now known to be potent learning events (Bjork, 1975; Karpicke & Roediger, 2008). The goal of this chapter is to consider the place of retrieval in learning events, and to demonstrate that the concept of reminding provides a useful unifying theme for memory phenomena that otherwise lack theoretical coherence. As we will demonstrate, the consequences of reminding can only be understood by jointly considering the inherent trade-off between conditions that 71

Y109937.indb 71

10/15/10 11:03:33 AM

72 • Aaron S. Benjamin and Brian H. Ross

promote likely reminding and conditions that promote potent reminding. This theoretical position follows naturally from the seminal work by the honoree of this festschrift, which both demonstrated the powerful mnemonic consequences of retrieval (Bjork, 1975; Whitten & Bjork, 1977) and considered the trade-off between likely and powerful retrieval opportunities (Finley, Benjamin, Hays, Bjork, & Kornell, 2010; Landauer & Bjork, 1978).

Reminding in Higher Cognition In many areas of higher-level cognition, research has emphasized the abstract knowledge people might use in accomplishing the task—rules or prototypes in categorization, domain-general problem-solving strategies, and generalized knowledge schemas. However, in each area, researchers have found it important to eschew an exclusive focus on abstract knowledge and consider how people might rely on the memory for specific earlier cases to accomplish the tasks. Reminding is thought to be the mechanism by which these relevant specific instances are brought to mind. In this section, we lay out three domains in which reminding has played a role in understanding phenomena in higherlevel cognition. Concept Learning Concepts are the basic building blocks of cognition. They are critical in all cognitive areas, so how they are acquired has long been of interest. Bruner, Goodnow, and Austin (1956) proposed an active hypothesistesting role for the learner: the generation and testing of rules about what features or combinations of features allow accurate determination of whether an item is or is not a member of a concept. Rosch (1975; Rosch & Mervis, 1975) later argued that concept representations should not be thought of as rules, but as prototypes—a set of features that vary in how typical they are of category members. Rather than either belonging or not belonging to the category, members could differ in how good a member they were as a function of how many prototype features they contained. Thus, a robin is a typical bird because it has the most common bird features (feathered, winged, small; flies, sings, eats worms, nests in trees), whereas a penguin is a less typical bird because it is large, does not fly, and eats fish. New items are classified by the prototype they are most similar to, and this same representation is used to classify all the instances in the category. This reliance on building up and using abstract knowledge was challenged by both Medin (Medin & Schaffer, 1978) and Brooks (1978),

Y109937.indb 72

10/15/10 11:03:33 AM

The Causes and Consequences of Reminding • 73

who argued that prototype views are inadequate as conceptual representations. Rather, people often use much more specific knowledge in making classification judgments. The thrust of this exemplar view is that our memory is constantly encoding specific knowledge, with items consisting of related features and connections to context, and that the presentation of a new item probes memory for similar knowledge. The retrieval of similar items provides a useful means of determining category membership; items that are very similar are likely to be in the same category. The importance of this proposal is twofold. First, because it avoids the need for a generalization mechanism for prototypes, the assumption is simply that people store information about specific items from their interactions with these items. Second, it allows for addressing the various complexities one finds in categories, such as correlations among features. For example, typical birds are small and sing, but it would be unusual for a large bird to sing. The set of exemplars encodes these correlations. The exemplar view is powerful, and has generally proven superior to prototype views in direct comparisons (e.g., Murphy, 2002; Ross & Makin, 1999). In the case of a new item that is very typical, it will be similar to many earlier items and thus easy to quickly classify. An atypical item will be similar to far fewer items, and will be more difficult to classify. The exemplar view has been extended in a number of ways, including providing an independent means of assessing similarity (Nosofsky, 1986) and a learning model that tunes the weight of various features (Kruschke, 1992). More recently, hybrid models have been introduced that include an exemplar component along with a rule learning module (Erickson & Kruschke, 1998). Some recent models challenge some aspects of how the specific knowledge of exemplars may influence classification (e.g., Love, Medin, & Gureckis, 2004), but no one questions the idea that specific knowledge influences classification. Even real-world tasks have shown large influences of this sort—for example, diagnoses by physicians with much experience in diagnosing dermatology cases are influenced by similarities to recently seen specific cases (Brooks, Norman, & Allen, 1991). For unusual items, or items early in learning, remindings are particularly common. They can influence classification and affect learners’ understanding of the category (e.g., Ross, Perkins, & Tenpenny, 1991). Even if a simple rule is given to learners and applied to new items, those new items that are very similar to earlier items may be classified differently than the rule predicts (Allen & Brooks, 1991). This research shows the importance of specific remindings in understanding classification behavior, and reveals that such behavior relies on more than abstract knowledge.

Y109937.indb 73

10/15/10 11:03:33 AM

74 • Aaron S. Benjamin and Brian H. Ross

Problem Solving by Analogy Problem solving is ubiquitous and important in everyday cognitive activities. Early developments in the field emphasized a characterization based on a representation of the problem as a problem space (the knowledge one has of the relevant world states and operators for transforming states) and the search of this space to find the goal state (Newell & Simon, 1972). Problem solving by analogy occurs when we try to solve a new problem by reference to an earlier problem and its solution. For example, students asked to solve a homework problem in a math course often look back through the book for similar examples. Problem solving often involves being reminded of an earlier experience and adapting it to the current situation. Gick and Holyoak (1980, 1983) presented a series of studies that examined how people solve problems by analogy. The target problem was Duncker’s tumor problem: A patient has a tumor that cannot be operated on, and the x-ray strength that can kill the tumor would also kill healthy tissue, resulting in death. The convergence solution is to position a set of lower-intensity machines around the body and have them converge at the tumor. Before attempting a solution to this problem, some subjects read a story about a general attacking a fortress that had roads to it mined to allow only small work details. He succeeded by splitting up his forces and having them converge on the fortress. Having this earlier experience increased the use of the convergence strategy for the tumor problem from 10 to 30%, but even this success rate is unimpressive. Perhaps they did not understand the story? No, when asked to summarize the story, performance was no better. Perhaps they did not understand the principle? No, when the principle was made explicit (“The general attributed his success to an important principle: If you need a large force to accomplish some purpose, but are prevented from applying such a force, many small forces applied simultaneously from different directions may work just as well”), performance was no better. Perhaps they did not think back to the earlier problem (i.e., get reminded)? Yes, when told they might want to use the earlier general story, performance increased to an impressive 80%. Remindings are critical in learning through analogical problem solving. Without a teacher to tell the learner what earlier example to use, remindings determine whether the learner tries to apply an appropriate earlier solution that might help solve the current problem and perhaps help to learn to solve problems of this type, or tries to apply an inappropriate earlier solution unsuccessfully. What affects which earlier experiences people get reminded of? Remindings are memory retrievals—the solver probes memory with

Y109937.indb 74

10/15/10 11:03:33 AM

The Causes and Consequences of Reminding • 75

an initial analysis of the problem and retrieves experiences that match this in some respects. Experts may be able to make a deep analysis of a problem to be reminded of earlier problems with the same deep structure. However, novices often have more superficial analyses and characterize the problem in terms of the particular objects mentioned in the problem, such as the particular story content of a word problem (Chi, Feltovich, & Glaser, 1981; Ross, 1984). Although theories differ on what influences reminding, all propose that the remembering of an earlier example critically influences current problem solving (see Reeves & Weisberg, 1994, for a critical review). In addition, to the extent that remindings allow comparisons that might promote generalization, they may play an important role in learning within a domain (e.g., Gick & Holyoak, 1983; Ross & Kennedy, 1990). Remindings are an important aspect of problem solving by analogy, a prevalent means of problem solving. In many cases, people do not rely on abstract knowledge, such as problem schemas or general rules, but think back to a specific earlier example and use that to help solve a current problem. Understanding One can think of the goal of understanding as the construction of an integrated representation that combines an input with prior knowledge. Early theories that attempted to bring prior knowledge to bear ran into a major obstacle—so much knowledge and so many inferences were needed that processing would not be possible in any reasonable time, for either people or machines (Rumelhart & Ortony, 1977; Schank & Abelson, 1977). The schema was an attempt to overcome this obstacle (e.g., Rumelhart & Ortony, 1977). A schema is a generalized knowledge structure that is used for understanding. Rather than combining all the relevant prior knowledge each time it is needed, the schema comes prepackaged: Those situations one has faced a number of times before access a schema that already contains the relevant prior knowledge, inferences, and slots for understanding that particular situation. For example, one has general knowledge of a buying/selling situation, with an understanding that the buyer gives some money (a function of what is being bought), the seller surrenders ownership of the item, and so on. For routine events that include knowledge very similar from time to time (e.g., going to a restaurant), even greater amounts of prior knowledge can be specified (Schank & Abelson, 1977). Schemas have become an important idea in a wide variety of psychological domains, as well as in artificial intelligence. Psychological research has found evidence of the existence and use of such abstract

Y109937.indb 75

10/15/10 11:03:33 AM

76 • Aaron S. Benjamin and Brian H. Ross

knowledge structures (e.g., Bower, Black, & Turner, 1979). However, there have also been indications in both psychological domains and artificial intelligence research that these large knowledge structures may sometimes be too large or inflexible, and that sometimes specific experiences may play a role. For example, Bower et al. (1979) found confusions in recalling information from schemata that shared similar parts, such as waiting rooms for doctors and dentists, suggesting that the overall “script” might be composed of smaller parts that are used across similar situations (such as waiting rooms and payments for health professionals). Schank (1982) argued that computer programs intended to acquire structures must build them up from specific earlier experiences. In processing a new experience, one might be guided by the general knowledge, but might also activate the specific earlier experiences if the situation is new or unusual. For example, if a restaurant experience involved someone choking on a chicken bone, a similar earlier experience might well be thought of and the relevant information utilized. A similar use of remindings can underlie problem solving by novices, as noted earlier (e.g., Ross, 1984). The idea of using specific episodes for understanding has been very influential in machine learning, leading to a new area of research called case-based reasoning. It allows the analysis of current situations to be processed in terms of earlier cases rather than by abstract knowledge alone. Earlier cases might help in a variety of ways, such as focusing on the important aspects, helping devise a plan, or providing information about what led to failure in a previous similar situation. Even in relatively simple reading comprehension situations, people may think back to earlier texts to help interpret the current one (Gentner, Rattermann, & Forbus, 1993; Ross & Bradshaw, 1994).

Reminding in Simple Memory Tasks The preceding examples reveal how memories for individual events and stimuli subserve a variety of intellectual skills, and suggest that reminding is the mechanism by which those memories are efficiently utilized. If reminding underlies those many cognitive capacities, then we should be able to detect footprints of it in basic memory tasks. The type of reminding we all experience occurs not only when being at a wedding reminds us of funny things that happened at a previous wedding, but also when digging an old toy out of the attic reminds us of a childhood experience with that toy. Reminding takes place not only for complex situations and mappings, but also—and maybe even more frequently—for simple individual elements in our lives. Basic memory

Y109937.indb 76

10/15/10 11:03:33 AM

The Causes and Consequences of Reminding • 77

paradigms, in which people are asked to remember stimuli like words and pictures, seem a fertile ground for seeking evidence for reminding for simple materials. We start that search in tasks in which related materials are presented at different times, and consider four common experimental paradigms. In each case we ask: If reminding were taking place, what would the consequences be? Two fundamental considerations from well-established memory research guide how we answer that question (Benjamin & Tullis, 2010). First, because we forget things, reminding is less likely at longer intervals. Second, although longer intervals decrease the probability of reminding, they increase the mnemonic potency of reminding. That is, because more laborious retrieval enhances memory more than easy retrieval (Gardiner, Craik, & Bleasdale, 1971; Karpicke & Roediger, 2008; Slamecka & Graf, 1978), difficult reminding enhances memory more for the reminded (i.e., retrieved) event than does easy reminding. Thus, the product of reminding can reveal a trade-off between likely and potent retrieval: if the reminding cue is too late, little reminding occurs and consequently little benefit accrues; if reminding is too soon, reminding occurs but the benefits are minimal. Memory for Repeated Materials The most straightforward case of relationship is identity, and so the best reminding cue for a previous event is likely to be a repetition of that event. The most commonly employed repetition paradigm is one in which the lag between the repetitions is varied; such experiments reveal the ubiquitous spacing and lag effects, in which greater distance between the repeated events leads to superior memory for the event (Ebbinghaus, 1885/1962; Melton, 1970). Predominant explanations for this effect include a role for the greater encoding variability afforded by more temporally variable study contexts (Bower, 1972; Estes, 1955) as well as the attenuated processing induced by close repetitions (Hintzman, 1974), but the effects may be more parsimoniously understood by considering the consequences of reminding (Benjamin & Tullis, 2010; Hintzman, 2004). Let us consider what the benefits of reminding would look like in such a paradigm. As outlined earlier, the probability of reminding decreases with increasing intervals and the potency of the reminding increases with increasing interval (difficulty). Therefore, there should be a sweet spot at which the probability and potency of reminding combine to produce maximal benefits—that is, spacing functions should be nonmonotonic with lag. In addition, because performance reflects not the increasing independence of events (as in encoding variability

Y109937.indb 77

10/15/10 11:03:33 AM

78 • Aaron S. Benjamin and Brian H. Ross

theories), but rather reveals a joint function of their dependence (which promotes probable reminding) and independence (which promotes difficult retrieval and thus powerful reminding), such theories can accommodate superadditive levels of performance. This is true because, although events can be no more independent than perfectly uncorrelated—an assumption that leads encoding variability to be inconsistent with superadditivity—theories that postulate interaction between the two events, like reminding, are not limited in that way. These two phenomena are shown in Figure 4.1, which plots data simulated from a reminding model of repetition proposed by Benjamin and Tullis (2010). Finally, because association is at the heart of reminding, no benefits should be apparent for unrelated materials—that is, materials that are unlikely to remind the learner of each other. Each of these predictions is borne out in data. Spacing functions can be and often are nonmonotonic (Benjamin & Tullis, 2010; Peterson, Wampler, Kirkpatrick, & Saltzman, 1963), revealing that, once the probability of reminding is sufficiently low, the net benefits start to decrease with additional spacing. In addition, superadditivity is ubiquitous (Begg & Green, 1988; Benjamin & Tullis, 2010), revealing itself at lags as short as five intervening items and evident in more than 60% of experimental conditions in the literature. Finally, no spacing benefit is evident for unrelated words (Ross & Landauer, 1978); that is, the probability of 0.5

Performance

0.45 0.4 Superadditivity baseline

0.35

Performance

0.3 0.25 0.2

0

2

4 6 8 Lag Between P1 and P2

10

12

Figure 4.1 Predicted performance for the reminding model proposed by Benjamin and Tullis (2010). Solid line represents memory for the item presented at P1 as a function of the interval between P1 and P2. Dashed lined represents the superadditivity baseline (i.e., the probability summation of remembering one of two things). Baseline performance is set to 0.2, the forgetting rate is 0.7, and the probability of reminding is 0.9.

Y109937.indb 78

10/15/10 11:03:34 AM

The Causes and Consequences of Reminding • 79

remembering one of two different words does not vary with the interval between the words. According to reminding theory, such a result is the obvious consequence of the fact that the second word does not remind the reader of the first, and thus does not initiate retrieval of it. Evidence for the theoretical value of reminding is also apparent in tasks in which items are repeated and memory is tested by evaluating judgments of the recency of an item’s occurrence (Hintzman, 2010) and memory for the frequency of an item’s occurrence (Hintzman, 2004). In each of these tasks, as in recognition more generally, other perspectives have failed to provide an adequate account of extant phenomena. Memory for Semantically Related Materials One benefit of thinking about repetition as similarity-induced reminding is that the principles generalize easily to near-repetition paradigms, in which synonyms or related words are presented. The same principles applied to repetition paradigms can be used to derive simple predictions about the effects of lag of memory for related materials, and about the effects of pure versus near repetition on memory. Because associates are less similar to one another than are repetitions, both the probability of reminding and the potency of reminding are changed. For example, a second presentation of king is likely to remind one of an earlier presentation of king, but it could also remind one of queen. If one had seen queen earlier in the list, a later presentation of king would be expected to have different consequences than hamburger for reminding. The probability of a word reminding the learner of a related word from earlier in the list would drop off more quickly than would the probability of a repetition doing so, but the potency of the reminding would be commensurately greater. The net effect is that the “sweet spot” in the trade-off is earlier in time than for repetitions. This effect can be seen in the top panel of Figure 4.2, in which two lag functions are simulated from the Benjamin and Tullis (2009) model— one of which has a very high probability of reminding (as one would expect for repetitions), and one of which has a slightly lower probability of reminding (as one would expect for related associates). Note that, as expected, the performance maximum is earlier in time (in fact, at the shortest interval) for associates than for repetitions. These predictions can be compared to data from an experiment reported by Hintzman et al. (1975, depicted in the bottom panel of Figure 4.2). One noteworthy feature of these results is that, if the presentation of the two events is close together in time, memory for the first word is superior if the second word is an associate than if it is a repetition. This reflects the fact that event dissimilarity fosters more

Y109937.indb 79

10/15/10 11:03:34 AM

80 • Aaron S. Benjamin and Brian H. Ross 0.4

Performance

0.35

Repeated 0.3

0.25

0.2

Related

0

2

4

6

8

10

12

Lag Between P1 and P2 0.95

Repeated Related

Hit Rate

0.9

0.85

0.8

0.75 –5

0

5

10 15 20 Lag Between P1 and P2

25

30

Figure 4.2 Top panel: Predicted performance of the reminding model for memory for an event at P1 following a repetition (darker line) or following a presentation of a related associate (lighter line). Bottom panel: Data from an experiment by Hintzman (1975) showing a similar pattern for analogous conditions.

potent retrieval—thus, being reminded of an event by an associate involves more laborious and thus more mnemonically enhancing retrieval. However, because reminding is increasingly unlikely at long lags, the earlier advantage of dissimilarity becomes a disadvantage as the interval is increased. Picking a cue for maximal mnemonic enhancement of the reminded event must thus be tied to the required interval between learning opportunities.

Y109937.indb 80

10/15/10 11:03:35 AM

The Causes and Consequences of Reminding • 81

Learning in A-B A-C Paired-Associate Tasks There is a large literature on the effects of cue repetition in the literature on proactive and retroactive interference in paired-associate learning (Keppel, 1968). That perspective is quite clear: Because increasing the number of associates to a common retrieval cue has negative effects on the retrieval of any given target item (see also Anderson, 1974), conditions under which cues are repeated should lead to poorer memory for the associated targets. For example, studying apple—mail followed by apple—house should make it more difficult to remember mail when presented with apple on a later test. In more abstract terms, memory for B given A following study of A-B A-C lists should be inferior than following study of A-B D-C lists. Because the C target does not share a common cue with B, learning the D-C list should be less interfering than learning the A-C list. In contrast to this prediction, experiments with this protocol often reveal a retroactive facilitation effect, in which A-B A-C lists lead to superior memory for B (Bruce & Weaver, 1973; Robbins & Bray, 1974). This phenomenon even appears in the learning of complex, realworld materials: Learners showed better memory for a text about Zen Buddhism when learning of a conflicting text on Buddhism was interpolated between study and test than when it was not (Ausubel, Stager, & Gaite, 1968). Educational programs that employ this concept have been similarly successful: Students showed better retention of materials from a seventh-grade science curriculum when they had a second year of science (in eighth grade), even when the specific content was not repeated (Arzi, Ben-Zvi, & Ganiel, 1985). These phenomena follow naturally from the idea of reminding. Because new information invites reminding of previously studied and related material, that reminding can indeed foster superior memory for the original information. Such a claim does not conflict with the idea that additional retroactive interference is also promoted by such encounters; whether the beneficial effects of reminding are sufficient to overcome the deleterious effects of interference is likely a complex function of the materials and scheduling of learning and testing events, as well as the goals of the learner. It is interesting to note that the very condition that enhances interference—namely, similarity—is also the factor that enhances the likelihood of reminding. This combination of effects makes good cognitive sense: When materials are similar, understanding them often requires generalization or contrast among the competing pieces of information, and reminding allows us to engage in such processes even when the components are temporally disparate.

Y109937.indb 81

10/15/10 11:03:35 AM

82 • Aaron S. Benjamin and Brian H. Ross

Memory for Lists of Semantic Associates A currently popular word list learning paradigm involves the study of long lists of associates to a common unstudied word. Following such study, incorrect recall and recognition of the common unstudied word are often apparent (Gallo, 2006; Roediger & McDermott, 1995). One proposed basis for this effect is that each associate in the list “activates” the common word somewhat, and that the net effect of these relations is to boost activation to a point where the rememberer is likely to attribute it to having been studied (Gallo & Roediger, 2002). This theory can be reframed in terms of reminding and, in doing so, can avoid the theoretical murkiness of what activation is or how it is monitored. Each item in the list has some probability of reminding the learner of the common associate, and the large number of items makes reminding highly likely. Monitoring is then an issue of source monitoring: If the learner can discriminate between retrieved (i.e., reminded) and seen words, then incorrect recall and recognition can be avoided. Consistent with this interpretation, manipulations that enhance memory for actually studied list members decrease false memory (Benjamin, 2001), as do manipulations that enhance the distinctiveness of such information (Dodson & Schacter, 2001). The only difference between reminding, as we have laid it out in previous sections, and reminding in such tasks concerns what is being reminded. In repetition, near-repetition, and paired-associate paradigms, that reminding is episodic: Single events that occurred previously in the experiment are retrieved. In semantic associate paradigms, the reminding is semantic: Individual concepts or words are retrieved from our knowledge, rather than from our memory for the experiment. A similar type of semantic reminding may be at work in the classic paradigm of Bower, Clark, Lesgold, and Winzenz (1969), in which the provision of organizational information at encoding led to superior memory for the material than when that information was absent or unorganized. In a second, less well-known experiment, Bower et al. provided organizational information in a later, separate list. That case also led to superior recall, and Bower et al. primarily attributed that enhancement to the benefits provided by an effective retrieval plan (see Benjamin, 2008). However, they also noted the possibility of a “mediational” interpretation, in which the presentation of the associated terms during the later list inspired covert rehearsal of members from the first list. This alternative interpretation is rooted in the concept of reminding, and illustrates the wide breadth of paradigms for which the idea might prove useful. In this case, a direct comparison of a retrieval

Y109937.indb 82

10/15/10 11:03:35 AM

The Causes and Consequences of Reminding • 83

organization explanation and a reminding explanation would require a condition in which associated, but not hierarchical, cues were presented during the second list.

The Purpose of Reminding We have outlined here principles that guide reminding, rules that determine the mnemonic consequences of reminding, and the higher-level cognitive capacities that reminding appears to support. If reminding is a basic building block of cognition, rather than just another boutique theoretical concept, we should be able to make sense out if its role in terms of general evolutionary principles of the mind. This short concluding section outlines our ideas about what that role is. We start from the perspective that the mind is constantly engaged in two parallel but somewhat conflicting goals. First, pattern matching mechanisms take the bewildering amount of input to our senses and reduce it to manageable proportions by relating that input to our prior knowledge. Our assessment (understanding) of a situation and the actions we take depend upon access to appropriate and goalrelevant information. Second, information is constantly scrutinized in order to tune those pattern matching mechanisms. Situations and objects that are commonly encountered are biased in favor of, and dimensions that identify critical differences are enhanced relative to ones that do not contribute to important distinctions. In order to extract such dimensions, common characteristics of related stimuli must be identified and distinctive characteristics of similar but critically different stimuli must be identified. Often, this occurs when the world copresents such stimuli. An art gallery that features modern art allows us to generalize across the dimensions that uniquely identify such art. Similarly, watching a baseball game attunes us to subtle but important differences between pitchers, such as their arm angles, windups, and pick-off motions, that we would not otherwise be likely to attend to. Reminding is what affords these opportunities at a distance. Even if we cannot visit a museum, occasional exposure to modern art—and the attendant reminding that ensues—allows us to compare art works directly (if imperfectly) even when our experiences occur at different times. Similarly, we can learn about baseball without watching a dozen pitchers at one time. Reminding allows the past to be part of the present, and so affords us the chance to generalize and contrast episodes that occur at different points in our day, or in our life.

Y109937.indb 83

10/15/10 11:03:35 AM

84 • Aaron S. Benjamin and Brian H. Ross

Remindings also afford opportunities to learn in unexpected ways. If we were to purge from our memory all characteristics of stimuli that we deemed irrelevant upon our encounter with that stimuli, we would be unable to test new hypotheses about the composition of categories or inferential rules for a given problem without revisiting those stimuli. Reminding is thus also a hedge against changing ideas and changing goals: Characteristics of potential romantic partners that are important to a college student might, for example, differ from ones important to a person seeking to have children in the near future. We cannot anticipate all our future goals, but retaining individual prior experiences provides the potential for our learning objectives to change in arbitrary and unanticipated ways. Remindings provide one means by which this potential can be realized.

References Allen, S. W., & Brooks, L. R. (1991). Specializing the operation of an explicit rule. Journal of Experimental Psychology: General, 120, 3–19. Anderson, J. R. (1974). Retrieval of propositional information from long-term memory. Cognitive Psychology, 5, 451–474. Arzi, H. J., Ben-Zvi, R., & Ganiel, U. (1985). Proactive and retroactive facilitation of long-term retention by curriculum continuity. American Educational Research Journal, 22, 369–388. Ausubel, D. P., Stager, M., & Gaite, A. J. H. (1968). Retroactive facilitation in meaningful verbal learning. Journal of Educational Psychology, 59, 250–255. Begg, I., & Green, C. (1988). Repetition and trace interaction: Superadditivity. Memory and Cognition, 16, 232–242. Benjamin, A. S. (2001). On the dual effects of repetition on false recognition. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27, 941–947. Benjamin, A. S., & Tullis, J. (2010). What makes distributed practice effective? Cognitive Psychology. Bjork, R. A. (1975). Retrieval as a memory modifier. In R. Solso (Ed.), Information processing and cognition: The Loyola Symposium (pp. 123– 144). Hillsdale, NJ: Lawrence Erlbaum Associates. Bower, G. H. (1972). Stimulus sampling theory of encoding variability. In A. W. Melton & E. Martin (Eds.), Coding processes in human memory. Washington, DC: V. H. Winston and Sons. Bower, G. H., Black, J. B., & Turner, T. F. (1979). Scripts in memory for text. Cognitive Psychology, 11, 177–220. Bower, G. H., Clark, M. C., Lesgold, A. M., & Winzenz, D. (1969). Hierarchical retrieval schemes in recall of categorized word lists. Journal of Verbal Learning and Verbal Behavior, 8, 323–343.

Y109937.indb 84

10/15/10 11:03:35 AM

The Causes and Consequences of Reminding • 85

Brooks, L. (1978). Nonanalytic concept formation and memory for instances. In E. Rosch & B. B. Lloyd (Eds.), Cognition and categorization (pp. 169– 211). Hillsdale, NJ: Erlbaum. Brooks, L. R., Norman, G. R., & Allen, S. W. (1991). Role of specific similarity in a medical diagnosis task. Journal of Experimental Psychology: General, 120, 278–287. Bruce, D., & Weaver, G. E. (1973). Retroactive facilitation in short-term retention of minimally learned paired associates. Journal of Experimental Psychology, 100, 9–17. Bruner, J. S., Goodnow, J. J., & Austin, G. A. (1956). A study of thinking. New York: John Wiley & Sons. Chi, M. T. H., Feltovich, P., & Glaser, R. (1981). Categorization and representation of physics problems by experts and novices. Cognitive Science, 5, 121–152. Dodson, C. S., & Schacter, D. L. (2001). “If I’d said it I would’ve remembered it”: Reducing false memories with a distinctiveness heuristic. Psychonomic Bulletin and Review, 8, 155–161. Ebbinghaus, H. (1962). Memory: A contribution to experimental psychology. New York: Dover. (Original work published 1885) Erickson, M. A., & Kruschke, J. K. (1998). Rules and exemplars in category learning. Journal of Experimental Psychology: General, 127, 107–140. Estes, W. K. (1955). Statistical theory of distributional phenomena in learning. Psychological Review, 62, 369–377. Finley, J. R., Benjamin, A. S., Hays, M. J., Bjork, R. A., & Kornell, N. (2010). The benefits of accumulating versus diminishing cues in recall. Manuscript under review. Gallo, D. A. (2006). Associative illusions of memory: False memory research in DRM and related tasks. New York: Psychology Press. Gallo, D. A., & Roediger, H. L., III. (2002). Variability among word lists in evoking memory illusions: Evidence for associative activation and monitoring. Journal of Memory and Language, 47, 469–497. Gardiner, J., Craik, F. I. M., & Bleasdale, F. (1973). Retrieval difficulty and subsequent recall. Memory and Cognition, 1, 213–216. Gentner, D., Rattermann, M. J., & Forbus, K. D. (1993). The roles of similarity in transfer: Separating retrievability from inferential soundness. Cognitive Psychology, 25, 524–575. Gick, M. L., & Holyoak, K. J. (1980). Analogical problem solving. Cognitive Psychology, 12, 306–355. Gick, M. L., & Holyoak, K. J. (1983). Schema induction and analogical transfer. Cognitive Psychology, 15, 1–38. Hintzman, D. L. (1974). Theoretical implications of the spacing effect. In R. L. Solso (Ed.), Theories in cognitive psychology: The Loyola Symposium (pp. 77–99). Potomac, MD: Erlbaum. Hintzman, D. L. (2004). Judgment of frequency vs. recognition confidence: Repetition and recursive reminding. Memory and Cognition, 32, 336–350.

Y109937.indb 85

10/15/10 11:03:35 AM

86 • Aaron S. Benjamin and Brian H. Ross

Hintzman, D. L. (2010). How does repetition affect memory? Evidence from judgments of recency. Memory and Cognition, 38, 102–115. Hintzman, D. L., Summers, J. J., & Block, R. A. (1975). Spacing judgments as an index of study-phase retrieval. Journal of Experimental Psychology: Human Learning and Memory, 1, 31–40. James, W. (1890). Principles of psychology. New York: Holt and Company. Karpicke, J. D., & Roediger, H. L. (2008). The critical importance of retrieval for learning. Science, 319, 966–968. Keppel, G. (1968). Retroactive and proactive inhibition. In T. R. Dixon & D. L. Horton (Eds.), Verbal behavior and general behavior theory (pp. 172–213). Englewood Cliffs, NJ: Prentice Hall. Kruschke, J. K. (1992). ALCOVE: An exemplar-based connectionist model of category learning. Psychological Review, 99, 22–44. Landauer, T. K., & Bjork, R. A. (1978). Optimum rehearsal patterns and name learning. In M. M. Gruneberg, P. E. Morris, & R. N. Sykes (Eds.), Practical aspects of memory (pp. 625–632). London: Academic Press. Love, B. C., Medin, D. L., & Gureckis, T. M. (2004). SUSTAIN: A network model of category learning. Psychological Review, 111, 309–332. Medin, D. L., & Schaffer, M. M. (1978). A context theory of classification learning. Psychological Review, 85, 207–238. Melton, A. W. (1970). The situation with respect to the spacing of repetitions and memory. Journal of Verbal Learning and Verbal Behavior, 9, 596–606. Murphy, G. L. (2002). The big book of concepts. Cambridge, MA: MIT Press. Newell, A., & Simon, H. A. (1972). Human problem solving. Englewood Cliffs, NJ: Prentice-Hall. Nosofsky, R. M. (1986). Attention, similarity, and the identification-categorization relationship. Journal of Experimental Psychology: General, 115, 39–57. Peterson, L. R., Wampler, R., Kirkpatrick, M., & Saltzman, D. (1963). Effect of spacing presentations on retention of a paired associate over short intervals. Journal of Experimental Psychology, 66, 206–209. Reeves, L. M., & Weisberg, R. W. (1994). The role of content and abstract information in analogical transfer. Psychological Bulletin, 115, 381–400. Robbins, D., & Bray, J. F. (1974). Repetition effects and retroactive facilitation: Immediate and delayed test performance. Bulletin of the Psychonomic Society, 3, 347–349. Roediger, H. L., & McDermott, K. B. (1995). Creating false memories: Remembering words not presented in lists. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 803–814. Rosch, E. (1975). Cognitive representations of semantic categories. Journal of Experimental Psychology: General, 104, 192–233. Rosch, E., & Mervis, C. B. (1975). Family resemblances: Studies in the internal structure of categories. Cognitive Psychology, 7, 573–605. Ross, B. H. (1984). Remindings and their effects in learning a cognitive skill. Cognitive Psychology, 16, 371–416.

Y109937.indb 86

10/15/10 11:03:35 AM

The Causes and Consequences of Reminding • 87

Ross, B. H., & Bradshaw, G. L. (1994). Encoding effects of remindings. Memory and Cognition, 22, 591–605. Ross, B. H., & Kennedy, P. T. (1990). Generalizing from the use of earlier examples in problem solving. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 42–55. Ross, B. H., & Landauer, T. K. (1978). Memory for at least one of two items: Test and failure of several theories of spacing effects. Journal of Verbal Learning and Verbal Behavior, 17, 669–680. Ross, B. H., & Makin, V. S. (1999). Prototype versus exemplar models. In R. J. Sternberg (Ed.), The nature of cognition (pp. 205–241). Cambridge, MA: MIT. Ross, B. H., Perkins, S. J., & Tenpenny, P. L. (1990). Reminding-based category learning. Cognitive Psychology, 22, 460–492. Rumelhart, D. E., & Ortony, A. (1977). The representation of knowledge in memory. In R. C. Anderson, R. J. Spiro, & W. E. Montague (Eds.), Schooling and the acquisition of knowledge (pp. 99–136). Hillsdale, NJ: Erlbaum. Schank, R. C. (1982). Dynamic memory. Cambridge, UK: Cambridge University Press. Schank, R. C., & Abelson, R. (1977). Scripts, plans, goals and understanding. Hillsdale, NJ: Erlbaum. Slamecka, N. J., & Graf, P. (1978). The generation effect: Delineation of a phenomenon. Journal of Experimental Psychology: Human Learning and Memory, 4, 592–604. Whitten, W. B., & Bjork, R. A. (1977). Learning from tests: The effects of spacing. Journal of Verbal Learning and Verbal Behavior, 16, 465–478. Woodworth, R. S. (1921). Psychology: A study of mental life. New York: H. Holt.

Y109937.indb 87

10/15/10 11:03:36 AM

Y109937.indb 88

10/15/10 11:03:36 AM

5

Retrieval-Induced Forgetting and the Resolution of Competition Benjamin C. Storm

It may seem that remembering and forgetting reflect two ends of a single continuum—that to remember is to avoid forgetting. Yet, in many instances, forgetting plays an essential and adaptive role in our ability to remember (e.g., E. L. Bjork & Bjork, 1988; R. A. Bjork & Bjork, 1992). Without some means of forgetting information that has become outdated or irrelevant, it would be incredibly difficult to remember information that is relevant. One mechanism that appears to afford this adaptive form of forgetting is inhibition (see, e.g., Anderson, 2003; E. L. Bjork, Bjork, & Anderson, 1998; R. A. Bjork, 1989). A given retrieval cue may activate many items in memory, and inhibition acts to decrease the accessibility of nontarget items in order to facilitate access to target items. This inhibition may explain why retrieving some items from memory causes the forgetting of other items in memory, a phenomenon known as retrieval-induced forgetting (Anderson, R. A. Bjork, & Bjork, 1994). It is argued herein that competition is a critical factor in the inhibitory account of retrieval-induced forgetting and, furthermore, that retrieval-induced forgetting is not as much a consequence of retrieval as it is a consequence of the inhibitory processes that resolve competition during retrieval.

89

Y109937.indb 89

10/15/10 11:03:36 AM

90 • Benjamin C. Storm

The Phenomenon of RetrievalInduced Forgetting Retrieval-induced forgetting is often studied using the retrieval practice paradigm (Anderson et al., 1994). In this paradigm, participants first study a list of category-exemplar pairs (e.g., fruit: lemon, profession: accountant, fruit: orange, profession: dentist), then, during retrieval practice, they retrieve half of the exemplars from half of the categories via category-plus-two-letter-stem retrieval cues (e.g., fruit: le_____). Practiced exemplars are referred to as Rp+ items (i.e., lemon), nonpracticed exemplars from practiced categories are referred to as Rp– items (i.e., orange), and exemplars from nonpracticed categories are referred to as Nrp items (i.e., accountant, dentist). After a brief delay, participants are given a final test for all of the items from the original study phase. As expected, Rp+ items are better recalled than are Nrp items. What is more surprising is that Rp– items are worse recalled than are Nrp items. It is this forgetting of Rp– items relative to Nrp items that is referred to as retrieval-induced forgetting—a finding that has proven to be highly robust and general, emerging in many contexts and with a wide variety of materials (see, e.g., Anderson & Bell, 2001; Bajo, Gómez-Ariza, Fernandez, & Marful, 2006; Carroll, Campbell-Ratcliffe, Murnane, & Perfect, 2007; Ciranni & Shimamura, 1999; Levy, McVeigh, Marful, & Anderson, 2007; Macrae & MacLeod, 1999; Migueles & Garcia-Bajos, 2007; Phenix & Campbell, 2004; Saunders & MacLeod, 2002; Shaw, Bjork, & Handal, 1995; Shivde & Anderson, 2001; Storm, Bjork, & Bjork, 2005). Theoretical Accounts of Retrieval-Induced Forgetting The inhibitory account of retrieval-induced forgetting contends that Rp– items are actively inhibited during the retrieval practice of Rp+ items (for reviews, see Anderson, 2003; Levy & Anderson, 2002). According to this account, forgetting is the consequence of inhibitory processes that act to resolve competition during retrieval. Because both Rp+ items and Rp– items are associated to the same retrieval cue, Rp– items may become activated during retrieval practice and compete with the retrieval of Rp+ items. Inhibition is recruited to suppress the Rp– items, which reduces competition, but also makes those items less recallable on the final test. The term inhibition has often been used to refer to any empirical demonstration of performance below baseline. In that sense inhibition is nothing more than a description of the data. It is important to emphasize that the term is used here in a much stronger sense. Specifically, we

Y109937.indb 90

10/15/10 11:03:36 AM

Retrieval-Induced Forgetting and the Resolution of Competition • 91

use the definition provided by R. A. Bjork (1989, p. 324), who referred to inhibition as a “suppression-type process directed at the to-be-inhibited information for some adaptive purpose.” In this sense, inhibition is an active and adaptive mechanism that functions with the specific and direct purpose of impairing the accessibility of an item or items in memory. Literally defined, the term retrieval-induced forgetting refers to the weaker meaning of inhibition—that retrieving an item or set of items from memory causes the forgetting of other items in memory, regardless of how or why that forgetting occurs. Unfortunately, the terms inhibition and retrieval-induced forgetting are often used interchangeably. Although suppression type inhibition may underlie many effects of retrieval-induced forgetting, it may not underlie all effects of retrieval-induced forgetting. In fact, there are many mechanisms by which retrieval can cause forgetting (e.g., retroactive interference, cue overload, part-set cuing, response competition, output interference, and strategy disruption; for a review of how these and other mechanisms might account for retrieval-induced forgetting, see Anderson & Bjork, 1994). A common theme among most noninhibitory accounts is that retrieval practice strengthens the association between Rp+ items and the associated retrieval cues, which has the side effect of occluding, interfering with, or stealing activation away from other items that are also associated with those cues (see, e.g., Anderson, 1983; McGeoch, 1942; Raaijmakers & Shiffrin, 1981; Rundus, 1973). Evidence for Inhibition There has been, and still is, a general reluctance to postulate a role for inhibition in memory (R. A. Bjork, 2007). Many researchers argue that retrieval-induced forgetting can be best explained by mechanisms other than inhibition (e.g., Dodd, Castel, & Roberts, 2006; MacLeod, Dodd, Sheard, Wilson, & Bibi, 2003; Williams & Zacks, 2001). A common argument against the inhibitory account is that forgetting may result from output interference (e.g., Roediger, 1973; A. D. Smith, 1971). That is, Rp– items may be impaired because Rp+ items are outputted first on the final test, which has the consequence of impairing the recall of as of yet nonoutputted Rp– items. However, by employing categoryplus-one-letter-stem retrieval cues (e.g., fruit: l_____), experimenters have been able to control the order in which items are retrieved on the final test, and retrieval-induced forgetting is observed even when output interference is controlled (e.g., Anderson & Bell, 2001; Anderson et al., 1994; Anderson, E. L. Bjork, & Bjork, 2000; Anderson & McCulloch, 1999; Bäuml & Hartinger, 2002; Storm, Bjork, & Bjork, 2007, 2008; Storm & Nestojko, 2010).

Y109937.indb 91

10/15/10 11:03:36 AM

92 • Benjamin C. Storm

Another possibility is that retrieval-induced forgetting is caused by strength-based associative interference (Anderson, 1983; McGeoch, 1942; Mensink & Raaijmakers, 1988). Retrieval practice strengthens a subset of items associated to a cue, which may have the side effect of making nonpracticed items less accessible given that cue. This blocking or interference-based account is not supported by evidence that retrieval-induced forgetting is cue independent (Anderson & Spellman, 1995). If retrieval-induced forgetting occurs because retrieval practice strengthens the association between the practiced item and its associated retrieval cue, then testing nonpracticed items using a novel retrieval cue—one that is independent from the cue used during retrieval practice—should prevent practiced items from interfering with the recall of nonpracticed items on the final test. Yet, retrieval-induced forgetting is observed even when independent cues are employed (e.g., Anderson & Bell, 2001; Anderson, Green, & McCulloch, 2000; Anderson & Spellman, 1995; Aslan, Bäuml, & Pastötter, 2007; Johnson & Anderson, 2004; Levy et al., 2007; MacLeod & Saunders, 2005; Radvansky, 1999; but see Camp, Pecher, & Schmidt, 2007; Perfect et al., 2004). Evidence against the interference account has also come from work showing that strengthening practiced items is neither sufficient nor necessary to cause retrieval-induced forgetting (e.g., Anderson et al., 2000; Bäuml, 2002; Ciranni & Shimamura, 1999; Román, Soriano, GómezAriza, & Bajo, 2009; Saunders, Fernandes, & Kosnes, 2009; Storm, Bjok, Bjork, & Nestojko, 2006; Storm & Nestojko, 2010; but see Verde, 2009). If retrieval-induced forgetting is the consequence of strength-based associative interference, then Rp– items should suffer from interference even if Rp+ items are strengthened by means other than selective retrieval. Yet, when participants are re-presented Rp+ items for additional study during what would normally be retrieval practice, the strengthening caused by that re-presentation fails to cause the forgetting of Rp– items on the final test. In general, retrieval-induced forgetting is strength independent—the extent to which Rp+ items are strengthened does not predict the magnitude of retrieval-induced forgetting that is observed. If inhibition does underlie retrieval-induced forgetting, then individuals who have an inhibitory deficit should demonstrate significantly less retrieval-induced forgetting than individuals who do not have an inhibitory deficit. However, such a correlation should only be observed for inhibitory-based effects of retrieval-induced forgetting. Participants with an inhibitory deficit may show normal or even exaggerated levels of retrieval-induced forgetting when forgetting is caused by noninhibitory mechanisms, such as output interference (cf. Anderson & Levy, 2007). For example, Storm and White (2010) administered the retrieval

Y109937.indb 92

10/15/10 11:03:37 AM

Retrieval-Induced Forgetting and the Resolution of Competition • 93

practice paradigm to college students diagnosed with attention-deficit hyperactivity disorder (ADHD), a disorder characterized by a deficit in inhibitory control. Individuals with ADHD failed to demonstrate any retrieval-induced forgetting when output interference was controlled, but demonstrated normal levels of retrieval-induced forgetting when output interference was not controlled (for similar results in schizophrenic patients, see Soriano, Jiménez, Román, & Bajo, 2009). The failure to control for output interference may explain why so many studies have observed normal levels of retrieval-induced forgetting in populations with established inhibitory deficits (e.g., Conway & Fthenaki, 2003; Ford, Keating, & Patel, 2004; Moulin et al., 2002; Nestor et al., 2005; Zellner & Bäuml, 2005). Evidence that individuals with inhibitory deficits demonstrate significantly less inhibitory-based retrieval-induced forgetting is consistent with the idea that executive control mechanisms underlie retrieval-induced forgetting (e.g., Anderson, 2003, 2005). Román et al. (2009) tested this idea further by having participants engage in one of two concurrent tasks during retrieval practice, either a trial-by-trial updating task or a continuous updating task. Both of these tasks were predicted to overload attentional resources and, therefore, impede the ability to inhibit nontarget competitors. Consistent with the executive control account, participants who engaged in a concurrent task demonstrated significantly less retrieval-induced forgetting than participants who did not engage in a concurrent task. Finally, neuroimaging work has also supported the inhibitory account (e.g., Johansson, Aslan, Bäuml, Gabel, & Mecklinger, 2007; Kuhl, Dudukovic, Kahn, & Wagner, 2007; Kuhl, Kahn, Dudukovic, & Wagner, 2008; Wimber et al., 2008; for a review, see Kuhl & Wagner, 2009). For example, Kuhl et al. (2007) measured neural activation in the prefrontal cortex across several blocks of retrieval practice. They reasoned that if inhibition functions to resolve competition, then retrieval on the final block of retrieval practice should be less competitive than retrieval on the first block of retrieval practice; and, owing to this reduction in competition, the neural regions responsible for detecting and resolving competition should be less engaged during the final block of retrieval practice than during the first block of retrieval practice. Kuhl et al. (2007) observed precisely this pattern of results. Prefrontal activity was reduced and, moreover, the extent to which activity was reduced correlated significantly with the amount of retrieval-induced forgetting that was observed on the final test. These and other results fit well with the inhibitory account and suggest that the prefrontal cortex plays a critical role in the inhibitory mechanisms that underlie retrievalinduced forgetting.

Y109937.indb 93

10/15/10 11:03:37 AM

94 • Benjamin C. Storm

Inhibitory Control and the Resolution of Competition Competition is a critical and defining feature of the inhibitory account of retrieval-induced forgetting. In many ways, evidence that retrieval-induced forgetting is competition dependent represents the most compelling line of evidence supporting the inhibitory account. As with cue independence and strength independence, competition dependence cannot be easily explained by noninhibitory interferencebased accounts. However, competition dependence also provides evidence supporting the adaptive function that inhibition is presumed to afford—namely, the resolution of competition. The fundamental premise of the inhibitory account is that there exists competition during retrieval, and inhibition is necessary to resolve that competition. It is competition that sets the stage for inhibition to occur. Nondesired items that are associated with the available set of retrieval cues must be suppressed, set aside, or inhibited, in order to facilitate the retrieval of the particular item or items that are desired. In what follows I review evidence that competition is a necessary condition for retrieval-induced forgetting and that inhibition functions to resolve that competition. Competition as a Necessary Condition for Retrieval-Induced Forgetting According to the inhibitory account of retrieval-induced forgetting, inhibition is elicited to suppress nontarget items that compete with the retrieval of target items. This account predicts that if an item does not compete with retrieval practice, then that item should not be inhibited and, therefore, that that item should not suffer from retrieval-induced forgetting. The few studies that have manipulated competition during retrieval practice have generally supported this prediction, as it is the items that compete most that are the most susceptible to forgetting (e.g., Anderson et al., 1994; Shivde & Anderson, 2001; Storm et al., 2007; however, see also Williams & Zacks, 2001). Anderson et al. (1994) provided the first and most widely cited example of competition dependence. In their study, the taxonomic strength of both practiced and nonpracticed items was manipulated. Whereas nonpracticed items of high taxonomic strength (e.g., orange, lemon) suffered from substantial effects of retrieval-induced forgetting, nonpracticed items of low taxonomic strength (e.g., fig, guava) suffered from significantly less retrieval-induced forgetting. Similar results were observed by Shivde and Anderson (2001), who found that retrieval practice of the subordinate meaning of a homograph caused the forgetting

Y109937.indb 94

10/15/10 11:03:37 AM

Retrieval-Induced Forgetting and the Resolution of Competition • 95

of the dominant meaning, whereas retrieval practice of the dominant meaning did not cause the forgetting of the subordinate meaning. In both experiments, it was the items that were most likely to intrude or compete with retrieval practice that were the most susceptible to retrieval-induced forgetting. Because weak exemplars and subordinate meanings were unlikely to become activated during retrieval practice, there was no need for them to be inhibited and, therefore, no reason for them to suffer retrieval-induced forgetting. Evidence of competition dependence is problematic for interference-based accounts. If retrievalinduced forgetting was the consequence of interference or blocking at the time of test, then the items strengthened by retrieval practice should have interfered with the recall of weaker items to at least the same extent as they interfered with the recall of stronger items (e.g., Raaijmakers & Shiffrin, 1981). Storm et al. (2007) reasoned that if retrieval-induced forgetting is competition dependent, then leading participants to believe that they would not be tested on certain information would ironically spare that information from forgetting. Research on directed forgetting has shown that cuing participants to forget an initially presented list of items reduces the extent to which those items proactively interfere with the learning and recall of a new list of items (Bjork, 1970; MacLeod, 1998). If proactive interference is reduced by instructions to forget, then items from to-be-forgotten lists should be less likely to interfere with subsequent retrieval practice and, therefore, less likely to be targeted by inhibition. Consistent with this prediction, Storm et al. (2007) found that items from to-be-remembered lists suffered from a substantial effect of retrieval-induced forgetting, whereas items from to-be-forgotten lists did not suffer from any retrieval-induced forgetting. An important aspect of Storm et al.’s (2007) study is that the degree of competition that occurred during retrieval practice was manipulated without using different sets of items. The same items were shown to suffer or not suffer from retrieval-induced forgetting, depending on whether an instruction to remember or forget was given prior to retrieval practice. This distinction is important. It suggests that items that are not susceptible to retrieval-induced forgetting under some conditions may become susceptible under other conditions. For example, although weak items may not compete during retrieval practice under normal conditions, providing sufficient exposure to such items may very well cause them to compete during retrieval practice and therefore be targeted by inhibitory control. Indirect evidence of competition dependence can be seen throughout research on retrieval-induced forgetting. Whether forgetting is or

Y109937.indb 95

10/15/10 11:03:37 AM

96 • Benjamin C. Storm

is not observed in a given study can often be attributed to whether there was competition during retrieval practice. In fact, many of the boundary conditions that constrain retrieval-induced forgetting are directly related to competition. Take, for example, the demonstration that the retrieval of one item fails to cause the forgetting of another item if those two items are well integrated—either due to encoding instructions or to the nature of the materials (e.g., Anderson et al., 2000; Anderson & McCulloch, 1999; Bäuml & Hartinger, 2002; Chan, McDermott, & Roediger, 2006; Migueles & Garcia-Bajos, 2007). Integration has been shown to reduce competition between items associated to the same retrieval cue (e.g., Postman, 1971; Radvansky & Zacks, 1991), and in doing so, integration may effectively reduce or eliminate the need for inhibition. Another factor that has been argued to allay competition is item-specific or distinctive processing (R. E. Smith & Hunt, 2000), which may explain why retrieval-induced forgetting fails to emerge under conditions that promote such processing, such as when one is under stress (Koessler, Engler, Riether, & Kissler, 2009) or in a negative mood (Bäuml & Kuhbandner, 2007). Finally, it is important to emphasize that only inhibitory-based effects of retrieval-induced forgetting should be competition dependent. Take, for example, the failure of Williams and Zacks (2001) to replicate Anderson et al.’s (1994) finding that exemplars of strong taxonomic strength are more susceptible to retrieval-induced forgetting than exemplars of weak taxonomic strength. As with many studies of retrieval-induced forgetting, Williams and Zacks employed a categorycued final test that failed to control for output interference. Whereas the final recall for items of strong taxonomic strength may have been impaired as a consequence of inhibition during retrieval practice, the final recall for items of weak taxonomic strength may have been impaired as a consequence of output interference on the final test. Unsuccessful Retrieval, Successful Forgetting If retrieval-induced forgetting is caused by inhibitory processes that act to resolve competition, then whether retrieval eventually succeeds or fails should not determine whether retrieval-induced forgetting occurs. And, consistent with this assertion, retrieval-induced forgetting is observed even when participants fail to retrieve anything during retrieval practice (Storm, Bjork, Bjork, & Nestojko, 2006; Storm & Nestojko, 2010). Storm and colleagues had participants study a list of category-exemplar pairs and then engage in retrieval practice that consisted of categoryplus-stem cues that either did or did not represent the initial letters of any associated exemplar. This manipulation effectively dictated whether

Y109937.indb 96

10/15/10 11:03:37 AM

Retrieval-Induced Forgetting and the Resolution of Competition • 97

retrieval practice could or could not be successful. Retrieval-induced forgetting was observed in both cases, and importantly, the size of the effect did not differ for exemplars associated with categories that had received possible retrieval practice versus exemplars associated with categories that had received impossible retrieval practice. The above pattern of results is very difficult for noninhibitory accounts to explain. If nothing is retrieved during retrieval practice, then nothing is strengthened and nothing should interfere with the retrieval of nonpracticed items on the final test. According to the inhibitory account, however, inhibition acts to resolve competition, and the consequences of that inhibition should be observed even if retrieval is ultimately unsuccessful. Some might argue that something is still being strengthened by impossible retrieval practice. For example, participants may fail to come up with a viable response, but they may still be coming up with a response, and, even if not viable, that response may interfere on the final test. Two considerations make this possibility unlikely. First, in none of the five experiments that employed impossible retrieval practice did participants who made more responses— regardless of the appropriateness of their responses—demonstrate more retrieval-induced forgetting than participants who made fewer responses. Second, if any items were activated or covertly retrieved during impossible retrieval practice, it would likely have been the items that participants had just previously studied. Thus, nonpracticed items would seem more likely to benefit from impossible retrieval practice than be impaired by it. Researchers have generally assumed that retrieval-induced forgetting is retrieval specific. That unsuccessful retrieval also causes forgetting supports and refines this assumption. A more accurate characterization of inhibition-based retrieval-induced forgetting is that it is competition specific. It is the competition that arises during retrieval that sets the stage for inhibition to occur, not the retrieval per se. It is ironic that researchers have often been so careful to design studies in such a way that fosters high levels of retrieval practice success. The irony is that by making retrieval practice easier, researchers may have unwittingly made retrieval practice less competitive and, therefore, less likely to result in inhibitory-based retrieval-induced forgetting. Overcoming Competition During Semantic Generation and Mental Imagery In most studies of retrieval-induced forgetting, retrieval practice is episodic in nature; that is, participants must retrieve specific items from an earlier phase of the experiment (i.e., the study phase). However, inhibition may be recruited to resolve competition in many situations where

Y109937.indb 97

10/15/10 11:03:37 AM

98 • Benjamin C. Storm

one must bypass inappropriate responses in order to select, retrieve, or generate a weaker, yet desirable, response. For example, if participants are guided to selectively generate items from semantic memory during retrieval practice, that semantic generation leads to just as much forgetting as does the typical episodic-based retrieval practice (e.g., Bäuml, 2002; Johnson & Anderson, 2004; Storm et al., 2006, 2007, 2008; Storm & Nestojko, 2010). A recent example of a nonretrieval task leading to an effect similar to that of retrieval-induced forgetting is reported by Saunders et al. (2009). During what would normally be retrieval practice, Saunders et al. presented intact category-exemplar pairs from a subset of categories and asked participants to generate mental images of those items. Across several trials, participants generated images focusing on different aspects of each item (e.g., shape, color, size, sound, and texture). This imagery task led to an exceptionally large effect of imageryinduced forgetting. Nonvisualized items from visualized categories were recalled significantly less than were nonvisualized items from nonvisualized categories. Perhaps most impressive was the magnitude of the effect. Normally, when category-plus-stem cued-recall tests are employed, retrieval-induced forgetting effects do not surpass 15%, yet in their Experiment 2, which also employed a category-plus-stem cuedrecall test, a 33% effect was observed. Saunders et al. (2009) argued that the imagery-induced forgetting effect was a consequence of inhibition. Generating mental imagery requires access to semantic knowledge, and as reviewed above, retrieval from semantic memory can cause retrieval-induced forgetting. One possible explanation for the impressive size of their effect may lie in the nature of the imagery task. Increasing the number of retrieval practice trials has been shown to increase the magnitude of retrieval-induced forgetting (e.g., Johnson & Anderson, 2004; Levy et al., 2007; Storm et al., 2008), and this increase may be amplified when the nature or target of retrieval/generation varies across each trial. Normally, when retrieval practice is repeated, the task becomes progressively easier (noncompetitive) as Rp+ items become more accessible owing to the benefits of retrieval practice. Forcing participants to generate imagery related to different aspects of each item may have increased competition during later practice trials, thereby increasing the need for inhibition. Overcoming Fixation in Creative Problem Solving Inhibition is generally assumed to stifle creativity, an assumption stemming from observations that individuals who are the least capable of inhibiting their thoughts and actions are often the most creative (e.g.,

Y109937.indb 98

10/15/10 11:03:37 AM

Retrieval-Induced Forgetting and the Resolution of Competition • 99

Carson, Peterson, & Higgins, 2003; Eysenck, 1995; Martindale, 1999). However, there are conditions under which inhibition may have the power to enhance creative cognition. The difficulty in many creative tasks lies in the constraining influences of old and inappropriate ideas. Such ideas can cause mental fixation, thus impeding the generation or retrieval of new and creative ideas (see S. M. Smith, 2003). Inhibition may facilitate creative problem solving by decreasing the accessibility of strong, yet inappropriate solutions, thereby facilitating access to novel and creative solutions. In other words, inhibition may provide problem solvers a means by which to overcome fixation and achieve a creative solution. Storm and Angello (in press) tested this idea by measuring retrievalinduced forgetting and correlating that measure with performance on a task commonly used to study creative problem solving, the Remote Associates Test (RAT; Mednick, 1962). To solve a given RAT problem, participants must generate a common associate to each of three cue words (e.g., manners, tennis, and round: solution is table), which can be difficult because the strongest associates to each cue word (e.g., polite, ball, and square, respectively) often bear little or no relationship to the other cue words. Once activated, however, these inappropriate associates can cause mental fixation, thereby interfering with the generation of novel and appropriate associates (S. M. Smith & Blankenship, 1991; Wiley, 1998). Storm and Angello manipulated the extent to which each participant experienced mental fixation by exposing half of the participants to misleading associates prior to solving a series of RAT problems. They reasoned that if the mechanism underlying retrievalinduced forgetting functions to resolve competition, then individuals who demonstrate more retrieval-induced forgetting should also demonstrate a superior ability to overcome competition created by exposure to the misleading associates. Overall, participants who were exposed to the misleading associates performed worse than participants who were not exposed to the misleading associates. However, the degree to which participants suffered from this exposure was moderated by individual differences in retrieval-induced forgetting. Participants who demonstrated the most retrieval-induced forgetting were less affected by exposure than were participants who demonstrated the least retrieval-induced forgetting. Said differently, the more a participant demonstrated retrieval-induced forgetting, the less that participant suffered from fixation during RAT problem solving. This effect became more pronounced as participants continued to try to solve the problems. After 18 minutes of attempted problem solving, participants who demonstrated the least retrieval-

Y109937.indb 99

10/15/10 11:03:38 AM

100 • Benjamin C. Storm

induced forgetting suffered from a 21% fixation effect, whereas participants who demonstrated the most retrieval-induced forgetting suffered from only a 2% fixation effect. These results provide a stunning demonstration of how the mechanisms underlying retrieval-induced forgetting function to resolve competition—not only during retrieval, but in the context of creative problem solving as well. They also provide a new type of evidence for the inhibitory account. If retrieval-induced forgetting was caused by interference, then individuals who demonstrated more retrieval-induced forgetting should have suffered from more interference, not less interference, while solving the RAT problems. Only the inhibitory account predicts that individuals who suffer from more retrieval-induced forgetting should be better able to overcome fixation.

Conclusion Inhibitory-based effects of retrieval-induced forgetting are competition dependent, meaning that nonpracticed items only suffer from retrieval-induced forgetting to the extent that they compete with retrieval practice. In this sense, retrieval-induced forgetting is not a by-product of retrieval; it is the consequence of adaptive inhibitory processes that act to resolve competition during retrieval. This inhibition is believed to reflect executive control mechanisms that provide flexible control over memory by resolving competition in whatever form it is encountered, whether it is during episodic retrieval, semantic generation, or creative problem solving. Even competition that arises during unsuccessful retrieval attempts is sufficient to elicit the inhibition of competing items. These findings provide compelling evidence for the inhibitory account of retrieval-induced forgetting and, more generally, demonstrate the important role that inhibition plays in resolving competition in memory.

References Anderson, J. R. (1983). The architecture of cognition. Cambridge, MA: Harvard University Press. Anderson, M. C. (2003). Rethinking interference theory: Executive control and the mechanisms of forgetting. Journal of Memory and Language, 49, 415–445. Anderson, M. C. (2005). The role of inhibitory control in forgetting unwanted memories: A consideration of three methods. In C. MacLeod & B. Uttl (Eds.), Dynamic cognitive processes (pp. 159–190). Tokyo: Springer-Verlag.

Y109937.indb 100

10/15/10 11:03:38 AM

Retrieval-Induced Forgetting and the Resolution of Competition • 101

Anderson, M. C., & Bell, T. (2001). Forgetting our facts: The role of inhibitory processes in the loss of propositional knowledge. Journal of Experimental Psychology, General, 130, 544–570. Anderson, M. C., Bjork, E. L., & Bjork, R. A. (2000). Retrieval-induced forgetting: Evidence for a recall-specific mechanism. Psychonomic Bulletin and Review, 7, 522–530. Anderson, M. C., & Bjork, R. A. (1994). Mechanisms of inhibition in longterm memory: A new taxonomy. In D. Dagenbach & T. Carr (Eds.), Inhibitory processes in attention, memory and language (pp. 265–326). New York: Academic Press. Anderson, M. C., Bjork, R. A., & Bjork, E. L. (1994). Remembering can cause forgetting: Retrieval dynamics in long-term memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 1063–1087. Anderson, M. C., Green, C., & McCulloch, K. C. (2000). Similarity and inhibition in long-term memory: Evidence for a two-factor model. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25, 1141–1159. Anderson, M. C., & Levy, B. J. (2007). Theoretical issues in inhibition: Insights from research on human memory. In D. Gorfein & C. MacLeod (Eds.), Inhibition in cognition (pp. 81–102). Washington, DC: American Psychological Association. Anderson, M. C., & McCulloch, K. C. (1999). Integration as a general boundary condition on retrieval-induced forgetting. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25, 608–629. Anderson, M. C., & Spellman, B. A. (1995). On the status of inhibitory mechanisms in cognition: Memory retrieval as a model case. Psychological Review, 102, 68–100. Aslan, A., Bäuml, K.-H., & Pastötter, B. (2007). No inhibitory deficit in older adults’ episodic memory. Psychological Science, 18, 72–28. Bajo, M. T., Gómez-Ariza, C. J., Fernandez, A., & Marful, A. (2006). Retrievalinduced forgetting in perceptually driven memory tests. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32, 1185–1194. Bäuml, K.-H. (2002). Semantic generation can cause episodic forgetting. Psychological Science, 13, 356–360. Bäuml, K.-H., & Hartinger A. (2002). On the role of item similarity in retrievalinduced forgetting. Memory, 10, 215–224. Bäuml, K.-H., & Kuhbandner, C. (2007). Remembering can cause forgetting— but not in negative moods. Psychological Science, 28, 111–115. Bjork, E. L., & Bjork, R. A. (1988). On the adaptive aspects of retrieval failure in autobiographical memory. In M. M. Gruneberg, P. E. Morris, & R. N. Sykes (Eds.), Practical aspects of memory: Current research and issues: Memories in everyday life (Vol. 1, pp. 283–288). London: Wiley. Bjork, E. L., Bjork, R. A., & Anderson, M. C. (1998). Varieties of goal directed forgetting. In J. M. Golding & C. M. MacLeod (Eds.), Intentional forgetting: Interdisciplinary approaches (pp. 103–137). Hillsdale, NJ: Erlbaum.

Y109937.indb 101

10/15/10 11:03:38 AM

102 • Benjamin C. Storm

Bjork, R. A. (1970). Positive forgetting: The noninterference of items intentionally forgotten. Journal of Verbal Learning and Verbal Behavior, 9, 255–268. Bjork, R. A. (1989). Retrieval inhibition as an adaptive mechanism in human memory. In H. L. Roediger & F. I. M. Craik (Eds.), Varieties of memory and consciousness: Essays in honour of Endel Tulving (pp. 309–330). Hillsdale, NJ: Erlbaum. Bjork, R. A. (2007). Inhibition: An essential and contentious concept. In H. L. Roediger, Y. Dudai, & S. M. Fitzpatrick (Eds.), Science of memory: Concepts (pp. 307–313). Oxford, UK: Oxford University Press. Bjork, R. A., & Bjork, E. L. (1992). A new theory of disuse and an old theory of stimulus fluctuation. In A. Healy, S. Kosslyn, & R. Shiffrin (Eds.), From learning processes to cognitive processes: Essays in honor of William K. Estes (Vol. 2, pp. 35–67). Hillsdale, NJ: Lawrence Erlbaum Associates. Camp, G., Pecher, D., & Schmidt, H. G. (2007). No retrieval-induced forgetting using item-specific independent cues: Evidence against a general inhibitory account. Journal of Experimental Psychology: Learning, Memory, and Cognition, 33, 950–958. Carroll, M., Campbell-Ratcliffe, J., Murnane, H., & Perfect, T. (2007). Retrievalinduced forgetting in educational contexts: Monitoring, expertise, text integration, and test format. European Journal of Cognitive Psychology, 19, 580–606. Carson, S. H., Peterson, J. B., & Higgins, D. M. (2003). Decreased latent inhibition is associated with increased creative achievement in high-functioning individuals. Journal of Personality and Social Psychology, 85, 499–506. Chan, J. C. K., McDermott, K. B., & Roediger, H. L. (2006). Retrieval-induced facilitation: Initially nontested materials can benefit from prior testing of related material. Journal of Experimental Psychology: General, 135, 553–571. Ciranni, M., & Shimamura, A. P. (1999). Retrieval-induced forgetting in episodic memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25, 1403–1414. Conway, M. A., & Fthenaki, A. (2003). Disruption of inhibitory control of memory following lesions to the frontal and temporal lobes. Cortex, 39, 667–686. Dodd, M. D., Castel, A. D., & Roberts, K. E. (2006). A strategy disruption component to retrieval-induced forgetting. Memory and Cognition, 34, 102–111. Eysenck, H. (1995). Genius: The natural history of creativity. Cambridge, UK: Cambridge University Press. Ford, R. M., Keating, S., & Patel, R. (2004). Retrieval-induced forgetting: A developmental study. British Journal of Developmental Psychology, 22, 585–603.

Y109937.indb 102

10/15/10 11:03:38 AM

Retrieval-Induced Forgetting and the Resolution of Competition • 103

Johansson, M., Aslan, A., Bäuml, K.-H., Gäbel, A., & Mecklinger, A. (2007). When remembering causes forgetting: Electrophysiological correlates of retrieval-induced forgetting. Cerebral Cortex, 17, 548–560. Johnson, S. K., & Anderson, M. C. (2004). The role of inhibitory control in forgetting semantic knowledge. Psychological Science, 15, 448–453. Koessler, S., Engler, H., Riether C., & Kissler, J. (2009). No retrieval-induced forgetting under stress. Psychological Science, 20, 1356–1363. Kuhl, B. A., Dudukovic, N. M., Kahn, I., & Wagner, A. (2007). Decreased demands on cognitive control reveal the neural processing benefits of forgetting. Nature Neuroscience, 10, 908–914. Kuhl, B. A., Kahn, I., Dudukovic, N. M., & Wagner, A. D. (2008). Overcoming suppression in order to remember: Contributions from anterior cingulate and ventrolateral prefrontal cortex. Cognitive, Affective, and Behavioral Neuroscience, 8, 211–221. Kuhl, B. A., & Wagner, A. D. (2009). Forgetting and retrieval. In G. G. Berntson & J. T. Cacioppo (Eds.), Handbook of neurosciences for the behavioral sciences. Hoboken, NJ: John Wiley & Sons. Levy, B. J., & Anderson, M. C. (2002). Inhibitory processes and the control of memory retrieval. Trends in Cognitive Science, 6, 299–305. Levy, B. J., McVeigh, N. D., Marful, A., & Anderson, M. C. (2007). Inhibiting your native language: The role of retrieval-induced forgetting during second language acquisition. Psychological Science, 18, 29–34. MacLeod, C. M. (1998). Directed forgetting. In J. M. Golding & C. M. MacLeod (Eds.), Intentional forgetting: Interdisciplinary approaches (pp. 1–57). Mahwah, NJ: Lawrence Erlbaum Associates. MacLeod, C. M., Dodd, M. D., Sheard, E. D., Wilson, D. E., & Bibi, U. (2003). In opposition to inhibition. In B. H. Ross (Ed.), The psychology of learning and motivation (Vol. 43, pp. 163–214). San Diego, CA: Academic Press. MacLeod, M. D., & Saunders, J. (2005). The role of inhibitory control in the production of misinformation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 964–979. Macrae, C. N., & MacLeod, M. D. (1999). On recollections lost: When practice makes imperfect. Journal of Personality and Social Psychology, 77, 463–473. Martindale, C. (1999). Biological bases of creativity. In R. J. Sternberg (Ed.), Handbook of creativity (pp. 137–152). New York: Cambridge University Press. McGeoch, J. A. (1942). The psychology of human learning. New York: Longmans & Green. Mednick, S. A. (1962). The associative basis of the creative process. Psychological Review, 69, 220–232. Mensink, G. J. M., & Raaijmakers, J. G. W. (1988). A model of interference and forgetting. Psychological Review, 95, 434–455. Migueles, E., & Garcia-Bajos, E. (2007). Selective retrieval and induced forgetting in eyewitness memory. Applied Cognitive Psychology, 21, 1157–1172.

Y109937.indb 103

10/15/10 11:03:38 AM

104 • Benjamin C. Storm

Moulin, C. J. A., Perfect, T. J., Conway, M. A., North, A. S., Jones, R. W., & James, N. (2002). Retrieval-induced forgetting in Alzheimer’s disease. Neuropsychologia, 40, 862–867. Nestor, P. G., Piech, R., Allen, C., Niznikiewicz, M., Shenton, M., & McCarley R. W. (2005). Retrieval-induced forgetting in schizophrenia. Schizophrenia Research, 75, 199–209. Perfect, T. J., Stark, L.-J., Tree, J. J., Moulin, C. J. A., Ahmed, L., & Hutter, R. (2004). Transfer appropriate forgetting: The cue-dependent nature of retrievalinduced forgetting. Journal of Memory and Language, 51, 399–417. Phenix, T. L., & Campbell, J. I. D. (2004). Effects of multiplication practice on product verification: Integrated structures model or retrieval-induced forgetting? Memory and Cognition, 32, 324–335. Postman, L. (1971). Transfer, interference and forgetting. In J. W. Kling & L. A. Riggs (Eds.), Woodworth and Scholsberg’s experimental psychology (3rd ed., pp. 1019–1132). New York: Holt, Rinehart & Winston. Raaijmakers, J. G. W., & Shiffrin, R. M. (1981). Search of associative memory. Psychological Review, 88, 93–134. Radvansky, G. A. (1999). Memory retrieval and suppression: The inhibition of situation models. Journal of Experimental Psychology: General, 128, 563–579. Radvansky, G. A., & Zacks, R. T. (1991). Mental models and the fan effect. Journal of Experimental Psychology: Learning, Memory, and Cognition, 17, 940–953. Roediger, H. L. (1973). Inhibition in recall from cueing with recall targets. Journal of Verbal Learning and Verbal Behavior, 12, 644–657. Román, P., Soriano, M. F., Gómez-Ariza, C. J., & Bajo, M. T. (2009). Retrievalinduced forgetting and executive control. Psychological Science, 20, 1053–1058. Rundus, D. (1973). Negative effects of using list items as recall cues. Journal of Verbal Learning and Verbal Behavior, 12, 43–53. Saunders, J., Fernandes, M., & Kosnes, L. (2009). Retrieval-induced forgetting and mental imagery. Memory and Cognition, 37, 819–828. Saunders, J., & MacLeod, M. D. (2002). New evidence on the suggestibility of memory: The role of retrieval-induced forgetting in misinformation effects. Journal of Experimental Psychology: Applied, 8, 127–142. Shaw, J. S., Bjork, R. A., & Handal, A. (1995). Retrieval-induced forgetting in an eyewitness-memory paradigm. Psychonomic Bulletin and Review, 2, 249–253. Shivde, G., & Anderson, M. C. (2001). The role of inhibition in meaning selection: Insights from retrieval-induced forgetting. In D. S. Gorfein (Ed.), On the consequences of meaning selection: Perspectives on resolving lexical ambiguity (pp. 175–190). Washington, DC: American Psychological Association. Smith, A. D. (1971). Output interference and organized recall from long-term memory. Journal of Verbal Learning and Verbal Behavior, 10, 400–408. Smith, R. E., & Hunt, R. R. (2000). The influence of distinctive processing on retrieval-induced forgetting. Memory and Cognition, 28, 503–508.

Y109937.indb 104

10/15/10 11:03:38 AM

Retrieval-Induced Forgetting and the Resolution of Competition • 105

Smith, S. M. (2003). The constraining effects of initial ideas. In P. Paulus & B. Nijstad (Eds.), Group creativity: Innovation through collaboration (pp. 15–31). Oxford, UK: Oxford University Press. Smith, S. M., & Blankenship, S. E. (1991). Incubation and the persistence of fixation in problem solving. American Journal of Psychology, 104, 61–87. Soriano, M. F., Jiménez, J. F., Román P., & Bajo, M. T. (2009). Inhibitory processes in memory are impaired in schizophrenia: Evidence from retrieval-induced forgetting. British Journal of Psychology, 100, 661–673. Storm, B. C., & Angello, G. (In press). Overcoming fixation: Creative problem solving and retrieval-induced forgetting. Psychological Science. Storm, B. C., Bjork, E. L., & Bjork, R. A. (2005). Social metacognitive judgments: The role of retrieval-induced forgetting in person memory and impressions. Journal of Memory and Language, 52, 535–550. Storm, B. C., Bjork, E. L., & Bjork, R. A. (2007). When intended remembering leads to unintended forgetting. Quarterly Journal of Experimental Psychology, 60, 909–915. Storm, B. C., Bjork, E. L., & Bjork, R. A. (2008). Accelerated relearning after retrieval-induced forgetting: The benefit of being forgotten. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34, 230–236. Storm, B. C., Bjork, E. L., Bjork, R. A., & Nestojko, J. F. (2006). Is retrieval success a necessary condition for retrieval-induced forgetting? Psychonomic Bulletin and Review, 13, 1023–1027. Storm, B. C., & Nestojko, J. F. (2010). Successful inhibition, unsuccessful retrieval: Manipulating time and success during retrieval practice. Memory, 18, 99–114. Storm, B. C., & White, H. A. (2010). ADHD and retrieval-induced forgetting: Evidence for a deficit in the inhibitory control of memory. Memory, 18, 265–271. Verde, M. F. (2009). The list-strength effect in recall: Relative-strength competition and retrieval inhibition may both contribute to forgetting. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10, 205–220. Wiley, J. (1998). Expertise and mental set: The effects of domain knowledge in creative problem solving. Memory and Cognition, 26, 716–730. Williams, C. C., & Zacks, R. T. (2001). Is retrieval-induced forgetting an inhibitory process? American Journal of Psychology, 114, 329–354. Wimber, M., Bäuml, K.-H., Bergström, Z., Markopoulos, G., Heinze, H.-J., & Richardson-Klavehn, A. (2008). Neural markers of inhibition in human memory retrieval. Journal of Neuroscience, 28, 13419–13427. Zellner, M., & Bäuml, K.-H. (2005). Intact retrieval inhibition in children’s episodic recall. Memory and Cognition, 33, 396–404.

Y109937.indb 105

10/15/10 11:03:38 AM

Y109937.indb 106

10/15/10 11:03:38 AM

6

On the Relationship Between Interference and Inhibition in Cognition Michael C. Anderson and Benjamin J. Levy

There is a moment that each of us knows all too well. This moment arises when, by chance, we encounter something that reminds us of an experience we would rather not think about. For some, this might occur when opening a drawer to see an object given to us by a loved one that has been lost due to death or a broken relationship. For others—such as for veterans returning from combat—existence may be an ongoing battle against their own memories, and against intrusive recollections of wartime experiences. Whatever the trigger, this moment elicits certain key experiences: an abrupt feeling of alarm and arousal, followed quickly by an effort to shut down retrieval, regain footing in our mental landscape, and redirect attention to more profitable goals. Even neutral memories can be too accessible; anyone who has ever gone to yesterday’s parking spot, used an outdated bank code, or called out to one’s previous spouse instead of one’s current one knows well the trouble such interloping knowledge can induce. Thus, one of the most innocent assumptions one might make about memory—that remembering is good and forgetting is bad—is, in fact, often untrue. More often than we realize, forgetting is our goal, and remembering is the human frailty. Given that forgetting can at times be a positive goal, it becomes important to address how we accomplish it, when successful. Although 107

Y109937.indb 107

10/15/10 11:03:39 AM

108 • Michael C. Anderson and Benjamin J. Levy

there are likely to be a variety of ways people limit the accessibility of unwanted memories, there has been considerable emphasis on the role of inhibition in accomplishing this function. Inhibition refers to a hypothesized control mechanism that reduces the level of activation associated with a trace, rendering it less accessible to ongoing cognition. The existence of inhibitory processes has been hypothesized in many different domains of human cognition, including attention, memory, language, and motor action (see Dagenbach & Carr, 1994; Dempster & Brainerd, 1995; Gorfein & MacLeod, 2007). In memory research, the inhibition view hypothesizes that when the activation associated with a trace is disruptive, inhibition may be engaged to reduce this unwanted accessibility. The persistence of this inhibition is thought to induce forgetting. Thus, inhibition may be one instrumental process in achieving adaptive or functional forgetting. In this chapter, we consider inhibition as a response to unwanted accessibility. First, we review the evidence for a relationship between the degree of interference caused by a trace and inhibitory aftereffects observed on later retention tests for that trace. This evidence is highly consistent with the idea that inhibitory mechanisms are triggered in response to intrusive memories that might otherwise disrupt our current goals. As such, inhibition appears to be a key mechanism for achieving functional forgetting. We then consider the theoretical relationship between interference and inhibition, and what this relationship predicts about the expected aftereffects of inhibition. We introduce the concept of a demand-success trade-off, a crucial factor complicating the measurement of inhibition. Demand-success trade-offs have important consequences for testing theoretical models of inhibition as well as theories that posit inhibitory deficits in different populations. These trade-offs are not unique to memory research, but affect any domain concerned with inhibition. By drawing attention to this issue, we hope to prevent confusion and theoretical controversy in the literature on inhibition, and improve the assessment of inhibitory deficits in special populations.

Evidence for a Role of Inhibition in Countering Unwanted Accessibility Though we may often complain about our memories, they can be surprisingly efficient. A stimulus can remind us of a related experience with little effort or intention. As useful as this ability is, it is less helpful when our goal is to think of something other than the first memory that leaps

Y109937.indb 108

10/15/10 11:03:39 AM

On the Relationship Between Interference and Inhibition in Cognition • 109 Typical Response Override Situation (e.g., Stroop or Go/No-Go tasks) Stimulus

Prepotent response

Weaker, contextually appropriate response

Figure 6.1 A typical response override situation. In this figure, a stimulus is associated with two responses, one of which is stronger (prepotent), and the other of which is weaker (dotted line). Response override occurs whenever one needs either to select the weaker, but more contextually appropriate, response, or simply to stop the prepotent response from occurring. Inhibitory control is thought to achieve response override by suppressing activation of the prepotent response. This basic situation describes many paradigms in research on executive control, including the Stroop and go/no-go tasks.

to mind. Sometimes, unwanted retrievals arise while we are retrieving something else, impeding the recall of the desired memory. Other times, we don’t mean to recall anything at all; automatic retrieval in response to an unexpected reminder may elicit unpleasant or otherwise distracting memories. When the most accessible trace is unwelcome, we must engage a control process to overcome interference posed by these overly zealous memories simply to reorient to the desired task (see Figure 6.1). A core claim of the adaptive forgetting view is that these situations— when accessibility is disruptive—trigger inhibitory processes that suppress the intruding representation, diminishing its accessibility, helping us to regain control over our thoughts. In this section, we describe studies examining this key assumption of the adaptive forgetting hypothesis in two situations: selective retrieval and retrieval stopping. Interference-Dependence During Selective Retrieval In selective retrieval situations, the goal is to recall a particular memory given a cue that activates other competing traces. To the extent that nontarget memories interfere with retrieval of the target, inhibitory processes ought to be engaged. The role of inhibitory control during

Y109937.indb 109

10/15/10 11:03:40 AM

110 • Michael C. Anderson and Benjamin J. Levy

selective retrieval is often studied using the retrieval practice procedure (Anderson, Bjork, & Bjork, 1994). In a typical version of this paradigm participants study multiple exemplars of several categories (e.g., fruitsbanana, fruits-lemon, tools-screwdriver, tools-hammer). A retrieval practice phase follows in which participants are cued to recall some of the exemplars from some of the categories (e.g., fruits-ba______). Afterwards, they receive a final test in which they are cued to recall all studied items. Importantly, some of the originally studied categories do not appear during the retrieval practice phase, and thus serve as a baseline for how memorable items should be on this final test, given that they have received no intervening practice. As one might expect, the practiced items are recalled more often than baseline items, confirming that retrieval practice is effective at boosting memory (Bjork, 1975). But what happens to unpracticed exemplars in practiced categories (e.g., fruits-lemon)? Our earlier analysis suggests that the category cues during retrieval practice may remind subjects not only of the desired item, but also of the other exemplars. Thus, these unpracticed items may interfere during retrieval practice, triggering inhibition as a means of resolving competition. If so, these items may be harder to recall on the final test than the baseline items. In fact, this pattern is typically observed in the retrieval practice paradigm (see Figure 6.2). This memory deficit, known as retrieval-induced forgetting (RIF), is consistent with an inhibitory aftereffect induced by selective retrieval that undermines retention of competing items (for a more extensive review of RIF, see Chapter 5, this volume). Practiced Category Fruits

Unpracticed Category Tools

73%

38%

50%

50%

Banana

Lemon

Screwdriver

Hammer

Figure 6.2 A standard categorical RIF study. Illustrated here are two items from each of two categories that subjects have studied (typically six items are studied from eight categories). In this example, subjects perform retrieval practice on fruits-banana, but not on fruits-lemon (unpracticed competitor) or on any members from the “tools” category (an unpracticed baseline category). The numbers show the percentage of items correctly recalled on the final cued-recall test. As shown here, practice facilitates recall of the practiced items relative to performance in baseline categories. RIF is reflected in the reduced recall of unpracticed members of the practiced category (banana), relative to performance in baseline categories (screwdriver and hammer).

Y109937.indb 110

10/15/10 11:03:40 AM

On the Relationship Between Interference and Inhibition in Cognition • 111

If RIF truly reflects an inhibitory aftereffect that is triggered by competition, we should find that the magnitude of this effect is larger when interference needs to be resolved than when related items are not interfering. The dependency of RIF on the need to resolve interference, known as interference-dependence, has been examined in several ways, using manipulations of stimulus materials and retrieval task. Manipulations of Competitor Interference One approach to examining interference-dependence involves manipulating the prepotency of competing items by selecting materials that vary in the strength of the cue-to-target association. For example, banana and orange are higherfrequency examples of the category fruits than are kiwi or papaya. Thus, high-frequency items should be easier to recall given fruits as a cue and, for the same reasons, should interfere more when retrieving other fruits. If so, high-frequency exemplars should be more likely to trigger inhibition, and produce larger RIF effects. In contrast, low-frequency exemplars should be harder to retrieve given their category. If so, they should cause less interference and suffer little RIF. As expected, highfrequency items are recalled better overall than low-frequency items, and critically, RIF is reliably greater for those high-frequency items (Anderson et al., 1994), consistent with interference-dependence. Manipulations of the prepotency of a competitor are not limited to taxonomic categories. A similar manipulation can be achieved using homographs in which one meaning of the word is far more dominant than the other. With materials such as these, one’s strong tendency, upon reading the word, is to retrieve the dominant meaning, even if one knows that it is inappropriate. For example, when asked to retrieve an associate of the verb meaning of the following words, it is difficult to avoid thinking of their noun meanings first: root, prune, fence, lobby, and stump. Because the accessibility of the noun meaning interferes with the intended task, one might expect inhibitory processes to be engaged to inhibit the dominant sense and facilitate retrieval of the subordinate meaning. This is indeed the pattern that is observed (Johnson & Anderson, 2004; Shivde & Anderson, 2001). In contrast, retrieving the dominant meaning is effortless because the subordinate meaning poses no interference, and as expected, this produces no inhibition of the weaker meaning (Shivde & Anderson, 2001). Another common instance of unwanted accessibility occurs during foreign language learning. Learning a new language requires that we attach entirely new phonological labels to objects and ideas that already have strong associations to native language words. As such, when we try to produce the foreign name for an object, we must overcome unwanted

Y109937.indb 111

10/15/10 11:03:41 AM

112 • Michael C. Anderson and Benjamin J. Levy

activation from the native language name. Thus, we might expect that retrieving words in a foreign language should trigger inhibition, especially during the early stages of learning. Indeed, for beginning Spanish speakers, when new words are not yet well learned, this is exactly the pattern observed: Subjects have difficulty accessing specific verbal labels from their native language after repeatedly retrieving the corresponding label in the second language (Levy, McVeigh, Marful, & Anderson, 2007). This pattern depends on participants’ fluency in their second language: Participants with strongly asymmetric skills (e.g., those who know English much better than Spanish) show sizable inhibition effects, whereas those with more symmetric skills (i.e., show evidence of comparable fluency) show little inhibition. Thus, when retrieval of foreign language words was difficult, inhibition suppressed native language words, consistent with an adaptive forgetting process. This dynamic may form the basis of the puzzling phenomenon of first language attrition, or the forgetting of one’s native language when immersed in a foreign language (e.g., Linck, Kroll, & Sunderman, 2009). Most manipulations of competitor interference have varied the strength of competition by manipulating whether competitors have a low or high a priori association to the retrieval practice cue. One notable exception to this trend is a study conducted by Storm, Bjork, and Bjork (2007), who manipulated the interference properties of competitors by a procedural manipulation. On each trial, before retrieval practice occurred, subjects were presented with short lists of category-exemplar pairs to study (e.g., country-Russia, flower-lily, country-Sweden, flowertulip). A directed forgetting manipulation then instructed subjects to either forget or remember these items. Prior work suggests that this instructional manipulation is effective in varying how accessible the prior items are (e.g., Bjork, 1970), and this assumption was independently verified. This directed forgetting cue was followed by semantic generation (i.e., retrieval practice) of other items from one of those categories (e.g., flower-pa____ for pansy). Thus, the items from the list studied immediately prior to retrieval practice acted as the competitors during semantic generation, and the instructional cue (insofar as it was effective) manipulated the degree to which these items would interfere. Strikingly, cuing subjects to forget eliminated RIF for those competing items, whereas RIF was observed for items they were instructed to remember. Storm et al. argued that directed forgetting instructions made these exemplars less accessible, providing less interference during retrieval practice, and therefore they did not trigger inhibitory processes. Thus, even when participants studied the same item and performed the same retrieval practice task, RIF could be made to appear or

Y109937.indb 112

10/15/10 11:03:41 AM

On the Relationship Between Interference and Inhibition in Cognition • 113

disappear, depending on how accessible those items were entering the retrieval practice phase. Manipulations of Practice Task Interference Demands Another approach to assessing interference-dependence is to manipulate the demands of the retrieval practice task, instead of the strength of the competitor. If a cue activates a nontarget memory, but that activation is unlikely to undermine success on the retrieval practice task, then inhibitory processes should be unnecessary. The most straightforward version of this approach comes from studies that contrast the effects of retrieval practice with those of repeated reexposure to the same stimuli (often referred to as extra presentations). Here all aspects of the retrieval practice paradigm are matched across two groups of subjects, except for the events during the practice phase. One group performs retrieval practice trials, as usual (e.g., fruitor_____ for orange), while the reexposure group is given the entire category-exemplar pair for additional study (e.g., fruit-orange). In this design, both groups practice the same exemplars an equal number of repetitions for the same amount of time, but the extra exposures group has the target item fully specified. Importantly, the inhibitory account predicts that, to the extent that reexposure poses very little demand on interference resolution, it should not produce forgetting on the final test. In contrast, noninhibitory explanations such as associative blocking predict that forgetting should occur regardless of how the practiced items were strengthened (see Anderson & Bjork, 1994, for further discussion of noninhibitory accounts of RIF). Many studies using this approach have found RIF after retrieval practice, but not after reexposure, provided that output interference is controlled at test (Bäuml, 1996, 1997, 2002; Johansson, Aslan, Bäuml, Gabel, & Mecklinger, 2007; Saunders, Fernandes, & Kosnes, 2009; Wimber, Rutschmann, Greenlee, & Bäuml, 2009). The dependency of RIF on active retrieval generalizes to retrieval of visuospatial information (Ciranni & Shimamura, 1999), homograph meanings (Shivde & Anderson, 2001), propositions (Anderson & Bell, 2001), and arithmetic facts (Campbell & Phenix, 2009), suggesting it is a general attribute of this phenomenon. Even when two groups receive retrieval practice, however, the difficulty of the interference resolution task can be manipulated by changing the retrieval task itself. For instance, Anderson, Bjork, and Bjork (2000) required all subjects to engage in retrieval practice, but simply manipulated whether the practice required the resolution of interference. The competitive group was provided with the category name and a letter stem for the exemplar (e.g., fruit- or____) as is customarily

Y109937.indb 113

10/15/10 11:03:41 AM

114 • Michael C. Anderson and Benjamin J. Levy

done; the noncompetitive group was instead asked to recall the category name, given the exemplar (e.g., fr___-orange). Although both tasks engage retrieval, the latter specifies the category as the target, so other exemplars should pose no threat to retrieval success and no RIF should be observed. Consistent with the adaptive inhibition view, competitive, but not noncompetitive, retrieval practice induced RIF, despite comparable strengthening of practiced items between the two groups. Another clever approach for manipulating interference demands was introduced by Bajo, Gomez-Ariza, Fernandez, and Marful (2006). During retrieval practice, they presented the shared category cue either two seconds before giving the subjects the individuating letter stem for the exemplar or presented them simultaneously, as is conventionally done. By presenting the category two seconds before the individuating stem, Bajo et al. hoped to increase the likelihood that competitors would be activated enough to pose interference when the distinctive stem was provided. If so, greater RIF should be found than in a case where the distinctive stem is provided immediately with the category. Consistent with this, they observed significantly more RIF when the category preceded the distinctive stem, despite comparable retrieval practice success and comparable strengthening of practiced items in both cases. Interestingly, these findings were observed with a paradigm in which categories were defined purely on the basis of lexical, rather than semantic, attributes (each category consisted of words that shared the same initial two-letter syllable and differed in their third letter). Thus, they were able to manipulate the amount of interference provided by these lexical competitors through the use of two cuing methods that differed only in their timing and not in objective information provided. If inhibition is recruited to facilitate retrieval, what happens if the retrieval task itself is impossible to accomplish? Is RIF still found for competing items, even if nothing is retrieved during retrieval practice? To look at this, Storm and colleagues (e.g., Storm, Bjork, Bjork, & Nestojko, 2006; Storm & Nestojko, 2010) designed their stimuli so that retrieval practice success was impossible for some trials by providing letter stems that did not correspond to any appropriate exemplar. Even in these situations where retrieval failure is ensured for every retrieval practice trial from a specific category, subjects show robust RIF. These findings suggest that successful retrieval itself is not a prerequisite for observing RIF and that inhibition functions to assist difficult retrieval (see also Dagenbach, Carr, & Barnhard, 1990). Intriguingly, on the retrieval practice trials that were possible to complete, Storm et al. (2006) noted that the subjects with lower practice success rates were the

Y109937.indb 114

10/15/10 11:03:41 AM

On the Relationship Between Interference and Inhibition in Cognition • 115

ones who suffered the most RIF, indicating that when retrieval practice was difficult, inhibition was more pronounced. Functional neuroimaging studies provide converging evidence that inhibition reflects a cognitive control response to unwanted accessibility (for a review, see Levy, Kuhl, & Wagner, in press). It is known that the prefrontal cortex is involved in cognitive control generally (e.g., Miller & Cohen, 2001), and specifically in response to interference in memory (Dolan & Fletcher, 1997; Fletcher, Shallice, & Dolan, 2000; Henson, Shallice, Josephs, & Dolan, 2002; Shimamura, Jurica, Mangels, Gershberg, & Knight, 1995). With this in mind, Wimber et al. (2009) used functional magnetic resonance imaging (fMRI) to contrast activity during blocks of competitive retrieval practice with activity during extra presentations of the same items, as had been done in the numerous behavioral studies described above. Strikingly, competitive retrieval was associated with increased activity within the prefrontal cortex, and the extent of this increased activity predicted later RIF for competing items. Kuhl, Dudukovic, Kahn, and Wagner (2007) also found that retrieval practice engaged the prefrontal cortex (PFC) and, interestingly, prefrontal regions showed a pattern of decreasing activation across repeated retrieval practice trials. Thus, control-related regions are strongly engaged initially when unwanted accessibility is greatest, and become less involved as control demands diminish with practice. Indeed, subjects who showed the largest decrease in prefrontal activity across retrieval practice attempts were the ones who showed the most RIF. These correlations between PFC and RIF provide converging evidence for the view that interference experienced during selective retrieval triggers inhibition as an adaptive forgetting process. Interference-Dependence During Retrieval Stopping Although we have focused thus far on selective retrieval, people encounter intrusive memories in many situations. Sometimes, for example, cues bring to mind memories even when we are not seeking to retrieve something, and these memories may be disruptive. When uninvited memories intrude, people often attempt to suppress the retrieval process and exclude the intrusive memory from awareness. The effort to regulate unwanted accessibility in these situations ought to recruit inhibitory mechanisms that facilitate adaptive forgetting of the disruptive event. Intentional retrieval suppression has been studied with the think/ no-think paradigm (Anderson & Green, 2001). In these studies subjects learn pairs of items in which one item acts as a cue to remind them of a particular response (e.g., lawn-beef, journey-pants). Later

Y109937.indb 115

10/15/10 11:03:41 AM

116 • Michael C. Anderson and Benjamin J. Levy

they are shown these cues again in isolation (e.g., lawn) and are asked to engage control over retrieval. On think trials subjects are asked to remember the response (beef) and keep it in mind during the trial. On no-think trials, they are asked to attend to the cue (journey), but to prevent the associated word (pants) from entering awareness. After performing these tasks on different sets of cues, subjects are tested on all of the previously studied responses. Unsurprisingly, response words that were practiced during the think trials are recalled more often than baseline items that were initially learned, but that did not occur during the intervening think/no-think phase (see Figure 6.3). The no-think response words, however, are recalled less often than the baseline items. The impaired recall of no-think items suggests that they have been disrupted by control mechanisms (Anderson & Green, 2001; Anderson et al., 2004; Depue, Banich, & Curran, 2006; Depue, Curran, & Banich, 2007). Importantly, no-think responses (e.g., pants) are harder to recall even when subjects are tested with novel extralist cues, referred to as independent probes (Anderson & Green, 2001; Anderson et al., 2004), suggesting that the memory deficit arises from reductions in the accessibility of the response (Anderson & Spellman, 1995). 100% Respond Baseline Suppress

Percent Recalled

95% 90%

Same Probe Ordeal

Independent Probe Insect R–

85% 80% 75% 70%

Roach Same Probe

Independent Probe

Type of Final Test n = 687

Figure 6.3 Final recall performance in the think/no-think procedure. The graph shows the percentage of items that subjects correctly recalled on the final test as a function of whether they tried to recall the item (respond), suppressed the item (suppress), or had no reminders to the item during the think/no-think phase (baseline). The left side shows recall when tested with the originally trained retrieval cue (i.e., the same probe), whereas the right side shows recall when tested with a novel, extralist category cue (i.e., the independent probe). The numbers shown here were taken from a meta-analysis of 687 subjects run in the think/no-think paradigm in our lab.

Y109937.indb 116

10/15/10 11:03:42 AM

On the Relationship Between Interference and Inhibition in Cognition • 117

In recent work using this paradigm, we sought to study the relationship between the amount of memory inhibition people show on a final retention test and how effectively they reduced the intrusiveness of the unwanted memory during retrieval suppression itself. According to the adaptive forgetting hypothesis, evidence for inhibitory forgetting should be most evident for people who are effective at limiting the intrusiveness of an unwanted memory during suppression. To study this relationship, we developed an online, trial-by-trial measure of how intrusive an item is during no-think trials (Levy & Anderson, in preparation). After each no-think trial subjects reported whether or not the response word came to mind, allowing us to identify when conscious intrusions happened and to track their progression with practice at the suppression task. Thus, in contrast to studies of RIF in which intrusiveness is presumed on the basis of a competitor’s a priori associative strength to a cue, this study directly assessed people’s subjective experience of whether the unwanted memory intruded. Using this metric of intrusiveness, we were able to document several important findings about intrusions and memory inhibition. First, when people attempt to suppress retrieval of an unwanted memory, they often fail. Indeed, on the initial suppression attempt for a given item, people usually fail to prevent the item from coming to mind. With repeated practice, however, intrusions decrease in frequency, showing that people can, with repeated effort, limit accessibility of unwanted memories (see Figure 6.4). Intriguingly, this steady decline in intrusions resembles

Frequency of Intrusions

70% 60% 50% 40% 30% 20% 10%

1

2

3 4 5 6 7 8 9 10 11 12 Repetitions During the TNT Phase

Figure 6.4 Frequency of intrusions during a think/no-think study. Plotted are the percentage of no-think trials where the subject reported that the response word came to mind for each of the 12 repetitions of the no-think items. Subjects initially report intrusions on more than half of the trials, but the frequency of these intrusions rapidly declines with practice. (From Levy & Anderson, in preparation.)

Y109937.indb 117

10/15/10 11:03:42 AM

118 • Michael C. Anderson and Benjamin J. Levy

the memory impairment observed for no-think items, which gradually increases with repeated practice at the task (Anderson & Green, 2001). This similarity suggests that subjective intrusion ratings and final test performance are measuring the same underlying construct: the state of activation of the to-be-avoided memory. If so, subjects who show the most forgetting of suppressed items on the final test (i.e., the good inhibitors) should also show the steepest decline in intrusion frequency during the think/no-think task itself. Consistent with this, good inhibitors showed a much more robust decline in intrusions over repetitions than poor inhibitors, as predicted by adaptive forgetting. The measure of memory intrusiveness described above provides a valuable tool in identifying the neural mechanisms of adaptive forgetting, and the conditions under which they operate. For instance, incorporating the intrusion measure into a functional magnetic resonance imaging design has allowed us to examine how the brain responds to an intrusion. Prior functional neuroimaging data have suggested that mnemonic control is achieved in the think/no-think task by downregulating activity within the hippocampus (Anderson & Levy, 2009; Anderson et al., 2004; Depue et al., 2007; ), a region known to be active during memory retrieval. Using the intrusion paradigm described above, Levy and Anderson (in preparation) found that the downregulation of hippocampal activity is significantly greater during suppression trials on which subjects briefly experience an intrusion than when they do not. Indeed, the magnitude of hippocampal downregulation in reaction to intrusions predicted subsequent forgetting for suppression items on the final test. This finding provides intriguing support for the idea that controlled modulation of memory structures in the brain underlies the ability to suppress retrieval, and that this response reacts directly to the experience of unwanted accessibility. Thus, in retrieval stopping, as in retrieval selection, unwanted accessibility triggers a modulatory response that reduces accessibility, consistent with the adaptive forgetting hypothesis.

The Demand-Success Trade-Off Problem: A Central Theoretical Issue for Research on Inhibitory Control As the foregoing findings illustrate, excess activation of an interfering trace triggers processes that impair retention of that trace. Findings consistent with this conclusion arise regardless of whether one manipulates the characteristics of the interfering trace or the retrieval task itself.

Y109937.indb 118

10/15/10 11:03:42 AM

On the Relationship Between Interference and Inhibition in Cognition • 119

Collectively, these findings support the view that inhibitory processes are engaged to suppress excess activation, consistent with an adaptive forgetting mechanism. Given the evidence for interference-dependence, one might surmise that behavioral indices of inhibition should grow as a function of how intrusive a memory is. After all, inhibition functions to overcome interference, so this prediction might seem obvious, especially given the prior findings. Indeed, inhibition theories in every domain—language, attention, memory, or motor action—hypothesize that inhibition emerges as a response to excess activation, and so behavioral measures of the aftereffects of inhibition should increase with interference. Nevertheless, the predicted relationship between interference and measured inhibitory aftereffects is not straightforward. Depending on how interference is manipulated, behavioral indices of inhibition may increase or decrease with increasing interference. This complexity stems from a theoretically predicted relationship we refer to as a demand-success trade-off, which arises whenever an inhibitory theory assumes that inhibition is anything less than perfectly effective. Here, we describe the essence of the demand-success trade-off problem, providing several illustrations. We argue that this problem poses a fundamental challenge for using behavioral aftereffects of inhibition to infer properties of the mechanism or to measure variations in that mechanism across populations. Importantly, this problem is not limited to memory, but affects all domains concerned with inhibition. The Essence of the Demand-Success Trade-Off Problem Since the putative function of inhibition is to reduce the influence of an unwanted representation, the need for inhibition should be related to how intrusive an unwanted trace is. Indeed, nearly every inhibition theorist would agree that as competitors become more interfering, the demand for inhibition should rise, increasing the likelihood that inhibition is triggered. Thus, inhibition theories might seem to predict a positive relationship between interference and behavioral aftereffects of inhibition. A problem arises, though, because inhibition may be imperfect. Indeed, much of the interest in inhibitory control emanates from the potential of this construct to explain individual differences in handling interference. Prominent theories of cognitive aging, cognitive development, and clinical disorders, including depression, schizophrenia, obsessive compulsive disorder, and attention-deficit disorder, hypothesize deficits in inhibitory control functions, subserved by the prefrontal cortex, that underlie a broad spectrum of cognitive deficits that influence

Y109937.indb 119

10/15/10 11:03:43 AM

120 • Michael C. Anderson and Benjamin J. Levy

attention, language, working memory, and long-term memory. Indeed, inhibitory control has even been suggested as an important process in general intelligence. This emphasis on variable inhibitory abilities is reasonable, given increased sensitivities to interference exhibited by these populations. Thus, few theorists would wish to attribute perfect efficacy to inhibition, lest it should fail to explain what it was enlisted to explain. So, the probability that inhibition will successfully deactivate the offending representation is less than 1.0 as a rule, and increasingly more so for populations thought to have deficits in inhibition. If inhibition is imperfect, we must consider how inhibition failures should influence behavioral markers of inhibition. An inhibition failure occurs when inhibition does not return the intrusive item’s activation to a point either at or below its baseline level by the time the attempt at inhibition ends. Although many factors may in principle contribute to inhibition failure, the degree of interference caused by a competitor surely contributes. Competition is a graded function that is related to the activation that a competitor possesses. Given imperfect inhibition, there will reach a point at which a competitor’s activation level exceeds what inhibition can counter within the time interval of a trial, and this activation point should vary depending on the inhibition rate possessed by an individual. Regardless of this variation, however, inhibition theories predict that the inhibition failure rate should be an increasing function of the degree of interference. What happens when inhibition fails? The most straightforward possibility is that the inhibition target may retain the activation level it possessed at the end of an inhibition attempt, which will be above baseline. If so, the item may persist in its activated state, enjoying facilitation it would not otherwise have accrued had the competitive incident not taken place. We shall refer to this as the carryover assumption. Inhibition theories thus predict that as interference increases, the proportion of trials on which there will be no inhibition, or even facilitation, ought to increase. These failures ought to be reflected in the behavioral index of inhibition. Thus, theories with imperfect inhibition imply a trade-off between the increasing demand for inhibition and the increasing likelihood of failure. Inhibition theories—regardless of whether they concern inhibition of memories, task sets, motor responses, or perceptual representations—predict the occurrence of a demand-success trade-off that should influence behavioral indices of inhibitory aftereffects. The demand-success trade-off refers to the predicted tendency for behavioral indices sensitive to inhibition to follow a nonmonotonic function relating interference to impairment (increasing, and then decreasing), reflecting the joint influence of inhibition demand and failure rate.

Y109937.indb 120

10/15/10 11:03:43 AM

On the Relationship Between Interference and Inhibition in Cognition • 121

To see the impact of a demand-success trade-off, consider an example, using RIF as a model case. Suppose we had a way of knowing the amount of interference that each competitor caused during the retrieval practice phase. Suppose further that we knew the quantitative relationship between the level of interference and (a) inhibition demand (e.g., probability that inhibition would be triggered) and (b) inhibition success. What would the relationship between competition and these factors look like? How would they combine to produce the aggregate RIF effect on the final test? Inhibition demand should increase monotonically with the level of interference, while inhibition success should decrease monotonically (see Figure 6.5a and b). The carryover assumption described earlier about the consequences of inhibition failure yields key predictions about observed inhibitory aftereffects. In general, observed inhibition will increase as interference increases, owing to the increased probability of triggering inhibition; however, as interference grows further, the failure rate may increase enough to counterbalance the increased likelihood of triggering inhibition. Indeed, as increasing interference causes the proportion of failed inhibition trials to grow, observed inhibition should decline and, ultimately, with high enough levels of failure, should turn to facilitation. Thus, inhibition models with imperfect inhibition and a carryover assumption predict a nonmonotonic change in inhibitory aftereffects as levels of interference are parametrically increased, increasingly so the more imperfect inhibition is (see Figure 6.5c). Note that, depending on the degree of facilitation enjoyed by the competing item, this behavioral facilitation effect may arise even when some fraction of interfering items are truly inhibited. This inhibition is masked by the abundance of facilitated competitors. As this example illustrates, the moment that an inhibition model introduces imperfect inhibition, demand-success trade-offs will complicate the relationship between the amount of inhibition demanded and the amount that is ultimately measured. These dynamics can lead to counterintuitive findings that might, on their face, seem to contradict inhibition theory. For instance, if competitors are especially interfering, retrieval practice may cause no RIF or even facilitation of items that one might expect to be inhibited. Similarly, if a study manipulates the degree of interference (e.g., low vs. high) to see whether inhibition is greater in the high-interference condition, a variety of results might occur, depending on where the low- and high-interference points are on the interference continuum. If they are at low and middle points, greater inhibitory aftereffects will be observed for higher interference; if they are at middle and high points, the reverse will be observed. Similarly, the function relating interference to

Y109937.indb 121

10/15/10 11:03:43 AM

100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%

The Relation Between Inhibition Demand and Interference

None

Very low

Low Moderate High

Very Extreme high

Probability That Inhibition Is Successful

Probability That Inhibition Is Triggered

122 • Michael C. Anderson and Benjamin J. Levy

100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%

The Relation Between Inhibition Success and Interference

None

Very low

Low Moderate High

Very Extreme high

Level of Interference Caused by a Competitor

Level of Interference Caused by a Competitor

(a)

(b)

Observed Memory Inhibition

Predicted Inhibition with the Carryover Assumption 15% 10% 5% 0% –5%

–10% –15%

None

Very low

Low Moderate High

Very Extreme high

Level of Interference Caused by Competitor

(c)

Figure 6.5 The relationship between interference and inhibition. (a) As the interference caused by competitors increases, the demand for inhibition rises. (b) As the interference caused by competitors increases, so too does the likelihood that inhibition will fail to overcome the unwanted activation of the competitor. (c) The integration of the two functions predicted in a and b, under the carryover assumption. Here inhibition failure results in facilitation of highly interfering competitors, reversing the inhibition effect.

inhibition aftereffects will vary with population differences in inhibitory control. For low inhibitory control populations, the relationship between inhibition and interference ought to be more nonmonotonic than it is for high-functioning individuals, for whom inhibition failures may make up a tiny fraction of inhibition attempts. Thus, in one group (e.g., young subjects), more inhibition may be found for high- than for low-interference competitors, but for another group (e.g., older subjects), lower in inhibitory function, the opposite may be true, as the high-strength competitor will, for this group, instigate more failures. Thus, demand-success tradeoffs must be considered carefully in theoretical tests of inhibition, and also in testing inhibition deficit hypotheses.

Y109937.indb 122

10/15/10 11:03:44 AM

On the Relationship Between Interference and Inhibition in Cognition • 123

Examples of Demand-Success Trade-Offs in Memory Inhibition The importance of demand-success trade-offs is best illustrated by considering the plight of a hypothetical investigator who decides to conduct research on inhibition. Suppose that our protagonist knows that interference triggers inhibition, and so ensures that the to-be-inhibited item is highly interfering. To begin, the investigator decides to simply look at the effects of a single inhibition attempt on later memory, compared to a baseline condition in which no inhibition is engaged. Being thorough, the investigator uses several paradigms, including semantic, episodic, and phonological RIF, as well as the think/no-think procedure. Our investigator would doubtless be disappointed if he or she obtained the results depicted in Figure 6.6a. In every paradigm, performing a single suppression attempt on the highly competitive item facilitated its recall on a later test. Such a consistent pattern would lead our investigator to have grave doubts about whether inhibition combats interference. Not giving up, however, suppose that he or she persists, believing that a single inhibition attempt might not have been enough to inhibit competitors. So he or she includes 16 inhibition attempts, ensuring that people have ample opportunity to inhibit the unwanted memory. Surprisingly, he or she observes that the to-be-inhibited items show little sign of being inhibited below baseline, with the exception of a single study (see Figure 6.6a). Although the facilitation effects observed in the first series of studies have disappeared, and one study shows evidence of inhibition, this collection of eight experiments would justifiably be viewed as discouraging, and lead one to suspect that inhibition may not be involved in overcoming interference after all. The data represented in Figure 6.6a are not fictitious, but were collected in our laboratory (Johnson & Anderson, 2004; Levy & Anderson, 2001; Levy et al., 2007; Shivde & Anderson, 2001). The main difference from our hypothetical case, however, was that the differing numbers of suppression attempts were not manipulated across experiments. Rather, they represented two of four points in within-subjects parametric manipulations of the number of suppression attempts. The full range of these parametric manipulations is depicted in Figure 6.6b. Several unifying features may be observed in these otherwise diverse studies. First, in each study, competition was potent, so inhibition tended to initially fail and produce facilitation rather than below-baseline performance. These facilitation effects are most readily observed on a single suppression attempt, on which the participant is least prepared to combat the challenging interference posed by the newly encountered item. So, for example, after attempting to generate an associate to the

Y109937.indb 123

10/15/10 11:03:44 AM

124 • Michael C. Anderson and Benjamin J. Levy

Percent Correct

55%

75%

Episodic RIF

50%

Bilingual RIF

95%

50%

70%

45%

90%

45%

65%

40%

85%

40%

Percent Correct

Semantic RIF

0

60%

1

0

35%

1

0

80%

1

55%

75%

45%

95%

50%

70%

40%

90%

45%

65%

35%

85%

40%

0

Max

60%

0

30%

Max

0

Think/No-Think

80%

Max

0

1

0

Max

55%

Semantic RIF of World Meaning

Percent Correct Rhyme Cued Recall

Percent Correct Semantic Generation

(a)

50%

45%

40%

0

1

4

8

Number of Semantic Generation Trials on Target

Percent Correct Recall

Percent Correct Recall

70%

65%

1

5

50% 45% 40% 35% 30%

100%

75%

0

Phonological RIF of Native Language Words

0

1

5

10

Number of Times a Picture Named in Foreign Langauge

Episodic RIF with Homographs

60%

55%

95% 90% 85% 80% 75%

20

Intentional Retrieval Suppression

0

1

8

16

Number of Suppression Attempts

Number of Retrieval Practice Trials

(b)

Figure 6.6 Illustration of the two-point problem. (a) Illustrations of typical two-point comparisons across four paradigms designed to test for inhibition. The top row contrasts a baseline condition with a single suppression attempt in four studies; none show inhibition, and all show reliable facilitation. The second row illustrates a comparison between the baseline condition and a higher number of suppression attempts. Overall, the pattern does not provide support for inhibitory aftereffects (except in one study). (b) Parametric manipulations in these four paradigms reveal that all four involve an initial inhibition failure that boosts target recall, but which is followed by successful engagement of inhibition. Two-point assessments hide these nonmonotonic functions, which clearly indicates the involvement of inhibition in all cases.

Y109937.indb 124

10/15/10 11:03:45 AM

On the Relationship Between Interference and Inhibition in Cognition • 125

verb meaning of prune in response to prune t _ _ m, participants often reported involuntarily thinking, at first, of the inappropriate noun meaning of prune; as a result, they were later more likely to successfully generate fruit on a semantic generation test than they would have been had they not attempted to practice the trim meaning of prune —in apparent contradiction to the idea that inhibition was recruited to suppress the irrelevant noun meaning. This initial rise in recall indicates that people failed to inhibit the competitor, and that they had difficulty with the retrieval practice task itself, consistent with the very low retrieval practice success often observed on the first trial in this design. The second common feature of these data is that after the initial failure to inhibit, further inhibition attempts reduce performance monotonically. In some cases, repeated suppression impairs the to-be-inhibited item below baseline, whereas in other cases it does not. In every case, however, there is a significant decline in performance across repetition levels. Because one inhibition attempt yields significant enhancement of the to-be-inhibited item, and we know that this facilitation persists through to the final test, the decline in performance with increasing repetitions implies an active process that counters that elevated accessibility, even if performance at the highest level fails to go below baseline. The failure to go below baseline may, in some cases, reflect the aggregate influence of items that persistently intrude and remain facilitated, and others that are progressively inhibited below baseline; alternatively, all items may be initially facilitated, and the decline from peak recall may reflect increasing success at countering their intrusion. In either case, the rise in recall after a single suppression argues in favor of the carryover assumption. Interestingly, despite the carryover, which ought to make the competitor even more intrusive, people appear able to dynamically adjust cognitive control to better handle high levels of competition on subsequent trials—a finding reminiscent of evidence for conflict adaptation in studies of executive control (see Botvinick, Cohen, & Carter, 2004). These findings are representative of what happens in a variety of inhibition paradigms when interference is particularly high. Instead of finding greater below-baseline inhibition, one gets null effects or even facilitation, consistent with a demand-success trade-off. The findings provide an important cautionary tale for investigators interested in studying inhibition. Surveying the literature on inhibitory processes, whether in memory or other domains, reveals that the overwhelming majority of experiments employs only two measurement points: a baseline condition, where inhibition should be absent, and an experimental condition, where it is believed to be present. The problem with using only two measurement points—which we refer to as the two-

Y109937.indb 125

10/15/10 11:03:45 AM

126 • Michael C. Anderson and Benjamin J. Levy

point problem—is vividly illustrated by the plight of our hypothetical investigator. This investigator might seem to have done everything right, yet ended up with no evidence for inhibition, even after multiple suppression attempts. This investigator’s problem arises because he chose competitors that generated maximal interference, without appreciating the significance of demand-success trade-offs, and because he only measured inhibition with two points. Had he included a parametric manipulation of the number of inhibition attempts, the experimenter would not have been misled about the role of inhibition in these tasks. These nonmonotonic functions also suggest that demand-success trade-offs have strong potential to mislead investigators interested in theoretical models of inhibition and inhibitory deficits. Consider again a hypothetical investigator who manipulates the level of interference to evaluate whether inhibition is involved in a task. The two-point data (Figure 6.7a) might indicate less impairment in the high-interference condition, appearing to challenge an inhibition account. This isn’t necessarily true, though, as there may be an initial failure that is followed by gradual inhibition (Figure 6.7b). Thus, a parametric manipulation may have provided a crucial window into the dynamics of inhibition. Demand-success trade-offs are also relevant to research on inhibitory deficits. Consider the results of a hypothetical experiment looking at the effects of aging on memory inhibition (Figure 6.7d). These data indicate less memory inhibition in older adults, when comparing a single experimental condition to a baseline. Here again, this may not be true. We cannot tell from two measurement points whether older adults are truly deficient at inhibition (Figure 6.7e), or if they have difficulty engaging control on the first trial, followed by highly effective suppression thereafter (Figure 6.7f). Indeed, in the latter case, even if recall never goes below baseline, there is actually more inhibition, when compared to the initial rise. Because the initial failure elevates the accessibility of the to-be-inhibited item and that persists, one must account for this rise in interpreting the data in all other repetition conditions. One credible account, for example, is that older adults are somewhat slower in recruiting inhibition (or any other process, see, e.g., Salthouse, 1996) at first (leading to an initial failure, and the rise in performance), but once inhibition has been engaged, it is as effective as it is in younger populations. The foregoing examples illustrate the power of demand-success trade-offs to mislead investigators interested in inhibition. Despite substantial interest in inhibitory deficits, many investigators have not considered these trade-offs. To the extent that inhibitory control has been assessed with experimental designs that neglect these issues, as we have argued, the literature should be plagued by inconsistencies in

Y109937.indb 126

10/15/10 11:03:45 AM

Percent Correctly Recalled

30%

40%

50%

60%

70%

30%

40%

50%

60%

70%

High Interference

Younger Adults

(d)

Older Adults

Baseline Suppression

Apparent Aging Deficit in Inhibition

(a)

Manipulation of Competitor Interference

Moderate Interference

Baseline Suppression

30%

40%

50%

60%

70%

30%

40%

50%

60%

70%

1

8

(b)

Number of Suppression Attempts

0

8

(e)

16

16

Younger Adults Older Adults

Number of Suppression Attempts

1

Masked Inhibitory Function

0

Moderate Interference High Interference

Non-Monotonic Function Percent Correctly Recalled 30%

40%

50%

60%

70%

30%

40%

50%

60%

70%

0

0

8

8

(f )

Number of Suppression Attempts

1

16

16

Younger Adults Older Adults

Truly Deficient Inhibition

(c)

Number of Suppression Attempts

1

Moderate Interference High Interference

True Null Effect for High Interference

Figure 6.7 Examples of how the demand-success trade-off influences assessment of inhibition theories. (a) Hypothetical results from an investigator who manipulates interference to evaluate whether inhibition is involved in a task. The data indicate less impairment (e.g., RIF) in the high-interference condition, appearing not to support an inhibition account. However, we cannot tell, from two measurement points, whether (b) the underlying function is an initial failure for highly interfering items followed by successful engagement of inhibitory control, or (c) there really is no inhibition in the high-interference condition. (d) A similar situation may arise in testing for variation in inhibitory control abilities across different populations. Here hypothetical data suggest that the older group has an inhibitory deficit, compared to the younger group. From this two-point assessment, one cannot be sure whether older adults (e) experience an initial inhibition failure followed by successful engagement of inhibition or (f) are truly deficient at inhibitory control.

Percent Correctly Recalled

Percent Correctly Recalled Percent Correctly Recalled

Y109937.indb 127 Percent Correctly Recalled

Apparent Failure of Interference Dependence

On the Relationship Between Interference and Inhibition in Cognition • 127

10/15/10 11:03:46 AM

128 • Michael C. Anderson and Benjamin J. Levy

the support of inhibitory deficits, generating reasonable doubt about their utility. We argue that such variability is not necessarily a sign for or against these theories, but rather reflects the insensitivity of these measures for assessing inhibition. As a solution, we argue for the use of parametric manipulations. By tracking the development of inhibition with repeated application, one can distinguish between truly null inhibitory effects and trade-offs arising from initial failures to engage inhibition. We believe that the logic underlying this method can be adapted to any domain concerned with inhibition.

Concluding Remarks With rare exception, the history of research on memory has adopted the point of view that forgetting is a problem to be overcome. This simple and pervasive assumption underlies nearly a century of research on forgetting that has emphasized the contributions of passive mechanisms, such as decay, interference, and contextual fluctuation. One of the seminal contributions of Robert Bjork has been to question this foundational premise, both empirically and theoretically. Empirically, in a career spanning four decades, Bjork introduced experimental methods such as the item and list method directed forgetting paradigms to establish that people can, in fact, forget things on purpose, and that it is useful to do so. Theoretically, Bob and Elizabeth have been articulate champions of the idea that forgetting, far from being a problem, can be an adaptive feature of the cognitive system and that this function may be accomplished by inhibition. These ideas, fittingly developed in a festschrift volume in honor of Endel Tulving (Bjork, 1989), inspired the first author to develop work on RIF. Two decades later, the fruits of this collaboration with Bob and Elizabeth Bjork can be seen in the sizable volume of work on memory inhibition and in the acceptance of the view that forgetting can indeed be functional. In this chapter, we have examined evidence for a basic assumption of this adaptive forgetting view—that inhibition mechanisms are engaged specifically to counter unwanted accessibility. According to this view, when persisting accessibility of a trace hinders some cognitive process—whether it is retrieval or attention to other representations— inhibition is triggered to overcome this interference. Over the last two decades, considerable evidence has amassed for this assumption, and we reviewed this evidence, focusing, in particular, on work with the RIF and think/no-think paradigms. Evidence for interference-dependence has been obtained by manipulating the competitive dynamics during retrieval practice, either by varying the a priori associative strengths

Y109937.indb 128

10/15/10 11:03:46 AM

On the Relationship Between Interference and Inhibition in Cognition • 129

of competitors or by manipulating the demands imposed on cognitive control. Collectively, the recurring pattern indicates that when the task goal can be disrupted by heightened accessibility of a nontarget memory, people are more likely to show forgetting of the interfering trace on later tests, riffing (pun intended) nicely on Bjorkian themes of adaptive forgetting. Although the evidence clearly favors the interference-dependence assumption, increasing the accessibility of an unwanted memory will not always lead to increases in memory inhibition. In the last section, we argued that unless one assumes a perfectly effective inhibition process, behavioral markers of the aftereffects of inhibition should not uniformly increase as the interference a competitor causes increases. Rather, when inhibition is imperfect, the size of the inhibitory aftereffect should follow a nonmonotonic function, whereby it initially increases with interference, and then decreases as interference exceeds the person’s capabilities. We illustrated these trade-offs, and how they can be problematic. These trade-offs, largely neglected, have the potential to undermine theoretical inferences in any domain concerned with inhibition. Indeed, failure to consider demand-success trade-offs contributes considerable noise to the empirical case for inhibition. We argued for the utility of parametric designs to detect demand-success trade-offs, and better quantify the involvement of inhibition. Adaptive forgetting plays an important role in the regulation of memory that may be important to our broader functioning. Indeed, some have argued for its role in maintaining a healthy sense of perspective (Pronk, Karremans, Overbeek, Vermulst, & Wigboldus, 2010). For example, we are confident that Bob Bjork can recount, in glorious detail, the events of his triumphant season as coach of the UCLA Psychology Department’s flag football team during the first author’s time as his graduate student; this is true, even while we very much doubt Bob would recall very much about the many times during that same period that the first author pestered Bob relentlessly for comments on their first manuscript on RIF (success at which was negatively correlated with flag football events). Even after two decades of research on RIF and four on directed forgetting, we can think of no better proof for adaptive forgetting than this. The very best research ideas often come to us from our personal experience, and in this regard, it should perhaps come as no surprise to those who know Bob that he championed the virtues of forgetting so admirably. And the field, and his friends, are all the better for it.

Y109937.indb 129

10/15/10 11:03:46 AM

130 • Michael C. Anderson and Benjamin J. Levy

References Anderson, M. C., & Bell, T. A. (2001). Forgetting our facts: The role of inhibitory processes in the loss of propositional knowledge. Journal of Experimental Psychology: General, 130, 544–570. Anderson, M. C., Bjork, E. L., & Bjork, R. A. (2000). Retrieval-induced forgetting: Evidence for a recall-specific mechanism. Psychonomic Bulletin and Review, 7, 522–530. Anderson, M. C. & Bjork, R. A. (1994). Mechanisms of inhibition in long-term memory: A new taxonomy. In D. Dagenbach & C. Carr (Eds.), Inhibitory processes in attention, memory, and language (pp. 265–326). Academic Press. Anderson, M. C., Bjork, R. A., & Bjork, E. L. (1994). Remembering can cause forgetting: Retrieval dynamics in long-term memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 1063–1087. Anderson, M. C., & Green, C. (2001). Suppressing unwanted memories by executive control. Nature, 410, 131–134. Anderson, M. C., & Levy, B. J. (2009). Suppressing unwanted memories. Current Directions in Psychological Science, 18, 184–194. Anderson, M. C., Ochsner, K., Kuhl, B., Cooper, J., Robertson, E., Gabrieli, S. W., et al. (2004). Neural systems underlying the suppression of unwanted memories. Science, 303, 232–235. Anderson, M. C., & Spellman, B. A. (1995). On the status of inhibitory mechanisms in cognition: Memory retrieval as a model case. Psychological Review, 102, 68–100. Bajo, M. T., Gomez-Ariza, C. J., Fernandez, A., & Marful, A. (2006). Retrievalinduced forgetting in perceptually driven memory tests. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32, 1185–1194. Bäuml, K. (1996). Revisiting an old issue: Retroactive interference as a function of the degree of original and interpolated learning. Psychonomic Bulletin and Review, 3, 380–384. Bäuml, K. (1997). The list-strength effect: Strength-dependent competition or suppression. Psychonomic Bulletin and Review, 4, 260–264. Bäuml, K. (2002). Semantic recall can cause episodic forgetting. Psychological Science, 13, 356–360. Bjork, R. A. (1970). Positive forgetting: The noninterference of items intentionally forgotten. Journal of Verbal Learning and Verbal Behavior, 9, 255–268. Bjork, R. A. (1975). Retrieval as a memory modifier: An interpretation of negative recency and related phenomena. In R. L. Solso (Ed.), Information processing and cognition: The Loyola Symposium (pp. 123–144). Hillsdale, NJ: Erlbaum. Bjork, R. A. (1989). Retrieval inhibition as an adaptive mechanism in human memory. In H. L. Roediger, III & F. I. M. Craik (Eds.), Varieties of memory and consciousness: Essays in honour of Endel Tulving (pp. 309–330). Hillsdale, NJ: Erlbaum.

Y109937.indb 130

10/15/10 11:03:46 AM

On the Relationship Between Interference and Inhibition in Cognition • 131

Botvinick, M. M., Cohen, J. D., & Carter, C. S. (2004). Conflict monitoring and anterior cingulate cortex: An update. Trends in Cognitive Sciences, 8, 539–546. Campbell, J. I. D., & Phenix, T. L. (2009). Target strength and retrieval-induced forgetting in semantic recall. Memory and Cognition, 37, 65–72. Ciranni, M. A., & Shimamura, A. P. (1999). Retrieval-induced forgetting in episodic memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25, 1403–1414. Dagenbach, D., & Carr, T. H. (Eds.). (1994). Inhibitory processes in attention, memory, and language. San Diego, CA: Academic Press. Dagenbach, D., Carr, T. H., & Barnhardt, T. M. (1990). Inhibitory semantic priming of lexical decisions due to failure to retrieve weakly activated codes. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 328–340. Dempster, F. N., & Brainerd, C. J. (Eds.) (1995). Interference and inhibition in cognition. San Diego, CA: Academic Press. Depue, B. E., Banich, M. T., & Curran, T. (2006). Suppression of emotional and non-emotional content in memory: Effects of repetition on cognitive control. Psychological Science, 17, 441–447. Depue, B. E., Curran, T., & Banich, M. T. (2007). Prefrontal regions orchestrate suppression of emotional memories via a two-phase process. Science, 37, 215–219. Dolan, R. J., & Fletcher, P. C. (1997). Dissociating prefrontal and hippocampal function in episodic memory encoding. Nature, 388, 582–585. Fletcher, P. C., Shallice, T., & Dolan, R. J. (2000). “Sculpting the response space”—an account of left prefrontal activation at encoding. Neuroimage, 12, 404–417. Gorfein, D. S., & MacLeod, C. M. (Eds.). (2007). Inhibition in cognition. Washington, DC: American Psychological Association. Henson, R. N., Shallice, T., Josephs, O., & Dolan, R. J. (2002). Functional magnetic resonance imaging of proactive interference during spoken cued recall. Neuroimage, 17, 543–558. Johansson, M., Aslan, A., Bauml, K. H., Gabel, A., & Mecklinger, A. (2007). When remembering causes forgetting: Electrophysiological correlates of retrieval-induced forgetting. Cerebral Cortex, 17, 1335–1341. Johnson, S. K., & Anderson, M. C. (2004). The role of inhibitory control in forgetting semantic knowledge. Psychological Science, 15, 448–453. Kuhl, B. A., Dudukovic, N. M., Kahn, I., & Wagner, A. D. (2007). Decreased demands on cognitive control following memory suppression reveal benefits of forgetting. Nature Neuroscience, 10, 908–914. Levy, B. J., & Anderson, M. C. (2001, June). Regulation of conscious awareness: Further evidence for a direct suppression mechanism. Poster presented at the Thirteenth Annual Meeting of the American Psychological Society, Toronto, ON, Canada. Levy, B. J., Kuhl, B. A., & Wagner, A. D. (2010). The functional neuroimaging of forgetting. In S. Della Sala (Ed.), Forgetting. Hove & New York: Psychology Press.

Y109937.indb 131

10/15/10 11:03:47 AM

132 • Michael C. Anderson and Benjamin J. Levy

Levy, B. J., McVeigh, N. D., Marful, A., & Anderson, M. C. (2007). Inhibiting your native language: The role of retrieval-induced forgetting during second-language acquisition. Psychological Science, 18, 29–34. Linck, J. A., Kroll, J. F., & Sunderman, G. (2009). Losing access to the native language while immersed in a second language: Evidence for the role of inhibition in second-language learning. Psychological Science, 20, 1507–1515. Miller, E. K., & Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. Annual Review of Neuroscience, 24, 167–202. Pronk, T. M., Karremans, J. C., Overbeek, G., Vermulst, A. A., & Wigboldus, D. H. J. (2010). What it takes to forgive: When and why executive function facilitates forgiveness. Journal of Personality and Social Psychology, 98, 119–131. Salthouse, T. A. (1996). The processing-speed theory of adult age differences in cognition. Psychological Review, 103, 403–428. Saunders, J., Fernandes, M., & Kosnes, L. (2009). Retrieval-induced forgetting and mental imagery. Memory and Cognition, 37, 819–828. Shimamura, A. P., Jurica, P. J., Mangels, J. A., Gershberg, F. B., & Knight, R. T. (1995). Susceptibility to memory interference effects following frontal lobe damage: Findings from tests of paired-associate learning. Journal of Cognitive Neuroscience, 7, 144–152. Shivde, G., & Anderson, M. C. (2001). The role of inhibition in meaning selection: Insights from retrieval-induced forgetting. In D. Gorfein (Ed.), On the consequences of meaning selection: Perspectives on resolving lexical ambiguity (pp. 175–190). Washington, DC: American Psychological Association. Storm, B. C., Bjork, E. L., & Bjork, R. A. (2007). When intended remembering leads to unintended forgetting. Quarterly Journal of Experimental Psychology, 60, 909–915. Storm, B. C., Bjork, E. L., Bjork, R. A., & Nestojko, J.F. (2006). Is retrieval success a necessary condition for retrieval-induced forgetting? Psychonomic Bulletin and Review, 13, 1023–1027. Storm, B. C., & Nestojko, J. F. (2009). Successful inhibition, unsuccessful retrieval: Manipulating time and success during retrieval practice. Memory, 18, 99–114. Wimber, M., Rutschmann, R. M., Greenlee, M. W., & Bäuml, K. (2009). Retrieval from episodic memory: Neural mechanisms of interference resolution. Journal of Cognitive Neuroscience, 21, 538–549.

Y109937.indb 132

10/15/10 11:03:47 AM

7

Sleep, Retrieval Inhibition, and the Resolving Power of Human Memory Malcolm D. MacLeod and Justin C. Hulbert

Few of us—at least in the UK—have much difficulty in remembering Steve Redgrave’s Herculean feat at the 2000 Olympics in Sydney when he won his fifth Olympic gold medal. Battling against diabetes and a training regime that few could even contemplate, Redgrave and his teammates crossed the line in the coxless fours just ahead of a very talented crew from Italy. It was an emotional event—not only for those directly involved but also for the hundreds of thousands who had risen at a ridiculously early time in the morning to watch the race live. Everywhere one looked, lights were on. It seemed as if the entire UK population had risen early to see if the impossible could be done. In retrieving a memory such as this, it is unlikely that we simply dip into the equivalent of some “mental filing cabinet” to emerge victorious with a pristine memorial record of Redgrave’s historic race. Rather, what we experience as memory is actually the product of the emergent properties of a mental representation for some previously experienced event combined with our attempts to recover it. As Lashley (1950) concluded, there is no single representation, or engram, for an event tucked away in memory waiting to be accessed. Memory retrieval is, in fact, a staggeringly complex phenomenon—one that involves tapping into dynamic representations jointly shaped by the original experience, the 133

Y109937.indb 133

10/15/10 11:03:47 AM

134 • Malcolm D. MacLeod and Justin C. Hulbert

history of encountering and retrieving that information during the interim, and the relationship of these representations to other memories (Moscovitch, 2007; Nadel & Moscovitch, 1997; Nadel, Samsonovich, Ryan, & Moscovitch, 2000; Tulving, 1983). Accordingly, memory retrieval is more akin to the assembly of an edited volume (much like this one) in which individual chapters and sections are sourced, fused together, and edited in line with a set of working goals. One of the principal difficulties inherent in such a dynamic process is how to distinguish between what is relevant and what is not. Without such a capacity, our ability to remember in any adaptive sense would be severely compromised; we would soon become swamped by irrelevant ephemera. For instance, we might remember other instances of Redgrave competing, other competitions at the Olympics, or even competitions we had engaged in, instead of Redgrave’s historic victory. So how does human memory achieve this kind of resolution? How can we retrieve such a target memory when faced with seemingly limitless numbers of related memories created and collated during a lifetime’s worth of experiences? On first inspection, one might reasonably entertain the notion that each memory must be organized or tagged in such a manner as to promote subsequent identification and precise access. This would be a reasonable working assumption if it were not for the fact that the cues typically employed to retrieve particular memories seldom contain information uniquely associated with a particular target event; that is, retrieval cues tend to access not only the material of interest, but also lots of related but irrelevant material. Thus, the resolving power of memory (i.e., the ability to retrieve a particular memory of interest) cannot simply be a function of the way in which memories are labeled or organized in memory. Given the tendency to access more than the target memory of interest, the critical question arises as to how such unwanted material is dealt with at retrieval. Without some means of resolving unwanted retrieval competition, we would soon find ourselves in pursuit of countless red herrings before finally stumbling upon the target memory. To put it more simply, retrieving material about a specific event from memory is probably similar to the task involved in being asked to meet a stranger at an airport. If the only information we have available to us is that the individual we are seeking is “tall,” we would need to approach all the tall people in the airport in order to locate our target person. In the same way, when retrieval cues are underspecified, lots of related but irrelevant material will tend to be accessed in addition to the target material. The question then is, how do we bring to mind the target memory when faced with this level of competition?

Y109937.indb 134

10/15/10 11:03:47 AM

Sleep, Retrieval Inhibition, and the Resolving Power of Human Memory • 135

From this preamble, it would seem that successful memory retrieval faces three major hurdles: (1) It has to deal with the issue of how to successfully retrieve an item against a background of a memory bank that constantly increases in size and, along with it, the number of potentially competing representations within it; (2) it has to deal with the fluctuating accessibility of particular memories, depending upon their retrieval history; and (3) it has to somehow preserve our imperfect perception of memory as a permanent entity. In order to accomplish all this, it would seem reasonable to suppose that we must have evolved some mechanism that distinguishes contextually appropriate information from contextually inappropriate information. In doing so, such a retrieval system would greatly enhance our ability to find adaptive responses to the various challenges posed by our social and physical worlds (J. R. Anderson, 1990; J. R. Anderson & Milson, 1989). Indeed, considering the multitude of memory-related tasks we perform daily, experience tells us that serious difficulties in accessing memories of interest are relatively rare—we seem to clear these various hurdles with relative ease. But what kind of mechanism might promote this kind of selective memory resolution without compromising the subsequent availability of temporarily inappropriate memories? In addressing this question, we review how an active forgetting mechanism—retrieval inhibition—offers a time-limited solution to these problems. In support of this claim, we draw upon key evidence from experiments concerned with retrieval-induced forgetting, spontaneous recovery, and memory regression. Finally, we conclude our discussion by speculating on the role of sleep in regulating the time-course of retrieval inhibition and consider how sleep may contribute to the promotion of adaptive and stable memory representations.

Retrieval-Induced Forgetting (RIF) To date, the most commonly used method to investigate the role of inhibition in retrieval selection has been the retrieval practice paradigm (originally devised by M. C. Anderson, Bjork, & Bjork, 1994). This paradigm consists of three distinct phases: a study phase, followed by the selective retrieval of some of the studied items, plus a final test for all the items presented at study. In addition to producing a facilitation effect for practiced items, the selective retrieval practice procedure typically yields poorer recall of nonpracticed items from practiced sets than of nonpracticed items from nonpracticed sets. This latter effect is referred to as retrieval-induced forgetting (RIF) and has proved significant for a number of reasons. First, although early reports of retrieval-based

Y109937.indb 135

10/15/10 11:03:47 AM

136 • Malcolm D. MacLeod and Justin C. Hulbert

impairments (e.g., Blaxton & Neely, 1983; Roediger, 1974; A. D. Smith, 1971) were based on data from a single output session, the retrieval practice paradigm reveals a lasting impairment over an interpolated delay period (usually ~20 minutes) between retrieval practice and final test. Second, rather than retrieval practice resulting in the spread of activation to related items in memory—which no doubt happens under certain circumstances (e.g., E. F. Loftus, 1973; G. Loftus & Loftus, 1974; Neely, 1976)—selective retrieval practice of some members of a category actually results in a decrease in the subsequent accessibility of nonpracticed members from that same catagory. The primary mechanism underlying this phenomenon is generally considered to be retrieval inhibition, although a number of alternative explanations also exist (see, e.g., Dodd, Castel, & Roberts, 2006; C. M. MacLeod, Dodd, Sheard, Wilson, & Bibi, 2003). Despite the notion of inhibition being by no means new (see R. Smith, 1992, for a review), few memory researchers before Robert Bjork truly appreciated how, via a dampening of the activation of related but irrelevant memories, attention could be successfully channelled toward target material (see also Hasher & Zacks, 1988; Saunders & MacLeod, 2006, for slightly different accounts of how inhibitory control is considered to operate).1 Part of the reason for this apparent neglect is that memory theorists have traditionally isolated the processes involved in learning and retrieval from those involved in forgetting. Despite this firmly entrenched perspective, R. A. Bjork (1978, 2001) continues to provide convincing arguments and experimental evidence that demonstrate that forgetting is, in fact, an integral part of the learning process. In particular, Bjork has championed the notion that there exist active forms of forgetting that play critical roles in the maintenance of adaptive, goal-directed behavior that coexist alongside noninhibitory mechanisms, such as timebased decay. The critical issue in the production of RIF concerns the need to resolve competition that stems from accessing related items. Consistent with this view, nonrelevant retrieval attempts (e.g., retrieving the names of capital cities after having initially studied sets of personality traits) regularly fail to produce RIF (M. D. MacLeod, 2002; Macrae & MacLeod, 1999). The same is true when items are simply re-presented rather than cued for retrieval (Anderson, Bjork, & Bjork, 2000). Yet, by providing the mere illusion of relevant competition during memory search in an “impossible retrieval task,” Storm, Bjork, Bjork, and Nestojko (2006) demonstrated that such circumstances are sufficient to invoke inhibitory control and, in turn, produce RIF.

Y109937.indb 136

10/15/10 11:03:48 AM

Sleep, Retrieval Inhibition, and the Resolving Power of Human Memory • 137

Since the first report of RIF, considerable evidence has accumulated in favor of an inhibitory account of this phenomenon (see J. R. Anderson, 2003, for a review). Perhaps the strongest evidence in supporting this perspective comes from those studies that have employed novel, independent cues during the final test phase of the retrieval practice paradigm (e.g., Anderson & Spellman, 1995; MacLeod & Saunders, 2005; Saunders & MacLeod, 2006; Veling & van Knippenberg, 2004). The fact that unpracticed items from unpracticed sets remain inaccessible despite the use of novel retrieval cues suggests that the memorial representation itself is in some way unavailable to conscious inspection; that is, it has been inhibited (see Anderson & Levy, this volume, for further discussion). The logic of inhibiting a dominant, yet contextually inappropriate memory trace is difficult to deny, especially when our needs change as a result of new circumstances. Indeed, one of the primary benefits of reducing retrieval competition via retrieval inhibition is that outof-date information can be set aside so that current information can be better remembered (R. A. Bjork, 1989; MacLeod & Saunders, 2008). There is usually little need, for instance, to recover the phone number of a former residence once one has moved on. In other cases, however, inhibiting a dominant memory may be only provisionally apposite; that is, material that is currently inhibited may actually be necessary at some future point in time to achieve an unforeseen task. If memories were rendered permanently inaccessible by the actions of retrieval inhibition, our ability to deal with challenges in our environment would soon become compromised. Indeed, such an inhibitory mechanism would only prove adaptive if one’s goals never changed or were constantly novel—neither of which reflects what happens in real life. In short, there is a compelling argument that the effects of inhibitory control have to be transitory if they are to prove adaptive (see also MacLeod & Macrae, 2001; MacLeod, Saunders, & Chalmers, 2010, for further discussion). Fortunately, inhibitory effects do not appear to vanquish affected memories into the ether. In fact, it has been shown that previously inhibited memories actually enjoy facilitated rates of relearning, compared to baseline traces (Storm, Bjork, & Bjork, 2008). These findings are consistent with Bjork and Bjork’s theorizing on the dissociation between retrieval strength and storage strength. In their new theory of disuse, Bjork and Bjork (1992) proposed that the retrieval strength of a particular memory equates to the current ease with which that item can be retrieved (e.g., retrieval strength is diminished when competitors, rather than a target-memory, are retrieved). Storage strength, in contrast, reflects how well an item was originally learned and, as such, is considered to be a relatively permanent feature (i.e., it is impervious to

Y109937.indb 137

10/15/10 11:03:48 AM

138 • Malcolm D. MacLeod and Justin C. Hulbert

the fluctuations of delay and retrieval inhibition). As we begin to consider how these two aspects of memory strength might interact to allow material that was once forgotten to reemerge, we shall take a closer look at two examples from the memory literature: spontaneous recovery and regression effects.

Spontaneous Recovery and Regression Effects From the earliest reports of RIF, parallels have been drawn to the forgetting typically observed in retroactive interference paradigms. As M. C. Anderson and colleagues (1994) note, many classical studies of retroactive interference employed a learning paradigm that confounded retrieval practice and strength of interpolated interference learning. Although the contribution of RIF in retroactive interference remains an active area of research (e.g., Bäuml, 1996; Delprato, 2005), the recovery of retrieval inhibition over time offers a tantalizing bridge across the literatures. To illustrate this point, consider the fact that in the standard A-B/A-C interference paradigm (Briggs, 1954), participants first learn a list of cue-associate pairings to criterion prior to being trained on a second set of pairings that share cues with items from the first list. Typically, subsequent cued-recall performance for the first-list associates diminishes as participants practice the retrieval of second-list associates, compared to a control group without interpolated learning. In line with results from the animal learning/extinction literature (see, e.g., Hull, 1943; Pavlov, 1927), Briggs (1954) discovered that human participants begin to recall items from the first list with a greater probability than their recall of second-list items, provided the delay between learning and testing was sufficiently long (i.e., between 12 and 24 hours). In the years that followed this discovery, memory researchers expended considerable effort in trying to unravel the factors that give rise to such spontaneous recovery effects (e.g., Postman, 1971; Postman, Stark, & Fraser, 1968; Postman, Stark, & Henschel, 1969). Although the line of research became a less popular pursuit when questions were raised about the original empirical evidence (Tulving & Madigan, 1970), this phenomenon has experienced something of a renaissance in recent years, specifically in the domains of directed forgetting (Wheeler, 1995), automatic memory processes (Lustig, Konkel, & Jacoby, 2004), and verbal overshadowing (Finger & Pezdek, 1999). Basing their perspective on classic, yet often overlooked, inhibitory explanations of retroactive interference along the lines of Osgood’s (1946, 1948) reciprocal inhibition hypothesis and Postman’s (1961)

Y109937.indb 138

10/15/10 11:03:48 AM

Sleep, Retrieval Inhibition, and the Resolving Power of Human Memory • 139

response set suppression hypothesis of retroactive interference, R. A. Bjork and Bjork (1992) suggested that first-list intrusions—or rehearsal attempts—during interpolated learning actually set the stage for both the inhibition of the first list’s retrieval strength and the facilitation of its storage strength. Since Bjork and Bjork maintain that retrieval strength decays according to a negatively accelerated function over time, the spontaneous recovery of first-list items seen at long delays may be due to the relatively faster rate of decline in the retrieval strength of the interpolated list.2 Related to the finding of spontaneous recovery are so-called memory regression effects (Estes, 1955). To illustrate, if we were to plot recall success as a function of when in time the various items were learned, we would typically find that, in the immediate aftermath of new learning, the earliest memories are disadvantaged compared to the relatively accessible recent items. After longer delays that do not involve any further retrieval attempts, however, the earliest learned memories often return to prominence (see R. A. Bjork, 1978). The ecological advantage of this regression is easy to grasp. Behaviors learned most recently are, statistically speaking, those we are most likely to use in the near future—so long as the circumstances under which the learning occurred persist. If, however, the environmental demands shift, such that we experience prolonged periods during which there is no call for these new behaviors, then the better learned, earlier behaviors should reassume dominance. Regression effects are widespread throughout the human and animal learning literature and encompass a range of timescales (see R. A. Bjork, 2001, for a review). In the treatment of phobias, for instance, it is relatively common for fear responses to return once treatment has ended (see Lang, Craske, & Bjork, 1999, for a discussion). It would seem that, without continued practice, any treatment gains are likely to dissipate over time. Sports coaches have long been aware of this phenomenon. Long breaks from competition or training can result in regression to earlier bad habits, or can prove advantageous in some instances, especially where high-performing athletes have recently acquired poor or inappropriate actions.

Inhibition and Delay At this point we are compelled to ask what is so special about the typical 12- to 24-hour delay that brings about a mnemonic recovery. McGeoch (1932) reminded us that forgetting is not simply due to the passage of time, per se, but also to the type of activity occurring during that delay.

Y109937.indb 139

10/15/10 11:03:48 AM

140 • Malcolm D. MacLeod and Justin C. Hulbert

Similarly, Wixted (2004) drew our attention to the earlier and much neglected work of Keppel (1968), who concluded that nonspecific retroactive interference from intervening cognitive activities is the primary cause of forgetting in everyday life. Indeed, when intervening activities are reduced or eliminated (for instance, by sleep, Jenkins & Dallenbach, 1924), the level of forgetting is significantly reduced. The extent of retroactive interference may not, as had been classically assumed in the interference literature, be due simply to the degree of relatedness between learned materials and those materials that had been acquired during intervening periods of wakefulness. Instead, forgetting may be more directly associated with the degree to which mental exertion is required to perform a cognitive task. If neural memory resources become depleted because of some cognitive task (e.g., reading normally increases activity in the hippocampus), then there may be insufficient resources available to permit memories to be consolidated. Sleep, however, may serve to eliminate mental exertion and thereby ensure that there are sufficient hippocampal resources to permit memory consolidation to take place (Wixted, 2004). We would argue, however, that this model represents only part of the picture. In particular, we posit that sleep performs, more generally, the role of regulating access to memories—not simply in terms of supporting memory consolidation and therefore storage strength, but also in terms of modulating retrieval strength. In essence, we are arguing that sleep may have a pivotal role in the resetting of retrieval inhibition so that memories that had been unavailable to conscious inspection once more become available for retrieval. In doing so, sleep may perform a critical role in determining which memories are ultimately available for retrieval during our waking lives. We have argued elsewhere (MacLeod & Macrae, 2001; MacLeod et al., 2010) that forgetting effects need to possess some transitory property if they are to meet the challenges posed by changes in our social and physical environment. If we had the ability to look into the future and see what kinds of information might be needed in order to function adaptively, the task for memory would be considerably easier—we could simply dispose of those memories that were considered redundant or out of date and keep the useful ones. In the absence of a crystal ball, however, there needs to be some way of promoting the information we currently need, but not at the expense of rendering related but currently redundant material permanently unavailable. Without knowing what challenges we will face tomorrow, or next week, or next year, the form of forgetting that arises from retrieval inhibition has to be transitory; otherwise, inhibitory forgetting would soon become maladaptive (E. L.

Y109937.indb 140

10/15/10 11:03:48 AM

Sleep, Retrieval Inhibition, and the Resolving Power of Human Memory • 141

Bjork, Bjork, & MacLeod, 2006; MacLeod, Bjork, & Bjork, 2003). In the remainder of this chapter, we briefly outline the relationship between retrieval inhibition and the passage of time, and the role of sleep in the resolving power of memory. In one of our earlier studies on retrieval-induced forgetting, we illustrated the transient nature of RIF (see MacLeod & Macrae, 2001). The critical factor in determining the production of RIF is the timing of the selective retrieval practice procedure in relation to the final test phase. Using sets of personality traits describing two fictitious individuals, MacLeod and Macrae showed that RIF failed to emerge when a 24-hour delay was inserted between retrieval practice and final test. If, on the other hand, a 24-hour delay was inserted between the initial presentation phase and retrieval practice, significant retrieval-induced forgetting was again observed, RIF still emerged in the final test, just as it had in the standard paradigm. Using novel probes at final test to rule out associative blocking as a viable explanation for these findings, we subsequently showed that these inhibitory effects dissipated over a 24-hour delay following selective retrieval practice (Figure 7.1; see Saunders & MacLeod, 2002). Likewise, a 12-hour delay filled with sleep has been observed to reduce RIF (Baran, Wilson, and Spencer, 2010). Despite the demonstrable transience of RIF in a number of paradigms, we readily acknowledge that there may be conditions under which this kind of forgetting may be more permanent (see MacLeod & Saunders, 2008; Storm, Bjork, & Bjork, 2007, for discussions). Currently, the literature on the temporal parameters of RIF remains variegated. Chan (2009) keenly noted that evidence for RIF lasting more than 24 hours is largely restricted to experiments involving retrieval practice sessions spread across multiple days (Conroy & Salmon, 2005, 2006; Ford, Keating, & Patel, 2004; García-Bajos, Migueles, & Anderson, 2009; but see Racsmány, Conway, & Demeter, 2010). This most likely reflects the fact that the vitality of inhibition is linked to the extent of retrieval practice and its spacing (see M. C. Anderson, 2003; MacLeod et al., 2010).3 Interestingly, Chan (2009) also reported a reversed RIF effect (i.e., retrieval-induced facilitation) for information that had been integrated into a coherent whole following a delay of 24 hours. He suggests that mnemonic features shared between practiced and unpracticed items are facilitated during retrieval practice, but that these only become unmasked in the latter case once any distinguishing features are no longer subject to inhibitory control. Here again, we encounter the seemingly magical delay period of 24 hours. Although a great variety of events occur within any 24-hour window, the one constant is sleep. Our 24-hour circadian clock not only

Y109937.indb 141

10/15/10 11:03:48 AM

Difference in Recall (%)

50% 40% 30% 20% 10% 0% –10% –20%

Difference in Recall (%)

142 • Malcolm D. MacLeod and Justin C. Hulbert

50% 40% 30% 20% 10% 0% –10% –20%

MacLeod & Macrae (2001)

Experimental Design Facilitation (Rp+ vs. Nrp) Inhibition (Rp– vs. Nrp)

Saunders & MacLeod (2002) Graph Key Immediate practice & test Delayed practice & test Delayed test

Facilitation (Rp+ vs. Nrp) Inhibition (Rp– vs. Nrp)

Figure 7.1 Measures of above-baseline facilitation and below-baseline inhibition in MacLeod and Macrae (2001) and Saunders and MacLeod (2002), as a function of interpolated delay between study and retrieval practice or between retrieval practice and test. For simplicity, the immediate practice and test condition from MacLeod and Macrae’s (2001) Experiment 2 is plotted here as the comparison group for both experimental conditions in the upper-left panel. Rp+ = practiced items from practiced sets, Rp– = unpracticed items from practiced sets, and Nrp = unpracticed items from unpracticed sets.

dictates when to head to bed and wake up, but it establishes a natural separation between the context of one day and the next, when previous learning (or forgetting) may no longer be appropriate. As we will discuss, the neurobiological states occurring during sleep have proved to be uniquely supportive of certain learning and memory processes. By relating these processes to the distinctions introduced by Bjork and Bjork (1992), we propose a mechanism by which the effects of inhibitory control can remain flexible. In doing so, we also explore how sleep might contribute to the restoration of the balance between the storage strength of a particular memory and its corresponding retrieval strength.

Sleep and Synaptic Homeostasis Perhaps because sleep was considered for so long to be devoid of any mental activity, it has taken cognitive theorists considerable time to catch up with the rapid developments that have been made in the

Y109937.indb 142

10/15/10 11:03:50 AM

Sleep, Retrieval Inhibition, and the Resolving Power of Human Memory • 143

neuroscience of sleep and memory. The current view is that sleep is a behavior that should more properly be defined as an actively regulated process that plays a pivotal role in the reorganization of neural activity (Hobson, 2005). More specifically, two major functions have been identified concerning the relationship between sleep and longterm memory. Perhaps the most readily recognized function of sleep in this regard involves the consolidation of recently encoded memories. Consolidation is, itself, a fairly nebulous term that has variously been used to refer to processes on timescales ranging from hours to decades, and involving everything from the stabilization of recently encoded memories in order to shield them from the hazards of traumatic brain injuries, to memory enhancement and integration within a more global knowledge base (see Stickgold & Walker, 2007, for a review). For present purposes, we focus on the putative connection between consolidation and the recovery of information forgotten due to retroactive inference. Beginning with Ekstrand’s (1967) description, research has largely born out the restorative nature of sleep in classic retroactive interference paradigms, as well as in more naturalistic tasks. Tellingly, Drosopoulos, Schulze, Fischer, and Born (2007) reported that restorative benefits of sleep are most prominent when the affected associations are weak as a consequence of poor encoding or retroactive interference. While not directly addressing the role of sleep at the time, R. A. Bjork and Bjork’s (1992) new theory of disuse would seem to predict exactly this outcome for memories that were once well learned (i.e., enjoyed a high storage strength) but currently register a low retrieval strength due to temporary inhibition. In fact, a growing body of evidence suggests that sleep actually mimics competitive retrieval. The hippocampo-neocortical dialogue theory describes the process underlying this benefit. Put simply, the theory posits that during sleep, memory traces are “replayed” and integrated via reciprocal information transfer between the hippocampus and the neocortex. The hippocampus represents a temporary store for newly learned information, while the neocortex represents a more permanent memory store (Buzsáki, 1996; Sutherland & McNaughton, 2000; Walker, 2005; Wilson, 2002). Though some have questioned the elevated status afforded to sleep in consolidating memories (for more on this perspective, see Vertes, 2004), the hippocampo-neocortical dialogue theory continues to garner support from both animal (Nadasdy, Hirase, Czurko, Csicsvari, & Buzsáki, 1999; Pavlides & Winson, 1989; Quin, McNaughton, Skaggs, & Barnes, 1997; Sirota, Csicsvari, Buhl, & Buzsáki, 2003; Wilson & McNaughton, 1994) and human studies (Peigneux, Laureys, Delbeuck, & Maquet, 2001; Peigneux et al., 2004).

Y109937.indb 143

10/15/10 11:03:51 AM

144 • Malcolm D. MacLeod and Justin C. Hulbert

The other documented function of sleep in relation to long-term memory concerns the synaptic homeostasis hypothesis, which focuses on the ability for sleep to lessen the synaptic footprint (i.e., in terms of energy and space in the brain) that has gradually built up during the day (Miller, 2009; Tononi & Cirelli, 2003, 2006). In essence, new memories are written to memory during the waking period by altering the connection strengths between different neurons. This process is called long-term potentiation (LTP) and represents our best model of long-term memory on the neuronal level to date. Proteins synthesized during LTP gradually accumulate over time, raising the overall level of synaptic strength in the brain. If this buildup of proteins were to continue unabated, the overall concentration would eventually reach a saturation point that would preclude any further learning. In order to prevent this from occurring, rising levels of the neurotransmitter adenosine eventually trigger the release of the inhibitory processes that had hitherto been staving off sleep. Hence, as waking life becomes more stimulating and the synaptic balance is thrown off-kilter, the duration of sleep increases. This, in turn, allows sufficient time for a return to a stable set point (Donlea, Ramanan, & Shaw, 2009). During sleep, the overall synaptic strength in the brain is gradually downscaled until it reaches a satisfactory baseline level while simultaneously preserving the relative connection weights between synapses (Vyazovskiy, Cirelli, Pfister-Genskow, Faraguna, & Tononi, 2008). This process also serves to tune memories by selectively curbing extraneous activations that would otherwise cloud the efficiency of neural responses (Graves, Pack, & Abel, 2001; Hasselmo, 1999; Pace-Schott & Hobson, 2002). Although the synaptic homeostasis hypothesis delineated above emphasizes synaptic downscaling, we would argue that another important function of sleep is to restore synaptic balance through the release of inhibition that had been applied to memories during wakefulness. In doing so, memories that are released from inhibition may once again become available for retrieval and prove advantageous in our attempts to achieve our goals the following day. A plausible account of just such a mechanism comes from the realm of computational neuroscience. Norman, Newman, and Detre (2007) put forward a model of RIF that accounts for various properties of inhibitory forgetting, including the diminution of forgetting over long delays. To accommodate this finding, Norman et al. (2007) suggested two possible explanations. The first simply assumes that RIF is context dependent and, as a result, should diminish as the testing context drifts away from the retrieval context with the passage of time. Indeed, similar explanations have also been proposed for spontaneous recovery phenomena (Rescorla, 2004).

Y109937.indb 144

10/15/10 11:03:51 AM

Sleep, Retrieval Inhibition, and the Resolving Power of Human Memory • 145

The second account, which Norman et al. (2007) favor, is based on an offline restorative mechanism which Norman, Newman, and Perotte (2005) had previously suggested occurs during rapid eye movement (REM) sleep. During this time, feedback inhibition in the model is periodically lowered in order to identify mnemonic competitors that are permitted to then rise above threshold. Having been singled out, these competitors can be further dampened, limiting future intrusions in the waking state. The level of inhibition in the model is then raised, such that target memories at risk of falling below the new, more stringent threshold, can be selectively facilitated to support future retrieval attempts. In this manner, the proposed learning algorithm oscillates between periods of competitor punishment and target strengthening (see also Norman, Newman, Detre, & Polyn, 2006).4 From this vantage point, understanding the REM sleep model in terms of Bjork and Bjork’s new theory of disuse is relatively straightforward. The storage strength of a particular memory should help to determine how easily an inhibited memory can be repaired, in much the same way that facilitation is directly related to storage strength. Conversely, if an item is deemed to be overly competitive in the oscillatory model, it should be weakened in direct proportion to its current retrieval strength until weaker memories can be retrieved given the proper context. The preexisting storage strength of dominant items, however, should shield those items from being unnecessarily overwritten. By directly manipulating activity (sleep versus wake) over a 12-hour period following retrieval practice, Baran, Wilson, and Spencer (2010) went on to demonstrate an inverse relationship between time spent in REM sleep and RIF, consistent with the notion that REM sleep may play a role in dampening prior inhibition. If, however, retrieval practice were to be spaced out over days, it remains possible that the storage strength would become so unstable that substantial RIF might persist for weeks without showing significant signs of recovery. Sleep offers far more than a quiet respite from the fast-paced world that surrounds us. More than just freeing up the cognitive resources necessary to cement the storage strength of newly acquired memories, sleep may achieve a homeostatic balance between the need to further inhibit recurrently disruptive memories and the need to facilitate weakened memories that might once again become relevant. These latter adjustments are best understood as modulations in retrieval strength and, as such, provide an explanatory mechanism for memory phenomena such as spontaneous recovery, regression effects, and release from RIF.

Y109937.indb 145

10/15/10 11:03:51 AM

146 • Malcolm D. MacLeod and Justin C. Hulbert

Concluding Remarks: Consolidating the Past and Looking Forward Plainly, the ever-changing environment in which we live often presents stark challenges for effective memory encoding and retrieval. In the face of these hurdles, we have developed a memory system that is not only flexible enough to inhibit prepotent responses so that more contextually appropriate, weaker responses can be retrieved and novel experiences can be incorporated into memory, but one that enjoys a regular means by which the system’s relative levels of excitation and inhibition can be stabilized during sleep. The putative ability of sleep to reset inhibition broadly outlined in these pages is by no means complete, but we hope, nevertheless, that it serves to demonstrate how the truly germinative insights provided by Robert Bjork’s work can influence new generations of memory researchers. It is in this spirit that we must pay close heed to Bjork’s call to move toward more neurologically plausible models of memory. Only then shall we begin to appreciate more fully the essence of his work and, in turn, advance the scientific exploration of human memory into exciting new realms.

Endnotes 1. For some researchers, the notion of inhibition extends to include the phenomenon of “memory stopping,” or the idea that inhibitory control can prevent unwanted or threatening memories coming into conscious awareness (M. C. Anderson, 2003; M. C. Anderson & Green, 2001 also Anderson & Levy, in this volume). For others, inhibition represents the mechanism by which the spread of activation is controlled (Saunders & MacLeod, 2006). In this chapter, however, we are concerned solely with inhibition as a means of downregulating the activation of related but unwanted memorial representations. 2. Melton and Irwin’s (1940) two-factor theory of interference, while not inhibitory in nature, also attributes forgetting to the inappropriate elicitation of first-list responses during second-list retrieval. This response competition is met with unlearning that is consistent with extinction in operant conditioning studies. Importantly, extinction predicts the spontaneous recovery of original learning. 3. See also relevant work on the effect of spaced learning (e.g., Crowder, 1976; Whitten & Bjork, 1977). 4. The nature of the strengthening and weakening processes is strikingly consistent with the theta oscillations that have been tied to various aspects of learning and memory and map on to long-term potentiation and longterm depotentiation, respectively.

Y109937.indb 146

10/15/10 11:03:51 AM

Sleep, Retrieval Inhibition, and the Resolving Power of Human Memory • 147

References Anderson, J. R. (1990). The adaptive character of thought. Hillsdale, NJ: Erlbaum. Anderson, J. R., & Milson, R. (1989). Human memory: An adaptive perspective. Psychological Review, 96, 703–719. Anderson, M. C. (2003). Rethinking interference theory: Executive control and the mechanisms of forgetting. Journal of Memory and Language, 49, 415–445. Anderson, M. C., Bjork, E. L., & Bjork, R. A. (2000). Retrieval-induced forgetting: Evidence for a recall-specific mechanism. Psychonomic Bulletin and Review, 7, 522–530. Anderson, M. C., Bjork, R. A., & Bjork, E. L. (1994). Remembering can cause forgetting: Retrieval dynamics in long-term memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 1063–1087. Anderson, M. C., & Green, C. (2001). Suppressing unwanted memories by executive control. Nature, 410, 131–134. Anderson, M. C., & Spellman, B. A. (1995). On the status of inhibitory mechanisms in cognition: Memory retrieval as a model case. Psychological Review, 102, 68–100. Baran, B., Wilson, J., & Spencer, R. (2010). REM-dependent repair of competitive memory suppression. Experimental Brain Research, 203(2), 471–477. Bäuml, K.-H. (1996). Revisiting an old issue: Retroactive interference as a function of the degree of original and interpolated learning. Psychonomic Bulletin and Review, 3, 380–384. Bjork, E. L., Bjork, R. A., & MacLeod, M. D. (2006). Types and consequences of forgetting: Intended and unintended. In L.-G. Nilsson & O. Nobuo (Eds.), Memory and society: Psychological perspectives (pp. 141–165). New York: Psychology Press. Bjork, R. A. (1978). The updating of human memory. In G. H. Bower (Ed.), The psychology of learning and motivation (Vol. 12, pp. 235–259). New York: Academic Press. Bjork, R. A. (1989). Retrieval inhibition as an adaptive mechanism in human memory. In H. L. Roediger & F. I. Craik (Eds.), Varieties of memory and consciousness: Essays in honour of Endel Tulving (pp. 309–330). Hillsdale, NJ: Lawrence Erlbaum Associates. Bjork, R. A. (2001). Recency and recovery in human memory. In H. L. Roediger, III, J. S. Nairne, I. Neath, & A. M. Suprenant (Eds.), The nature of remembering: Essays in honor of Robert G. Crowder (pp. 211–232). Washington, DC: American Psychological Association Press. Bjork, R. A., & Bjork, E. L. (1992). A new theory of disuse and an old theory of stimulus fluctuation. In A. F. Healy, S. Kosslyn, & R. M. Shiffrin (Eds.), From learning processes to cognitive processes: Essays in honor of William K. Estes (Vol. 2, pp. 35–67). Hillside, NJ: Erlbaum.

Y109937.indb 147

10/15/10 11:03:51 AM

148 • Malcolm D. MacLeod and Justin C. Hulbert

Blaxton, T. A., & Neely, J. H. (1983). Inhibition from semantically related primes: Evidence of a category-specific inhibition. Memory and Cognition, 11, 500–510. Briggs, G. E. (1954). Acquisition, extinction, and recovery functions in retroactive inhibition. Journal of Experimental Psychology, 47, 285–293. Buzsáki, G. (1996). The hippocampo-neocortical dialogue. Cerebral Cortex, 6, 81–92. Chan, J. C. K. (2009). When does retrieval induce forgetting and when does it induce facilitation? Implications for retrieval inhibition, testing effect, and text processing. Journal of Memory and Language, 61, 153–170. Conroy, R., & Salmon, K. (2005). Selective postevent review and children’s memory for nonreviewed materials. Journal of Experimental Child Psychology, 90, 185–207. Conroy, R., & Salmon, K. (2006). Talking about parts of a past experience: The impact of discussion style and event structure on memory for discussed and nondiscussed information. Journal of Experimental Child Psychology, 95, 278–297. Crowder, R. G. (1976). Principles of learning and memory. Oxford, UK: Lawrence Erlbaum. Delprato, D. J. (2005). Retroactive interference as a function of degree of interpolated study without overt retrieval practice. Psychonomic Bulletin and Review, 12, 345–349. Dodd, M. D., Castel, A. D., & Roberts, K. E. (2006). A strategy disruption component to retrieval-induced forgetting. Memory and Cognition, 34, 102–111. Donlea, J. M., Ramanan, N., & Shaw, P. J. (2009). Use-dependent plasticity in clock neurons regulates sleep need in drosophila. Science, 324, 105–108. Drosopoulos, S., Schulze, C., Fischer, S., & Born, J. (2007). Sleep’s function in the spontaneous recovery and consolidation of memories. Journal of Experimental Psychology: General, 136, 169–183. Ekstrand, B. R. (1967). Effect of sleep on memory: 1. Journal of Experimental Psychology, 75, 64–72. Estes, W. K. (1955). Statistical theory of spontaneous recovery and regression. Psychological Review, 62, 145–154. Finger, K., & Pezdek, K. (1999). The effect of the cognitive interview on face identification accuracy: Release from verbal overshadowing. Journal of Applied Psychology, 84, 340–348. Ford, R. M., Keating, S., & Patel, R. (2004). Retrieval-induced forgetting: A developmental study. British Journal of Developmental Psychology, 22, 585–603. García-Bajos, E., Migueles, M., & Anderson, M. C. (2009). Script knowledge modulates retrieval-induced forgetting for eyewitness events. Memory, 17, 92–103. Graves, L., Pack, A., & Abel, T. (2001). Sleep and memory: A molecular perspective. Trends in Neurosciences, 24, 237–243.

Y109937.indb 148

10/15/10 11:03:51 AM

Sleep, Retrieval Inhibition, and the Resolving Power of Human Memory • 149

Hasher, L., & Zacks, R. T. (1988). Working memory, comprehension, and aging: A review and a new view. In G. H. Bower (Ed.), The psychology of learning and motivation (Vol. 22, pp. 193–225). New York: Academic Press. Hasselmo, M. E. (1999). Neuromodulation: Acetylcholine and memory consolidation. Trends in Cognitive Sciences, 3, 351–359. Hobson, J. A. (2005). Sleep is of the brain, by the brain and for the brain. Nature, 437, 1254. Hull, C. L. (1943). Principles of behaviour. New York: Appleton-CenturyCrofts. Jenkins, J. B., & Dallenbach, K. M. (1924). Oblivescence during sleep and waking. American Journal of Psychology, 35, 605–612. Keppel, G. (1968). Consolidation and forgetting theory. In H. Weingartner & E. S. Parker (Eds.), Memory consolidation: Psychology of cognition (pp. 149–161). Hillsdale, NJ: Erlbaum. Lang, A., Craske, M., & Bjork, R. (1999). Implications of a new theory of disuse for the treatment of emotional disorders. Clinical Psychology Science and Practice, 6, 80–94. Lashley, K. (1950). In search of the engram. Symposia of the Society for Experimental Biology, 4, 454–482. Loftus, E. F. (1973). Activation of semantic memory. American Journal of Psychology, 86, 331–337. Loftus, G., & Loftus, E. (1974). The influence of one memory retrieval on a subsequent memory retrieval. Memory and Cognition, 2, 467–471. Lustig, C., Konkel, A., & Jacoby, L. L. (2004). Which route to recovery? Controlled retrieval and accessibility bias in retroactive interference. Psychological Science, 15, 729–735. MacLeod, C. M., Dodd, M. D., Sheard, E. D., Wilson, D. E., & Bibi, U. (2003). In opposition to inhibition. In H. R. Brian (Ed.), Psychology of learning and motivation: Advances in research and theory (Vol. 43, pp. 163–214). San Diego, CA: Academic Press. MacLeod, M. D. (2002). Retrieval-induced forgetting in eyewitness memory: Forgetting as a consequence of remembering. Applied Cognitive Psychology, 16, 135–149. MacLeod, M. D., Bjork, E. L. & Bjork, R. A. (2003). The role of retrievalinduced forgetting in the construction and distortion of memories. In B. Kokinov & W. Hirst (Eds.), Constructive memory (pp. 55–68). NBU Series in Cognitive Science. Sofia, Bulgaria: New Bulgarian University Press. MacLeod, M. D., & Macrae, C. N. (2001). Gone but not forgotten: The transient nature of retrieval-induced forgetting. Psychological Science, 12, 148–152. MacLeod, M. D. & Saunders, J. (2005). REM-dependent repair of competitive memory suppression. Experimental Brain Research, 203(2), 471–477. MacLeod, M. D., & Saunders, J. (2008). Retrieval inhibition and memory distortion: Negative consequences of an adaptive process. Current Directions in Psychological Science, 17, 26–30.

Y109937.indb 149

10/15/10 11:03:51 AM

150 • Malcolm D. MacLeod and Justin C. Hulbert

MacLeod, M. D., Saunders, J., & Chalmers, L. (2010). Retrieval-induced forgetting: The unintended consequences of unintended forgetting. In G. M. Davies & D. Wright (Eds.), Current issues in applied memory research. Hove: Psychology Press. Macrae, C. N., & MacLeod, M. D. (1999). On recollections lost: When practice makes imperfect. Journal of Personality and Social Psychology, 77, 463–473. McGeoch, J. (1932). Forgetting and the law of disuse. Psychological Review, 39, 352–370. Melton, A., & Irwin, J. (1940). The influence of degree of interpolated learning on retroactive inhibition and the overt transfer of specific responses. American Journal of Psychology, 53, 173–203. Miller, G. (2009). Sleeping to reset overstimulated synapses. Science, 324, 22. Moscovitch, M. (2007). Memory: Why the engram is elusive. In H. L. Roediger, Y. Dudai, & S. M. Fitzpatrick (Eds.), Science of memory: Concepts (pp. 17–22). New York: Oxford University Press. Nadasdy, Z., Hirase, H., Czurko, A., Csicsvari, J., & Buzsáki, G. (1999). Replay and time compression of recurring spike sequences in the hippocampus. Journal of Neuroscience, 19, 9497–9507. Nadel, L., & Moscovitch, M. (1997). Memory consolidation, retrograde amnesia and the hippocampal complex. Current Opinion in Neurobiology, 7, 217–227. Nadel, L., Samsonovich, A., Ryan, L., & Moscovitch, M. (2000). Multiple trace theory of human memory: Computational, neuroimaging, and neuropsychological results. Hippocampus, 10, 352–368. Neely, J. (1976). Semantic priming and retrieval from lexical memory: Evidence for facilitatory and inhibitory processes. Memory and Cognition, 4, 648–654. Norman, K. A., Newman, E. L., & Detre, G. (2007). A neural network model of retrieval-induced forgetting. Psychological Review, 114, 887–953. Norman, K. A., Newman, E., Detre, G., & Polyn, S. (2006). How inhibitory oscillations can train neural networks and punish competitors. Neural Computation, 18, 1577–1610. Norman, K. A., Newman, E. L., & Perotte, A. J. (2005). Methods for reducing interference in the complementary learning systems model: Oscillating inhibition and autonomous memory rehearsal. Neural Networks, 18, 1212–1228. Osgood, C. E. (1946). Meaningful similarity and interference in learning. Journal of Experimental Psychology, 36, 277–301. Osgood, C. E. (1948). An investigation into the causes of retroactive interference. Journal of Experimental Psychology, 38, 132–154. Pace-Schott, E. F., & Hobson, A. J. (2002). The neurobiology of sleep: Genetics, cellular physiology and subcortical networks. Nature Reviews Neuroscience, 3, 591–605. Pavlides, C., & Winson, J. (1989). Influences of hippocampal place cell firing in the awake state on the activity of these cells during subsequent sleep episodes. Journal of Neuroscience, 9, 2907–2918.

Y109937.indb 150

10/15/10 11:03:52 AM

Sleep, Retrieval Inhibition, and the Resolving Power of Human Memory • 151

Pavlov, I. P. (1927). Conditioned reflexes; an investigation of the physiological activity of the cerebral cortex (G. V. Anrep, Trans.). London: Oxford University Press. Peigneux, P., Laureys, S., Delbeuck, X., & Maquet, P. (2001). Sleeping brain, learning brain. The role of sleep for memory systems. Neuroreport, 12, A111–A124. Peigneux, P., Laureys, S., Fuchs, S., Collette, F., Perrin, F., Reggers, J., et al. (2004). Are spatial memories strengthened in the human hippocampus during slow wave sleep? Neuron, 44, 535–545. Postman, L. (1961). The present status of interference theory. In C. N. Cofer (Ed.), Verbal learning and verbal behavior (pp. 152–179). New York: McGraw-Hill. Postman, L. (1971). Transfer, interference and forgetting. In J. W. Kling & L. A. Riggs (Eds.), Woodworth and Schlosberg’s experimental psychology (3rd ed., pp. 1019–1132). New York: Holt, Rinehart and Winston. Postman, L., Stark, K., & Fraser, J. (1968). Temporal changes in interference. Journal of Verbal Learning and Verbal Behavior, 7, 672–694. Postman, L., Stark, K., & Henschel, D. M. (1969). Conditions of recovery after unlearning. Journal of Experimental Psychology, 82, 1–24. Quin, Y. L., McNaughton, B. L., Skaggs, W. E., & Barnes, C. A. (1997). Memory reprocessing in corticocortical and hippocampocortical neuronal ensemble. Philosophical Transactions of the Royal Society B: Biological Sciences, 352, 1525–1533. Racsmány, M., Conway, M. A., & Demeter, G. (2010). Consolidation of episodic memories during sleep: Long-term effects of retrieval practice. Psychological Science, 21, 80–85. Rescorla, R. A. (2004). Spontaneous recovery. Learning and Memory, 11, 501–509. Roediger, H. L. (1974). Inhibiting effects of recall. Memory and Cognition, 2, 261–269. Saunders, J., & MacLeod, M. D. (2002). New evidence on the suggestibility of memory: The role of retrieval-induced forgetting in misinformation effects. Journal of Experimental Psychology: Applied, 8, 127–142. Saunders, J., & MacLeod, M. D. (2006). Can inhibition resolve retrieval competition through the control of spreading activation? Memory and Cognition, 34, 307–322. Sirota, A., Csicsvari, J., Buhl, D., & Buzsáki, G. (2003). Communication between neocortex and hippocampus during sleep in rodents. Proceedings of the National Academy of Sciences of the United States of America, 100, 2065–2069. Smith, A. D. (1971). Output interference and organized recall from long-term memory. Journal of Verbal Learning and Verbal Behavior, 10, 400–408. Smith, R. (1992). Inhibition: History and meaning in the sciences of mind and brain. Berkeley: University of California Press. Stickgold, R., & Walker, M. P. (2007). Sleep-dependent memory consolidation and reconsolidation. Sleep Medicine, 8, 331–343.

Y109937.indb 151

10/15/10 11:03:52 AM

152 • Malcolm D. MacLeod and Justin C. Hulbert

Storm, B. C., Bjork, E. L., & Bjork, R. A. (2007). When intended remembering leads to unintended forgetting. Quarterly Journal of Experimental Psychology, 60, 909–915. Storm, B. C., Bjork, E. L., & Bjork, R. A. (2008). Accelerated relearning after retrieval-induced forgetting: The benefit of being forgotten. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34, 230–236. Storm, B. C., Bjork, E. L., Bjork, R. A., & Nestojko, J. F. (2006). Is retrieval success a necessary condition for retrieval-induced forgetting? Psychonomic Bulletin and Review, 13, 1023–1027. Sutherland, G. R., & McNaughton, B. (2000). Memory trace reactivation in hippocampal and neocortical neuronal ensembles. Current Opinion in Neurobiology, 10, 180–186. Tononi, G., & Cirelli, C. (2003). Sleep and synaptic homeostasis: A hypothesis. Brain Research Bulletin, 62, 143–150. Tononi, G., & Cirelli, C. (2006). Sleep function and synaptic homeostasis. Sleep Medicine Reviews, 10, 49–62. Tulving, E. (1983). Elements of episodic memory. Oxford, UK: Oxford University Press. Tulving, E., & Madigan, S. A. (1970). Memory and verbal learning. Annual Review of Psychology, 21, 437–484. Veling, H., & van Knippenberg, A. (2004). Remembering can cause inhibition: Retrieval-induced inhibition as cue-independent process. Journal of Experimental Psychology: Learning, Memory and Cognition, 30, 315–318. Vertes, R. P. (2004). Memory consolidation in sleep: Dream or reality. Neuron, 44, 135–148. Vyazovskiy, V. V., Cirelli, C., Pfister-Genskow, M., Faraguna, U., & Tononi, G. (2008). Molecular and electrophysiological evidence for net synaptic potentiation in wake and depression in sleep. Nature Neuroscience, 11, 200–208. Walker, M. P. (2005). A refined model of sleep and the time course of memory formation. Behavioral and Brain Sciences, 28, 51–104. Wheeler, M. A. (1995). Improvement in recall over time without repeated testing—Spontaneous-recovery revisited. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 173–184. Whitten, W. B., & Bjork, R. A. (1977). Learning from tests: Effects of spacing. Journal of Verbal Learning and Verbal Behavior, 16, 465–478. Wilson, M. A. (2002). Hippocampal memory formation, plasticity, and the role of sleep. Neurobiology of Learning and Memory, 78, 565–569. Wilson, M. A., & McNaughton, B. L. (1994). Reactivation of hippocampal ensemble memories during sleep. Science, 265, 676–679. Wixted, J. T. (2004). The psychology and neuroscience of forgetting. Annual Review of Psychology, 55, 235–269.

Y109937.indb 152

10/15/10 11:03:52 AM

8

Blocking Out Blocks Adaptive Forgetting of Fixation in Memory, Problem Solving, and Creative Ideation Steven M. Smith

A Creative Insight: Forgetting Can Be Adaptive To most people, forgetting is a terrible thing. To students taking exams, forgetting can mean poor grades. To business people trying to remember the names of customers or supervisors, forgetting can mean career failure. To the elderly, forgetting can mean imminent cognitive decline. The downside of forgetting is quite clear to everyone. Why do we forget? Does forgetting simply indicate a type of cognitive failure? If so, then why would the human species, supposedly in an advantageous evolutionary position in terms of our adaptive mental abilities, be so susceptible to forgetting? Perhaps, in spite of our susceptibility to forgetting, we humans nonetheless remember better than do other species. Perhaps this is so, yet faced with the amazing feats of memory shown by salmon remembering their spawning waters over a lifetime, bees remembering locations of food sources in relation to the hive and the times of day when those sources are optimal, and squirrels remembering locations of thousands of hidden food caches, the superiority of human memory seems less obvious. A creative perspective of our human tendency to forget is that forgetting at times can be adaptive. This is not to suggest that getting answers wrong 153

Y109937.indb 153

10/15/10 11:03:52 AM

154 • Steven M. Smith

on exams, forgetting where important documents are filed, and calling co-workers by the wrong names are adaptive events. Rather, the idea is that certain recurring situations are consistently and optimally managed by cognitive operations that necessitate at least temporary forgetting. Bjork’s studies of adaptive forgetting have explored phenomena such as updating, reducing interference, and improving the retrieval strength of material being learned. Updating is the process of replacing outdated information with current relevant information, such as where one’s vehicle is currently parked rather than where it was previously parked, what a patient’s current medical status is rather than what it was previously, or what the situation is currently in an athletic competition rather than what it was previously (e.g., E. L. Bjork & Bjork, 1988; R. A. Bjork, 1972, 1978; R. A. Bjork & Bjork, 1992). Forgetting a system’s previous status, or putting it out of mind, can facilitate a clearer awareness of a current state. Proactive interference, a memory problem caused by confusion arising from prior learning, can be mitigated by directed forgetting of the prior knowledge (e.g., R. A. Bjork, 1970). A learning trial can benefit one’s ability to retrieve encoded material more if forgetting has occurred prior to that study trial (R. A. Bjork & E. Bjork, 1992). In addition to these uses of forgetting, and others (e.g., regulation of emotion via forgetting of unpleasant memories and ideas), Bjork’s insight that forgetting can have adaptive consequences has proven useful in understanding certain puzzling cognitive phenomena, specifically hypermnesia and incubation. Here, I will describe my work in which I have applied Bjork’s adaptive forgetting ideas in the form of the forgetting fixation hypothesis.

Two Mysteries: Hypermnesia and Incubation Two mysterious phenomena in cognitive psychology are hypermnesia and incubation. Hypermnesia refers to the finding that with repeated testing, memory can show improvement over time, even without reexposure to the to-be-remembered material. It seems that we have always known that more is forgotten as time passes, and psychologists have quantified this forgetting at length since Ebbinghaus’s original experimental findings. The ubiquity of decay functions has been apparent whether measured in milliseconds, minutes, or years. How is it possible, then, that people might remember more with the passage of time, rather than less? Furthermore, hypermnesia effects can be decomposed into forgetting (loss of information over time) and reminiscence (remembering previously forgotten material) components; hypermnesia is the net increase in recall seen when these two measures are added. Thus,

Y109937.indb 154

10/15/10 11:03:52 AM

Blocking Out Blocks • 155

the real mystery is reminiscence, the seemingly spontaneous recovery of forgotten information. The second of these mysteries is incubation, a phenomenon that historically has been celebrated as a mystery of creative geniuses, but that experimentally has been difficult to observe. Incubation refers to the creative insight that is sometimes achieved after difficult and seemingly intractable problems have been put aside. During the time away from the problem, or immediately upon one’s return to the problem, one might experience a sudden and unpredicted insight into the ultimate solution to the problem. Clearly, most problems are solved when we work on them. The mystery of incubation is that solutions are achieved not by continuing to work on problems, but rather by turning away from work on them. My research on these two mysterious phenomena has been inspired by Bjork’s creative insight that forgetting sometimes can be adaptive. In both cases, and in related phenomena, my approach has been that recovery can occur only when there is an impasse from which to recover, and that approaches or strategies that lead to impasses must be avoided or escaped if successful strategies are going to lead to recovery.

The Forgetting Fixation Hypothesis Fixation is a very general term related to interference, response competition, and blocking. In simple terms, fixation refers to getting stuck on an incorrect answer, or an inappropriate strategy for some task. A classic example of fixation in problem solving includes functional fixity (e.g., Maier, 1931), in which people find it difficult to comprehend the usefulness of an object for solving a problem because they persist in thinking only of the object’s typical or intended function. In Maier’s two-string problem, people have difficulty in seeing pliers or scissors as a pendulum weight because they tend to be stuck in thinking of the more typical functions of those tools. Another classic example of fixation was demonstrated by Luchins (e.g., Luchins & Luchins, 1959), using the water jar problem. In this case, people who discover a mathematical formula for solving one problem find that they can repeatedly reapply the solution for a series of similar problems, mechanically adopting a mental set for reproducing the same solution again and again. Upon encountering a simple problem for which the mental set is inappropriate, people are frequently stymied because they are fixated, or stuck using the inappropriate solution. In both of these cases, whether fixation is self-induced, when experimental participants draw inappropriate conceptual information from memory, or experimenter induced,

Y109937.indb 155

10/15/10 11:03:53 AM

156 • Steven M. Smith

when inappropriate information is brought to mind by experimental stimuli, fixation can block or impede success in creative problem solving. Can blocks of this nature be avoided or escaped? The forgetting fixation hypothesis, which derives directly from Bjork’s insight that forgetting can be adaptive, states that incubation and reminiscence (and hypermnesia) occur when initially incurred blocks or impasses are relieved via forgetting. In my studies of the forgetting fixation hypothesis, I have initially induced fixation with experimental manipulations, and then observed the effects of time away from the fixated task in terms of recovery. Experimentally, there has first been the onus of showing that certain manipulations produce fixation effects, and of analyzing the nature of those effects. For incubation effects, the burden of proof also extends to the need to show that such experimental manipulations cause fixation not only in simple laboratory paradigms, but also in more ecologically valid tasks that better resemble creative thinking. The second step, experimentally, has involved time away from initially fixated activities. The relevant comparison has been time working continuously on fixated tasks compared with segments of work that total the same amount of time, but that are separated by a time period (an incubation interval) involving an unrelated task. I first will review my research that has examined ways of experimentally inducing fixation in memory, problem solving, and creativity tasks. Next, I will review research that examines relief from initial fixation as a function of incubation intervals, looking at memory tasks, problem solving, and brainstorming.

Experimentally Induced Fixation (Blocking) in Memory Tasks Research on interference effects, or forgetting caused by response competition, has a long history (e.g., Melton & von Lackum, 1941; Postman & Underwood, 1973). There is clear evidence for other mechanisms of forgetting, most notably retrieval inhibition, proposed by Bjork (e.g., Anderson, Bjork, & Bjork, 1994; R. A. Bjork, 1972, 1989), but the work I will discuss here focuses on forgetting via memory blocking. The two issues concerning memory blocking I have examined are implicit memory blocking and reversible memory blocks. Implicit memory blocking is at the heart of fixation in creativity and problem solving because such blocks have been thought to be unconscious in nature and, therefore, beyond one’s voluntary control. The reversibility of memory blocks is the key to the forgetting fixation explanation of reminiscence and

Y109937.indb 156

10/15/10 11:03:53 AM

Blocking Out Blocks • 157

incubation effects, because in both cases the explanation concerns recovering from initial blocks. For reminiscence, the forgetting fixation hypothesis states that later retrieval attempts can benefit from recovery from initial output interference, and that a longer delay between retrieval attempts allows initially blocking responses time to be forgotten, thereby weakening the blocking effect. For incubation, the forgetting fixation hypothesis offers the same explanation, that initial memory blocking that causes fixation is weakened over time, and that reversing these initial blocking effects is what can cause incubation effects. Can fixation and memory blocks be caused implicitly, involuntarily, and without conscious awareness? Decades of research has shown clearly that implicit memory occurs involuntarily, without conscious awareness (e.g., Schacter, 1987). Is the same true for memory blocks? One of the most popularly used implicit memory tests is the word fragment completion test (e.g., Graf & Schacter, 1985; Schacter, 1987). In this procedure, participants typically read a list of words, such as analogy, density, and fixture. Later instructions, without referring directly to the previous list of words, ask participants to fill in the missing letters from many word fragments, including fragments of words from the list that was previously read, such as a _ _ l _ g y, d _ n _ i t y, and f i _ _ u r e. Successful completion of previously read word fragments is taken as a sign of implicit memory, that is, memory of events that is not accompanied by an awareness of remembering those events. Smith and Tindell (1997) used this word fragment completion paradigm to induce and examine implicit memory blocks. This implicit blocking paradigm uses the same principle as the implicit memory test, but rather than fragments that correspond exactly to previously read words, the fragments are orthographically similar to read words, but cannot correctly be completed with the read words. The stimuli we used in the Smith and Tindell study are shown in Table 8.1. In that study, implicit blocking effects were robust, uninfluenced by level of processing at encoding or by the proportion of test fragments that could be completed by read words, and were not eliminated when participants were explicitly warned not to think about read words as they were completing the test fragments. When participants were (wrongly) advised to try to use the previously read words on the fragment completion test, this increased the blocking effect, indicating that although explicit retrieval increased blocking, it is not necessary for blocking, which occurs even in the absence of such explicit instructions. The implicit blocking effect has been explored by others, including Lustig and Hasher (2001), Logan and Balota (2003), Leynes, Rass, and Landau (2008), and Kinoshita and Towgood (2001). One interesting twist was

Y109937.indb 157

10/15/10 11:03:53 AM

158 • Steven M. Smith Table 8.1 Stimuli From Smith and Tindell’s (1997) Implicit Memory Blocking Paradigm Blocker

Fragment

Target

ANALOGY BRIGADE COTTAGE CHARTER CLUSTER CRUMPET DENSITY FIXTURE HOLSTER

A_L_ _GY B_G_A_E C_TA_ _G CHAR_T_ C_U_TR_ CU_P_ _T D_ _NITY F_I_URE H_ST_R_

ALLERGY BAGGAGE CATALOG CHARITY COUNTRY CULPRIT DIGNITY FAILURE HISTORY

implemented by Kozak, Sternglanz, Viswanathan, and Wegner (2008), who found that blocking effects were exacerbated after participants first tried to suppress thoughts of the blocker words. The implicit memory blocking effect could serve as the basis of fixation in creative problem ideation and problem solving. Can strong, reversible forgetting effects be experimentally induced? We created such a paradigm to examine memory blocking and recovery in accurate and false memories (Smith et al., 2003, Smith & Moynan, 2008). The method used to create blocked and recovered memories in a laboratory paradigm involves multiple reexposures to the majority of initially studied items (see Figure 8.1). Two different groups study many categorized word lists (three are critical lists, many others are filler lists). Each list includes the name of the category (e.g., birds) plus 10 members of that category (e.g., robin, cardinal, sparrow). Following study, those in the control group have three nonverbal tasks (e.g., mental rotation, math problems). Those in the forget group are given three tasks involving reexposure to the many filler lists (during this time, the critical lists are not seen again). The control and forget participants see critical lists only once, during initial study. Multiple reexposures to filler lists in the forget condition block retrieval of the critical lists. On an initial free-recall test, participants recall the category names shown during initial study (e.g., birds). On a subsequent cued-recall test, the three critical category names are provided and participants recall the members of the categorized lists. This category cued-recall test is used to gauge memory recovery for members of critical lists. The result of this procedure, on the initial free-recall test, is a powerful forgetting effect for the minority of nonrepeated items. For example, Smith and Moynan (2008) reported that death-related words were recalled 63%

Y109937.indb 158

10/15/10 11:03:53 AM

Blocking Out Blocks • 159 Initial Encoding of a Few Critical Lists and Many Filler Lists Control Condition

Forget Condition

Nonverbal Tasks e.g., mental rotation, mazes

Interference Tasks e.g., ratings of filler list items

Free Recall of Categorized List Names

Forgetting Effect Assessed

Cued Recall of Categorized List Members

Recovery Effect Assessed

Figure 8.1 Memory blocking and recovery paradigm used by Smith et al. (2002) and Smith and Moynan (2008).

more often in the control condition than in the forget condition. Large forgetting effects were found even for distinctive materials with violent and sexual content (expletives). Importantly, this forgetting effect was completely reversible; participants provided with appropriate retrieval cues (i.e., the critical category names) were able to recall as much whether they had been assigned to the control or forget conditions. Given that such powerful forgetting effects are reversible, it is reasonable that recovery from memory blocks can serve as the basis for forgetting fixation explanations of reminiscence and incubation effects.

Experimentally Induced Fixation (Blocking) in Creative Problem Solving Can memory blocking stimuli cause fixation in creative problem solving? Smith and Blankenship (1989) first demonstrated such effects with rebus problems, puzzles that are solved by matching representations of words in the form of pictures or symbols with common English phrases. Examples of rebus problems used by Smith and Blankenship (1989) are shown in Figure 8.2. In that study, useful hints were provided for many noncritical rebus problems, such as Problems 1 and 2 in Figure 8.2. Misleading clues were provided for critical problems, as in Problems 3 and 4. These inappropriate clues yielded a fixation effect;

Y109937.indb 159

10/15/10 11:03:54 AM

160 • Steven M. Smith Example

1. Clues: between lines

Clue: between r|e|a|d|i|n|g you just me Solution:

2. Clues: below degrees 0 B.A. Ph.D. M.D.

3. Clues: paper over

4. Clues: under not

fly night

wheather

“just between you & me”

Figure 8.2 Experimental stimuli used by Smith and Blankenship (1989). Helpful clues on Problems 1 and 2 lead to the solutions “reading between the lines” and “three degrees below zero.” Misleading clues on Problems 3 and 4 block the solutions “fly by night” and “an ill spell of weather.”

participants were less able to solve critical rebus problems when misleading clues were provided. Adaptive forgetting of these fixating clues should produce incubation effects, according to the forgetting fixation hypothesis. The same type of fixation effect is found with Remote Associates Test (RAT) problems (Kohn & Smith, 2009; Smith & Blankenship, 1991; Vul & Pashler, 2005; Wiley, 1998), which have been used as a measure of creative thinking (e.g., Mednick, 1962). When associates of test words that are not solutions are primed, participants are less able to solve RAT problems, another demonstration of fixation in creative problem solving caused by memory of inappropriate information.

Experimentally Induced Fixation in Creativity Tasks Do primed stimuli fixate creative thinking, as they do in memory and problem-solving tasks? Smith, Ward, and Schumacher (1993) gave participants creative idea generation tasks, such as imagining creatures that might evolve on a planet similar to Earth or creating new imaginative toys, and showed examples to half of the treatment groups (the fixation treatment groups) before they began the task. Although briefly viewing examples did not affect the number of ideas generated by participants, it clearly and consistently increased the inclusion of exemplified features in participants’ creative sketches, termed a conformity effect.

Y109937.indb 160

10/15/10 11:03:54 AM

Blocking Out Blocks • 161

Figure 8.3 Top: The three examples given to participants who were imagining life forms that might evolve on a planet like Earth (Smith et al., 1993). The creature on the bottom left conforms to the examples because, like them, it has four legs, antennae, and a tail. The creature on the bottom right, sketched by a participant who saw no examples, has none of the three critical features.

As in Smith and Tindell’s (1997) implicit memory blocking experiments, participants in Smith et al.’s (1993) conformity experiments increased their conformity effects when they were advised to create ideas similar to the examples, indicating that explicit retrieval of examples is sufficient but not necessary to produce creative conformity. Furthermore, giving explicit instructions to create ideas that were different from the examples did nothing to mitigate the conformity effects, similar to the failure of explicit warnings to mitigate implicit blocking effects (Smith & Tindell, 1997). Conformity effects, even with instructions to avoid them, now have been demonstrated and explored by many others (e.g., Landau & Lehr, 2004; Landau & Leynes, 2004; Landau, Thomas, Thelen, & Chang, 2002; Marsh, Landau, & Hicks, 1997; Marsh, Ward, & Landau, 1999). Is fixation caused by primed examples in more realistic tasks involving creative thinking? Jansson and Smith (1991) studied design fixation in

Y109937.indb 161

10/15/10 11:03:54 AM

162 • Steven M. Smith

engineering design students and in professional engineers. Participants were given realistic design problems, such as creating a measuring device for visually impaired people, designing a new type of bicycle rack, inventing an inexpensive spill-proof coffee cup, and designing a biomechanical device for monitoring the intestine. Half of the participants were briefly shown a single example (the fixation condition), and half saw no examples. The examples were either limited or flawed in various respects, like the one shown in Figure 8.4. This design would leak and would scald the user’s mouth because the straw and mouthpiece would not allow air to be mixed with the hot liquid during drinking. Even when participants were explicitly forbidden to use negative features of the example design (Figure 8.4), they nonetheless incorporated those features in their sketches, a phenomenon now referred to as design fixation. Here, fixation caused by primed examples can be observed in a realistic task involving creative design. Design fixation has become a subject of study in numerous investigations of creative conceptual design (e.g., Chrysikou & Weisberg, 2005; Dahl & Moreau, 2002; Purcell & Gero, 1991; Purcell, Williams, Purcell, Gero, Edwards, & Matka, 1994; Gero, & Colbron, 1993; Shah, Smith, Vargas-Hernandez, Gerkens, & Wulan, 2003). Plastic top

Mouth piece

Tube Coffee

Styrofoam cup

Figure 8.4 Participants in the fixation conditions of Jansson and Smith’s (1991) study were shown examples such as the one in this figure. In this experiment participants were instructed to “Create, sketch, and label the parts of a new inexpensive spill-proof coffee cup. Do not use drinking straws or mouthpieces.”

Y109937.indb 162

10/15/10 11:03:55 AM

Blocking Out Blocks • 163

Another activity in which creative ideas are sought is brainstorming, where a group of participants tries to generate creative solutions to problems (e.g., Osborn, 1957; Paulus, 2000). Although it is commonly believed that group brainstorming is better than individual work, it has often been shown that individuals brainstorming alone produce more than they would produce collaboratively, a phenomenon referred to as a brainstorming productivity deficit (e.g., Diehl & Stroebe, 1987, 1991; Mullen, Johnson, & Salas, 1991; Nijstad & Stroebe, 2006; Nijstad, Stroebe, & Lodewijkx, 2003; Paulus & Dzindolet, 1993; Stroebe & Diehl, 1991). This productivity deficit in brainstorming resembles the collaborative inhibition effect in which the amount recalled by individuals alone is more than that by the same individuals recalling in a group setting (e.g., Basden, Basden, Bryner, & Thomas, 1997; Weldon & Bellinger, 1997). Do brainstorming participants suffer a type of fixation effect when they hear the ideas of others in their group? Kohn and Smith (2010) used an electronic brainstorming medium to experimentally control the ideas offered by brainstorming group members. Nominal groups (individuals brainstorming alone) generated more nonredundant ideas than did real groups, even without our experimental manipulation of ideas, the typical brainstorming deficit. When individuals saw ideas generated by a confederate, their ideas conformed to those of the confederate, increasingly as more confederate ideas were seen, and participants explored fewer relevant domains.

Conclusions From Fixation and Blocking Experiments Memory blocking can operate both implicitly and explicitly, as shown by experimental manipulations. This blocking effect manifests itself as fixation in problem solving, creative idea generation, brainstorming, and creative conceptual design. When people are initially fixated, does time away from a memory task or from creative problem solving result in incubation effects? I will now review research on relief from initial fixation caused by incubation intervals in memory tasks, problem solving, and brainstorming.

Incubation Effects in Memory Tasks As previously noted, reminiscence is something of a conundrum because the phenomenon refers to increasing memory over time, contrary to typical forgetting effects. What causes reminiscence? One of

Y109937.indb 163

10/15/10 11:03:55 AM

164 • Steven M. Smith

the most interesting studies of reminiscence and hypermnesia was reported by Roediger and Thorpe (1978), who replicated Erdelyi and Becker’s (1974) finding that recall increased over three successive recall tests, but who also showed that the amount recalled after three 7-minute tests was the same as that for a single 21-minute test. The simplest explanation of this result might appear to be that with longer time allowed for recall, participants make more attempts to retrieve material. Roediger and Thorpe speculated, however, that reminiscence “may be due to the gradual release of the inhibition produced by the act of recall” (p. 304). Memories initially retrieved, according to this explanation, cause output interference, so a time-dependent release from output interference would be an example of adaptive forgetting of items initially recalled. Using just such an explanation, Smith and Vela (1991) reasoned that time to recall and time to forget output interference were two different factors that could be experimentally unconfounded. That is, with equal durations of recall time, we manipulated time to forget output interference; any concomitant increases in recall can be attributed to forgetting the initially blocking output interference, not to increased attempts to recall items. Holding time to recall constant, we gave participants a list of 50 pictures, followed by two recall tests. The second recall test was given either immediately after the first test or after a 1-, 5-, or 10-minute delay. Longer delays allowed more forgetting of initial fixation. Three experiments in this study produced incubated reminiscence effects, that is, greater reminiscence (and hypermnesia) when the retest followed a delay (Figure 8.5). Memory recovery was greater when incubation intervals were longer. These findings are consistent with the idea that output interference accumulates during an initial recall test, and interference or blocking diminishes during an incubation interval. Incubated reminiscence effects were most pronounced in the first minute of the retest (Figure 8.6). Because incubation intervals allow changes in mental set, recall after a delay was restarted with a different set of subjective retrieval cues, making previously unrecalled items more accessible immediately upon returning to the recall task. Without an incubation interval, the already recalled items remained highly accessible at the retest because fixation was not forgotten. Participants in Smith and Vela’s experiments were not forewarned about the second recall test. Nonetheless, they might have continued their recall attempts during the unfilled incubation interval between the two recall tests. To test whether incubation time aids reminiscence by allowing more surreptitious retrieval attempts, or by allowing output interference to be forgotten, we replicated the incubated reminiscence

Y109937.indb 164

10/15/10 11:03:55 AM

Blocking Out Blocks • 165 6 5

Total reminiscence Total hypermnesia

4 3 2 1 0

No Incubation

1-Min. Incubation

5-Min. Incubation

10-Min. Incubation

Figure 8.5 Reminiscence and hypermnesia in Smith and Vela’s (1991) study as a function of incubation intervals between recall tests. 2.5

Minute 1 Minute 2

2

Minute 3 Minute 4 Minute 5

1.5

1

0.5

0

No Incubation

1-Min. Incubation 5-Min. Incubation 10-Min. Incubation

Figure 8.6 Reminiscence minute by minute during retest in Smith and Vela’s (1991) study as a function of incubation condition.

Y109937.indb 165

10/15/10 11:03:56 AM

166 • Steven M. Smith

experiment, comparing the effects of an unfilled interval (which might have allowed retrieval attempts) with those of an interval filled with a demanding nonverbal task (maze puzzles); the demanding task should prevent additional recall attempts, but should not interfere with verbal memories. This experiment replicated the incubated reminiscence effect (Figure 8.7); greater reminiscence was seen when the retest was delayed, rather than immediately after the first recall test. Again, the gains in reminiscence caused by incubation intervals can be seen in the first minute of the retest. The incubated reminiscence effect was not diminished when the incubation interval was filled with maze puzzles; in fact, the effect was slightly, but not significantly, better when incubation was filled with a demanding nonverbal task. This result indicates that forgetting initially output items improved recovery of initially blocked items, consistent with the forgetting fixation hypothesis. As reviewed above, the implicit memory blocking paradigm (Smith & Tindell, 1997) shows that exposure to blocker words (e.g., analogy) that are orthographically similar to word fragment solution words (e.g., allergy for a _ l _ _ gy) decreases the ability to complete the similar word fragments. This effect, termed the memory blocking effect (MBE), has been explored by several studies (e.g., Landau & Leynes, 2006; Logan & Balota, 2003; Lustig & Hasher, 2001). Leynes et al. (2008) replicated the memory blocking effect with a fragment completion test given only 2 1.8 1.6 1.4 1.2

Minute 1 Minute 2 Minute 3 Minute 4 Minute 5

1 0.8 0.6 0.4 0.2 0

No Incubation

Unfilled Incubation

Filled Incubation

Figure 8.7 Reminiscence minute by minute during retest in Smith and Vela’s (1991) study as a function of filled versus unfilled incubation intervals.

Y109937.indb 166

10/15/10 11:03:57 AM

Blocking Out Blocks • 167

a few minutes after exposure to blockers, but showed that the blocking effect disappeared after a delay of 72 hours. This result provides support for the forgetting fixation hypothesis, in this case, with an implicit memory effect.

Incubation Effects in Creative Problem Solving Before Smith and Blankenship’s (1989) experimental study of incubation, there had been many unsuccessful attempts to produce incubation effects (e.g., Dominowski & Jenrick, 1972; Gall & Mendelsohn, 1967; Olton, 1979; Olton & Johnson, 1976), and no cases of successful experiments that were ever replicated (although some failed replication). With an experimental paradigm based on the forgetting fixation hypothesis, they demonstrated and replicated incubation effects in creative problem solving. Having shown a blocking effect by providing bad clues initially with critical rebus problems, Smith and Blankenship (1989) retested initially unsolved problems either immediately, after an incubation interval of 5 minutes, or after an incubation interval of 15 minutes. Resolution, calculated as the proportion of initially unsolved problems that were solved at retest, was greater following an incubation interval, and was greater the longer the interval (Figure 8.9). After the retest of the critical problems, participants were asked to recall the misleading clues that went with the problems. Recall of the bad clues was worse the longer the incubation interval (Figure 8.8). Consistent with the forgetting fixation hypothesis, greater incubation effects were accompanied with poorer recall of misleading clues. Smith and Blankenship (1991) found that fixation was induced on initial problem-solving attempts by presenting associates of RAT problem words either alongside the test words when RAT problems were given, or in advance of the RAT problems as paired associates (See Table 8.2). Initially, unsolved RAT problems were retested either immediately or after incubation intervals of several minutes, and resolution scores were calculated as the proportion of initially unsolved problems that were solved at retest. Three experiments showed significant incubation effects, defined as greater resolution after an incubation interval than on an immediate, nondelayed retest. Significant incubation effects were found, however, only in treatment conditions in which RAT problems were initially fixated (Figure 8.9). These results again demonstrate replicable incubation effects, and are consistent with the forgetting fixation hypothesis. Using RAT problems, Vul and Pashler (2007) also found that incubation effects were

Y109937.indb 167

10/15/10 11:03:57 AM

168 • Steven M. Smith Smith & Blankenship (1989) Results

Proportion

0.8 0.6 0.4 0.2 0

Recall of Bad Clues Resolution at Retest

Immediate Retest

5-Min Delayed Retest

15-Min Delayed Retest

Incubation Time

Figure 8.8 Mean proportions of problems resolved and recall of misleading clues as a function of incubation condition in Experiment 1 of Smith and Blankenship’s (1989) study. 0.45 0.4

Resolution

0.35

Initially Fixated

0.3 0.25

Not Fixated

0.2 0.15 0.1 0.05 0

Immediate Retest

Incubation

Figure 8.9 Resolution (proportion of initially unsolved problems solved at retest) of RAT problems as a function of initial fixation and incubation interval (from Experiment 1 of Smith & Blankenship, 1991). An incubation effect is seen only for initially fixated items.

observed only when participants were experimentally fixated. Kohn and Smith (2009) replicated and extended these findings. In that study, only RAT problems that were initially fixated were susceptible to incubation effects. Furthermore, only demanding incubation activities that removed RAT problems from sight caused incubation effects; keeping initially unsolved RAT problems in sight during the incubation interval precluded incubation effects. All of these results show that incubation

Y109937.indb 168

10/15/10 11:03:58 AM

Blocking Out Blocks • 169 Table 8.2 Remote Associates Test (RAT) Problems, Blocker Words, and Solutions Remote Associates Test Problems

Blockers

Solutions

SALAD HEAD GOOSE BED DUSTER WEIGHT APPLE HOUSE FAMILY CAT SLEEP BOARD WATER SKATE CUBE ARM COAL STOP

lettuce room green black sugar rest

egg feather tree walk ice pit

effects are strongly linked to initial fixation and are consistent with the forgetting fixation hypothesis. Kohn and Smith (2010), using an electronic brainstorming medium, and a confederate to experimentally provide varying numbers of ideas to participants, found that participants showed fixation effects when they viewed others’ responses. In Experiment 3 of their study, they gave participants 20 minutes for their brainstorming session. During the first 10 minutes, participants were either given fixating ideas or saw no fixating ideas. In the no-incubation condition, the second 10 minutes of brainstorming was continuous with the first 10 minutes, whereas in the incubation condition, the second 10 minutes of brainstorming followed a 5-minute incubation interval during which participants solved maze puzzles. More ideas were generated and more categories were explored in the last 10 minutes of brainstorming when an incubation interval was provided, relative to the no-incubation condition. This incubation effect was limited, however, to the conditions in which participants were initially fixated; incubation effects did not occur for nonfixated participants. As in the creative problem-solving and memory tasks discussed above, these results, linking incubation effects to initial blocking effects, are consistent with the forgetting fixation hypothesis.

Summary and Conclusions Bjork’s insight that forgetting can be adaptive inspired my forgetting fixation hypothesis, the notion that weakened memory blocking effects cause reminiscence and recovery in memory, and incubation effects in creative problem solving. Bjork’s idea of adaptive forgetting has led me through a long line of research projects, looking at ways of experimentally inducing memory blocks and reducing blocks via adaptive forgetting. Forgetting initially retrieved material (blockers) or weakened

Y109937.indb 169

10/15/10 11:03:58 AM

170 • Steven M. Smith

output interference can enhance reminiscence, a measure of memory recovery. Forgetting inappropriate or misleading clues can enhance incubation effects, or recovery from initially blocked or fixated problem solving. My own work, and that of others researching the same problems of recovery from initial memory blocks, represents only one fragment of the research inspired by Bjork’s insight about the adaptive uses of forgetting. Other areas of research on adaptive uses of forgetting range from emotion regulation to updating to educational applications. Potentially adaptive cognitive mechanisms of forgetting other than blocking include inhibition, suppression, overwriting, and implicit memory mechanisms. Although the idea is unintuitive, Bjork’s insight about adaptive uses of forgetting is broadly supported by experimental research, and it continues to inspire new ways of understanding human cognition.

References Anderson, M. C., Bjork, R. A., & Bjork, E. L. (1994). Remembering can cause forgetting: Retrieval dynamics in long-term memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20(5), 1063–1087. Basden, B. H., Baden, D. R., Bryner, S., & Thomas, R. L. I. (1997). A comparison of group and individual remembering: Does group participation disrupt retrieval? Journal of Experimental Psychology: Learning, Memory, and Cognition, 23, 1656–1669. Bjork, E. L., & Bjork, R. A. (1988). On the adaptive aspects of retrieval failure in autobiographical memory. In M. M. Gruneberg, P. E. Morris, & R. N. Sykes (Eds.), Practical aspects of memory II (pp. 283–288). London: Wiley. Bjork, R. A. (1970). Positive forgetting: The noninterference of items intentionally forgotten. Journal of Verbal Learning and Verbal Behavior, 9, 255–268. Bjork, R. A. (1972). Theoretical implications of directed forgetting. In A. W. Melton & E. Martin (Eds.), Coding processes in human memory (pp. 217– 235). Washington, DC: Winston. Bjork, R. A. (1978). The updating of human memory. In G. H. Bower (Ed.), The psychology of learning and motivation (Vol. 12, pp. 235–259). New York: Academic Press. Bjork, R. A. (1989). Retrieval inhibition as an adaptive mechanism in human memory. In H. L. Roediger, III & F. I. M. Craik (Eds.), Varieties of memory and consciousness: Essays in honour of Endel Tulving (pp. 309–330). Hillsdale, NJ: Lawrence Erlbaum Associates.

Y109937.indb 170

10/15/10 11:03:58 AM

Blocking Out Blocks • 171

Bjork, R. A., & Bjork, E. L. (1992). A new theory of disuse and an old theory of stimulus fluctuation. In A. Healy, S. Kosslyn, & R. Shiffrin (Eds.), From learning processes to cognitive processes: Essays in honor of William K. Estes (Vol. 2, pp. 35–67). Hillsdale, NJ: Erlbaum. Chrysikou, E. G., & Weisberg, R. W. (2005). Following the wrong footsteps: Fixation effects of pictorial examples in a design problem-solving task. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31(5), 1134–1148. Dahl, D. W., & Moreau, P. (2002). The influence and value of analogical thinking during new product ideation. Journal of Marketing Research, 39, 47–60. Diehl, M., & Stroebe, W. (1987). Productivity loss in brainstorming groups: Toward the solution of a riddle. Journal of Personality and Social Psychology, 53(3), 497–509. Diehl, M., & Stroebe, W. (1991). Productivity loss in idea-generating groups: Tracking down the blocking effect. Journal of Personality and Social Psychology, 61(3), 392–403. Dominowski, R. L., & Jenrick, R. (1972). Effects of hints and interpolated activity on solution of an insight problem. Psychonomic Science, 26(6), 335–338. Erdelyi, M. H., & Becker, J. (1974). Hypermnesia for pictures: Incremental memory for pictures but not words in multiple recall trials. Cognitive Psychology, 6(1), 159–171. Gall, M., & Mendelsohn, G. A. (1967). Effects of facilitating techniques and subject-experimenter interaction on creative problem solving. Journal of Personality and Social Psychology, 5(2), 211–216. Graf, P., & Schacter, D. L. (1985). Implicit and explicit memory for new associations in normal and amnesic subjects. Journal of Experimental Psychology: Learning, Memory, and Cognition, 11, 501–518. Jansson, D. G., & Smith, S. M. (1991). Design fixation. Design Studies, 12(1), 3–11. Kinoshita, S., & Towgood, K. (2001). Effects of dividing attention on the memory-block effect. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27, 889–895. Kohn, N. W., & Smith, S. M. (2009). Partly versus completely out of your mind: Effects of incubation and distraction on resolving fixation. Journal of Creative Behavior, 43(2), 102–118. Kohn, N. W., & Smith, S. M. (2010). Collaborative fixation: Effects of others’ ideas on brainstorming. Applied Cognitive Psychology (24), 1–22. Published online in Wiley Interscience (www.interscience.wiley.com) DOI: 10.1002/acp.1699. Kozak, M., Sternglanz, R. W., Viswanathan, U., & Wegner D. M. (2008). The role of thought suppression in building mental blocks. Consciousness and Cognition: An International Journal, 17(4), 1123–1130. Landau, J. D., & Lehr, D. P. (2004). Conformity to experimenter-provided examples: Will people use an unusual feature? Journal of Creative Behavior, 38(3), 180–191.

Y109937.indb 171

10/15/10 11:03:58 AM

172 • Steven M. Smith

Landau, J. D., & Leynes, P. A. (2004). Manipulations that disrupt generative processes decrease conformity to examples: Evidence from two paradigms. Memory, 12, 90–103. Landau, J. D., & Leynes, P. A. (2006). Do explicit memory manipulations affect the memory blocking effect? American Journal of Psychology, 119, 463–479. Landau, J. D., Thomas, D. M., Thelen, S. E., & Chang, P. (2002). Source monitoring in a generative task. Memory, 10(3), 187–197. Leynes, P. A., Rass, O., & Landau, J. D. (2008). Eliminating the memory blocking effect. Memory, 16(8), 852–872. Logan, J. M., & Balota, D. A. (2003). Conscious and unconscious lexical retrieval blocking in younger and older adults. Psychology and Aging, 18, 537–550. Luchins, A., & Luchins, E. (1959). Rigidity of behavior: A variational approach to the effect of einstellung. Eugene: University of Oregon Books. Lustig, C., & Hasher, L. (2001). Implicit memory is vulnerable to proactive interference. Psychological Science, 12, 408–412. Maier, N. R. F. (1931). Reasoning in humans. II. The solution of a problem and its appearance in consciousness. Comparative Psychology, 12, 181–194. Marsh, R. L., Landau, J. D., & Hicks, J. L. (1997). Contributions of inadequate source monitoring to unconscious plagiarism during idea generation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 23(4), 886–897. Marsh, R. L., Ward, T. B., & Landau, J. D. (1999). The inadvertent use of prior knowledge in a generative cognitive task. Memory and Cognition, 27(1), 94–105. Mednick, S. (1962). The associative basis of the creative process. Psychological Review, 69(3), 220–232. Melton, A. W., & von Lackum, W. J. (1941). Retroactive and proactive inhibition in retention: Evidence for a two-factor theory of retroactive inhibition. American Journal of Psychology, 54, 157–173. Mullen, B., Johnson, C., & Salas, E. (1991). Productivity loss in brainstorming groups: A meta-analytic integration. Basic and Applied Social Psychology, 12(1), 3–23. Nijstad, B. A., & Stroebe, W. (2006). How the group affects the mind: A cognitive model of idea generation in groups. Personality and Social Psychology Review, 10, 186–213. Nijstad, B. A., Stroebe, W., & Lodewijkx, H. F. M. (2003). Production blocking and idea generation: Does blocking interfere with cognitive processes? Journal of Experimental Social Psychology, 39(6), 531–548. Olton, R. M. 1979. Experimental studies of incubation: Searching for the elusive. Journal of Creative Behavior, 13(1), 9–22. Olton, R. M., & Johnson, D. M. (1976). Mechanisms of incubation in creative problem solving. American Journal of Psychology, 89(4), 617–630. Osborn, A. (1957). Applied imagination. New York: Scribner.

Y109937.indb 172

10/15/10 11:03:58 AM

Blocking Out Blocks • 173

Paulus, P. B. (2000). Groups, teams, and creativity: The creative potential of idea-generating groups. Applied Psychology: An International Review, 49(2), 237–262. Paulus, P. B., & Dzindolet, M. T. (1993). Social influence processes in group brainstorming. Journal of Personality and Social Psychology, 64(4), 575–586. Postman, L., & Underwood, B. J. (1973). Critical issues in interference theory. Memory and Cognition, 1(1), 19–40. Purcell, A. T., & Gero, J. S. (1991). The effects of examples on the results of a design activity. In J. S. Gero (Ed.), Artificial Intelligence in Design ’91 (pp. 525–542). Oxford: Butterworth-Heinemann. Purcell, A. T., Gero, J. S., Edwards, H., & Matka, E. (1994). Design fixation and intelligent design assistants. In J. S. Gero & F. Sudweeks (Eds.), Artificial Intelligence in Design ’94 (pp. 483–496). Dordrecht, The Netherlands: Kluwer. Purcell, A. T., Williams, P., Gero, J. S., & Colbron, B. (1993). Fixation effects: Do they exist in design problem solving? Environment and Planning B: Planning and Design, 20(3), 333–345. Roediger, H. L., III, & Thorpe, L. A. (1978). The role of recall time in producing hypermnesia. Memory and Cognition, 6(3), 296–305. Schacter, D. L. (1987). Implicit memory: History and current status. Journal of Experimental Psychology: Learning, Memory, and Cognition, 13(3), 501–518. Shah, J. J., Smith, S. M., Vargas-Hernandez, N., Gerkens, R., & Wulan, M. (2003). Empirical studies of design ideation: Alignment of design experiments with lab experiments. In Proceedings of the American Society for Mechanical Engineering (ASME) DTM Conference, Chicago, IL. Smith, S.M., & Blankenship, S.E. (1989). Incubation effects. Bulletin of the Psychonomic Society, 27, 311–314. Smith, S. M., & Blankenship, S.E. (1991). Incubation and the persistence of fixation in problem solving. American Journal of Psychology, 104, 61–87. Smith, S. M., Gleaves, D. H., Pierce, B. H., Williams, T., Gilliland, T. R., & Gerkens, D.R. (2003). Comparing recovered memories with created ones: An experimental approach. Applied Cognitive Psychology, 16, 1–29. Smith, S. M., and Moynan, S. C. (2008). Forgetting and recovering the unforgettable. Psychological Science, 19(5), 462–468. Smith, S. M., & Tindell, D. R. (1997). Memory blocks in word fragment completion caused by involuntary retrieval of orthographically similar primes. Journal of Experimental Psychology: Learning, Memory, and Cognition, 23(2), 355–370. Smith, S. M., & Vela, E. (1991). Incubated reminiscence effects. Memory and Cognition, 19(2), 168–176. Smith, S. M., Ward, T. B., & Schumacher, J. S. (1993). Constraining effects of examples in a creative generation task. Memory and Cognition, 21, 837–845.

Y109937.indb 173

10/15/10 11:03:58 AM

174 • Steven M. Smith

Stroebe, W., & Diehl, M. (1991). You can’t beat good experiments with correlational evidence: Mullen, Johnson, and Salas’s meta-analytic misinterpretations. Basic and Applied Social Psychology, 12(1), 25–32. Vul, E., & Pashler, H. (2007). Incubation benefits only after people have been misdirected. Memory and Cognition, 35(4), 701–710. Weldon, M. S., & Bellinger, K. D. (1997). Collective memory: Collaborative and individual processes in remembering. Journal of Experimental Psychology: Learning, Memory, and Cognition, 23, 1160–1175. Wiley, J. (1998) Expertise as mental set: The effects of domain knowledge in creative problem solving. Memory and Cognition, 26, 716–730.

Y109937.indb 174

10/15/10 11:03:58 AM

9

A Contextual Framework for Understanding When Difficulties Are Desirable Mark A. McDaniel and Andrew C. Butler

Of the many contributions that Bob Bjork has made to the field of memory research, the concept of desirable difficulties in learning is arguably one of the most provocative and far reaching in terms of its potential impact on training and educational practices. In a series of papers during the early 1990s (Bjork, 1994a, 1994b; Christina & Bjork, 1991; Schmidt & Bjork, 1992), Bjork challenged the commonly held belief that factors that enhance performance or speed improvement during learning also produce superior long-term retention and transfer. Reviving the neglected distinction between the momentary strength of a response and the enduring habit strength of that response (e.g., Estes, 1955; Guthrie, 1952; Hull, 1943; Skinner, 1938; Tolman, 1932), he argued that performance during learning reflects momentary accessibility of knowledge or skill rather than its underlying storage strength. Thus, performance during learning can be a poor indicator of whether that knowledge or skill will be accessible (or available) in the future. Based on this line of reasoning and a review of the relevant literature, Bjork reached a seemingly contradictory conclusion: Introducing difficulties during learning can actually increase long-term retention and transfer. The concept of desirable difficulties has been influential, and has spread to a diversity of subareas within the memory field, including connectionist models of memory (McClelland, McNaughton, & O’Reilly, 175

Y109937.indb 175

10/15/10 11:03:59 AM

176 • Mark A. McDaniel and Andrew C. Butler

2002), animal learning (Bouton, 1994), and memory development in infants (Rovee-Collier, 1997). More important for present purposes, the construct of desirable difficulties has become a galvanizing example of how basic memory research can potentially contribute to improving educational practice. The implication is that instead of constructing practice and learning materials to ease the processing demands on the learner (e.g., see Sweller, 1999), educators should introduce difficulties into the learning environment to promote retention and transfer. However, as Bjork has emphasized, the catch here is that the difficulties must be desirable. Therefore, for effective translation of this principle to educational contexts, it becomes paramount to distinguish difficulties that are desirable from those that are not desirable. Initial efforts to identify candidate desirable difficulties have relied primarily on an examination of the empirical literature (Bjork, 1994b). Though promising, this approach has been limited. It has not provided a principled set of guidelines for determining and identifying a priori what kinds of difficulty will be desirable and what kinds will not be desirable (Zacks, Hasher, Sanft, & Rose, 1983). Accordingly, much trial and error is required to implement desirable difficulties in educational settings (e.g., see Richland, Linn, & Bjork, 2007). Even more problematic, there is now an abundance of evidence that the desirability of any particular difficulty depends on a host of contextual factors, factors that are operative in classroom settings (McDaniel & Einstein, 2005). That is, some techniques that have been identified as a desirable difficulty based on empirical findings, such as requiring the learner to generate target material (deWinstanley & Bjork, 2004), will provide mnemonic benefits in some contexts but fail to provide benefits in other contexts (for a summary, see McDaniel & Einstein, 2005). Thus, we suggest that a fruitful approach to understanding and prescribing desirable difficulties will require a contextualistic framework. In this chapter, we adopt a general contextualistic model of memory as a foundation within which to consider desirable difficulties. In considering desirable difficulties from this perspective, we generate predictions concerning what kinds of difficulty will be desirable in what contexts and what difficulty will not be desirable, and we present the experimental work that has tested and informed these predictions. We begin by providing a summary of Jenkins’s (1979) tetrahedral model of memory. We then use that model as a scaffolding to develop a framework for understanding desirable difficulties and to organize our review of the research that has investigated desirable difficulties from certain aspects of this perspective. Finally, we focus on a contextual factor identified in the model but that has been underexamined with

Y109937.indb 176

10/15/10 11:03:59 AM

A Contextual Framework for Understanding When Difficulties Are Desirable • 177

regard to desirable difficulties: the relationship between the difficulty manipulation and the criterial task. We briefly review some initial studies bearing on this issue, and also report some new work. In particular, we will focus on how the relation between the desirable difficulty and the criterial task significantly impacts two different types of outcomes: memory and metacomprehension.

Theoretical Framing Jenkins (1979) introduced his tetrahedral model of memory to draw attention to a general view emerging among researchers that memory is a contextual phenomenon (e.g., McKeachie, 1974). In the tetrahedral model, each corner of the tetrahedron represents a cluster of variables of a given type, namely, encoding (orienting) operations or tasks, the to-be-learned materials or events, the characteristics of the subjects, and the criterial tasks that assess retrieval (see Figure 9.1; see Roediger, 2008, for a possible fifth vertex). With regard to desirable difficulties, many of the manipulations in the literature correspond to the encoding Subjects Abilities Interests Knowledge Purposes •• •

Orienting Tasks Instructions Directions Activities Apparatus •• •

Criterial Tasks Recall Recognition Problem solving Performance •• •

Materials Sensory mode Physical structure Psychological organization Psychological sequence •• •

Figure 9.1 The tetrahedral model of memory experiments from Jenkins (1979). Each vertex represents a cluster of variables, and each edge represents a two-way interaction between two clusters. Each plane represents a three-way interaction, and the entire figure represents a four-way interaction.

Y109937.indb 177

10/15/10 11:03:59 AM

178 • Mark A. McDaniel and Andrew C. Butler

(orienting) tasks vertex. Common examples include the benefits of generating target material relative to reading the target materials (deWinstanley & Bjork, 2004; McDaniel, Einstein, Dunay, & Cobb, 1986), the benefits to retention from spaced relative to massed practice (for review see Cepeda, Pashler, Vul, Wixted, & Rohrer, 2006), variable practice relative to repeated practice (e.g., Goode, Geraci, & Roediger, 2008), and testing on material relative to restudying (for review see Roediger & Karpicke, 2006). A critical feature of the model, however, is that interactions abound between the clusters of variables. The edges that link corners represent two-way interactions between the variables, and faces of the tetrahedron suggest higher-order interactions. That is, one face of the tetrahedron reflects the possibility that the encoding task(s) given during initial learning, the subjects in the study, and the to-be-learned materials will yield a three-way interaction in memory performance. Jenkins (1979) argued that memory researchers must consider the entire tetrahedron because “interaction, contextual sensitivity with respect to all our variables, is the pattern—not the exception” (p. 444). In a similar vein, we suggest that the desirability of difficulty will depend on sensitivity to the contextual variations captured in the tetrahedral model. To be able to appeal to the literature for supportive evidence for our framework, we focus on describing how the edges (to use the tetrahedral model term)—that is, the relations between the encoding task variable and the other variables (subject characteristics, materials, and the demands of the criterial task)—are critical for determining the extent to which difficulty is desirable. In doing so, we do not intend to imply that the faces of the tetrahedron (higher-order interactions) are unimportant; our expectation is that these higher-order interactions will emerge. However, except for one instance in the literature (that we discuss below), we are unaware of studies that have examined the joint influences of several contextual variables (e.g., learner characteristics and materials) on the desirability of difficulty.

The Core Assumptions To anticipate and explain the desirability of any particular difficulty, we suggest that four key components are required: 1. Specification of the processing stimulated by the encoding task that creates difficulty 2. Specification of the processing required by the criterial task

Y109937.indb 178

10/15/10 11:03:59 AM

A Contextual Framework for Understanding When Difficulties Are Desirable • 179

3. Sensitivity to the processing dynamics associated with the learner’s characteristics and the affordances of the materials to be learned 4. The extent to which the processing stimulated by the difficulty is complementary or redundant with that spontaneously engaged by the learner or with that afforded by the material Our central claim is that desirable difficulties are those that stimulate processing that is not redundant with the processing spontaneously engaged by the learner (which, as noted above, will depend on learner characteristics, materials, or both) and that matches the demands of the criterial task. Regarding the criterial task, as we will show later in the chapter, difficulty is desirable only to the extent that the processing stimulated by the difficulty overlaps with that required by the criterial task. When the processing does not overlap, negative cascading effects of difficulty can result. To make these ideas more concrete, consider a task that involves generation, which has been identified as a desirable difficulty. For educationally relevant materials such as text, there are various ways to engineer the materials so that generation is required. Following basic laboratory work with word lists, the words in the text can be presented in fragmented form with letters missing (e.g., Hirshman & Bjork, 1988; McDaniel, Waddill, & Einstein, 1988), and the learner must generate each word in the text; or the sentences of the text can be presented in randomized fashion, and the learner must generate a sentence order that reflects a coherent text (e.g., McDaniel et al., 1986). In either case, the generation task represents a difficulty for comprehension that is not present when the learner simply reads an intact text. Accordingly, based on the general idea that generation is a desirable difficulty, both of these generation tasks should foster better retention. By contrast, from the perspective of our framework, the desirability of the difficulty will depend on the components outlined above. First, note that the word generation task likely stimulates more extensive processing of the words and the local context that helps to identify the words, whereas the sentence reordering task stimulates more extensive processing of the relationship among propositions in the text (for ease of exposition, we will label these two processing types proposition-specific processing and relational processing, respectively; McDaniel, Hines, Waddill, & Einstein, 1994). The desirability of the word generation and sentence order generation will thus depend on the extent to which the criterial tasks require proposition-specific and relational information, respectively. Later in the chapter we describe work that tests these predictions.

Y109937.indb 179

10/15/10 11:04:00 AM

180 • Mark A. McDaniel and Andrew C. Butler

Components 3 and 4 above further suggest that the desirability of the generation tasks will also depend on the target materials and the learners’ backgrounds/abilities. The word generation task will be a desirable difficulty only to the extent that learners are not already engaging in proposition-specific processing (because of the affordances of the materials), whereas the sentence order generation task will be a desirable difficulty only to the extent that learners are not already engaging in relational processing (because of the materials). Regarding learners’ backgrounds/abilities, a difficulty will not be desirable if the learner does not have the background/ability to accommodate the difficulty; on the other hand, the difficulty could be desirable if it forces the learner to engage in processing that he or she ordinarily would not. In the next sections, we describe the evidence bearing on these anticipated patterns.

Characteristics of the Learner and Desirable Difficulty One characteristic of the learner that will have a significant impact on whether a difficulty will be desirable concerns the learner’s ability to perform the processing required by the difficulty manipulation. As one example, generating the words in a text increases the challenges to the lexical decoding, as illustrated by the following example: The ki _ g of a ce _ tai _ cou _ try l _ st hi _ rin _ whi _ e on a d _ ive th _ ough his ca _ ital. He at o _ ce pl _ ced a noti _ e in the n _ wsp _ pers, pr _ misin _ that whom _ ver mi _ ht fin _ and r _ turn the ri _ g woul _ re _ eive a l _ rge re _ ard in m _ ney. A si _ ple priv _ te was luc _ y enough to fi _ d it. If the process of lexical decoding is already challenging for a reader, then placing additional demands on this process by requiring generation may be ineffective. To test this expectation, McDaniel, Hines, and Guynn (2002) required less and more skilled readers, assessed as such on the Nelson–Denny Reading Test (Brown, Nelson, & Denny, 1973), to either read an intact version of a fairy tale or generate the tale from fragmented word presentations. Following study, the participants were required to free recall the passage. Critically, Nelson–Denny reading scores are thought to reflect, at least in part, word decoding skills (Mason, 1978; Petros, Bentz, Hammes, & Zehr, 1990; Stanovich, 1980). Consequently, the additional demands placed on word decoding by the fragmented word texts might not be desirable for the less able Nelson– Denny readers, even though generating fragmented word texts would

Y109937.indb 180

10/15/10 11:04:00 AM

A Contextual Framework for Understanding When Difficulties Are Desirable • 181

presumably be a desirable difficulty for the more able Nelson–Denny readers (because it would stimulate proposition-specific processing, processing useful for free recall of narratives; McDaniel & Kerwin, 1987). As can be seen from the top panel of Table 9.1, the free-recall patterns (for the passages) were in line with these predictions. Requiring generation from fragmented words improved recall of the narrative only for more able (Nelson–Denny) readers. Generation from fragmented words was not a desirable difficulty for the less able (Nelson–Denny) readers because these readers’ recall significantly suffered when generation was required. The less able readers successfully generated 98% of the fragmented words in the narrative (as did more able readers), so their decline in recall (relative to the intact narrative) was not a failure to complete the generation task. Instead, the processing required by generation presumably challenged their deficient word decoding skills to the point that other comprehension processes were compromised, thereby hindering learning. It is important to note that our contextualistic theory of desirable difficulty does not suggest that difficulty per se is a disadvantage to poor readers. For instance, a generation task that forces relational processing, such as the sentence reordering task mentioned earlier, would be expected to be desirable for less able Nelson–Denny readers because these readers are presumably not deficient in organizational processing Table 9.1 Proportion Correct on the Final Free-Recall Test as a Function of Learning Activity and Reader Ability (More Able and Less Able Readersa) for Experiment 1a and Experiment 1b Experiment 1a Reader Ability Learning Activity Read only Word generation

Less Able

More Able

.41 .33

.44 .51

Experiment 1b Reader Ability Learning Activity Read only Sentence reordering

Less Able .15 .21

More Able .14 .20

Source: Data from McDaniel, M. A., Hines, R. J., & Guynn, M. J., Journal of Memory and Language, 46, 544–561, 2002. a As assessed by the Nelson-Denny Reading Test.

Y109937.indb 181

10/15/10 11:04:00 AM

182 • Mark A. McDaniel and Andrew C. Butler

(indeed, they may rely on it to overcome decoding deficiencies; Petros et al., 1990). The results shown in the bottom panel of Table 9.1 also confirm this prediction. For both less and more able Nelson–Denny readers, the sentence reordering difficulty (generation) improved recall for the target passages (which in this case were expository passages, in line with the next section on affordances of the target material). These patterns clearly demonstrate that the desirability of any particular difficulty is contextually sensitive, and further that such patterns can be understood and anticipated by the type of theoretical analysis that we have presented. In addition to reading ability, the background knowledge of the reader will be a potentially important learner characteristic for determining the desirability of any particular difficulty. As one example, McNamara, Kintsch, Songer, and Kintsch (1996) increased text difficulty by reducing the referential coherence of a passage about heart disease. They then compared learning (as indexed by inference questions, problem-solving questions, and sorting keywords) from the difficult version of the passage relative to the referentially coherent passage for readers with either high or low knowledge in biology. Because readers with low background knowledge tend to be impaired in their ability to draw necessary inferences while reading (McNamara & McDaniel, 2004), these readers would presumably be disadvantaged when presented with the difficult (low-coherence) version. Consistent with this expectation, reduced referential coherence increased performance on the three measures of learning (inference questions, problem solving, and sorting keywords) for high-knowledge readers but impaired learning for the low-knowledge readers. Again, the desirability of the difficulty hinged in principled ways on the characteristics of the learner.

Nature of Materials and Desirable Difficulty Prompted by the contextualistic approach presented in this chapter, a number of studies have also examined how the nature of the materials influences the desirability of a particular difficulty. As mentioned above, one core component of our framework is the assumption that the to-belearned materials afford certain kinds of processing. Accordingly, for a difficulty to be desirable, we assume that it must stimulate processing that is not redundant with that ordinarily stimulated by the material (and that is appropriate for the criterial task, as detailed in the next section). For instance, a taxonomically related word list affords organizational (relational) processing (as evidenced by category clustering in recall of

Y109937.indb 182

10/15/10 11:04:00 AM

A Contextual Framework for Understanding When Difficulties Are Desirable • 183

such word lists; Einstein & Hunt, 1980). Thus, difficulty that stimulates more extensive elaboration of each word in the list should be desirable because it would force processing of information that was not invited by the material and that was useful for the free-recall task (Einstein & Hunt, 1980). By contrast, difficulty that stimulates organizational processing (in terms of taxonomic categories) should not be desirable for a taxonomically related word list because, even though organizational processing is useful for free recall, such processing would be redundant with the organizational processing that the list itself affords. These predictions were borne out in McDaniel, Einstein, and Lollis (1988). When subjects were required to perform a difficult pleasantness rating task, which presumably encouraged more extensive descriptions of the individual words, free recall (M = .71) improved relative to more straightforward processing conditions that required less time and effort (simple pleasantness rating or sorting by category names; combined mean recall = .59). The difficulty induced by the pleasantness rating task was desirable for processing a taxonomically related word list; however, when subjects were required to perform a difficult sorting task (sort without the benefit of the category names), no improvement in recall relative to the simpler conditions was observed (mean recall for difficult sorting = .58). It is noteworthy that category clustering in the difficult sorting condition was as high (indeed higher) as that observed in the other study conditions, suggesting that subjects were able to effectively figure out a sorting scheme in the difficult sorting condition. Nevertheless, this type of difficulty was not desirable when applied to a word list that already afforded organizational processing. Our theory also anticipates that the difficult sorting task should be desirable when it is applied to a word list that has no obvious organizational structure because such a word list would not afford organizational processing. This prediction was also confirmed by McDaniel, Einstein, et al. (1988a). The difficult sorting task substantially improved free recall for an unstructured list (M = .79) relative to that obtained for the less difficult processing conditions (combined mean recall = .46). Moreover, for an unstructured list, we assume that in the absence of any obvious organizational structure, learners will focus their processing on the individual words. Accordingly, the difficult pleasantness rating condition should produce processing that is relatively redundant (focus on individual words) with that invited by the material itself when applied to the unstructured list. As predicted, recall was not increased by difficult pleasantness rating (M = .47) relative to the less difficult processing conditions. Thus, the desirability of a particular difficulty task cannot be determined by the nature of the difficulty per se, but

Y109937.indb 183

10/15/10 11:04:00 AM

184 • Mark A. McDaniel and Andrew C. Butler

will depend on a consideration of the processing afforded by the target material itself. This material-appropriate analysis of desirable difficulty (McDaniel et al., 1986; McDaniel & Einstein, 1989) applies to generation, a task commonly identified as a “desirable difficulty” (Richland et al., 2007), and to text recall. As noted in the previous sections, for texts, two kinds of generation tasks have been implemented: word generation, which stimulates proposition-specific processing, and sentence reordering, which stimulates relational processing of the text. Word generation should be desirable for improving recall of texts that ordinarily invite relational processing, such as fairy tales (because it stimulates processing complementary to that invited by these texts), but word generation should not be a desirable difficulty for texts that ordinarily invite proposition-specific processing, such as expository texts (see McDaniel & Einstein, 1989, and McDaniel et al., 1994, for support regarding the kinds of processing invited by these different text types). By the same token, sentence reordering should improve recall of expository texts, whereas sentence reordering should not be a desirable difficulty for fairy tales. As shown in the bottom panel of Table 9.2, this predicted crossover interaction has emerged in a number of experiments such that generation significantly improved text recall relative to a read-only control, but only when the generation task was appropriate for the particular text (McDaniel et al., 1986; Einstein, McDaniel, Owen, & Cote, 1990). It is noteworthy that in these experiments, the redundant generation– text type combinations substantially increased processing time relative to reading an intact text (see top panel of Table 9.2), yet recall still did not significantly improve. Thus, generation is not invariably desirable; without appreciation of the material appropriateness of a particular generation task, generation can produce labor in vain. By extension, we suggest that difficulty in general will be desirable to the extent that it is material appropriate (i.e., the difficulty stimulates processing that is not redundant with that invited by the text and is appropriate for the criterial task). Several experiments with educationally relevant study activities (outlining and answering embedded questions) have supported this hypothesis (Einstein et al., 1990), but more research that examines a range of difficulty manipulations would be useful to evaluate the generality of our material-appropriate approach to desirable difficulties. It is worth noting that in line with the tetrahedral model, the situation is even more complex because the material appropriateness of generation (and likely other difficulty) is further qualified by learner characteristics. For learners that are already challenged by word decoding, word generation (from fragmented words) is not desirable even

Y109937.indb 184

10/15/10 11:04:00 AM

A Contextual Framework for Understanding When Difficulties Are Desirable • 185 Table 9.2 A Comparison of Word Generation and Sentence Reordering Tasks With Folktales and Expository Passages Learning Activity Dependent Measure

Experiment (Exp.)

Time (minutes)

McDaniel et al. (1986)a Exp. 1 McDaniel et al. Exp. 2 Einstein et al. (1990)b Exp. 1 Einstein et al. Exp. 3

Recall (accuracy)

Passage Type

Read Only

Sentence Word Reordering Generation

Folk tale

1.2

5.9

12.1

Expository Folk tale Expository Folk tale

2.1 1.9 1.8 1.5

16.7 8.6 12.3 7.8

17.5 7.8 8.9 8.2

Expository Folk tale Expository

1.9 1.7 2.0

13.4

12.6 7.3

13.4

McDaniel et al.

Folk tale

.43

.43

.49

Exp. 1 McDaniel et al. Exp. 2 Einstein et al. Exp. 1 Einstein et al. Exp. 3

Expository Folk tale Expository Folk tale Expository Folk tale Expository

.15 .36 .22 .45 .28 .41 .22

.28 .47 .40 .52 .43

.14 .62 .19 .56 .19 .56

.32

Source: Adapted from McDaniel, M. A., & Einstein, G. O., in A. F. Healy (Ed.), Experimental Cognitive Psychology and Its Applications, American Psychological Association, Washington, DC, 73–85, Table 6.1, 2005. a McDaniel, Einstein, Dunay, and Cobb (1986). b Einstein, McDaniel, Owen, and Cote (1990).

for fairy tale recall (as described in the preceding section and shown in Table 9.1; McDaniel et al., 2002, Experiment 1A). By contrast, for learners that do not ordinarily construct organized text representations (low-structure builders), sentence reordering generation is desirable for improving recall of fairy tales (McDaniel et al., Experiment 2A). Thus, complete understanding of the desirability of difficulty requires joint consideration of processing mediated by at least three core factors: the particular difficulty, the materials to which the difficulty is applied, and the learners’ characteristics. In the next section, we examine a final core factor, the nature of the criterial task.

Y109937.indb 185

10/15/10 11:04:00 AM

186 • Mark A. McDaniel and Andrew C. Butler

Transfer-Appropriate Processing and Desirable Difficulty One edge of the tetrahedral model that has not been as thoroughly investigated with respect to the concept of desirable difficulties is the interaction between encoding tasks and criterial tasks. In the broader memory literature, the relationship between encoding and retrieval has been defined by two different, but related ideas: the encoding specificity principle and transfer-appropriate processing. The encoding specificity principle states that successful retrieval depends upon the degree to which the features or elements in the retrieval cue overlap with those in the memory trace (Tulving, 1983). Likewise, transfer-appropriate processing states that memory performance will be enhanced to the extent that the processes engaged during initial learning match the processes required for the criterial task (Morris, Bransford, & Franks, 1977; for review see Roediger, Weldon, & Challis, 1989; for similar ideas produced around the same time see Fisher & Craik, 1977; Jacoby, 1975; McDaniel, Friedman, & Bourne, 1978). The paucity of desirable difficulties research on this topic is unfortunate because transfer-appropriate processing indicates that the efficacy of any difficulty introduced during learning should depend on a processing match with the criterial test. For example, if an encoding task induces proposition-specific processing of a to-be-learned material (e.g., word generation), then performance would be enhanced if the criterial task also required proposition-specific processing or tested proposition-specific information; however, if the criterial task instead required relational processing or tested relational information, then the encoding and retrieval processes would be mismatched, resulting in negative cascading effects of difficulty. When a difficulty is inappropriate to the criterial task, the difficulty may actually impair performance. We now turn to reviewing some of the studies that have investigated the consequences of a match or mismatch in processing between the encoding and criterial tasks on both memory and metacomprehension. Memory One major goal of education is to facilitate the acquisition and retention of knowledge. At every level, students must master some core set of information before they can begin to engage in more complex forms of learning. Bloom’s (1956) taxonomy of educational objectives identified knowledge (i.e., memory of previously learned materials) as a prerequisite for higher-order learning goals, such as understanding, application, analysis, synthesis, and evaluation. For example, students learning a

Y109937.indb 186

10/15/10 11:04:01 AM

A Contextual Framework for Understanding When Difficulties Are Desirable • 187

foreign language must first master some vocabulary before they can begin to construct sentences or understand grammar. Although knowledge can be obtained from a variety of sources, we will concentrate on the learning of information from texts because of the ubiquity of such materials in the classroom and other educational contexts. Historically, researchers have considered deeper, more elaborative encoding of the to-be-remembered material to be a means to improve memory performance. This general approach was based on the longheld assumption that successful retention is solely determined by the “goodness” of initial learning (e.g., Craik & Lockhart, 1972; but for exceptions see Kohler, 1947; Melton, 1963). However, this assumption was challenged by studies in the late 1960s that demonstrated that memory performance is the product of both the memory trace and the cue(s) provided at retrieval (e.g., Tulving & Pearlstone, 1966; Tulving & Osler, 1968). As a result, retrieval processes have become a critical component of many subsequent memory theories (Tulving, 1983; Morris et al., 1977). Our theoretical framework incorporates these ideas by proposing that some encoding tasks do produce better retention than others, but that the best memory performance occurs when the processing induced by these encoding tasks matches the processing engaged on the criterial task. In a series of experiments, McDaniel et al. (1994) explored the idea that superior memory performance depends on a processing match between a task designed to introduce difficulties during learning and the criterial test. More specifically, they investigated how a generation task that induced either proposition-specific or relational processing of a passage affected performance on a final cued-recall test that featured questions about both proposition-specific and relational information from the passage. In Experiment 1, subjects read a passage that described half an inning of a baseball game in one of three incidental learning conditions: read only, word generation, or sentence reordering. After engaging in one of the three incidental learning tasks, subjects took a final cued-recall test that featured questions about both proposition-specific (e.g., the name of the baseball stadium) and relational (e.g., a major play in the game) information from the passage. The pattern of results on the final cued-recall test supported the initial predictions of McDaniel et al. (1994, Experiment 1). As shown in Table 9.3, both generation tasks produced superior final test performance relative to reading the intact passage; however, this main effect was qualified by an interaction between initial learning task and question type. The word generation task produced the best performance on the propostion-specific questions, whereas the sentence reordering task

Y109937.indb 187

10/15/10 11:04:01 AM

188 • Mark A. McDaniel and Andrew C. Butler Table 9.3 Proportion Correct on the Final Cued-Recall Test as a Function of Learning Activity and Type of Final Test Question Type of Final Test Question Learning Activity Read only Word generation Sentence reordering

Proposition Specific .35 .57 .50

Relational .23 .29 .39

Source: Data from McDaniel, M. A., Hines, R. J., Waddill, P. J., & Einstein, G. O., Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 169–184, 1994, Experiment 1.

led to the best performance on the relational test questions. The differential effects of the two generation tasks on the two types of cuedrecall questions can be best explained by the match (and mismatch) in processing between encoding and retrieval. The word generation task promoted proposition-specific processing, which matched the type of information tested by the proposition-specific questions. Likewise, the sentence reordering task promoted relational processing, which provided a good match for the information tapped by the relational questions. When there was a mismatch in processing between the initial generation task and the final test question, performance was significantly lower but still better than the read-only control condition. In an unpublished study that builds on the findings of McDaniel and colleagues (1994), we investigated how the nature of the criterial test influences the benefits of performing two different types of generation task on the same passage relative to the same generation task twice or simply reading the passage twice (Butler, Flanagan, Roediger, & McDaniel, 2007). During an initial learning phase, subjects studied expository texts that were approximately 600 words in length and covered a variety of topics (e.g., Salvador Dali, the Crusades, the Taj Mahal). They studied each text twice and performed one of three learning activities during each study phase: a word generation task that required the generation of deleted letters, a sentence reordering task that involved sorting groups of sentences into the correct order, or simply reading the text. There were five initial learning conditions: subjects either read the text twice, performed the word generation task twice, performed the sentence reordering task twice, or performed a combination of the two generation tasks with the word generation task first or the sentence reordering task first.

Y109937.indb 188

10/15/10 11:04:01 AM

A Contextual Framework for Understanding When Difficulties Are Desirable • 189

Two days later, subjects were tested with questions about both proposition-specific and relational information from the passages, and these questions were presented in either multiple-choice or cued-recall test format. Proposition-specific questions were about information that was contained within a single sentence of a given passage. For example, a proposition-specific question about the Salvador Dalí passage was “What was the name of Salvador Dalí’s wife?” (Answer: Gala). In contrast, the relational questions required the learner to synthesize information across multiple sentences. An example of a relational question was “Why might Salvador Dalí have attempted to commit suicide?” (Answer: He was devastated by his wife’s death). For each subject, half of the questions were presented in multiple-choice format (with five alternatives) and the other half were presented in cued-recall format. The results of the final test showed that the benefit of the generative activities depends critically upon whether the type of processing produced by the activity matched the type of processing required during the criterial test (see Table 9.4). When subjects engaged in the same generative study task twice, they performed better on the type of question that matched the processing promoted by that task relative to the type of question for which there was a mismatch in processing. When compared to the read twice control group, performing the word generation task twice enhanced performance on the proposition-specific questions with no cost to performance on the relational questions. In contrast, performing the sentence reordering task twice led to equivalent performance on the relational questions but actually impaired performance on the proposition-specific questions relative to the read twice control. Interestingly, when subjects engaged in two different types of generative study tasks, they generally performed equally well on both types of final test questions. This result indicates that the two types of processing promoted by these different tasks were complementary in that each one helped subjects to learn and retain different information from the passage. In addition, it is important to note that the general pattern of results held across both the multiple-choice and cued-recall format. Although certain test formats are often considered to tap a single type of information (e.g., multiple choice questions test proposition-specific information), the test format can be manipulated independently of the type of information or processing tested. Metacomprehension In educational situations, a set of fundamental processes that contribute to learning and retention are those involved in metacomprehension. Of interest for present purposes are the metacomprehension processes

Y109937.indb 189

10/15/10 11:04:01 AM

190 • Mark A. McDaniel and Andrew C. Butler Table 9.4 Proportion Correct on the Final Cued-Recall Test as a Function of Learning Activity and Type of Final Test Question Multiple Choice Type of Final Test Question Learning Activity R-R WG-WG SR-SR WG-SR SR-WG

Proposition Specific .69 .87 .60 .73 .82

Relational .77 .78 .73 .77 .80

Cued Recall Type of Final Test Question Learning Activity R-R WG-WG SR-SR WG-SR SR-WG

Proposition Specific .48 .60 .35 .45 .46

Relational .49 .44 .45 .41 .54

Source: Data from Butler, A. C., Flanagan, P., Roediger, H. L., III, & McDaniel, M. A., The Benefit of Generative Study Activities Depends on the Nature of the Criterial Test, 2007, manuscript in preparation. Note: R = read, WG = word generation, SR = sentence reordering.

of monitoring online learning of text material (Maki & Berry, 1984) and the student’s regulation of subsequent study activity based on the information gleaned from such monitoring (Nelson & Dunlosky, 1991). These two processes work in concert to determine the effectiveness of student-initiated study activities. Research on the accuracy of monitoring suggests that learners generally cannot judge how well they have learned information from read texts (Glenberg & Epstein, 1985; Glenberg, Sanocki, Epstein, & Morris, 1987). These poor metacognitive judgments in turn negatively impact the efficacy of student-directed study activities. Specifically, learners tend to allocate study time to items they think they have not learned well (Thiede & Dunlosky, 1999), and thus when learners’ metacognitive judgments are inaccurate, their

Y109937.indb 190

10/15/10 11:04:01 AM

A Contextual Framework for Understanding When Difficulties Are Desirable • 191

allocation of study time will not be effectively directed. Theoretically, then, interventions that improve metacognition should result in more effective student-directed studying. Recent work has suggested that a general approach to improving metacognition is to stimulate richer encoding of the target material. For instance, requiring learners to summarize text content or requiring learners to generate keywords for the text content improved metacomprehension of that content (Thiede & Anderson, 2003; Thiede, Anderson, & Therriault, 2003; Thiede, Dunlosky, Griffin, & Wiley, 2005). Theoretically then, to the extent that desirable difficulties promote richer encoding of text content, metacomprehension should be improved, which in turn should enhance the effectiveness of subsequent student-directed studying of that target content. According to our theoretical framework, however, the desirability of a difficulty will not hinge simply on stimulating richer encoding of the text content. Instead, our approach anticipates that a critical determinant of whether difficulty will enhance the accuracy of metacognitive judgments is the extent to which the processing stimulated by the difficulty treatment is congruent with that required by the criterial test. Specifically, we suggest that the difficulty that is desirable for improving metacomprehension is that which stimulates more extensive processing of the type of information that is required for the criterial test. By contrast, we anticipate that difficulty will not be desirable if it stimulates extensive processing of information that mismatches that required for the criterial test. We evaluated these novel predictions in a recently published study (Thomas & McDaniel, 2007). Subjects studied short texts (about 326 words) while engaging in one of three types of activities: read only, word generation, or sentence reordering. After processing the texts, subjects provided metacomprehension ratings on which they were asked to judge how well they would be able to remember the information conveyed in each paragraph. In making these judgments, subjects were accurately informed as to the kind of test that would follow (proposition specific or relational), and they had been given examples of these test types for practice texts at the outset of the experiment. Then, they were given a short-answer test with questions focusing on proposition-specific information (i.e., details) or relational information (i.e., thematic aspects). Of central interest was the accuracy of subjects’ metacomprehension judgments. Reasoning from our framework, assuming that the word generation task stimulates extensive proposition-specific processing, then it should enhance metacomprehension for the proposition-specific short-answer test but not for the relational short-answer test (relative to

Y109937.indb 191

10/15/10 11:04:01 AM

192 • Mark A. McDaniel and Andrew C. Butler

the read-only control). In contrast, assuming that sentence reordering stimulates extensive relational processing, it should enhance metacomprehension when the relational short-answer test is administered but not when the proposition-specific short-answer test is administered. As can be seen in Figure 9.2, the results were perfectly in line with these expectations (the values represent the mean of each subject’s correlations between judged learning and actual test performance based on six judgments and six test items). Metacomprehension was significantly improved relative to the read-only control when the processing stimulated by the difficulty was congruent with the test. Though not of focus here, it is worth mention that the memory performances paralleled this pattern as well. Returning to the metacomprehension findings, it is also noteworthy that when the difficulty was incongruent with the particular criterial test, metacomprehension accuracy was impaired relative to that when subjects simply read the passages. Indeed, subjects were at chance in judging what they knew and what they did not know when the difficulty and criterial test were incongruent. This striking pattern demonstrates the dire consequences of failing to appreciate that the desirability of a particular difficulty is a joint function of the processing stimulated by the difficulty and the requirements of the criterial test. And importantly, these consequences should cascade into subsequent negative effects for learners’ ability to self-direct restudy of the material. 0.80 0.60

Mean Gamma

0.40 0.20

Proposition Specific Relational

0.59

0.41 0.27 0.11

0.00

–0.06

–0.20 –0.40 –0.60 –0.80

–0.33 Read-Only

Word Generation

Sentence Reordering

Learning Activity

Figure 9.2 Mean gamma correlations between metacomprehension predictions and performance on the final test as a function of learning activity and type of final test question. (Data from Thomas, A. K., & McDaniel, M. A., Memory and Cognition, 35, 668–678, 2007.)

Y109937.indb 192

10/15/10 11:04:02 AM

A Contextual Framework for Understanding When Difficulties Are Desirable • 193 Learning Activity

0.20

Mean Gamma

0.00

–0.20

–0.40

Word Generation

–0.20

Sentence Reordering

–0.21

–0.25

–0.31 Proposition specific Relational

–0.60

Figure 9.3 Mean gamma correlations between metacomprehension predictions and study time allocation as a function of learning activity and type of final test question. (Data from Thomas, A. K., & McDaniel, M. A., Memory and Cognition, 35, 668–678, 2007.)

Thomas and McDaniel (2007) tested the just mentioned implication in a second experiment. Once subjects had completed their metacomprehension (but before tasking the test), they were allowed to restudy the texts. Texts were presented one paragraph at a time in normal format (and normal order), and the amount of time that subjects chose to study each paragraph was recorded. Figure 9.3 shows that, as expected, subjects in all conditions devoted more study time to material that they judged they had not learned well. Of course, the catch is that for the inappropriate difficulty conditions, subjects have no idea what they have learned and what they have not learned, and accordingly, their study allocation becomes functionally random. That is, encountering inappropriate difficulty when first presented with material renders learners’ rational self-directed study ineffective. Supporting this assertion, restudy activity in the inappropriate difficulty conditions (word generation paired with relational test questions; sentence reordering paired with proposition-specific test questions) did not improve test performances relative to when restudy was not allowed (see Table 9.5). By contrast, when difficulty was congruent with the criterial task (word generation with proposition-specific test; sentence reordering with relational test), self-directed study (based on enhanced metacomprehension) improved test performance.

Y109937.indb 193

10/15/10 11:04:02 AM

194 • Mark A. McDaniel and Andrew C. Butler Table 9.5 Proportion Correct on the Final Test as a Function of Learning Activity and Type of Final Test Question With Restudy and Without Restudy Without Restudy Type of Final Test Question Learning Activity Word generation Sentence reordering

Proposition Specific

Relational

.60 .43

.30 .63

With Restudy Type of Final Test Question Learning Activity Word generation Sentence reordering

Proposition Specific .81 .40

Relational .25 .84

Source: Data from Thomas, A. K., & McDaniel, M. A., Memory and Cognition, 35, 668–678, 2007.

Conclusion The research outlined in the previous two sections supports three important conclusions. First, the desirability of particular difficulty manipulations depends on the match between the processing stimulated by the difficulty and the type of information targeted in the criterial test. Second, desirable difficulties (defined in this manner) improve both memory performance and accurate monitoring of learning, which in turn enhances the effectiveness of learner self-directed study activities. Third, inappropriate difficulties (those not congruent with the criterial task) have dramatic negative impacts on memory and monitoring accuracy and, by extension, the subsequent effectiveness of self-directed study activities. In general summary, our work reinforces Bjork’s (1994b) suggestion that educators could enhance learning, metacomprehension, and student self-study activities by incorporating desirable difficulties into educational applications. Our contribution here is that we have presented a contextualistic framework to help guide educators and theorists toward prescribing and anticipating the kinds of difficulty that are in fact desirable.

Y109937.indb 194

10/15/10 11:04:02 AM

A Contextual Framework for Understanding When Difficulties Are Desirable • 195

References Bjork, R. A. (1994a). Institutional impediments to effective training. In D. Druckman & R. A. Bjork (Eds.), Learning, remembering, believing: Enhancing human performance (pp. 295–306). Washington, DC: National Academy Press. Bjork, R. A. (1994b). Memory and metamemory considerations in the training of human beings. In J. Metcalfe & A. Shimamura (Eds.), Metacognition: Knowing about knowing (pp. 185–205). Cambridge, MA: MIT Press. Bloom, B. S. (1956). Taxonomy of educational objectives: The classification of educational goals. Essex, UK: Harlow. Bouton, M. E. (1994). Conditioning, remembering, and forgetting. Journal of Experimental Psychology: Animal Behavior Processes, 20, 219–231. Brown, J. I., Nelson, M. J. B., Denny, E. C. (1973). The Nelson–Denny reading test. Boston: Houghton Mifflin. Butler, A. C., Flanagan, P., Roediger, H. L., III, & McDaniel, M. A. (2007). The benefit of generative study activities depends on the nature of the criterial test. Poster presented at the annual meeting of the Psychonomic Society, Long Beach, CA. Cepeda, N. J., Pashler, H., Vul, E., Wixted, J. T., & Rohrer, D. (2006). Distributed practice in verbal recall tasks: A review and quantitative synthesis. Psychological Bulletin, 132, 354–380. Christina, R. W., & Bjork, R. A. (1991). Optimizing long-term retention and transfer. In D. Druckman & R. A. Bjork (Eds.), In the mind’s eye: Enhancing human performance (pp. 23–56). Washington, DC: National Academy Press. Craik, F. I., & Lockhart, R. S. (1972). Levels of processing: A framework for memory research. Journal of Verbal Learning and Verbal Behavior, 11, 671–684. deWinstanley, P. A., & Bjork, E. L. (2004). Processing strategies and the generation effect: Implication for making a better reader. Memory and Cognition, 32, 945–955. Einstein, G. O., & Hunt, R. R. (1980). Levels of processing and organization: Additive effects of individual-item and relational processing. Journal of Experimental Psychology: Human Learning and Memory, 6, 588–598. Einstein, G. O., McDaniel, M. A., Owen, P. D., & Cote, N. C. (1990). Encoding and recall of texts: The importance of material appropriate processing. Journal of Memory and Language, 29, 566–581. Estes, W. K. (1955). Statistical theory of distributional phenomena in learning. Psychological Review, 62, 369–377. Fisher, R. P., & Craik, F. I. M. (1977). Interaction between encoding and retrieval operations in cued recall. Journal of Experimental Psychology: Human Learning and Memory, 3, 701–711. Glenberg, A. M., & Epstein, W. (1985). Calibration of comprehension. Journal of Experimental Psychology: Learning, Memory, and Cognition, 11, 702–718.

Y109937.indb 195

10/15/10 11:04:02 AM

196 • Mark A. McDaniel and Andrew C. Butler

Glenberg, A. M., Sanocki, T., Epstein, W., & Morris, C. (1987). Enhancing calibration of comprehension. Journal of Experimental Psychology: General, 116, 119–136. Goode, M. K., Geraci, L., & Roediger, H. L., III. (2008). Superiority of variable to repeated practice in transfer on anagram solution. Psychonomic Bulletin and Review, 15, 662–666. Guthrie, E. R. (1952). The psychology of learning. New York: Harper and Row. Hirshman, E. L., & Bjork, R. A. (1988). The generation effect: Support for a twofactor theory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14, 484–494. Hull, C. L. (1943). The principles of behavior. New York: Appleton-CenturyCrofts. Jacoby, L. L. (1975). Physical features vs. meaning: A difference in decay. Memory & Cognition, 3, 247–251. Jenkins, J. J. (1979). Four points to remember: A tetrahedral model of memory experiments. In L. S. Cermak & F. I. M. Craik (Eds.), Levels of processing in human memory (pp. 429–446). Hillsdale, NJ: Erlbaum. Kohler, W. (1947). Gestalt psychology. New York: Liverwright. Maki, R. H., & Berry, S. L. (1984). Metacomprehension of text material. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10, 663–679. Mason, M. (1978). From print to sound in mature readers as a form of reader ability and two forms or orthographic regularity. Memory & Cognition, 6, 568–581. McClelland, J. L., McNaughton, B. L., & O’Reilly, R. C. (2002). Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory. In T. A. Polk & C. M. Seifert (Eds.), Cognitive modeling (pp. 499–534). Cambridge, MA: MIT Press. McDaniel, M. A., & Einstein, G. O. (1989). Material appropriate processing: A contextualist approach to reading and studying strategies. Educational Psychology Review, 1, 113–145. McDaniel, M. A., & Einstein, G. O. (2005). Material appropriate difficulty: A framework for determining when difficulty is desirable for improving learning. In A. F. Healy (Ed.), Experimental cognitive psychology and its applications (pp. 73–85). Washington, DC: American Psychological Association. McDaniel, M. A., Einstein, G. O., Dunay, P. K., & Cobb, R. (1986). Encoding difficulty and memory: Toward a unifying theory. Journal of Memory and Language, 25, 645–656. McDaniel, M. A., Einstein, G. O., & Lollis, T. (1988). Qualitative and quantitative considerations in encoding difficulty effects. Memory and Cognition, 16, 8–14. McDaniel, M. A., Friedman, A., & Bourne, L. E. (1978). Remembering the levels of information in words. Memory and Cognition, 6, 156–164. McDaniel, M. A., Hines, R. J., & Guynn, M. J. (2002). When text difficulty benefits less-skilled readers. Journal of Memory and Language, 46, 544–561.

Y109937.indb 196

10/15/10 11:04:03 AM

A Contextual Framework for Understanding When Difficulties Are Desirable • 197

McDaniel, M. A., Hines, R. J., Waddill, P. J., & Einstein, G. O. (1994). What makes folk tales unique: Content familiarity, causal structure, scripts, or superstructures? Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 169–184. McDaniel, M. A., & Kerwin, M. L. (1987). Long-term prose retention: Is an organizational schema sufficient? Discourse Processes, 10, 237–252. McDaniel, M. A., Waddill, P. J., & Einstein, G. O. (1988b). A contextual account of the generation effect: A three-factor theory. Journal of Memory and Language, 27, 521–536. McKeachie, W. J. (1974). Instructional psychology. Annual Review of Psychology, 25, 161–193. McNamara, D. S., Kintsch, E., Songer, N. B., & Kintsch, W. (1996). Are good texts always better? Interactions of text coherence, background knowledge, and levels of understanding in learning from text. Cognition and Instruction, 14, 1–43. McNamara, D. S., & McDaniel, M. A. (2004). Suppressing irrelevant information: Knowledge activation or inhibition? Journal of Experimental Psychology: Learning, Memory, and Cognition, 30, 465–482. Melton, A. W. (1963). Implications of short-term memory for a general theory of memory. Journal of Verbal Learning and Verbal Behavior, 2, 1–21. Morris, C. D., Bransford, J. D., & Franks, J. J. (1977). Levels of processing versus transfer-appropriate processing. Journal of Verbal Learning and Verbal Behavior, 16, 519–533. Nelson, T. O., & Dunlosky, J. (1991). When people’s judgments of learning (JOLs) are extremely accurate at predicting subsequent recall: The “delayed-JOL effect.” Psychological Science, 2, 267–270. Petros, T. V., Bentz, B., Hammes, K., & Zehr, H. D. (1990). The components of text that influence reading times and recall in skilled and less skilled college readers. Discourse Processes, 13, 387–400. Richland, L. E., Linn, M. C., & Bjork, R. A. (2007). Cognition and instruction: Bridging laboratory and classroom settings. In F. Durso, R. Nickerson, S. Dumais, S. Lewandowsky, & T. Perfect (Eds), Handbook of applied cognition (2nd ed., pp. 555–583). West Sussex, UK: John Wiley & Sons Ltd. Roediger, H. L., III. (2008). Relativity of remembering: Why the laws of memory vanished. Annual Review of Psychology, 59, 225–254. Roediger, H. L., III, & Karpicke, J. D. (2006). The power of testing memory: Basic research and implications for educational practice. Perspectives on Psychological Science, 1, 181–210. Roediger, H. L., III, Weldon, M. S., & Challis, B. H. (1989). Explaining dissociations between implicit and explicit measures of retention: A processing account. In H. L. Roediger, III & F. I. M. Craik (Eds.), Varieties of memory and consciousness: Essays in honor of Endel Tulving (pp. 3–41). Hillsdale, NJ: Erlbaum. Rovee-Collier, C. (1997). Dissociations in infant memory: Rethinking the development of implicit and explicit memory. Psychological Review, 104, 467–498.

Y109937.indb 197

10/15/10 11:04:03 AM

198 • Mark A. McDaniel and Andrew C. Butler

Schmidt, R. A., & Bjork, R. A. (1992). New conceptualizations of practice: Common principles in three paradigms suggest new concepts for training. Psychological Science, 3, 207–217. Skinner, B. F. (1938). The behavior of organisms. New York: Appleton-CenturyCrofts. Stanovich, K. (1980). Toward an interactive compensatory model of individual differences in the development of reading fluency. Reading Research Quarterly, 16, 32–71. Sweller, J. (1999). Instructional design in technical areas. Victoria, Australia: Australian Council for Education Press. Thiede, K. W., & Anderson, M. C. M. (2003). Summarizing can improve metacomprehension accuracy. Contemporary Educational Psychology, 28, 129–160. Thiede, K. W., Anderson, M. C. M., & Therriault, D. (2003). Accuracy of metacognitive monitoring affects learning of texts. Journal of Educational Psychology, 95, 66–73. Thiede, K. W., & Dunlosky, J. (1999). Toward a general model of self-regulated study: An analysis of selection of items for study and self-paced study time. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25, 1024–1037. Thiede, K. W., Dunlosky, J., Griffin, T. D., & Wiley, J. (2005). Understanding the delayed-keyword effect on metacomprehension accuracy. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 1267–1280. Thomas, A. K., & McDaniel, M. A. (2007). The negative cascade of incongruent generative study-test processing in memory and metacomprehension. Memory and Cognition, 35, 668–678. Tolman, E. C. (1932). Purposive behavior of animals and men. New York: Century. Tulving E. (1983). Elements of episodic memory. New York: Oxford University Press. Tulving, E., & Osler, S. (1968). Effectiveness of retrieval cues in memory for words. Journal of Experimental Psychology, 77, 593–601. Tulving, E., & Pearlstone, Z. (1968). Availability versus accessibility of information in memory for words. Journal of Verbal Learning and Verbal Behavior, 5, 381–391. Zacks, R. T., Hasher, L., Sanft, H., & Rose, K. C. (1983). Encoding effort and recall: A cautionary note. Journal of Experimental Psychology: Learning, Memory, and Cognition, 9, 747–756.

Y109937.indb 198

10/15/10 11:04:03 AM

10

Testing, Generation, and Spacing Applied to Education Past, Present, and Future Catherine O. Fritz

Research about remembering and learning should make sense in the laboratory and in the world; when it fails to do so, there is something wrong or something incompletely understood and more work is needed. Bob Bjork’s approach to his own research and his consideration of others’ work seem to embody this principle, and his influence in this respect was fundamental to my development as a researcher and an educator. The current emphasis on giving psychology away and making it applicable to the world is due in great part to his efforts. This chapter focuses on the application of psychology to education, primarily in terms of spacing, testing, and generation effects—areas that are strongly associated with Bjork’s research—specifically in terms of evidence and arguments related to their relevance to education. Bjork has argued that research on memory can and should be applied to teaching and learning (e.g., Bjork, 1979; deWinstanley & Bjork, 2002) and has suggested specific practical applications such as spacing of material when teaching and when studying, varying the context for new concepts, embedding retrieval practice in course design and study time, paraphrasing as an aid to study, and drawing students’ attention to the high-level organization of the material. Others have joined in his calls to apply these principles to education (e.g., Dempster, 1989, 1992, 1997; Larsen, Butler, & Roediger, 2008). 199

Y109937.indb 199

10/15/10 11:04:03 AM

200 • Catherine O. Fritz

The application of psychology to education has been a recent focus for cognitive psychologists (e.g., McDaniel, 2007), but it also has a long history. James (1890/1981) referred to practice testing as active repetition: “In learning by heart (for example), when we almost know the piece, it pays better to wait and recollect by an effort from within, than to look at the book again. If we cover the words in the former way, we shall probably know them the next time; if the latter way, we shall likely need the book once more” (p. 646). Testing and generation effects (Jacoby, 1978; Slamecka & Graf, 1978; Wittrock & Carter, 1975) are treated here as two terms for the same fundamental effect: a benefit in later remembering as a result of producing an answer to a demand at an earlier time. Both effects are typically concerned with later memory for the material that was produced as an answer to a demand at study time. Tests require generation: Practice tests can be seen as a demand for generation from a constructivist perspective, as memory for the information is constructed or generated. Generation often involves a test of memory: The paradigms associated with generation effects typically require the participant to produce information that is somehow related to some given information; because the generated material is constructed from knowledge associated with the given information, some degree of remembering is involved. When the generated material is not constructed from memory, for example, for unfamiliar material such as nonwords, then the generation effect vanishes (e.g., Bertsch, Pesta, Wiscott, & McDaniel, 2007; Lutz, Briggs, & Cain, 2003).

Early Approaches to Learning and Teaching Practice testing, generation, and spaced repetition often have key roles to play in approaches to teaching and learning. Herbart’s (1898) analysis of teaching and learning provides one example; he strongly emphasizes the role of curiosity in learning, arguing that curiosity leads the learner to generate ideas, instances, and features. His model of an effective lesson contains five steps:

1. Preparation—bringing to mind related knowledge 2. Presentation—presenting new information 3. Comparison—with other exemplars or contexts 4. Generalization—to instances that are provided 5. Application—to new instances

When following this process, the new information is revisited multiple times, in multiple contexts, thereby potentially providing benefits of active, spaced repetition and variable encoding (e.g., Gartman &

Y109937.indb 200

10/15/10 11:04:03 AM

Testing, Generation, and Spacing Applied to Education • 201

Johnson, 1972). The preparation step, the comparison step, and the application step will almost certainly involve generation of instances and features, providing benefits of more elaborate and deeper semantic processing (Craik & Lockhart, 1972) and possibly generation effects. Dewey (1938/1997a) argued that education was about developing the ability to think, and that the acquisition of information was almost an incidental outcome of activities that stimulated the ability to think. He mapped Herbart’s (1898) model onto his own analysis of thinking (Dewey, 1910/1997b). Students’ interest and learning were to be developed through problem-solving activities that involved a sequence of experience, idea formation, and application of these ideas. Although Dewey’s approach emphasized problem solving, experience, and understanding rather than explicit mnemonic strategies for remembering information, the mnemonic elements are nevertheless incorporated. Learning through an active process of problem solving clearly provides opportunities to bring relevant information to mind (retrieval practice), to think about new information in various ways (variable encoding, deeper semantic processing), and to generate the ideas that are the focus of the lesson (generation). These early approaches were concerned primarily with understanding (i.e., developing meaning) as a key to learning, and the meaningfulness of the material is well established as an important factor in remembering (Bartlett, 1932; Ebbinghaus, 1885/1964). Specific study strategies for learning from lectures and books and for preparing for examinations are also useful; these were the focus of Mace’s (1932) application of psychology in a how-to book for students. With respect to learning from books and lectures, he advised a distributed study pattern with gradually increasing intervals: “Acts of revision should be spaced in gradually increasing intervals, roughly intervals of one day, two days, four days, eight days and so on” (pp. 37–38). This advice provides an early precursor to Landauer and Bjork’s (1978) expanding practice schedules. Like James (1890/1981), Mace noted that rereading material is passive repetition, which is far less effective than active repetitions, such as recalling it or explaining it. Mace also observed that assigned essays and examinations are not merely diagnostic tools, but are also important aids to learning; better memory results from generating explanations of what is being learned.

Making Practice More Effective The expanding intervals suggested by Mace and demonstrated by Landauer and Bjork (1978) are a useful heuristic when scheduling

Y109937.indb 201

10/15/10 11:04:03 AM

202 • Catherine O. Fritz

practice activities, but under some conditions uniform intervals are more effective than their corresponding expanding counterparts (Karpicke & Roediger, 2007). The effectiveness of different types of schedules may be based on how effortful the practice tests are. Robert Bjork has often argued that effortful retrievals do more to improve memory than do easy ones. Thus, material that is relatively easy to retain over a short period would not benefit from the initial test that occurs after a very short time, as usually occurs in expanding schedules, because it is a trivial practice event. On the other hand, material that is very difficult to retain over a short period would probably benefit little from practice with uniform intervals, because there would rarely be a successful practice event. This line of reasoning suggests that it is not the spacing that is key, but the effort involved in the practice event. To further test that idea, we manipulated both the nature of the intervals and the nature of the cues in a laboratory experiment on learning foreign language vocabulary (Morris & Fritz, 2006). Participants studied and were tested on German–English translations with either expanding or uniform intervals and either constant level cues or diminishing cues. Each practice test provided the German word and letters from the English word. Constant level cues always provided two letters: the first letter and one other; the second letter was different for each of the three practice tests. For diminishing cues, the first practice test provided three letters (initial and two others), the second provided two letters (initial and one other), and the third provided only the initial letter. Refer to Figure 10.1 for a summary of our results. Expanding intervals (one, four, and seven intervening items) were significantly more effective than uniform ones (four intervening items), and decreasing the information provided in the cues each time led to significantly better learning than holding the cue level constant. Learning was improved by making each practice test somewhat more difficult than the previous, whether the increased difficulty was due to a larger interval or a reduced cue. Although each practice test should require effort, they should probably not be too difficult, to avoid excessive failure, which might demoralize and demotivate the learner.

Retrieval Practice in the Classroom One of the memory techniques that Bjork uses in his own classes is the name game: a retrieval practice-based technique to help students in seminar or work groups to learn one another’s names, and so encourage

Y109937.indb 202

10/15/10 11:04:04 AM

Testing, Generation, and Spacing Applied to Education • 203

Mean Recall (max = 4)

3

Decreasing cues Similar cues

2.5 2 1.5 1 0.5 0

Expanding

Uniform

P only

Figure 10.1 Translation performance on a criterion test following a 10-minute filled interval from Morris and Fritz (2006). Words and their translations were presented once and, for practiced words, practiced three times. Expanding intervals in practice were significantly more effective than uniform intervals (d = .2); decreasing cues were significantly more effective than similar cues (d = .2). Error bars represent ±1 SEM.

a better working relationship. Like many of his students, I adopted the name game in my own seminar teaching and discussed the merits of variants of the game with colleagues. Because names are notoriously difficult to remember, and because there was no published evidence of the name game’s effects, we experimentally demonstrated its impressive effectiveness with actual seminar groups ranging in size from 6 to over 20 and investigated the effects of variations on how the game is played (Morris & Fritz, 2000, 2002; Morris, Fritz, & Buck, 2004). Refresher sessions a week or two after the initial game made the name game even more effective, and benefits of the game were observed even after 11 months. Our research also showed that the benefits of practice testing for name learning could be considerably boosted by combining the practice with attention to the meaning in names (Morris, Fritz, Jackson, Nichol, & Roberts, 2005). In addition to advice based on the application of psychology to education, empirical work has tested the application of practice testing and spacing in the classroom and with education-like materials. Early experimental work by Gates (1917) with primary school children investigated the optimum ratio of reading to recitation from memory for children in several grades, ranging from grade 1 to grade 8. His materials and procedures were adapted somewhat to be appropriate for the age of each group. When students studied education-like materials (biographies), there was always a benefit to spending some time reciting; on average,

Y109937.indb 203

10/15/10 11:04:05 AM

204 • Catherine O. Fritz

the best results were obtained when about 40% of the time was spent in recitation. The benefit of optimal recitation over reading alone was about 30%, although this figure was higher for younger children (36%) and lower for older ones (18%). For a more detailed summary of Gates’s data without the need to access the full dissertation, see Roediger and Karpicke (2006). Although Gates’s (1917) work showed that a mixture of reading and recitation (i.e., testing) led to better memory for the details of the studied biographies, both immediately and after a short delay, it did not assess the longer-term effects of testing, nor did it address the question of when the tests should ideally occur. These questions were the subject of Spitzer’s (1939) work, conducted in schools across the state of Iowa. In their regular classes, under the direction of their regular teachers, students read short expository texts and were tested two or more times after delays varying from immediate tests to 63 days. By varying the timing of the initial test, he showed a customary forgetting curve with rapid loss that appeared to asymptote after about two weeks. It is noteworthy that the pattern of forgetting was very similar to that obtained from adults under laboratory conditions, often studying materials quite unlike expository texts. Spitzer designed his research so that the second and third tests were matched in terms of delay from the study session to the first or second tests for other groups of students. In this way, he was able to demonstrate the powerful effect of an early test in retarding further forgetting. Jones (1923) also demonstrated the powerful effect of tests at the end of lectures. When a practice test immediately followed a lecture, criterion test scores were on average more than 50% higher than when no end-of-lecture test occurred; criterion tests occurred from three days to eight weeks later (p. 51). He argued strongly that it was essential that the test should occur immediately following the lecture to arrest forgetting; as little as two days later, much of the forgetting had already occurred. We obtained results similar to Jones’s in a small quasi-experiment embedded in a second-year statistics course that was taught in two sections (Fritz & Morris, 2003). At the end of a lecture on multiple regression, students either took a short-answer test on nine key points or were given a brief review of those nine points using the same statements that formed the basis of the test. Students who were tested were not given feedback. The nine points were randomly selected from a pool of 18 points addressed in the lecture. The following week a surprise short-answer test was administered covering all 18 points. Figure 10.2 illustrates the powerful effect of the test at the end of the lecture on memory a week later.

Y109937.indb 204

10/15/10 11:04:05 AM

Correct Answers (max = 9)

Testing, Generation, and Spacing Applied to Education • 205 5

Tested or reviewed Not revisited

4 3 2 1 0

Test at Test at Lect+1hr+2hr Lect+1hr

Test at Lect only

Review+ 1hr test

Review only

Practice Schedule

Figure 10.2 Performance on a short-answer test one week after the lecture from Fritz and Morris (2003). Students in the left three groups attended the lecture at the same time and were tested on nine key points at the end of the lecture. One subgroup was also tested after one and two hours; one was also tested after one hour, and one did not have an additional test on the day of the lecture. Students in the right two groups attended the same lecture on a different day; for these students, the lecture ended with a review of the same nine key points. Half of these students were tested one hour after the lecture. A week later all students took a surprise test on all 18 key points from the lecture.

Learning to spell can also benefit from spaced practice. In an early experiment by Fishman, Keller, and Atkinson (1968) involving computer-assisted learning, fifth-grade students practiced spelling words in each of six sessions. Words that were massed (practiced three times within one session) were less well learned than those that were spaced (practiced once in each of three sessions) when tested 10 and 20 days later. In another experimental study (Rieth et al., 1974), poor spellers from a fifth-grade class spent five weeks being taught spelling in a traditional way—getting all 20 words on Monday, doing workbook exercises during the week, and being tested on Friday. For the next five weeks they were given five words each day, Monday through Thursday, with workbook exercises for those words and a test on the following day, followed by a test on all 20 words on Friday. The next three weeks reinstated the traditional method, and the final two weeks reinstated the spaced practice. The benefits observed for the words that were practice tested were substantial; every child earned higher grades with practice tests. Learning disabled children can also benefit from reducing the number of words introduced each day and distributing practice (Gettinger, Bryant, & Fayne, 1982). Early reading lessons were the context for research by Seabrook, Brown, and Solity (2005). Phonics-based reading lessons for year 1 students were held either in three 2-minute sessions each day or in a single

Y109937.indb 205

10/15/10 11:04:06 AM

206 • Catherine O. Fritz

6-minute session for two weeks; the spaced sessions led to substantially greater improvements over that time, providing clear evidence for spacing-related benefits. Both testing and spacing effects were observed by Carpenter, Pashler, and Cepeda (2009) in eighth-grade history classes with a nine-month delay to the criterion test. Facts were reviewed after 1 week, 16 weeks, or not at all for three different groups of children. The review involved two steps: In the first part, students worked through 15 questions to be answered and 15 questions with answers provided that were to be read; after completing that activity, they were asked to read through a reordered set of all 30 questions and answers. Fifteen items were not reviewed. Facts reviewed by testing were significantly better recalled on the final test than those reviewed or not reviewed. The longer delay before the review led to descriptively better final test performance with a medium effect size, but the difference was marginally nonsignificant. Rea and Modigliani (1985) demonstrated benefits from spaced practice in both spelling and multiplication facts in third-grade students. They also observed an additional advantage to expanding intervals in that children were able to feel successful while learning.

Laboratory Work With Realistic Materials Although laboratory research is invaluable in identifying the characteristics and some limits of an effect, responsible implementation in educational settings requires research that bridges the gap by using more educationally realistic materials or retention intervals and sometimes, as above, by moving into the classroom. Naveh-Benjamin (1990) provides an excellent discussion about the benefits of establishing clear links between the laboratory work and the educational work. Reviews of research to inform education, such as that recently conducted by the U.S. Department of Education’s National Mathematics Advisory Panel, establish criteria that demand both internal and external validity for the most useful research (Reyna, Benbow, Boykin, Whitehurst, & Flawn, 2008). Laboratory-based research on generation, testing, and spacing effects has often involved learning word pairs with immediate tests or retention intervals of less than an hour. This work transfers quite easily to an educational activity when learning foreign language vocabulary, which appears similar to paired associate learning. For example, Bloom and Schuell (1981) researched high school students learning French. As part of their usual class activities, students studied 20 words using three 10-minute worksheets; half of the students were randomly assigned to

Y109937.indb 206

10/15/10 11:04:06 AM

Testing, Generation, and Spacing Applied to Education • 207

space those worksheets across three days, and half worked all three on the same day. Immediate tests showed negligible difference between the groups, but after four days a significant advantage for distributed practice was observed. Our research (Fritz, Morris, Acton, Voelkel, & Etkind, 2007) has demonstrated that expanding retrieval practice can be roughly equivalent to the keyword method in boosting receptive learning of foreign language vocabulary and superior to the keyword method for productive learning. Education often involves learning from texts; some laboratory studies such as those described below asked students to study the texts and then provided one or more practice tests prior to a criterion test. When participants studied short journal papers and were tested immediately afterwards—the practice test—and then tested three days later, the format of the practice test and the presence or absence of feedback on the practice test were important (Kang, McDermott, & Roediger, 2007). Without feedback, multiple choice practice tests led to better learning, perhaps because people were more often correct, whereas short-answer practice tests were more effective when feedback was provided, regardless of the format of the criterion test. The results suggest that frequent quizzes are a good way to assist learning; if feedback is provided and used by the students, short-answer quizzes are most effective, but if feedback is either not provided or not used, multiple choice quizzes are more effective. If feedback is beneficial, then what would be the effect of having open-book practice tests? Participants in another experiment studied short texts and either took an immediate practice test or not (Agarwal, Karpicke, Kang, Roediger, & McDermott, 2008). Open-book tests will obviously lead to better performance on the practice test, but more importantly, they also led to performance on the criterion test a week later that was equivalent to the closed-book tests and superior to the group without practice tests. The notion that open-book practice tests, which produce almost perfect performance, can improve learning as effectively as closed-book tests has important educational implications at all levels. Generating explanations and summaries of studied material is arguably another form of testing and has long been argued as an essential part of comprehension and learning (e.g., Wittrock, 1989). Eighth-grade students who were prompted to explain what they had just read, as they read through a short biology text, learned more from the text than those who simply read through the text twice (Chi, de Leeuw, Chiu, & LaVancher, 1994). We observed the same pattern when comparing the memory of preschool children for the contents of a short video clip of

Y109937.indb 207

10/15/10 11:04:06 AM

208 • Catherine O. Fritz

a children’s program (Fritz & Morris, 2006). Children watched a short video of an unfamiliar children’s program; half were immediately asked to explain what they just saw, and the other half watched the video clip a second time. The children who explained remembered significantly more than children who viewed the clip twice. In other research (Hooper, Sales, & Rysavy, 1994), university students who read a 6,200word chapter and individually summarized each paragraph scored better on a test of literal comprehension than did students who generated analogies or simply reread each paragraph; however, the same pattern was not observed for students who worked in pairs. It appears that generating a summary of the content being studied is an important aid to memory and learning. Learning mathematics is a challenge for many students, and distributed practice tests appear to improve learning here as well. Rohrer and Taylor (2006, 2007) conducted experiments with university students learning to solve mathematics problems. These studies document substantial and significant benefits of distributed practice in mathematics learning and provide the basis for Rohrer and Taylor’s argument that the organization of mathematics textbooks should change from massing practice problems in the section where the concepts are explained to distributing them across sections and chapters through the book. Realistic time frames for educational materials are obviously important as well. In a large, Internet-based study (Cepeda, Vul, Rohrer, Wixted, & Pashler, 2006), participants were taught a set of 32 obscure facts through practice tests. In the initial session and the review session, the entire set of questions was presented, one at a time, as many times as necessary to answer every item correctly. Once a question was answered correctly, it was dropped from that session. Review sessions occurred immediately or up to 105 days later; the criterion test occurred 7, 35, 70, or 350 days after the review. The optimal interval between initial study and review was neither a constant amount nor a constant proportion of the interval between review and criterion test. To maximize recall after 7 days, it is best to review immediately, but to maximize recall a year later, a review after 21 to 35 days was best. Recognition tests produced higher scores, but a similar pattern was observed. In related research with university students participating in person, rather than on the Internet, similar patterns were observed for learning foreign vocabulary, facts, and names of objects (Cepeda et al., 2009). This pattern of performance suggests that, in line with Spitzer’s (1939) results and Mace’s (1932) advice, truly optimal learning over the long term would follow from review or practice after a short while,

Y109937.indb 208

10/15/10 11:04:06 AM

Testing, Generation, and Spacing Applied to Education • 209

leading to better performance on practice at an intermediate interval, in turn leading to better performance following a longer interval.

Programmed Instruction and Device-Directed Practice From the current, cognitive vantage point, Skinner’s (1958) programmed instruction and teaching machines are also testing-based approaches to learning. Holland and Skinner’s (1961) The Analysis of Behavior is a prime example: a textbook composed entirely of test questions and answers. Students work through each set of roughly 30 to 60 questions, writing answers and checking them immediately, without skipping items. Some key points are questioned more than once, in slightly different ways, at intervals through the set. Students are advised to repeat each set until they make few or no errors, when they can proceed to the next set. Repeating the complete set of questions provides spacing between an item and its repetition. Periodic review sets are also provided, which provide tests following a more extended interval and direct students to repeat earlier sets that have not withstood the test of time. A programmed instruction approach can be very effectively implemented on computers and can significantly improve students’ learning (e.g., Davis, Bostow, & Heimisson, 2007; Miller & Mallott, 2006; Uhumuavbi & Mamudu, 2009). A five-year experiment in primary schools investigated the effectiveness of computer-based drill and practice sessions in mathematics, reading, and language arts (Ragosta, Holland, & Jamison, 1982). Just 10 minutes per day of practice substantially improved reading and language arts performance on standardized tests as well as on tests associated with the program in the first year and the two subsequent years. For mathematics, 20 minutes per day led to even more substantial benefits, which continued to increase across the three years. A present-and-test approach to teaching often underlies computeraided instruction, sometimes in combination with a multistep approach similar to Herbart’s (1898). For example, the I Can Learn© (Interactive Computer Aided Natural Learning) program provides five-stage mathematics lessons: pretest, review of prerequisites needed, lesson, cumulative review, and comprehensive test. Students in classes randomly assigned to this computer- and practice-based approach performed significantly better on prealgebra and algebra tests than those in classes with traditional approaches (Barrow, Markman, & Rouse, 2009). One might argue that the benefits of this type of approach are not due merely

Y109937.indb 209

10/15/10 11:04:06 AM

210 • Catherine O. Fritz

to the testing and practice aspects, but that these students benefit from the more individualized instruction provided by an intelligent practicebased approach. However, it is clear that all individualized instruction relies on formal or informal diagnostic testing to determine what the student needs; one beneficial effect of testing is that it identifies areas of weakness and strength, enabling individualized instruction. The benefits of testing are multifaceted: improved memory, improved confidence when information is successfully recalled, and identification of information that has not yet been mastered. Leitner boxes are another present-and-test or flashcard approach that is widely used in German schools; they are explicitly based on Ebbinghaus’s (1885/1964) work and make learning more efficient by increasing the intervals between successive correct answers. Flashcards and a series of boxes or compartments are required. Flashcards are practiced until learned, and then moved to the next box to be practiced again after a longer delay. When answered correctly from the second box, they are moved to the third box for practice after an even longer delay. The number of boxes can vary, typically from about four to about eight, but the principle is one of strengthening memory by stretching it, akin to the Bjorks’ new theory of disuse (Bjork & Bjork, 1992). The Leitner box approach lends itself to computer-based study and has been implemented in the phase-6 software® (phase-6®, 2007), which schedules the reintroduction of questions based on past performance. Information to be learned is loaded into the software in a question-and-answer format; the learner is tested and given the correct answers. Correctly answered questions move to the next phase, to be tested again after one day for phase 1, after three days for phase 2, and after increasing intervals for each higher phase. When a student makes an error on a question, that question is moved back one phase, shifting to a shorter interval, to enable the student to recover his or her mastery of the material. Preliminary research in foreign language classes in U.S. and German schools suggests that phase-6® is enjoyed by students and may help their learning (Fritz, Passey, & Morris, 2009). As a pilot project funded by the UK Economic and Social Research Council, we recently developed SIMPLE: computer-based instruction for a few topics in introductory statistics (Fritz, Peelo, Folkard, & Ramirez-Martinell, 2008). The SIMPLE tutorials provide lessons with embedded questions to both monitor students’ understanding and consolidate their learning; Figure 10.3 provides an example of typical flow between screens. Tutorial screens have questions embedded to both monitor understanding and improve learning; if a student answers incorrectly, he or she works through additional tutorial screens, which

Y109937.indb 210

10/15/10 11:04:06 AM

Testing, Generation, and Spacing Applied to Education • 211 Introduce module and list topics to be addressed

Begin tutorial on 1st topic r te

1s

Question on understanding of one aspect Correct Positive feedback on correct response

Brief review of selected aspect More detailed/ comprehensive review Scaffolded questions Warning or exit

r

A: Erro B: Er

ror

1st error 2nd err or

Specific review Further review or warning or exit

Error feedback + specific tutorial

Could embed further questions

Error feedback + specific tutorial

Could embed further questions

Error feedback + specific tutorial

Could embed further questions

or

Positive feedback on correct response

Correct 3r d er ro r

rr :E

C: Correct

2nd error

D

Question on understanding of another aspect (multiple choice)

r ro

Further tutorial on 1st topic or begin 2nd topic . . . and so forth . . .

Figure 10.3 Example flow in a SIMPLE tutorial. Boxes represent one or more slides; block arrows represent main-line flow through the module; line arrows represent optional flow, based on students’ responses. This pilot project was funded by UK ESRC grant RES-043-25-0007.

are sometimes designed to address the specific error, before being asked the question again. If errors persist, more basic tutelage is provided, often with embedded questions to identify where the student is lost. Questions appear initially where a topic is explained and, embedding distributed retrieval practice, again at various points later in the tutorial.

Y109937.indb 211

10/15/10 11:04:08 AM

212 • Catherine O. Fritz

Every student works at his or her own pace, and students who quickly grasp a topic are able to move on to the next one, whereas students who struggle to grasp a topic encounter more explanations and examples. The structure of the tutorials is based on principles of spacing, practice testing, and individualized instruction. Students are generally enthusiastic about the tutorials, overwhelmingly preferring them to textbook material. The tutorials are useful for initial teaching and for review.

Conclusion Clearly there is a long and successful history of efforts to apply testing, generation, and spacing effects to improving educational practices, but there is still much progress to be made. It may be premature to advocate or adopt practices in classrooms that have only been demonstrated in the laboratory, and it is not reasonable to promote educational changes without solid empirical support derived from rigorously designed and conducted research. The costs of ill-advised changes are far too great, in terms of both potential damage to students’ learning and the misallocation of limited resources, including costs and teachers’ investments. Changes in classrooms and textbooks should be based on research that clearly has both internal and external validity (e.g., Reyna et al., 2008). In early days, a few psychologists investigated memory and learning through controlled experiments in classrooms involving educationally relevant materials and time frames presented in an educational context (e.g., Gates, 1917; Jones, 1923; Spitzer, 1939), but since that time, much of the study of memory and learning has followed a course relatively independent of education. The recent renewal of interest in both applying laboratory results to education and learning about human memory and learning in realistic contexts is to be welcomed. There remains a need for more, well-designed research that is both rigorous (high in internal validity) and generalizable (high in external validity) to better describe and understand how memory and learning work in educational contexts. Recent and future advances in this respect are, at least in part, a result of the important contributions that Robert Bjork has made through his own research, his interactions with students and colleagues, and his influence as a leader in the field of cognitive psychology.

References Agarwal, P. K., Karpicke, J. D., Kang, S. H. K., Roediger, H. L., III, & McDermott, K. B. (2008). Examining the testing effect with open- and closed-book tests. Applied Cognitive Psychology, 22, 861–876.

Y109937.indb 212

10/15/10 11:04:08 AM

Testing, Generation, and Spacing Applied to Education • 213

Barrow, L., Markman, L., & Rouse, C. E. (2009). Technology’s edge: The educational benefits of computer-aided instruction. American Economic Journal: Economic Policy, 1, 52–74. Bartlett, F. C. (1932). Remembering: A study in experimental and social psychology. London, U.K.: Cambridge University Press. Bertsch, S., Pesta, B. J., Wiscott, R., & McDaniel, M. A. (2007). The generation effect: A meta-analytic review. Memory and Cognition, 35, 201–210. Bjork, R. A. (1979). Information-processing analysis of college teaching. Educational Psychologist, 14, 15–23. Bjork, R. A., & Bjork, E. L. (1992). A new theory of disuse and an old theory of stimulus fluctuation. In A. F. Healy, S. M. Kosslyn, & R. M. Shiffrin (Eds.), Essays in honor of William K. Estes: From learning processes to cognitive processes (Vol. 2, pp. 35–67) Hillsdale, NJ: Erlbaum. Bloom, K. C., & Schuell, T. J. (1981). Effects of massed and distributed practice on the learning and retention of second language vocabulary. Journal of Educational Research, 74, 245–248. Carpenter, S. K., Pashler, H., & Cepeda, N. J. (2009). Using tests to enhance 8th grade students’ retention of U.S. history facts. Applied Cognitive Psychology, 23, 760–771. Cepeda, N. J., Coburn, N., Rohrer, D., Wixted, J. T., Mozer, M. C., & Pashler, H. (2009). Optimizing distributed practice: Theoretical analysis and practical applications. Experimental Psychology, 56, 236–246. Cepeda, N. J., Vul, E., Rohrer, D., Wixted, J. T., & Pashler, H. (2008). Spacing effects in learning: A temporal ridgeline of optimal retention. Psychological Science, 19, 1095–1102. Chi, M. T. H., de Leeuw, N., Chiu, M.-H., & LaVancher, C. (1994). Eliciting selfexplanations improves understanding. Cognitive Science, 18, 439–477. Craik, F. I. M., & Lockhart, R. S. (1972). Levels of processing: A framework for memory research. Journal of Verbal Learning and Verbal Behavior, 11, 671–684. Davis, D. R., Bostow, D. E., & Heimisson, G. T. (2007). Strengthening scientific verbal behavior: An experimental comparison of progressively prompted and unprompted programmed instruction and prose tutorials. Journal of Applied Behavior Analysis, 40, 179–184. Dempster, F. N. (1989). Spacing effects and their implications for theory and practice. Educational Psychology Review, 1, 309–330. Dempster, F. N. (1992). Using tests to promote learning: A neglected classroom resource. Journal of Research and Development in Education, 25, 213–217. Dempster, F. N. (1997). Using tests to promote classroom learning. In R. F. Dillon (Ed.), Handbook on testing (pp. 332–346). Westport, CT: Greenwood. Dewey, J. (1997a). Experience and education. New York: Touchstone. (Original work published 1938) Dewey, J. (1997b). How we think. Mineola, NY: Dover. (Original work published 1910)

Y109937.indb 213

10/15/10 11:04:08 AM

214 • Catherine O. Fritz

deWinstanley, P. A., & Bjork, R. A. (2002). Successful lecturing: Presenting information in ways that engage effective processing. In M. D. Svinicki (Ed.), New directions for teaching and learning: Applying the science of learning to university teaching and beyond (No. 89, pp. 19–31). Hoboken, NJ: Jossey-Bass, Wiley. Ebbinghaus, H. (1964). Memory. Oxford, UK: Dover. (Original work published 1885) Fishman, E. J., Keller, L., & Atkinson, R. C. (1968). Massed versus distributed practice in computerized spelling drills. Journal of Educational Psychology, 59, 290–296. Fritz, C. O., & Morris, P. E. (2003, July). Expanding retrieval practice: Investigating the parameters. Paper presented at the 5th Biennial Conference of the Society for Applied Research in Memory and Cognition, Aberdeen. Lancaster, UK. Fritz, C. O., & Morris, P. E. (2006, September). Explaining improves learning in preschool children. Paper presented at the Annual Conference of the Cognitive Section of the British Psychological Society, Lancaster University. Fritz, C. O., Morris, P. E., Acton, M., Voelkel, A. R. & Etkind, R. (2007). Comparing and combining expanding retrieval practice and the keyword mnemonic for foreign vocabulary learning. Applied Cognitive Psychology, 21, 499–526. Fritz, C. O., Passey, D., & Morris, P. E. (2009). Phase-6® AG, analyses and results of initial research studies: An independent report. Retrieved from http:// www.phase-6.com/opencms/system/galleries/download/lernsoftware/ phase-6-Research-Final-Report.pdf Fritz, C. O., Peelo, M., Folkard, A. M., & Ramirez-Martinell, A. (2008). Quantitative skills in the social sciences: Identifying and addressing the challenges. In D. Green (Ed.), CETL-MSOR Conference Proceedings (pp. 34–40). Birmingham, UK: Maths, Stats and OR Network. Gartman, L. M., & Johnson, N. F. (1972). Massed versus distributed repetition of homographs: A test of the differential-encoding hypothesis. Journal of Verbal Learning and Verbal Behavior, 11, 801–808. Gates, A. I. (1917). Recitation as a factor in memorizing. Archives of Psychology, 6(40). Gettinger, M., Bryant, N. D., & Fayne, H. R. (1982). Designing spelling instruction for learning-disabled children: An emphasis on unit size, distributed practice, and training for transfer. Journal of Special Education, 16, 439–448. Herbart, J. F. (1898). The application of psychology to education (B. C. Mulliner, Trans.). London, UK: Swan Sonnenschien. Holland, J. G., & Skinner, B. F. (1961). The analysis of behavior. New York: McGraw-Hill. Hooper, S., Sales, G., & Rysavy, S. D. M. (1994). Generating summaries and analogies alone and in pairs. Contemporary Educational Psychology, 19, 53–62.

Y109937.indb 214

10/15/10 11:04:08 AM

Testing, Generation, and Spacing Applied to Education • 215

Jacoby, L. L. (1978). On interpreting the effects of repetition: Solving a problem versus remembering a solution. Journal of Verbal Learning and Verbal Behavior, 17, 649–667. James, W. (1981). The principles of psychology (Vol. 1). Cambridge, MA: Harvard University Press. (Original work published 1890) Jones, H. E. (1923). Experimental studies of college teaching. Archives of Psychology, 10, 68. Kang, S. H. K., McDermott, K. B., & Roediger, H. L., III. (2007). Test format and corrective feedback modify the effect of testing on long-term retention. European Journal of Cognitive Psychology, 19, 528–558. Karpicke, J. D., & Roediger, H. L., III. (2007). Expanding retrieval practice promotes short-term retention, but equally spaced retrieval enhances longterm retention. Journal of Experimental Psychology: Learning, Memory, and Cognition, 33, 704–719. Landauer, T. K., & Bjork, R. A. (1978). Optimal rehearsal patterns and name learning. In M. M. Gruneberg, P. E. Morris, & R. N. Sykes (Eds.), Practical aspects of memory (pp. 625–632). London, UK: Academic Press. Larsen, D. P., Butler, A. C., & Roediger, H. L. III. (2008). Test enhanced learning in medical education. Medical Education, 42, 959–966. Lutz, J., Briggs, A., & Cain, K. (2003). An examination of the value of the generation effect for learning new material. Journal of General Psychology, 130, 171–188. Mace, C. A. (1932). The psychology of study. London, UK: Methuen. McDaniel, M. A. (Ed.). (2007). Applying cognitive psychology to education [Special section]. Psychonomic Bulletin and Review, 14, 185–255. Miller, M. L., & Mallot, R. W. (2006). Programmed instruction: Construction responding, discrimination responding, and highlighted keywords. Journal of Behavioral Education, 15, 111–119. Morris, P. E., & Fritz, C. O. (2000). The name game: Using retrieval practice to improve the learning of names. Journal of Experimental Psychology: Applied, 6, 124–129. Morris, P. E., & Fritz, C. O. (2002). The improved name game: Better use of expanding retrieval practice. Memory, 10, 259–266. Morris, P. E., & Fritz, C. O. (2006, November). Reduced cuing as an alternative to improving memory using expanding retrieval practice. Poster presented at the 47th Annual Meeting of the Psychonomic Society, Houston, TX. Morris, P. E., Fritz, C. O., & Buck, S. (2004). The name game: Acceptability, bonus information and group size. Applied Cognitive Psychology, 18, 89–104. Morris, P. E., Fritz, C. O., Jackson, L., Nichol, E., & Roberts, E. (2005). Strategies for learning proper names: Expanding retrieval practice, meaning and imagery. Applied Cognitive Psychology, 19, 779–798. Naveh-Benjamin, M. (1990). The acquisition and retention of knowledge: Exploring mutual benefits to research and the educational setting. Applied Cognitive Psychology, 4, 295–320.

Y109937.indb 215

10/15/10 11:04:08 AM

216 • Catherine O. Fritz

Phase-6®. (2007). Locking vocabulary into long term memory. Retrieved from http://www.phase-6.com/opencms/Homepage/ Ragosta, M., Holland, P. W., & Jamison, D. T. (1982). Computer assisted instruction and compensatory education: The ETS/LAUSD study final report. Educational Testing Service Project Report 19. Princeton, NJ: Educational Testing Service. Rea, C. P., & Modigliani, V. (1985). The effect of expanded versus massed practice on the retention of multiplication facts and spelling lists. Human Learning, 4, 11–18. Reyna, V. F., Benbow, C. P., Boykin, A. W., Whitehurst, G. J., & Flawn, T. (2008). Report of the subcommittee on standards of evidence. In U.S. Department of Education, Foundations for success: The final report of the National Mathematics Advisory Panel. Retrieved from http://www.ed.gov/about/ bdscomm/list/mathpanel/report/standards-of-evidence.pdf Rieth, H., Axelrod, S., Anderson, R., Hathaway, F., Wood, K., & Fitzgerald, C. (1974). Influence of distributed practice and daily testing on weekly spelling tests. Journal of Educational Research, 68, 73–77. Roediger, H. L., III, & Karpicke, J. D. (2006). The power of testing memory: Basic research and implications for education. Perspectives on Psychological Science, 1, 181–210. Rohrer, D., & Taylor, K. (2006). The effects of overlearning and distributed practise [sic] on the retention of mathematics knowledge. Applied Cognitive Psychology, 20, 1209–1224. Rohrer, D., & Taylor, K. (2007). The shuffling of mathematics problems improves learning. Instructional Science, 35, 481–498. Seabrook, R., Brown, G. D. A., & Solity, J. E. (2005). Distributed and massed practice: From laboratory to classroom. Applied Cognitive Psychology, 19, 107–122. Skinner, B. F. (1958). Teaching machines. Science, 128, 969–977. Slamecka, N. J., & Graf, P. (1978). The generation effect: Delineation of a phenomenon. Journal of Experimental Psychology: Human Learning and Memory, 4, 592–604. Spitzer, H. F. (1939). Studies in retention. Journal of Educational Pscyhology, 30, 641–656. Uhumuavbi, P. O., & Mamudu, J. A., (2009). Relative effects of programmed instruction and demonstration methods on students’ academic performance in science. College Student Journal, 43, 658–668. Wittrock, M. C. (1989). Generative processes of comprehension. Educational Psychologist, 24, 345–376. Wittrock, M. C., & Carter, J. F. (1975). Generative processing of hierarchically organized words. American Journal of Psychology, 88, 489–501.

Y109937.indb 216

10/15/10 11:04:08 AM

11

Learning From and for Tests William B. Whitten II

The Place of Testing in Education Tests are ubiquitous and important in formal education. Teachers and students universally view tests as assessment opportunities. In contrast, it is rare for educators or students to view tests as opportunities for learning. There are probably three reasons for this. The first is that educators are largely unaware of test-effect results that are primarily published in professional research journals, not in practitioner literature. The second is that an ever-expanding curriculum pressures teachers to move on to new topics following a test, so that students and teachers alike have little firsthand experience with repeated testing and its coincident effects. The third is that tests are perceived to absorb valuable class time and to generate work for teachers, so the use of tests is often minimized. For students to benefit from what we now know about test effects, it is necessary to address all three of these reasons. First, educators need to know about test effects. Second, educational strategy needs to be updated to take advantage of what is known about test effects. And third, an assessment mindset needs to move to an assessment plus learning mindset so that tests are perceived as a valuable use of class time and teacher effort.

217

Y109937.indb 217

10/15/10 11:04:09 AM

218 • William B. Whitten II

Two Categories of Test Effects Substantial evidence of positive and negative test effects has accumulated over the past 40 years. These effects can be placed in two categories that I will label test effects type 1 (TE-1) and test effects type 2 (TE-2). TE-1 experiments address this question: How do initial tests modify memory, either positively or negatively, as measured by subsequent test performance? TE-2 experiments ask: How do expected test characteristics affect student learning? In the next sections I will characterize each category of test effects by summarizing a few of my experiments that are representative of the larger literature, and will discuss the implications of these effects for education. Then, I will describe recent work on Guided Cognition, a strategy for designing unsupervised learning tasks (typical of homework and online learning) as an example of educational activities that include embedded implicit tests. Finally, I will suggest an instructional design approach that emphasizes testing and that takes advantage of the principles of Guided Cognition to accelerate learning.

Learning from Tests (TE-1) Several experimental manipulations demonstrate that, up to a point, more is learned from recall and recognition processes when conditions make these processes somewhat difficult. These experiments are instances of the idea that “desirable difficulties” in learning situations (Bjork, 1994) can, somewhat counterintuitively, facilitate learning. Effects of Initial-Retrieval Spacing on Long-Term Memory One way to make an initial retrieval more difficult is to insert an interval filled with a rehearsal-preventing task between an item’s presentation and its attempted recall. Whitten and Bjork (1972, 1977) reported an experiment that varied this filled interval to evaluate the effects of initial test spacing on later recall. The hypothesis was that initial retrieval would become more difficult with longer delays, but the benefit for long-term recall would be greater after longer initial retrieval delays. The prediction was that initial recall would be less likely, but later free recall would be more likely, as the initial-recall rehearsal-preventing spacing interval increased. The experiment used a new design that, at the time, was thought of as a cross between the Brown-Peterson paradigm and the free-recall paradigm. Subsequently, however, this design became known as the “longterm recency paradigm” because of the discovery of long-term recency effects when the final recall data were analyzed by serial position (Bjork

Y109937.indb 218

10/15/10 11:04:09 AM

Learning From and for Tests • 219

& Whitten, 1974; Whitten & Bjork, 1972, 1977). The paradigm is also known as the (less complimentary sounding) “continuous distractor paradigm” because each to-be-remembered item is preceded and followed by a rehearsal-preventing distractor task. Students in the experiment were presented three blocks of 12 continuous trials. Trials were designed to provide either two study presentations (P1-P2) or one study presentation and one recall test (P1-T). Each trial consisted of a 2-second presentation of a word pair, followed by 4, 8, or 14 seconds of digit shadowing, a 3-second presentation of the word pair or a 3-second interval to recall the word pair, and then 13, 9, or 3 seconds of digit shadowing, respectively. Thus, each trial within the continuous 12-trial block lasted 22 seconds. Each trial block started with 16 seconds of digit shadowing before the first trial and ended with 16 seconds of digit shadowing after the twelfth trial. Following each trial block, students were given a 2.2-minute free-recall test. The results, shown in Figure 11.1, support the prediction: Initial recall decreased, and final (end-of-block) free recall increased as the P1-T interval increased from 4 to 14 seconds. The final recall of items presented and then tested showed a remarkable parallel to the final recall for the items in the traditional P1-P2 spacing condition. 0.8

Recall Probability

0.7 0.6 P1-T Initial Test P1-T Final Test P1-P2 Final Test

0.5 0.4 0.3 0.2

4

5

6 7 8 9 10 11 12 13 14 P1-P2 or P1-T Interval (sec)

Figure 11.1 Initial-test and final-test recall as a function of presentation to initial-test spacing. Final-test recall as a function of presentation to presentation spacing. (After Whitten, W. B., II, & Bjork, R. A., Journal of Verbal Learning and Verbal Behavior, 16, 465–478, 1977.)

Y109937.indb 219

10/15/10 11:04:09 AM

220 • William B. Whitten II

Implications for Education These results, and many subsequent experiments, inform us that delaying an initial retrieval can make the act of retrieval more effective for learning. Applying this robust laboratory finding to education is, of course, a bit tricky because logically there must be a point at which initial retrieval is so unlikely, due to a long, distractor-filled P1-T interval, that little can be gained from a retrieval attempt. Also, it is likely that the optimally effective P1-T interval will vary by subject matter and complexity. For example, the optimal interval for learning single words of foreign vocabulary may be quite different from the optimal interval for learning complex details from a novel or play. Using laboratory-based experimental findings to guide instructional design is analogous to moving from material science to civil engineering. A bridge can be built based on what is known, but there will be occasional failures due to unknown variables. Likewise, we should use the idea of delaying an initial test to make it more effective for long-term retention, but we need to watch for the inflection point where initial recall is too low, endangering the bridge to longterm memory. Effects of Initial-Test Level of Processing on Long-Term Memory Probed recall has been shown to take longer if the probe requires a semantic match than if it requires an acoustic match (e.g., Whitten, 1974, Experiment III). If response latency reflects processing difficulty, semantic processing is, in some sense, more difficult than acoustic processing. It follows that initial probed recall requiring a semantic match should result in better long-term recall than initial probed recall requiring an acoustic match. Such results have been obtained (e.g., Whitten, 1974) and have been used to explain the negative recency effect (Whitten, 1974, 1978; see also Bjork, 1975). Implications for Education Applying initial-test level-of-processing results to education is straightforward, if imprecise. It is clear that we want to avoid simple rote rehearsal situations that allow students to parrot back just heard to-be-learned content, since such behavior may be based largely on acoustic-level information. And it is clear that we want students to retrieve content based on its semantic attributes. Beyond that, thinking about how to apply levels of initial retrieval to education becomes rather vague. The general principle, however, is easy for practitioners to understand, and it can be used to discourage the use of repetitive drill homework that may encourage surface-level thinking and produce poor long-term retention.

Y109937.indb 220

10/15/10 11:04:09 AM

Learning From and for Tests • 221

Effects of Initial Recognition Processes on Long-Term Memory There are several ways to make recognition tests more difficult, thus requiring more effort to discriminate between the correct and incorrect answers. In line with the general rule that more effort on an initial test produces better long-term memory, we should be able to find that more difficult initial recognition tests produce better subsequent memory for the recognized items. Whitten and Leonard (1980) manipulated recognition difficulty in two ways. In their first experiment, recognition was made more difficult by increasing the number of alternatives from which the correct answer was to be selected. Students were presented three 12-word lists of nouns at a rate of four seconds per word. Following the presentation of each list, students were given a recognition test consisting of 12 multiple-choice test items. Each test item consisted of one target (old) item and one, three, or seven distractor (new) items to form two-, four-, or eight-choice recognition test conditions, respectively. Assuming that the words in a test item were read from left to right, students would have to read an average of 1.5, 2.5, and 4.5 words to find a target in the two-, four-, and eight-choice test items, respectively. To encourage students to read all the choices in the recognition test, students were instructed to cross out all distractor words in each line of the test, leaving only the target word unmarked. To create a delay before the recall test, the third list’s recognition test was followed by an eight-minute filler task (to judge pictures for “memorability”). Then students were given an unexpected three-minute freerecall test to recall the target words from all three lists. The results are shown in Table 11.1. Even though the greater number of choices required greater effort to select the correct answer, it was hoped that initial recognition accuracy would be nearly perfect and nearly equal in all three initial recognition-test conditions. This outcome was desired so that conclusions about the effects on delayed recall of the number of distractors in the initial recognition test could be made without resorting to conditional analyses. Initial recognition performance was quite high, as measured by the mean proportion of correct recognitions (hits), but by this measure, performance decreased as recognition alternatives increased. This result is consistent with the hypothesis that increasing the number of distractors in multiple-choice items increases the chance that one of those distractors will have a stronger familiarity value than the target. On the other hand, the decreasing initial recognition accuracy shown in Table 11.1 may reflect different false-alarm rates induced by varying

Y109937.indb 221

10/15/10 11:04:09 AM

222 • William B. Whitten II Table 11.1 Performance on Initial Multiple-Choice Recognition and Subsequent Free-Recall Tests as a Function of the Number of Initial Multiple-Choice Alternatives Number of Multiple-Choice Alternatives in Initial Recognition Tests 2

4

8

Initial Multiple-Choice Recognition M proportion correct (hits) .967 M discrimination (Ag) .964

.960 .974

.919 .975

Delayed (Final) Free Recall M proportion recalled targets M proportion recalled distractors

.385 .002

.399 .003

.348 .006

Source: After Whitten, W. B., II, & Leonard, J. M., Journal of Experimental Psychology: Human Learning and Memory, 6, 127–134, 1980, Experiment 1.

numbers of distractors (due, for example, to guessing) rather than real differences in discriminability of target items from distractor items. To measure discrimination ability, a signal detection analysis using Ag, the area under the receiver operating characteristic (ROC) curve, was performed, and it produced no significant differences among the three initial recognition conditions. In contrast to initial recognition, delayed free-recall performance increased significantly with increasing initial recognition alternatives. So whether we conclude that initial recognition decreased, as measured by proportion correct (hits), or that initial recognition stayed constant, as measured by discrimination (Ag), it is clear that requiring students to discriminate among more alternatives on an initial recognition test produced better performance on a subsequent free-recall test. It is worth noting that, even though students had to read more distractor words as the number of initial recognition alternatives increased, there were no differences across conditions in the number of distractors recalled because almost none were recalled. In this experiment, having more multiple-choice alternatives did not lead to confusions, in later free recall, about which words were from the lists and which were not. In their second experiment, Whitten and Leonard (1980) manipulated initial recognition difficulty by varying the extent of semantic relations between target words and distractor words. Three semantically related and three semantically unrelated distractors were selected to combine with each of the 48 target (list) words. The semantic relations were based

Y109937.indb 222

10/15/10 11:04:10 AM

Learning From and for Tests • 223

on conceptual categories such as shapes, weapons, instruments, and so on. The unrelated distractors varied in length and number of syllables, in parallel with the related distractors. Each student’s recognition test consisted of a deck of 48 cards, each with a four-alternative test item, half containing target items with related distractors and half containing target items with unrelated distractors. In this experiment we wanted to lower initial recognition performance to demonstrate that the effects of recognition alternatives are not contingent on an easy recognition test. For that reason, the words were presented in one long 48-word list at a rate of four seconds per word. Immediately following the list, each student was given a randomly ordered 48-card deck for a self-paced recognition test. The recognition test was followed by an eight-minute filler task (to judge pictures for “memorability”). Then students were given an unexpected four-minute free-recall test to recall the target words. The results are shown in Table 11.2. Initial recognition performance was lower than in the previous experiment, as was desired. The level of recognition was the same in the two conditions. This may seem surprising until it is realized that the students were not informed about the type of initial test they would receive, and it has been reported that studying for recall can eliminate the recognition decrement due to semantically related distractors (e.g., Leonard & Whitten, 1983). In contrast, delayed free recall was better for words that had semantically related distractors in the initial recognition Table 11.2 Performance on Initial Multiple-Choice Recognition and Subsequent Free-Recall Tests as a Function of Semantic Relationships Between Initial Multiple-Choice Alternatives and Targets Type of Multiple-Choice Alternatives in Initial Recognition Tests Related

Unrelated

Initial Multiple-Choice Recognition M proportion correct (hits)

.794

.805

Delayed (Final) Free Recall M proportion recalled targets M proportion recalled distractors

.356 .010

.313 .004

Source: After Whitten, W. B., II, & Leonard, J. M., Journal of Experimental Psychology: Human Learning and Memory, 6, 127–134, 1980, Experiment 2.

Y109937.indb 223

10/15/10 11:04:10 AM

224 • William B. Whitten II

test. While it is possible to obtain this effect by mediation of recalled distractors, that explanation is unlikely here because very few distractors were recalled. Implications for Education The results of the two Whitten and Leonard (1980) experiments show that increments in memory, as measured by delayed free recall, are positively related to the extent of memory evaluation occurring during an initial recognition test. In other words, requiring more discrimination during initial recognition produces better subsequent recall. These results were found when the amount of required discrimination was manipulated either by increasing the number of distractors for targets or by increasing the semantic relatedness of distractors to targets. It would seem, then, that we can recommend these conditions of learning for education. However, the situation is not that simple. In more recent experiments, Roediger and Marsh (2005) have found that increasing the number of distractors on multiple-choice tests can decrease later recall, most likely because students recall previously misrecognized wrong answers in place of the unrecognized correct answers. Butler, Marsh, Goode, and Roediger (2006) reconciled these apparently conflicting outcomes by replicating the Whitten and Leonard (1980, Experiment 1) results with single words and replicating the Roediger and Marsh (2005) results with prose material, demonstrating that the type of to-be-learned content can determine the extent of false-answer intrusions on a delayedrecall test. Requiring more discrimination, as is essential to become expert in any skill or topic, should produce positive educational results, but only if incorrect selections, ideas, movements, and so on, are made known during the learning process, so they can be tagged as incorrect and suppressed. Importantly, Butler and Roediger (2008) have found that the negative effects of additional distractors can be eliminated by inserting feedback, in the form of reviewing the correct answers, between a multiple-choice recognition test and a subsequent cued-recall test. Tentatively, it can be suggested that requiring greater discrimination on an initial multiple-choice test can boost long-term recall, but that the correct answers to the recognition test should be provided as feedback following the multiple-choice test and before the long-term recall test. An obvious lesson from these two studies is that when generalizing a laboratory finding (such as learning from tests) to education, we need to consider the whole educational context, the complexity of the to-belearned materials, and other laboratory findings (e.g., effects of feedback) that may apply. This caution notwithstanding, from the whole set

Y109937.indb 224

10/15/10 11:04:10 AM

Learning From and for Tests • 225

of TE-1 experiments reviewed above, we can be confident that, within reasonable limits, making initial tests more challenging can pay dividends in better learning, as measured on subsequent tests.

Learning for Tests (TE-2) Students want to know what will be tested and how. By middle school, students are familiar with a multitude of test types, including multiplechoice, true/false, short-answer, identification, matching, fill-in-theblank, sentence completion, and essay. The fact that students want to know what sort of test they will have indicates that students think that knowing this will somehow help them prepare for the test. In other words, students must think that they know how to study for each type of test and that they can adjust their study strategies accordingly. There is a fairly extensive experimental psychology literature comparing studying for recall and recognition, often motivated more by theoretical issues than by applied issues, but sometimes with an eye toward applied issues. Experiments reported by Leonard and Whitten (1983) are representative of these. Effects of Test Expectations on Content Organization Knowledge In their first experiment, Leonard and Whitten (1983) manipulated students’ test expectations to determine how expecting recall or recognition affects knowledge of item-order information. The motivation for this experiment was to uncover differences in the sorts of associations students may make in the two expectation conditions, theorizing that item-to-item associations would be more likely if studying for recall and item-to-context associations would be more likely if studying for recognition. The hypothesis was that expecting a recall test would cause the students to emphasize learning item-to-item associations to a greater extent than would expecting a recognition test, and if so, students who expect a recall test should know more about the list’s structure (order information) than students who expect a recognition test. To evaluate students’ serial position knowledge, and to determine the acquisition conditions of such knowledge, Leonard and Whitten (1983) used a variation of a simple recognition test that can be called the list reconstruction task. In this task students study a list as if to prepare for either a recall test or a recognition test. They are then given all the words in a random order and are required to assign each word from the list to the box on a response sheet that corresponds to the word’s serial input position during study. The boxes on the response sheet can be viewed as a multiple choice test for each word’s serial position.

Y109937.indb 225

10/15/10 11:04:10 AM

226 • William B. Whitten II

Students were told that they would be shown several lists of words, and that each list would be followed by a memory test of that list. The nature of the memory test was left unknown. The words in each list were presented one at a time, for four seconds each. After each of two 20-word lists, half of the students were given a sheet of paper and freerecall instructions. The other half of the students were given a 60-card deck with one word printed on each card, 20 list words and 40 distractor words, in random order, and were told to mark each card with an O for “old” or an N for “new.” In this way, expectations of half the students were set to expect a recall test, and expectations of half the students were set to expect a new-old recognition test, for the third 20-word list. Instead, after the third list was presented, all students were given the list reconstruction task where they were given 20 randomly ordered strips of paper, each with a word from the list, and were asked to arrange them in the order in which they had been presented by placing them in boxes arranged in a column on a response sheet. The results are shown in Table 11.3. Two measures of order information are reported. The first is the mean percentage of words assigned to their exact serial positions. Students who were led to expect recall assigned significantly more words to their correct serial positions than did students who were led to expect recognition. To evaluate knowledge of order information in a less stringent manner, Spearman rank-order correlations between each student’s ordering of the list items and the presentation order of the items were computed. These correlations were significant for both groups, indicating that both groups had retained information about list structure. However, the correlations were significantly higher for the group that expected free recall than for the group that expected recognition, indicating that students who expected a Table 11.3 Performance on List Reconstruction Task Practice Condition Performance Measure M% Mp

Free Recall 20.15 .60

New-Old Recognition 13.55 .45

Note: The mean Spearman rank-order correlations are between presented order and reconstructed order. M% = mean percentages of words assigned to correct serial input positions. Source: After Whitten, W. B., II, & Leonard, J. M., Journal of Experimental Psychology: Learning, Memory and Cognition, 9, 440–455, 1983, Experiment 1.

Y109937.indb 226

10/15/10 11:04:10 AM

Learning From and for Tests • 227

recall test retained more order information. Clearly, test expectations affected what the students chose to focus on during study, and expecting a recall test resulted in greater knowledge of the content’s organization. Effects of Test Expectations on Recognition Discrimination In another experiment, Leonard and Whitten (1983, Experiment 5) explored the effects of related recognition-test distractors on recognition performance for students who had expected and studied for free recall and for students who had expected and studied for multiplechoice recognition. As for their Experiment 1, Leonard and Whitten theorized that studying for recall should bias students toward item-toitem associations, whereas studying for recognition should bias them toward item-to-context associations. It was further theorized that itemto-item associations would create more specific or distinctive instances of the encoded items, whereas the item-to-context associations would provide a more general encoding related to the item’s core meaning. It was known that semantically related distractors lowered recognition performance after studying for recognition (e.g., Anisfeld & Knapp, 1968; Underwood & Freund, 1968). Leonard and Whitten predicted that this recognition performance decrement would be less if students expected and studied for recall, but then were tested for recognition: The distinctive encodings were predicted to be less susceptible to confusions with related distractors. To test this prediction, an experiment was designed that led half the students to expect recall tests and the other half to expect recognition tests. Test expectations were set by presenting two 24-word lists to all students at four seconds per word. After each of these lists, half of the students were given free-recall tests, and the other students were given four-alternative multiple-choice recognition tests. Following these test-practice lists, all students were presented a 48-word list that was followed by a four-alternative multiple-choice recognition test. On this test, each of 24 list words appeared with three semantically related distractors, and each of 24 list words appeared with three semantically unrelated distractors. The results are shown in Figure 11.2. Overall, recognition performance for words from the third list was better for students who expected a recall test. The typical decrement caused by related distractors was replicated for students who expected a recognition test. As predicted, this decrement was less, and in fact was totally eliminated, for students who expected and studied for a recall test, but then were given a recognition test.

Y109937.indb 227

10/15/10 11:04:10 AM

228 • William B. Whitten II

Recognition Probability

0.90 0.85 0.80

Unrelated Distractors Related Distractors

0.75 0.70 0.65

Recognition Recall Type of Practice

Figure 11.2 Mean multiple-choice recognition probability as a function of type of practice and type of distractor sets. (After Leonard, J. M., & Whitten, W. B., II, Journal of Experimental Psychology: Learning, Memory, and Cognition, 9, 440–455, 1983, Experiment 5.)

Implications for Education In these two experiments, students who expected a recall test, when compared to students who expected a recognition test, had better knowledge of item order, had better recognition accuracy, and showed none of the confusion typically caused by semantically related distractors. It would seem, then, that good educational advice would be to always study as if for a recall test. Applying these findings directly to education, however, is somewhat risky. Lundeberg and Fox (1991) concluded from a meta-analysis of 107 classroom and laboratory studies from 31 publications (including Leonard & Whitten, 1983) that “expecting tests of recall and recognition [may not] provide a useful analog to test expectancy effects involving essay and multiplechoice tests in the classroom” (p. 94). The difficulty, it seems, is that our abstractions tested in carefully controlled laboratory experiments may not generalize to the tests in education, even though they are related in principle. For example, an essay test requires students to recall content, but the results of the essay have much to do with how the content is expressed and related, and such constructions may not be predictable from the ability to recall a set of facts. As another example, the ability to recognize the correct answer on a multiple-choice test may be based on much more than familiarity judgments, and may, in fact, require the student to work a problem or to make an inference. Basic research experiments such as those by Leonard and Whitten (1983), often motivated by theory as much as by potential application, can discover basic processes and can inspire research on analogous applied issues. To have confidence in recommending educational

Y109937.indb 228

10/15/10 11:04:11 AM

Learning From and for Tests • 229

practice, we need to perform confirming experiments using typical educational materials in authentic educational environments. In any case, however, the TE-2 experiments clearly demonstrate that what students learn is affected by the type of test students expect, and this should be taken into account when constructing an educational strategy.

Embedded Implicit Tests The test effects reviewed so far derive from situations where testing is explicit. Many educational activities that are not called tests have embedded in them requirements to recall information or to compare and evaluate information. These requirements are, in a sense, implicit tests that may share many characteristic recall and recognition processes with explicit tests. It follows, then, that implicit tests should have effects on learning that parallel effects of explicit tests. If so, the benefits of tests and test expectations can be magnified by designing educational activities to include a greater number of implicit tests. Guided Cognition of Unsupervised Learning Guided Cognition refers to the use of study tasks that have been designed to engage students in specific, observable, cognitive events that elicit underlying, learning-effective cognitive processes (Whitten, Whitten, & Rabinowitz, 2006). Guided Cognition homework has been shown to result in superior long-term retention of course content over a wide range of ages, abilities, and subject matter (Whitten, Whitten, & Rabinowitz, 2009). There are several possible reasons for these results, but one reason is that the cognitive events designed into Guided Cognition homework tasks probably include a greater number of implicit tests than do typical homework tasks. In their initial work on Guided Cognition, Whitten, Whitten, and Rabinowitz (2006) noted that the learning environment for unsupervised individual learning, typical of performing most homework, is impoverished compared to the learning environment for supervised group learning, typical of a well-run classroom. In the classroom situation, a student’s cognitive processes can be influenced by interactions with other students and with the teacher, but for most homework the student’s cognitive processes are primarily directed by the requirements of the homework tasks. Could learning from homework be made more effective by including in the homework some characteristics of the classroom environment? To explore this question, a form of “reverse engineering” was used to create a new style of homework that would “guide cognition” in ways analogous to the guidance provided by various influences of a classroom

Y109937.indb 229

10/15/10 11:04:11 AM

230 • William B. Whitten II

environment. We began by identifying specific learning activities, or cognitive events, that commonly occur in a typical classroom, and then designed homework tasks to elicit or emulate them. The content-oriented cognitive events included in the experiments were visualize and illustrate, relate to prior experience, brainstorm and evaluate, consider divergent answers, and role play. Whitten, Whitten, and Rabinowitz (2006) reported two experiments that were performed with high school English classes. Experiment 1 included three classes of low- to average-ability seniors (labeled Average Placement). Experiment 2 included two classes of high-ability seniors (labeled Advanced Placement). The experiments were exact replications across ability levels. The subject matter under study was Shakespeare’s Macbeth, Act III and Act IV. Five Traditional homework questions that follow simple question syntax were prepared for Act III. These were straightforward verbal questions about events and characters. Five pairs of Traditional and Guided Cognition homework questions were prepared for five content areas of Act IV. Each Guided Cognition question was constructed to elicit one of the five content-oriented cognitive events and addressed exactly the same content as its corresponding Traditional question. Act III was taught on Monday and Tuesday and was studied with Traditional homework by all classes on Tuesday evening. Act IV was taught on Wednesday and Thursday and was studied with Traditional homework by some classes and with Guided Cognition homework by other classes on Thursday evening. No new material was taught on Friday, and there was no homework assignment over the weekend. On Monday, all classes received a surprise quiz to test content knowledge from both Acts III and IV. The Experiment 1 (Average Placement students) quiz performance is shown in Figure 11.3. All three classes performed nearly identically on the Act III content studied Tuesday evening. In sharp contrast, the experimental groups (Traditional-Guided Cognition classes) performed much better than the control group (Traditional-Traditional class) on the Act IV content studied Thursday evening. Experiment 2 (Advanced Placement students) produced similar results, demonstrating that students from a wide range of abilities benefit from Guided Cognition homework. In these experiments, Traditional and Guided Cognition homework tasks required students to retrieve previously read or studied information (an implicit TE-1), and to process (re-store) the information in various ways. Compared to doing no homework at all, we should expect both forms of homework to increase content knowledge. But what accounts for the greatly enhanced learning from the Guided Cognition

Y109937.indb 230

10/15/10 11:04:11 AM

Learning From and for Tests • 231 70

Percent Correct

65 60

Traditional–Traditional Traditional–Guided Cognition Traditional–Guided Cognition

55 50 45 40 35 30

Tuesday Thursday Homework Days

Figure 11.3 Quiz performance as a function of homework type. (After Whitten, W. B., II, Whitten, S. E., and Rabinowitz, M., Guided Cognition of unsupervised learning, presented at the annual meeting of the American Educational Research Association, San Francisco, CA, April 2006, Experiment 1.)

homework? One source of the Guided Cognition effect is almost certainly from cognitive processing that results in highly organized and uniquely encoded information. Another likely cause is the greater extent to which implicit tests are required in the Guided Cognition homework than in the Traditional homework. It is easy to relate these implicit tests to the various “desirable difficulties” of the learning-from-tests effects, reviewed above. (1) Implicit tests embedded in the Guided Cognition cognitive events tend to be delayed in the sense that they are typically performed after delays of several minutes, or even hours, after the original reading or study of to-be-learned content. (2) These implicit tests are at deep levels of processing in that they always require attending to the semantic level and, in addition, often require higher-order thinking. (3) They frequently require recognizing many related ideas or facts, and then discriminating among them. In general then, these implicit tests are not easy and should be quite effective in promoting learning.

Taking Advantage of Increased Emphasis on Testing Local, state, and federal testing requirements have increased in attempts to measure student performance across widely differing schools; to

Y109937.indb 231

10/15/10 11:04:12 AM

232 • William B. Whitten II

bring student, teacher, and school performance to higher and more consistent levels; and to improve student, educator, and school system accountability. This increased emphasis on testing has created some backlash. Parents are concerned that testing interferes with learning by taking too much time from the school day, and teachers are concerned that they have to follow a narrow curriculum and “teach to the test.” But might there be an opportunity hidden in the increased emphasis on testing? In the next section I propose a test-centric approach to education that acknowledges the benefits of recall and recognition practice, that designs homework to include a greater number of implicit tests, and that motivates students to learn by giving them more information about assessment tests. A Guided Cognition Approach to Test-Driven Instructional Design This chapter has provided a brief review of test-taking effects (TE-1) and test-expectancy effects (TE-2), and has discussed an approach to homework design that emphasizes engagement in learning-effective cognitive events that are rich in implicit tests. In this Section I propose an educational approach that unifies these ideas to take advantage of both classes of test effects and that extends Guided Cognition from homework design to test design, effectively adding another dimension to learning from and for tests. The proposed features of the approach are as follows: 1. As for other instructional design strategies, begin by defining the educational goals—what the student is to know and what the student is to be able to do. 2. Next, design a test that measures accomplishment of those goals and that requires the student to engage in cognitive events that effectively promote learning (e.g., as described for homework design by Whitten, Whitten, & Rabinowitz, 2006, 2009; Whitten, Rabinowitz, Whitten, & Portnoy, 2008). 3. Whenever possible, build into the test “desirable difficulties” such as those reviewed in the section on learning from tests. 4. Design homework that includes learning-effective cognitive events. Such homework would include the sorts of cognitive events that will be on the eventual assessment test, and whenever possible should include elements of recall practice or recognition practice. For example, the cognitive event “relate to prior experience” can be designed to require the student to recall an earlier part of the course content in order to relate it to the current course content. Recalling the earlier content is a subtask

Y109937.indb 232

10/15/10 11:04:12 AM

Learning From and for Tests • 233

within this cognitive event, which also may include additional effective elements such as combining related information in new and memorable ways. As another example, the cognitive event “consider divergent answers” can be designed to encourage the student to recall specifics of the course content, then to recognize similarities and differences, and in so doing can provide elements of recall practice and recognition practice. 5. Give information to the student to establish test expectations. An effective way to do this is to provide alternate versions of the test as models. 6. Assess educational goal attainment with a version of the test that requires engagement in additional instances of the effective cognitive events, thereby providing additional opportunities to learn from the test. This instructional design approach provides several opportunities for beneficial test-taking effects (TE-1). Explicit recall practice and explicit recognition practice can occur when students use practice tests to prepare for an assessment test, and can occur again during the assessment test. Implicit recall practice and implicit recognition practice can occur when study tasks, such as homework, include Guided Cognition cognitive events that require elements of recall and recognition. Beneficial test-expectancy effects (TE-2) can be obtained by clearly communicating test characteristics so that students are encouraged to engage in effective cognitive events and to practice recall and recognition, as appropriate, as they prepare for the assessment test.

Acknowledgments Preparation of this chapter was supported by the Institute of Education Sciences, United States Department of Education, Grant R305A080134.

References Anisfeld, M., & Knapp, M. (1968). Association, synonymity, and directionality in false recognition. Journal of Experimental Psychology, 77, 171–179. Bjork, R. A. (1975). Retrieval as a memory modifier. In R. Solso (Ed.), Information processing and cognition: The Loyola Symposium (pp. 123– 144). Hillsdale, NJ: Lawrence Erlbaum Associates. Bjork, R. A. (1994). Memory and metamemory considerations in the training of human beings. In J. Metcalfe & A. Shimamura (Eds.), Metacognition: Knowing about knowing (pp. 185–205). Cambridge, MA: MIT Press.

Y109937.indb 233

10/15/10 11:04:12 AM

234 • William B. Whitten II

Bjork, R. A., & Whitten, W. B. (1974). Recency-sensitive retrieval processes in long-term free recall. Cognitive Psychology, 6, 173–189. Butler, A. C., Marsh, E. J., Goode, M. K., & Roediger, H. L., III. (2006). When additional multiple-choice lures aid versus hinder later memory. Applied Cognitive Psychology, 20, 941–956. Butler, A. C., & Roediger, H. L., III. (2008). Feedback enhances the positive effects and reduces the negative effects of multiple-choice testing. Memory and Cognition, 36, 604–616. Leonard, J. M., & Whitten, W. B., II. (1983). Information stored when expecting recall or recognition. Journal of Experimental Psychology: Learning, Memory, and Cognition, 9, 440–455. Lundeberg, M. A., & Fox, P. W. (1991). Do laboratory findings on test expectancy generalize to classroom outcomes? Review of Educational Research, 61, 94–106. Roediger, H. L., III, & Marsh, E. J. (2005). The positive and negative consequences of multiple-choice testing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 1155–1159. Underwood, B. J., & Freund, J. S. (1968). Errors in recognition learning and retention. Journal of Experimental Psychology, 78, 55–63. Whitten, W. B., II. (1974). Retrieval “depth” and retrieval component processes: A levels-of-processing interpretation of learning during retrieval. Human Performance Center Technical Report 53. University of Michigan, Ann Arbor, MI. Whitten, W. B., II. (1978). Initial-retrieval “depth” and the negative recency effect. Memory and Cognition, 6, 590–598. Whitten, W. B., II, & Bjork, R. A. (1972, April). Test events as learning trials: The importance of being imperfect. Paper presented at the meeting of the Midwestern Mathematical Psychology Association, Bloomington, IN. Whitten, W. B., II, & Bjork, R. A. (1977). Learning from tests: Effects of spacing. Journal of Verbal Learning and Verbal Behavior, 16, 465–478. Whitten, W. B., II, & Leonard, J. M. (1980). Learning from tests: Facilitation of delayed recall by initial recognition alternatives. Journal of Experimental Psychology: Human Learning and Memory, 6, 127–134. Whitten, W. B., II, Rabinowitz, M., Whitten, S. E., and Portnoy, L. B. (2008, November). Guided Cognition of unsupervised learning: Designing effective homework. Part 3. Presented at the annual meeting of the Psychonomic Society, Chicago, IL. Whitten, W. B., II, Whitten, S. E., and Rabinowitz, M. (2006, April). Guided Cognition of unsupervised learning. Presented at the annual meeting of the American Educational Research Association, San Francisco, CA. Whitten, W. B., II, Whitten, S. E., and Rabinowitz, M. (2009, August). Guided Cognition of unsupervised study increases learning for students from ages 12 to 18. Presented at the 13th Biennial Conference of the European Association for Research on Learning and Instruction, Amsterdam, The Netherlands.

Y109937.indb 234

10/15/10 11:04:12 AM

12

Can Desirable Difficulties Overcome Deceptive Clarity in Scientific Visualizations?1 Marcia C. Linn, Hsin-Yi Chang, Jennifer L. Chiu, Zhihui Helen Zhang, and Kevin McElhaney

Introduction Dynamic visualizations provide a pathway for students to understand science concepts. This pathway offers promise for increasing the accessibility of a range of important science concepts, particularly those that involve cause and effect and emergent phenomena. Computer technologies offer unprecedented opportunities to explore visual technologies for research, assessment, teaching, and learning. To realize the potential of dynamic visualizations, instruction needs to overcome learners’ tendency to overestimate their understanding due to what we call deceptive clarity. In this chapter we explore ways that desirable difficulties can help learners overcome pitfalls associated with the deceptive clarity of visualizations. We define visualizations as interactive, computer-based animations (such as models, simulations, and virtual experiments) of scientific phenomena. The key concept in this definition is interactivity, which permits student-initiated exploration with visualizations. Visualizations make unseen processes visible, such as chemical reactions or planetary motion. They allow students to conduct virtual experiments about 235

Y109937.indb 235

10/15/10 11:04:12 AM

236 • Linn, Chang, Chiu, Zhang, and McElhaney

complex situations, such as global climate change, airbag safety, or home insulation. Visualizations have the potential to illustrate phenomena that are too small, complex, long term, or large scale to be investigated in hands-on laboratories. They can help students link multiple representations, such as symbolic equations, unseen interactions (e.g., atomic, gravitational), and observable phenomena. Embedding visualizations in supportive curriculum materials, including materials that feature desirable difficulties, can improve the impact of the visualization (R. A. Bjork & Linn, 2006; Linn & Hsi, 2000). Supportive materials can make visualizations accessible to learners and broaden participation in science (Boo & Watson, 2001; Burbules & Linn, 1991). They can offer fresh views of complex concepts and stimulate new ways to connect ideas. Accessible visualizations can motivate students to participate in science. For example, when asked what helps them learn science, two thirds of sixth graders chose visualizations over explanations, reading, partners, and teachers (Corliss & Spitulnik, 2008; Figure 12.1). Research concerning the educational value of dynamic visualizations is contradictory and inconclusive, leaving developers and practitioners in disagreement about whether to use visualizations and how best to exploit their apparent value. In a widely publicized series of articles, Tversky (see, e.g., Tversky, Morrison, & Betrancourt, 2002) concludes, “The research on the efficacy of animated over static graphics is not 80% 70% 60% 50% 40% 30% 20% 10% 0%

Visualizations Explanation steps

Reading/ studying

Partner

All of it

None of it

Teacher

Figure 12.1 Student preferences for learning activities. Most students prefer visualizations (N = 145 sixth graders).

Y109937.indb 236

10/15/10 11:04:13 AM

Can Desirable Difficulties Overcome Deceptive Clarity? • 237

encouraging” (Tversky et al., 2002, p. 247). In Education Week Viadero (2007) presented these and contrasting opinions and described them as an unresolved paradox. Kozma (2003) reports limited success using visualizations for chemistry instruction. Recent reviews of the literature report effect sizes for visualizations ranging from –1.5 to +2.3 (Chang, Chiu, McElhaney, & Linn, 2010; Liu et al., 2008). Yet science educators and their students remain convinced of the value of visualizations and take advantage of the increasing power of computers to use them widely. To clarify some of these contradictory findings, we turn to desirable difficulties. We report on how desirable difficulties can improve use of two distinct kinds of visualizations: 1. Visualizations that make unseen processes visible. Visualizations can make atomic-scale phenomena, such as chemical reactions, electricity, and heat transfer, visible. These visualizations illustrate unseen processes by promoting connections to visible phenomena (such as explosions, sparks, or melting) and incorporating graphical or symbolic representations. 2. Visualizations that support inquiry around complex systems. Visualizations can support virtual experimentation and address the goal of inquiry learning. Consistent with the National Science Education Standards (National Research Council, 1996), inquiry-based learning involves systematic experimentation. Visualizations can support inquiry by allowing learners to alter the system and observe the results. Experimenting with visualizations can extend opportunities found in handson experimentation to more complex, large-scale, or dangerous situations. For example, in our pilot work we have studied inquiries about airbag safety. Deceptive Clarity Exploring these visualizations often results in deceptive clarity—a visualization can be so memorable that students become convinced they understand complex processes when they can recall only superficial features of what they have seen. For instance, students often see only bouncing balls when viewing atomic models that designers believe vividly capture processes such as melting or osmosis. Deceptive clarity is especially common when users passively observe visualizations depicting unseen processes because learners lack methods for testing their conjectures. Dick Zare refers to chemistry as “faith-based science” because students are asked to take the scientific assertions about

Y109937.indb 237

10/15/10 11:04:14 AM

238 • Linn, Chang, Chiu, Zhang, and McElhaney

unseen processes on faith. Haphazard or unplanned experimentation with virtual systems may also convince students to simplify the situation. Desirable difficulties can help students recognize when their understanding of a visualization is superficial. For example, we have studied a visualization of an explosion that at first glance depicts slow molecules that bounce around and suddenly speed up. More careful analysis reveals that the reaction starts when one of the reactants spontaneously dissociates. The resultant free radicals attack the other reactant, releasing energy that causes additional dissociations and reactions. By experimenting with different dissociation and activation energies students can gain a deep understanding of chemical reactions (see Figure 12.2). To gain a nuanced understanding of this visualization, students must conduct informative experiments, carefully observe and analyze what they see, then connect this analysis to previous ideas about molecular interactions. Desirable difficulties can help students distinguish productive and unproductive ideas by engaging students thoughtfully in experimentation, focusing learners on testing meaningful investigation questions, and guiding students to uncover the mechanisms that govern the phenomena in question. Chiu used judgments of learning during instruction on chemical reactions to investigate deceptive clarity (Chiu & Linn, in press). Chiu asked students to rate their own understanding of chemical reactions or after viewing a visualization and after viewing the visualization and generating an explanation of the chemical reaction. Chiu found that students

Figure 12.2 Screenshot of a dynamic visualization about chemical reactions.

Y109937.indb 238

10/15/10 11:04:14 AM

Can Desirable Difficulties Overcome Deceptive Clarity? • 239

judged themselves to have better understanding of chemical reactions after viewing the visualization than they did after writing an explanation about the visualization. Detailed analysis of students’ responses to prompts revealed that students overestimated their understanding after viewing the visualization—simply watching the visualization was sufficient to convince students of their own understanding. This complacency might have deterred them from further analysis of the visualization. Generating an explanation led students toward a more accurate assessment of their own understanding. When students were prompted to explain their observations, they identified gaps in their knowledge, leading to a decline in their estimate of their understanding. Overconfidence about one’s own understanding after interacting with a visualization is consistent with the belief of most learners that they are active, visual learners. Chiu’s findings are also congruent with research showing that learners are prone to overestimate their ability to remember even relatively straightforward information (such as names or telephone numbers) unless prompted to test their memories (E. L. Bjork & Bjork, in press). Similarly, learners suffer from an “illusion of explanatory depth” for complex processes, especially for phenomena that can be mentally visualized or animated (Rozenblit & Keil, 2002). By granting access to these traditionally unseen levels through visualizations, learners may believe their superficial perception of balls bouncing around can explain the complex process of chemical reactions, and be lulled into a false sense of understanding. The Knowledge Integration Instructional Pattern Chiu’s findings point to the need for supportive curriculum materials to address the deceptive clarity of visualizations. Students’ comprehension of visualizations is strongly influenced by what they know (Cook, 2006; Kozma, 2003). Students may ignore important information and make unintended interpretations of peripheral details (Ainsworth, 2006; Gilbert, 2007). Accurate interpretations of visualizations depend on connections to relevant disciplinary knowledge, appreciation of the nature of scientific models (Grosslight, Unger, Jay, & Smith, 1991), and fluency with spatial information (Baenninger & Newcombe, 1995). The explanations in the chemical reactions module were successful at helping students assess their own knowledge because they prompted students to articulate connections between the visualization and their prior understanding. When students realized these connections were inadequate to provide a coherent explanation, they were motivated to refine their understanding of the visualization.

Y109937.indb 239

10/15/10 11:04:14 AM

240 • Linn, Chang, Chiu, Zhang, and McElhaney

We draw from the knowledge integration framework to design curricula that helps students use visualizations to achieve coherent scientific understanding. The knowledge integration perspective has been refined in empirical studies for more than 20 years (Davis, 2003; Davis & Krajcik, 2005; Linn, 1995; Linn, Davis, & Bell, 2004; Linn & Eylon, 2006; Quintana et al., 2004) and holds that learners often have incoherent and fragmented ideas. Deep science learning requires students to integrate ideas from multiple perspectives. The knowledge integration framework draws on studies showing that even the rather primitive visualizations possible in the 1980s, when embedded in carefully designed inquiry investigations, had powerful, cumulative effects on integrated understanding (Linn & Hsi, 2000). Research has identified four knowledge integration processes that promote coherent science understanding: eliciting ideas, adding ideas, distinguishing among ideas, and reflecting on ideas (Linn & Eylon, 2006). Many studies show the benefit of eliciting ideas or making predictions (e.g., Gunstone & Champagne, 1990). When students articulate their ideas they can test them against new ideas. Adding new ideas is the goal of nearly every science activity, but visualizations offer a unique source of ideas. Students require carefully designed supporting instruction to help them consider scientific evidence to distinguish productive ideas from unproductive ones among those that they articulate. They also need opportunities to reflect on the alternatives and reorganize the connections among their ideas. Ultimately, students need to link productive ideas with their prior knowledge and experience to achieve complex and durable scientific understanding. The third process in the instructional pattern, distinguishing among ideas, has promise for clarifying the role of desirable difficulties in learning from visualizations. Together these processes make up the knowledge integration instructional pattern.

Scientific Visualizations and Desirable Difficulties Despite recent efforts, evidence for the impact of visualizations as a component of computer-based environments remains contradictory and sparse. Several recent meta-analyses have synthesized studies that have empirical data, use visualizations to improve student learning in science classrooms, and test visualizations with learners (Chang et al., 2010). Though most studies show some impact for visualizations, the results do not offer a coherent answer to the question about when

Y109937.indb 240

10/15/10 11:04:14 AM

Can Desirable Difficulties Overcome Deceptive Clarity? • 241

visualizations work or why they are effective. Some of these studies, however, suggest advantages for desirable difficulties. Researchers have employed diverse methods to examine the benefits of visualizations for science learning. Researchers in psychology and cognition often conduct laboratory studies to compare the impacts of static and dynamic presentations on recall of information. Researchers in science education and educational psychology often conduct classroom studies where they compare the impacts of instruction with or without visualizations. Curriculum designers and cognitive scientists often conduct iterative refinement studies to compare activity designs that use visualizations. We summarize the findings that highlight the ways desirable difficulties can help learners make sense of visualizations. These findings occur mainly in three types of studies: • Laboratory investigations that compare static diagrams to animations • Classroom comparison studies that compare typical to visualization-enhanced instruction • Classroom studies that use a pretest–posttest design Laboratory Investigations Laboratory studies are common in the field of psychology. These studies are often conducted to study learners’ abilities to remember simple ideas rather than understand complex concepts. Many studies focus directly on memory and forgetting (Mayer & Sims, 1994), but some are designed to explore more complex questions that have relevance to science instruction, such as desirable difficulties (R. A. Bjork, 1994). The methods used for these studies limit their applicability to classroom research (Richland, Linn, & Bjork, 2007). Duration of instruction is often measured in minutes. Study designs often include control groups even when they are not expected to benefit learners. At the same time, these studies are valuable because they isolate promising elements of visualizations that can be investigated in classroom comparison studies. Some studies explore aspects of desirable difficulties that have been found to contribute to memory or forgetting in studies of verbal learning, such as massed or distributed practice, generation versus selection, and interleaving of content. Overall, the Chang et al. (2010) meta-analysis supports the assertion that well-designed animations are superior to static diagrams (average effect size, g = 0.19, a positive, moderate effect). However, these studies have uneven results with effect sizes ranging from –0.7 to 1.47 in magnitude. Laboratory studies reveal two characteristics of

Y109937.indb 241

10/15/10 11:04:14 AM

242 • Linn, Chang, Chiu, Zhang, and McElhaney

effective visualizations that suggest how desirable difficulties could improve learning outcomes: (1) the minimization of irrelevant cognitive demand, and (2) a focus on a personally meaningful scientific context. First, animations need to be carefully constructed so that they do not confuse students. For instance, Mayer, Hegarty, Mayer, and Campbell (2005) had learners study animated sequences of static pictures of various phenomena (lightning, waves, car brakes) for a short time (140 seconds). When compared to static visuals, these computer-controlled dynamic visualizations may increase the student’s need to hold previously presented frames in working memory. The irrelevance of such processing to the learning goals might explain why Mayer et al. found no benefit for dynamic visualizations over static pictures. Giving learners control of the animation could test this conjecture, as learners in the study were unable to review the sequence. Other examples of extraneous complexity include inconsistent color choices (Mayer, Heiser, & Lonn, 2001) or unnecessary details, such as kinetic energy and potential energy when just kinetic energy would suffice (Pallant & Tinker, 2004). Though these findings do not provide direct evidence for the benefits of desirable difficulties, they do point to the negative effects of undesirable difficulties. Extraneous complexity does indeed slow down learners, but it also results in a focus on concepts that are tangential to the learning goal, distracting students from the learning goals. In contrast, desirable difficulties aim to help students recognize, revisit, and repair gaps in their knowledge. These studies suggest that desirable difficulties could reduce the complexity of visualizations by focusing attention on relevant information. Second, visualizations can have broader impacts when presented in familiar, meaningful contexts. Visualizations that draw on what students know may enable them to ask more effective questions, connect new information to existing knowledge, and recognize unlikely findings. Instruction is more successful when students know the nature of the elements of the visualization, the rules for the interactions of the elements, and the goals of the instruction (diSessa, 2004; Moreno & Mayer, 2007; Piburn et al., 2005; Stratford, Krajcik, & Soloway, 1998). These findings suggest that desirable difficulties could encourage learners to revisit their existing knowledge and develop a sophisticated understanding of visualizations based on prior ideas. For instance, visualizations could be designed to prompt learners to articulate how aspects of visualizations represent everyday observations. Such prompts would slow down students’ processing of the visualization while strengthening connections between the visualizations and students’

Y109937.indb 242

10/15/10 11:04:14 AM

Can Desirable Difficulties Overcome Deceptive Clarity? • 243

everyday understanding of science. Similarly, a virtual experimentation environment that prompts students to articulate a goal for each experimental trial would slow down the experimentation process but help students conduct each trial with a specific purpose, rather than haphazardly exploring the variables. In summary, laboratory studies support the general idea that visualizations can enhance science understanding. They illustrate the importance of careful design of visualizations. They underscore the need to conduct trial and refinement studies to refine the visualization and ensure that learners have the necessary prior knowledge to benefit from the visualization. Several of the laboratory studies suggest the value of the desirable difficulties framework in guiding design of research on scientific visualizations. Specifically, generation of responses and spaced practice with visualizations offer promise. Classroom Studies Comparing Visualizations to Typical Instruction Chang et al. (2010) synthesized a second group of studies that compared typical instruction to instruction featuring visualizations. Classroom comparison studies are designed to showcase the advantages of visualizations by comparing instructional approaches. Studies investigate the relative impact of curricula with visualizations compared to traditional text-based curricula. Across these studies Chang et al. (2010) found a moderate positive effect of dynamic visualization on learning (average g = 0.5) and consistently positive outcomes. Effects and instructional methods vary widely in these studies, making it difficult to clarify design guidelines or to recommend promising instructional practices. Several of the studies underscore the value of using visualizations to make unseen processes visible and to support virtual experimentation. For example, studies show the benefit of making chemical (Russell & Kozma, 2005; Stieff, 2006) and electrical (Finkelstein et al., 2005) interactions visible. The sparseness of these findings underscores the need for systematic, careful study of the most promising visualizations and suggests the importance of combining results from laboratory and classroom research. Desirable difficulties could further enhance the design of these approaches, as discussed in this chapter. Visualization Studies Using Pretests and Posttests Chang et al. (2010) analyzed a third group of studies that implement visualizations in science classrooms to teach important science concepts and use a pretest–posttest design to assess instructional impact. Overall, a large average effect (g = 1.32) was found for these studies.

Y109937.indb 243

10/15/10 11:04:15 AM

244 • Linn, Chang, Chiu, Zhang, and McElhaney

Such studies confound the instructional design with the visualization. They highlight the need for research that clarifies the instructional contexts that take advantage of visualizations. These studies show that visualizations improve understanding when embedded in supportive instruction, such as guidance on how to conduct experiments using the visualizations. For example, guiding students to critique experiments of peers or experiments of fictitious individuals can be more successful than providing direct instruction on how to conduct virtual experiments (Chang, 2009). We revisit critique as a desirable difficulty at the end of this chapter. Other studies reveal challenges of designing instructional supports for activities such as planning of experiments. McElhaney (McElhaney & Linn, 2008) found that groups without guidance, conducting very few or very many experiments, were less successful than those performing an intermediate number of experiments. However, constraining students to perform a set number of experiments appears to act as an undesirable difficulty by restricting students’ focus to simple tasks and not improving planning. General guidance about the nature of experimentation has mixed effects (Corliss & Spitulnik, 2008; Fretz et al., 2002). Consistent with findings for drawing, desirable difficulties for planning should slow learners and require them to distinguish among alternatives. To support learning from visualizations of unseen processes, classroom studies show that students need guidance to explore the visualization and help link the visualization to related representations such as chemical equations, everyday situations, or narrative descriptions (Linn, Lee, Tinker, Husic, & Chiu, 2006; Russell & Kozma, 2005; Wu, Krajcik, & Soloway, 2001; Zhang, 2007). For example, linking visualizations to personally relevant contexts such as global climate change (Chiu & Linn, 2008), electrical fires (Shen, 2008), or asthma (Tate, 2009) has been used in activities that promote complex scientific understanding. These studies slow down learning by eliciting students’ everyday ideas and prompting students to connect these ideas to their understanding of visualizations. Studies show that visualizations introduced as pivotal cases (Linn et al., 2004) are more successful than other examples (Chiu, 2007; Tate, 2009). Pivotal cases contrast two familiar conditions that constitute a controlled experiment, and the choice of context ensures that learners can convert their experience into a compelling narrative. Pivotal cases also encourage students to make important distinctions between the contrasting conditions, highlighting key features. As discussed in detail below, several studies suggest that one of the principles of desirable

Y109937.indb 244

10/15/10 11:04:15 AM

Can Desirable Difficulties Overcome Deceptive Clarity? • 245

difficulties—encourage students to generate responses—has value for ensuring that students articulate these distinctions and develop sophisticated understanding of unseen processes from visualizations (Chiu, 2007, 2009; Chiu & Linn, 2008; Zhang & Linn, 2008). Overall, the existing classroom studies illustrate the promise of visualization in science instruction. They focus attention on desirable difficulties as a promising way to enhance the impact of visualizations.

Examples from TELS: Generation and Critique The Technology-Enhanced Learning in Science (TELS) center took advantage of cyberlearning (Borgman et al., 2008) to investigate learning from visualizations. Using the Web-based Inquiry Science Environment (WISE) researchers studied the potential value of visualizations, identified promising patterns, and refined visualizationbased curriculum materials. The TELS technologies facilitate design of visualizations, curriculum materials, and assessments. They implement supports for aligning instruction, professional development, and assessment with the knowledge integration framework (Kali, Linn, & Roseman, 2008). They help researchers study how students learn from visualizations by logging student interactions and varying feedback to students or teachers. For example, combining online assessments with logging technology, context-sensitive feedback, and supports for teachers can provide unique insights into what students know and at the same time serve as opportunities to learn for both students and teachers. Cyberlearning environments like WISE can support communities of designers in creating effective materials and teachers in customizing materials for their students (Gerard, Spitulnik, & Linn, in press). TELS examined how two types of desirable difficulties, generation and critique, can improve learning in classroom contexts when used as part of the knowledge integration instructional pattern. Here we summarize these TELS studies. Generation as a Desirable Difficulty Research on desirable difficulties (R. A. Bjork, 1994, 1999) suggests generation as a way to improve learning with visualizations. As noted earlier, when working with visualizations, learners often neglect nuanced information and establish superficial interpretation. Generation activities such as writing explanations or drawing pictures offer students a chance to contrast their ideas represented in the drawings with those in the visualization. If students detect gaps in their knowledge, they may revisit the visualization to seek additional information. Generation

Y109937.indb 245

10/15/10 11:04:15 AM

246 • Linn, Chang, Chiu, Zhang, and McElhaney

could overcome deceptive clarity and help students develop criteria to distinguish among the new ideas and elicited views. Thus, generation serves as a desirable difficulty that slows down learning with visualizations but leads to more nuanced understanding. Recent work by Kornell (Kornell, Hayes, & Bjork, 2009) on the role of testing shows that when students attempt to answer a difficult test question they learn more than when they are able to read the question and the response but do not have to first generate a prediction. In a series of experiments Kornell et al. demonstrated that responding to a test question, even when the answer generated is wrong, can improve learning. Generating an answer appears to serve the role of eliciting ideas consistent with the knowledge integration instructional pattern (Linn & Eylon, 2006). A series of classroom comparisons conducted by Zhang (Zhang & Linn, 2008) sheds light on generation as a way to overcome deceptive clarity. Zhang asked students to make predictions about chemical reactions, to explore a visualization illustrating a chemical reaction, to draw the main states of the chemical reactions’ visualization (see Figure 12.1), and to reflect by explaining their drawings. In the drawing activity, students were asked to draw the state of the chemical reaction before it started, when it had just started, after it had been going for some time, and when it was completed. Many students neglect the roles of bond breaking and bond formation in the reaction process, believing that the reactants instantaneously become the products in the reaction. An effective visualization makes the process of bond breaking and bond formation visible. Zhang hypothesized that making drawings of the various states of chemical reactions would prompt students to test and distinguish ideas by comparing their drawings to the visualization. To do so, students need to revisit the visualization. Students may realize they missed important details in their previous interpretation or failed to distinguish among the new and prior ideas. Thus, they may revisit the visualization for nuanced information. Consistent with desirable difficulties, drawing slows down learning and results in numerous errors on the part of students, but has the potential to promote deeper understanding of chemical reactions. In her first study, Zhang compared a drawing condition with the opportunity to explore the visualization in more depth. She found, consistent with desirable difficulties, that drawing led to more errors in performance but had greater impact on learning than exploring (Zhang & Linn, 2008). To characterize the nature of the drawing condition, Zhang & Linn (2010) created conditions that did not require drawing but did require

Y109937.indb 246

10/15/10 11:04:15 AM

Can Desirable Difficulties Overcome Deceptive Clarity? • 247

students to distinguish their ideas. In two follow-up studies she compared the drawing activity to two types of selection activities (standard and complex). In the standard selection condition, students were asked to select among eight drawings to describe the sequence of events in the chemical reaction. In the complex selection condition, Zhang increased the number of choices for the selection task and added more potentially confusing options, based on the drawings students generated in earlier studies. The outcome measures required students to draw their ideas about chemical reactions, to explain chemical reactions, and to critique narrative accounts of chemical reactions. Comparing the two selection conditions helped clarify the role of distinguishing ideas. The complex condition with dozens of choices resulted in far more student errors and also slowed learning more than the standard condition. It forced students to discriminate carefully among the possibilities. The impact of complex selection versus drawing ideas about chemical reactions turned out to be quite similar. Both tasks slowed learning and students benefitted from the experience. In contrast, for the standard selection condition, the smaller number of choices made selection faster, leaving time for additional exploration. Standard selection did not have as great an impact as the drawing task or the complex selection task. These results support the view that distinguishing alternatives is the main mechanism behind the drawing condition. Both the drawing and the complex selection tasks slowed down learning by about the same amount and resulted in similar numbers of errors during learning. These conditions help clarify the nature of activities that slow learning but improve outcomes. In both cases, students need to discriminate among their prior ideas and those in the visualization, a key process of knowledge integration. Furthermore, the standard selection condition engendered fewer errors and, therefore, resulted in less emphasis on discrimination. These results shed light on the nature of desirable difficulties. Conditions that slow learning and increase errors can lead to better learning outcomes than conditions that result in fewer errors. In Zhang’s studies, even when learning time was equalized, students who made fewer errors learned less. This is consistent with the deceptive clarity of the visualizations. The drawing condition was most successful when students were required to make fine distinctions. These distinctions led to errors that revealed gaps in understanding that exploration of the visualization did not reveal. Furthermore, we believe that this condition would not have been successful without first eliciting students’ ideas. Eliciting ideas set up the opportunity to distinguish among ideas, leading students to contrast their views of instantaneous

Y109937.indb 247

10/15/10 11:04:15 AM

248 • Linn, Chang, Chiu, Zhang, and McElhaney

reactions with their analysis of intermediate states where bonds have been broken, but new bonds have not yet been formed. For example, the drawing condition was especially helpful for students who initially characterized the reaction as an instantaneous process. Drawing motivated learners to make errors and then consult the visualization for additional information. Other initial ideas, such as the view that all of the atoms get separated before recombining, or even that the reaction starts with all the atoms separated from each other, were also clarified by the process of drawing. In both of these situations, drawing the intermediate steps helped learners distinguish their initial ideas from those in the visualization. Students who started with a limited understanding of conservation of matter and failed to keep the number of atoms and molecules the same across their drawings or selections also benefited from drawing or selection. These students tended to neglect unbound atoms after the reaction occurred, leaving them out of their final drawings. For these students, selecting among the drawings was difficult because their prior ideas suggested one thing but the visualization suggested another. Going back to the visualization helped them develop criteria to determine the intermediate conditions. These studies show that drawing and complex selection, when part of the knowledge integration instructional pattern, motivate students to recognize and interpret intermediate conditions and revisit the visualization. Studies that log how students interact with the visualizations support this view (Chiu, 2009; Zhang, 2009). Both Chiu’s and Zhang’s studies show that when students are asked to represent intermediate states, they use the visualization to resolve what is happening in the bond breaking and the bond formation process. These findings help clarify how students make sense of chemical reactions using visualizations. In many research studies, generation is more successful in promoting knowledge integration and understanding than selection (Richland et al., 2007). Thus, this series of studies about chemical reactions is particularly important. The studies show that when students have the choice of selecting among a large set of alternatives, the selection task and the drawing task are similar in impact. In both generation and complex selection, the goal for students is to discriminate among a large space of alternatives. In contrast, when students had a smaller set of alternatives, discrimination was less nuanced. For this task we made selection more effective by increasing the numbers of alternatives. The generation and the selection tasks both ask students to represent a complex temporal order of a chemical reaction: representing the initial state, bond breaking, bond formation, and the final state. Both the complex selection

Y109937.indb 248

10/15/10 11:04:15 AM

Can Desirable Difficulties Overcome Deceptive Clarity? • 249

and drawing conditions reinforce the view that distinguishing among productive and unproductive ideas is a possible desirable difficulty. Both conditions slow learning and increase student errors, and both ultimately improve learning. Critique as a Desirable Difficulty In a series of comparison studies, TELS researchers have compared asking students to experiment with a visualization and asking them to critique experimental designs or inferences generated from the visualization (Chang, 2009; Tate, 2009; Zhang, 2010). Typically students make predictions, explore a visualization, and then critique an experiment or interpretation of the visualization. Critique has some of the same characteristics as selection in that learners must distinguish between their own ideas and the ideas they are critiquing. As with selection, students must develop criteria in order to make these distinctions. Chang (Chang, 2009; Chang & Linn, 2010) contrasted situations where students articulated prior knowledge and then either read about how to conduct virtual experiments or critiqued the experiments of others. After this experience, students conducted their own experiments. The experiments involved both designing a study to investigate questions such as “What is the best insulator?” and observing the molecular level interactions for the different conditions. In Chang’s study, critique involved asking students to evaluate the experiment and conclusions reached by an ostensible peer. Thus, students had the opportunity to analyze the reasoning of a hypothetical student in their class. This activity appeared to engage students in examining their own reasoning and developing more thoughtful analysis of the experimental situation. As in Zhang’s study, the opportunity to critique occurred within the knowledge integration instructional pattern, first using prompts, visualizations, and the experiments of the hypothetical student to elicit and add ideas about heat and temperature. Chang (2009) found that both experimentation and critique were effective. Students made considerable progress in understanding thermal conductivity and equilibrium as a result of using the unit. Both groups performed equally on visualization questions that required students to draw a series of pictures to show their idea of thermal conductivity and equilibrium at the molecular level. However, on experimentation questions that asked students to plan a second trial when the research question and first trial were given, the critique group was more successful than the experiment group. In addition, the effect size of pre–post gain from the critique group was large (1.21), while the effect size from the

Y109937.indb 249

10/15/10 11:04:15 AM

250 • Linn, Chang, Chiu, Zhang, and McElhaney

experiment-only group was moderate (0.63). Chang concluded that combining experimentation and critique was more effective for helping students distinguish among their ideas than experimentation alone. Students developed more sophisticated criteria by doing both experiments and critiques than when only experimenting. Zhang (2010) compared the drawing condition with a critique condition where students critiqued drawings about atomic interactions. The drawings to be critiqued were designed to capture alternative ideas that students typically held, as determined in previous studies. To critique, students needed to analyze ideas represented in the drawings, distinguish their own ideas from these views, and develop criteria for their evaluations. Zhang found that both drawing and critique were effective. Students in both conditions integrated ideas about bond breaking, bond formation, and conservation of mass to explain chemical reactions. Students in the critique group outperformed those in the drawing group on critique questions. In addition, students with sophisticated initial views of chemical reactions (those who predicted some form of bond breaking) benefitted more from critique than students with more rudimentary ideas about chemical reactions (often limited to the changes represented by the chemical equation). Students who were initially able to incorporate bond breaking into their understanding of reactions benefitted most from critique because they had to make more precise discriminations to generate valid critiques. Thus, as with drawing, critique increases opportunities to discriminate among ideas and motivates learners to generate criteria to guide discrimination. Zhang suggests that drawing and critique be used together to improve the ability of all students to distinguish among ideas. Both of these studies suggest that when combined with other generation activities, critique adds value to instruction featuring dynamic interactive visualizations. It appears that critique strengthens the development of criteria for distinguishing among ideas even for sophisticated learners. These studies highlight how generation and critique can help students distinguish and refine criteria for their ideas and increase their ability to benefit from learning with visualizations. Engaging students in critique and generation activities as part of the knowledge integration instructional pattern can slow learning, increase errors, and ultimately improve outcomes. These approaches push students to clarify their ideas, revealing gaps in their understanding. These approaches are compatible with the nature of desirable difficulties. They shed light on the nature of desirable difficulties in the context of complex scientific reasoning.

Y109937.indb 250

10/15/10 11:04:15 AM

Can Desirable Difficulties Overcome Deceptive Clarity? • 251

Conclusions and Implications The research on the effectiveness of visualizations reveals the complex interacting factors that impact success. Like many learning activities, visualizations can be highly effective in some contexts and ineffective in others. A visualization cannot be evaluated separately from its intended role in a learning environment, from its major features, or from the students who use it. Our findings suggest that the deceptive clarity of scientific visualizations may present problems for many learners. Desirable difficulties, such as generation or critique activities, show promise to help learners overcome these illusions of understanding. Deceptive clarity stems from learners’ beliefs about their own learning. When visualizations bring previously “unseen” phenomena to life, learners inflate their sense of understanding. The ability to pause, rewind, and replay visualizations instills a sense of understanding. Students may apply the criteria they use to determine whether they understand movies to the complex phenomena represented in visualizations, believing that visualizations are as easy to understand as a typical movie. Introducing desirable difficulties through generation and critique activities can encourage students to develop more sophisticated views of themselves as learners. By distinguishing and sorting ideas, students can recognize the limitations of their own memories and the complexity of the phenomena they are perceiving. Generation and critique activities combined with timely reflection can help students gain insight into the nature of their own learning. To make instruction featuring visualizations accessible and comprehensible, the research of Chiu (Chiu & Linn, in press), Chang (Chang & Linn, 2010), and Zhang (Zhang & Linn, 2008) shows the benefit of trial and refinement of curricular materials. Teachers also contribute to curricular improvements when they use student work to customize instruction (Gerard et al., in press). In this chapter, we argue that many successful refinements draw on and resonate with research showing how desirable difficulties can improve learning outcomes. The advantages of desirable difficulties arise in the context of embedding visualizations in coherent curriculum materials that implement the knowledge integration instructional pattern (Linn & Eylon, 2006). The pattern involves four processes. First, desirable difficulties succeed when students articulate their ideas so they can discriminate them from other perspectives. Second, the focus of this chapter is using desirable difficulties to help students benefit from new ideas added by visualizations. Third, students need to distinguish productive ideas from unproductive ones,

Y109937.indb 251

10/15/10 11:04:15 AM

252 • Linn, Chang, Chiu, Zhang, and McElhaney

and that is where desirable difficulties come into play. Finally, students need to reflect on the alternatives and reorganize the connections among their ideas (e.g., Brown, 1992). Thus, desirable difficulties enhance the discrimination process of the knowledge integration instructional pattern. Specifically, the findings suggest that desirable difficulties arise when students are required to distinguish between their prior knowledge and alternatives introduced during the course of instruction. These distinctions contrast with many forms of instruction where students must make sense of only new ideas. Desirable difficulties can thus prompt students to consider prior knowledge, increasing the difficulty of learning by requiring students to articulate and evaluate previous ideas before connecting them to new ideas. In this way, distinguishing among ideas is particularly effective when it occurs as part of the knowledge integration instructional pattern because it ensures that students sort out all ideas, new and old, to achieve coherent understanding of visualizations. Incorporating desirable difficulties into instruction featuring visualizations can help students and teachers benefit from visualization and develop the ability to guide their own learning. Brief observation of visualizations without any effort to distinguish the new ideas from existing ideas is insufficient for learning. Many of the laboratory studies that show limited value for visualizations involve very short durations of learning. Effects over short durations may not hold over longer periods of time, and the effects may actually be reversed (Roediger & Karpicke, 2006; Bjork & Bjork, 2009). Students may confuse the ease of interacting with a visualization with the ease of learning from a visualization when durations are short. Adding generation or testing activities can help students realize that they need to do more than control variables or combust molecules. To learn the underlying concept, students need to generate their own ideas, distinguish them from those in the visualization, develop criteria to determine which ideas are more valuable, and reflect on the process. Desirable difficulties in the form of generation activities can also help teachers appreciate the diversity of student ideas when working with visualizations. These rich kinds of assessments can provide teachers with a window on student learning. They can stimulate teachers to find ways to provide guidance and encourage students to distinguish, revisit, and revise their ideas. Cyberlearning environments provide students with opportunities to engage in rich inquiry investigations about current science topics. These environments can incorporate powerful visualizations that illustrate key science concepts that deepen students’ appreciation of socioscientific issues and controversies. Such visualizations succeed when embedded in appropriate instruction, such as the knowledge

Y109937.indb 252

10/15/10 11:04:16 AM

Can Desirable Difficulties Overcome Deceptive Clarity? • 253

integration instructional pattern. The addition of desirable difficulties can strengthen this learning. Future research using detailed logs of interactions with visualizations or experimentation strategies can further inform how instruction affects students’ understanding of visualizations. Future research on visualizations should examine how desirable difficulties can help students connect ideas from visualizations to their everyday understanding of science, making science relevant and meaningful for all learners.

Acknowledgments This material is based upon work supported by the National Science Foundation under grants ESI-0334199 and ESI-0455877. Support from the Center for Advanced Study in the Behavioral Sciences (CASBS) enabled the collaboration with Robert Bjork that led to this work. Inspiration and guidance from Robert Bjork were essential to our success. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. The authors appreciate helpful comments from the Technology-Enhanced Learning in Science research group. We also appreciate help in the production of this manuscript from Jonathan Breitbart and Suparna Kudesia.

Endnote 1. This material is based upon work supported by the National Science Foundation under grants No. ESI-0334199, and ESI-455877. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

References Ainsworth, S. E. (2006). DeFT: A conceptual framework for learning with multiple representations. Learning and Instruction, 16(3), 183–198. Baenninger, M., & Newcombe, N. (1995). Environmental input to the development of sex related differences in spatial and mathematical ability. Learning and Individual Differences, 7(4), 363–379. Bjork, E. L., & Bjork, R. A. (2009). Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning. In M. A. Gernsbacher, R. W. Pew, & J. R. Pomerantz (Eds.), Psychology and the real world: Essays illustrating fundamental contributions to society. New York, NY: Worth Publishers.

Y109937.indb 253

10/15/10 11:04:16 AM

254 • Linn, Chang, Chiu, Zhang, and McElhaney

Bjork, R. A. (1994). Memory and metamemory considerations in the training of human beings. In J. Metcalfe & A. E Shimamura (Eds.), Metacognition: Knowing about knowing (pp. 185–205). Cambridge, MA: MIT Press. Bjork, R. A. (1999). Assessing our own competence: Heuristics and illusions. In D. Gopher & A. Koriat (Eds.), Attention and performance XVII. Cognitive regulation of performance: Interaction of theory and application (pp. 435– 459). Cambridge, MA: MIT Press. Bjork, R. A., & Linn, M. C. (2006). The science of learning and the learning of science: Introducing desirable difficulties. APS Observer, 19, 29. Boo, H.-K., & Watson, J. R. (2001). Progression in high school students’ (aged 16–18) conceptualizations about chemical reactions in solution. Science Education, 85(5), 568–585. Borgman, C. L., Abelson, H., Dirks, L., Johnson, R., Koedinger, K. R., Linn, M. C., et al.. (2008, June 24). Fostering learning in the networked world: The cyberlearning opportunity and challenge. Washington, DC: National Science Foundation. Brown, A. L. (1992). Design experiments: Theoretical and methodological challenges in creating complex interventions. Journal of the Learning Sciences, 2, 137–178. Burbules, N. C., & Linn, M. C. (1991). Science education and the philosophy of science: Congruence or contradiction? International Journal of Science Education, 13(3), 227–241. Chang, H.-Y. (2009, April). Use of critique to enhance learning with an interactive molecular visualization of thermal conductivity. In M. C. Linn (Chair), Critique to learning science. Symposium conducted at the annual meeting of the National Association for Research in Science Teaching, Garden Grove, CA. Chang, H.-Y., Chiu, J., McElhaney, K., & Linn, M. C. (2010). Can dynamic visualization improve science learning? Manuscript in preparation. Chang, H.-Y., & Linn, M. C. (2010). To learn from a molecular visualization: Observe, interact, or critique? Manuscript submitted for publication. Chiu, J. L. (2007, April). Eliciting explanations and self-assessments to support students’ knowledge integration. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL. Chiu, J. L. (2009, April 15). The impact of feedback on student learning and monitoring with dynamic visualizations. Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA. Chiu, J., & Linn, M. C. (2008). Self-assessment and self-explanation for learning chemistry using dynamic molecular visualizations. In International Perspectives in the Learning Sciences: Cre8ting a Learning World. Proceedings of the 8th International Conference of the Learning Sciences (Vol. 3, pp. 16–17). Utrecht, The Netherlands: International Society of the Learning Sciences.

Y109937.indb 254

10/15/10 11:04:16 AM

Can Desirable Difficulties Overcome Deceptive Clarity? • 255

Chiu, J., & Linn, M. C. (In press). The role of self-monitoring in learning chemistry with dynamic visualization. In A. Zohar & Y. J. Dori (Eds.), Metacognition and science education: Trends in current research. London, UK: Springer-Verlag. Cook, M. P. (2006). Visual representations in science education: The influence of prior knowledge and cognitive load theory on instructional design principles. Science Education, 90, 1073–91. Corliss, S., & Spitulnik, M. (2008). Student and teacher regulation of learning in technology-enhanced science instruction. In International Perspectives in the Learning Sciences: Cre8ting a Learning World. Proceedings of the 8th International Conference of the Learning Sciences (Vol. 1, pp. 167–174). Utrecht, The Netherlands: International Society of the Learning Sciences. Davis, E., & Krajcik, J. (2005). Designing educative curriculum materials to promote teacher learning. Educational Researcher, 34(3), 3–14. Davis, E. A. (2003). Prompting middle school science students for productive reflection: Generic and directed prompts. Journal of the Learning Sciences, 12(1), 91–142. diSessa, A. A. (2004). Meta-representation: Native competence and targets for instruction. Cognition and Instruction, 22(3), 293–331. Finkelstein, N., Adams, W. K., Keller, C. J., Kohl, P. B., Perkins, K. K., Podolefsky, N. S., et al. (2005). When learning about the real world is better done virtually: A study of substituting computer simulations for laboratory equipment. Physics Review Special Topics—Physics Education Research, 1, 10103-1–10108. Fretz, E. B., Wu, H.-K., Zhang, B., Davis, E. A., Krajcik, J. S., & Soloway, E. (2002). An investigation of software scaffolds supporting modeling practices. Research in Science Education, 32(4), 567–589. Gerard, L. F., Spitulnik, M. W., & Linn, M. C. (In press). Teacher use of evidence to customize inquiry science instruction. Journal of Research in Science Teaching. Gilbert, J. K. (Ed.). (2007). Visualization in science education. Models and modeling in science education (Vol. 1). London: Springer-Verlag. Grosslight, L., Unger, C., Jay, E., & Smith, C. (1991). Understanding models and their use in science: Conceptions of middle and high school students and experts. Journal of Research in Science Teaching (Special Issue: Students’ Models and Epistemologies of Science), 28(9), 799–822. Gunstone, R. F., & Champagne, A. B. (1990). Promoting conceptual change in the laboratory. In E. Hegarty-Hazel (Ed.), The student laboratory and the science curriculum. New York: Routledge. Kali, Y., Linn, M. C., & Roseman, J. E. (2008). Designing coherent science education: Implications for curriculum, instruction, and policy. New York: Teachers College Press. Kornell, N., Hays, M. J., & Bjork, R. A. (2009). Unsuccessful retrieval attempts enhance subsequent learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35(4), 989–998.

Y109937.indb 255

10/15/10 11:04:16 AM

256 • Linn, Chang, Chiu, Zhang, and McElhaney

Kozma, R. B. (2003). The material features of multiple representations and their cognitive and social affordances for science understanding. Learning and Instruction, 13(2), 205–226. Linn, M. C. (1995). Designing computer learning environments for engineering and computer science: The scaffolded knowledge integration framework. Journal of Science Education and Technology, 4(2), 103–126. Linn, M. C., Davis, E. A., & Bell, P. (Eds.). (2004). Internet environments for science education. Mahwah, NJ: Lawrence Erlbaum Associates. Linn, M. C., & Eylon, B.-S. (2006). Science education: Integrating views of learning and instruction. In P. A. Alexander & P. H. Winne (Eds.), Handbook of educational psychology (2nd ed., pp. 511–544). Mahwah, NJ: Lawrence Erlbaum Associates. Linn, M. C., & Hsi, S. (2000). Computers, teachers, peers: Science learning partners. Mahwah, NJ: Lawrence Erlbaum Associates. Linn, M. C., Lee, H.-S., Tinker, R., Husic, F., & Chiu, J. L. (2006). Teaching and assessing knowledge integration in science. Science, 313(5790), 1049–1050. Liu, L., Uttal, D. H., Marulis, L. M., Lewis, A. L., Warren, C. M., & Newcombe, N. S. (2008). Training spatial skills: What works, for whom, and for how long? Paper presented at the Conference on Research and Training in Spatial Intelligence, Evanston, Illinois. Mayer, R. E., Hegarty, M., Mayer, S., & Campbell, J. (2005). When static media promote active learning: Annotated illustrations versus narrated animations in multimedia instruction. Journal of Educational Psychology, 11(4), 256–265. Mayer, R. E., Heiser, J., & Lonn, S. (2001). Cognitive constraints on multimedia learning: When presenting more material results in less understanding. Journal of Educational Psychology, 93(1), 187–198. Mayer, R. E., & Sims, V. K. (1994). For whom is a picture worth a thousand words? Extensions of a dual-coding theory of multimedia learning. Journal of Educational Psychology, 86(3), 389–401. McElhaney, K. W., & Linn, M. C. (2008). Impacts of students’ experimentation using a dynamic visualization on their understanding of motion. In International Perspectives in the Learning Sciences: Cre8ting a Learning World. Proceedings of the 8th International Conference of the Learning Sciences (Vol. 2, pp. 51–58). Utrecht, The Netherlands: International Society of the Learning Sciences. Moreno, R., & Mayer, R. (2007). Interactive multimodal learning environments. Educational Psychology Review, 19(3), 309–326. National Research Council. (1996). National science education standards: 1996. Washington, DC: National Academy Press. Pallant, A., & Tinker, R. F. (2004). Reasoning with atomic-scale molecular dynamic models. Journal of Science Education and Technology, 13(1), 51–66.

Y109937.indb 256

10/15/10 11:04:16 AM

Can Desirable Difficulties Overcome Deceptive Clarity? • 257

Piburn, M., Reynolds, S., McAuliffe, C., Leedy, D., Birk, J., & Johnson, J. (2005). The role of visualization in learning from computer-based images. International Journal of Science Education, 27(5), 513–527. Quintana, C., Reiser, B. J., Davis, E. A., Krajcik, J., Fretz, E., Golan, R. D., et al. (2004). A scaffolding design framework for software to support science inquiry. Journal of the Learning Sciences, 13(3), 337–386. Richland, L. E., Linn, M. C., & Bjork, R. A. (2007). Cognition and instruction: Bridging laboratory and classroom settings. In F. Durso, R. Nickerson, S. Dumais, S. Lewandowsky, & T. Perfect (Eds.), Handbook of applied cognition (2nd ed., pp. 555–584). New York, NY: Wiley. Roediger, H. L., & Karpicke, J. D. (2006). Test-enhanced learning. Psychological Science, 17(3), 249–255. Rozenblit, L. R., & Keil F. C. (2002). The misunderstood limits of folk science: An illusion of explanatory depth. Cognitive Science, 26, 521–562. Russell, J., & Kozma, R. (2005). Assessing learning from the use of multimedia chemical visualization software. In J. Gilbert (Ed.), Visualization in science education (pp. 299–332). London: Kluwer. Shen, J. (2008, March 27). Connecting atomic models and observations to explain static electricity. Paper presented at the annual meeting of the American Educational Research Association, New York. Stieff, M. (2006). Increasing representational fluency with visualization tools. In Proceedings of the 7th International Conference on Learning Sciences (pp. 730– 736). Bloomington, IN: International Society of the Learning Sciences. Stratford, S. J., Krajcik, J., & Soloway, E. (1998). Secondary students’ dynamic modeling processes: Analysing, reasoning about, synthesizing, and testing models of stream exosystems. Journal of Science Education and Technology, 7(3), 215–234. Tate, E. D. (2009, April 16). The impact of the asthma curriculum on student’s integrated understanding of biology. Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA. Tversky, B., Morrison, J. B., & Betrancourt, M. (2002). Animation: Can it facilitate? International Journal of Human-Computer Studies, 57(4), 247–262. Viadero, D. (2007). Computer animation being used to bring science concepts to life: Evidence of learning gains remains sparse. Education Week, 26(26), 12–13. Wu, H.-K., Krajcik, J., & Soloway, E. (2001). Promoting understanding of chemical representations: Students’ use of a visualization tool in the classroom. Journal of Research in Science Teaching, 38(7), 821–842. Zhang, Z. (2007, April). Using scaffolded visualizations to support student understanding of energy concepts at molecular and macroscopic levels. Presentation at the annual meeting of the American Educational Research Association, Chicago. Zhang, Z. (2009). Drawing and selection activities to improve learning using visualizations. Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA.

Y109937.indb 257

10/15/10 11:04:16 AM

258 • Linn, Chang, Chiu, Zhang, and McElhaney

Zhang, Z. (2010). Exploring drawing and critique to enhance learning from visualizations. In Learning in the Disciplines. Proceedings of the 9th International Conference of the Learning Sciences (Vol. 2, p. 234). Chicago, USA: International Society of the Learning Sciences. Zhang, Z., & Linn, M. C. (2008). Using drawins to support learning from dynamic visualizations. In International Perspectives in the Learning Sciences: Creating a Learning World. Proceedings of the 8th International COnference of the Learning Sciences (Vol. 3, pp. 162–162). Utrecht, the Netherlands: International Society of the Learning Sciences. Zhang, Z., & Linn, M. C. (2010). How can selection and drawing support learning from dynamic visualizations? In Learning in the Disciplines. Proceedings of the 9th International Conference of the Learning Sciences (Vol. 2, 165–166). Chicago: International Society of the Learning Sciences.

Y109937.indb 258

10/15/10 11:04:16 AM

13

Desirable Difficulties and Studying in the Region of Proximal Learning Janet Metcalfe

Introduction The desirable difficulties perspective (E. L. Bjork & Bjork, 2011; R. A. Bjork, 1988, 1992; Bjork & Bjork, 2006; R. A. Bjork & Linn, 2006) and the region of proximal learning framework (e.g., Kornell & Metcalfe, 2006; Metcalfe, 2002, 2009; Metcalfe & Kornell, 2005) appear, at least on the surface, to be at odds. The desirable difficulties perspective says that individuals should make things hard on themselves (but in a good way). Instructors should not spare the learners, but instead they should challenge them and make learning difficult. People should embrace difficulties, because it is through those difficulties that long-term learning occurs. The term desirable is added to difficulties, but even so the message is that learning should be a challenge. The region of proximal learning framework proposes that learning is optimized by having the learner study materials that are not very difficult, given the individual’s current state of learning. Indeed, they should be the easiest possible as-yet-unlearned items—just beyond what the learner has already fully mastered, but not much beyond. Too much difficulty, within this framework, is seen as maladaptive and potentially disheartening. Both of these views are intuitive. The desirable difficulties perspective capitalizes on the adage “no pain, no gain” and on the pervasive 259

Y109937.indb 259

10/15/10 11:04:17 AM

260 • Janet Metcalfe

work ethic emphasizing the moral value of hard work and diligence that many cultures claim for their own. The region of proximal learning framework resonates with the developmental philosophies of Piaget (1952) and Vygotsky (1987), including the idea that there is a zone of proximal development that is just beyond what the child is capable of easily on his or her own, in which he or she could learn via scaffolding. It also conforms to the transitional learning stage in Atkinson’s (1972) Markov model, in which items that are not yet permanently learned, but rather are in a state of being almost learned, are those on which the learner should concentrate. It takes seriously the so-called labor in vain effect of Nelson and Leonesio (1988) whereby people may spend a great deal of time selectively working on the most difficult items, but this extra time and effort is without payoff. It results in no memory benefit. In the world of education, the region of proximal learning framework resonates to the notion of “just right” books for children in which their own reading level is assessed and then they are given books very close to their own skill level. Allowing children to read at their own level is thought to promote fluency and faster, not slower, progress. It is also thought to make reading more enjoyable and less frustrating. The adage that comes to mind for the region of proximal learning framework is that it is sensible to first take the “low-hanging fruit.” The apparent conflict between the desirable difficulties and the region of proximal learning views does not do justice to either framework, however. Both are more nuanced than such a simplistic overview would suggest. The goal of this chapter is a clarification of where the two frameworks are in agreement, and where there are real differences. One issue, at the heart of both frameworks, is the extent to which the learner is actively involved in his or her own learning. This sense of engagement may be the most critical marker of both whether the person is operating within his or her own region of proximal learning and whether the amount of difficulty he or she is experiencing is the desirable amount. Both frameworks, it is argued, seek a level of challenge at which learners remain engaged: not so easy that they are bored, but not so difficult that they are overwhelmed. There are four domains that Bjork and his colleagues have specified as ways in which learning should be made difficult. The desirable difficulties framework recommends that (1) the learner should engage in retrieval practice (or self-generation or testing), (2) feedback should be reduced, (3) practice should be spaced, and (4) study of different topics should be interleaved. The region of proximal learning framework agrees on two of these and disagrees on the other two. The two recommendations on which the region of proximal learning and the

Y109937.indb 260

10/15/10 11:04:17 AM

Desirable Difficulties and Studying in the Region of Proximal Learning • 261

desirable difficulties framework are in agreement are those referring to retrieval practice (i.e., self-generation or testing) and spacing of practice. Although the conclusions on these two are the same, the reasoning leading to the convergent conclusions is different. The recommendations on which the two frameworks differ are feedback, which the region of proximal learning framework advocates but the desirable difficulties framework abjures, and interleaving, which the desirable difficulties framework for the most part advocates, but about which the region of proximal learning framework has reservations. Here, I shall discuss each of these four recommendations and review empirical findings that bear on each. Before turning to the specifics of these four recommendations, it is worth noting that the desirable difficulties framework and the region of proximal learning focus on different concerns. The desirable difficulties framework is process oriented. The underlying assumption is that making the cognitive processing in which the individual engages as difficult as possible is key to improving memory. Difficult encoding or retrieval is thought to etch in retrieval routes, making the items memorable. Demonstrations, such as those of Benjamin, Bjork, and Schwartz (1998), in which the items that people answered with the most difficulty were later found to be most memorable, illustrate this point. In contrast, the region of proximal learning framework has focused more on the content of the materials, and the strategies that people use to ensure that easily learned materials—but those that will not be remembered without at least some additional effort—are not, inadvertently, overlooked. Of course, content and process are intertwined, and so the processing needed for simpler materials may, itself, be easier. Items that require more extensive and effortful processing are, themselves, more difficult items. Furthermore, both of these frameworks point to a middle ground. The term desirable is key in the desirable difficulties framework—very difficult is not desirable. And the region of proximal learning framework does not say that the learner should study the easiest items. Those items will have been learned already, and hence are not in the person’s region of proximal learning. Any item that is not yet learned is not very easy.

Retrieval Practice Both the desirable difficulties perspective and the region of proximal learning framework agree that the kind of focused attention that is rallied in the service of retrieval practice (R. A. Bjork, 1988) gives rise to excellent learning results. The region of proximal learning framework

Y109937.indb 261

10/15/10 11:04:17 AM

262 • Janet Metcalfe

is entirely pragmatic on how to study: Any process or strategy that helps learning is embraced. The desirable difficulties framework sees retrieval attempts as a difficulty of just the right degree to be desirable. Hence, there is complete agreement on this first of the domains emphasized in the Bjork (2006) and Bjork and Linn (2006) framework. The memory enhancement seen with self-generation, the benefits attributable to retrieval practice, and the learning enhancement observed with testing appear to be different faces of the same diamond. They all involve the engaged effort of the individual to actively come up with the answer to a question. Retrieval practice is the more generic term, but it is plausible that both generation effects and testing effects benefit memory as a result of a common mechanism. Whether the reason for the memory enhancement is that retrieval practice entails a particular optimal amount of difficulty, or because the person accords preferential treatment and perhaps attentional resources to materials and events in which the self is actively engaged, is unknown. Whatever the cause, the memory-enhancing effects of generating are robust. The generation effect was first detailed by Slamecka and Graf (1978), in a paper that compared memory for words that were generated by the participants themselves to words that were simply presented and had to be read. Memory performance in the generate condition was superior— leading to the idea that self-generation was a memory tonic. A metaanalysis of studies on the generation effect in the nearly three decades since this original paper was conducted by Bertsch, Pesta, Wiscott, and McDaniel (2007). They investigated 86 studies that included 445 measures and a total of 17,711 participants. The generation superiority resulted whether recognition, cued recall, or free recall was the criterion measure, whether learning was intentional or incidental, whether the design was between participants or within participants, and whether the lists were blocked or random. Generation resulted in better memory for both older adults and younger adults. It resulted in better memory when the materials were numbers or words, though the beneficial effect was smaller when the materials were nonwords, and, alone among manipulations, anagrams showed no effect of generation. There was a beneficial effect of generating regardless of whether the generate rule was rhyme, association, category inclusion, sentence completion, calculation, synonym generation, word fragment completion, antonym generation, or letter rearrangement, and whether the participants were asked to generate a whole target or only part of a target. The generation effect occurred when the lists were short, medium, or long. It obtained at all test delays. And finally, it occurred when the generation task was

Y109937.indb 262

10/15/10 11:04:17 AM

Desirable Difficulties and Studying in the Region of Proximal Learning • 263

easy, of moderate difficulty, or difficult. Indeed, difficulty seemed to have no impact. Thus, if what self-generation is doing is inducing difficulty, it is a desirable difficulty. But it is not clear that the benefits of self-generated retrieval are due to the difficulty rather than some other aspect of the task, especially since the difficulty of the generation task itself had no effect. There is no dispute, however, about the robustness of the data. In an interesting variation on the theme of generation effects, Kornell, Hays, and Bjork (2009) showed that a memory facilitation of the correct answer resulted from attempted retrieval even when the person generated a wrong answer. There was, of course, a caveat in this study: The person needed to be given feedback. The benefits of feedback will be detailed shortly. In this case, it was a necessity. There is now abundant evidence that test taking, like generation, results in memory benefits (Butler & Roediger, 2007; McDaniel, Anderson, Derbish, & Morrisette, 2007; McDaniel & Fisher, 1991; McDaniel, Roediger, & McDermott, 2007; Roediger & Karpicke, 2006a, 2006b). Although the testing effect applies not only to recall tests but also to recognition tests (though not so robustly), larger effects of testing are seen in recall testing, probably because people more reliably generate in that situation. Although there has been little deep theoretical understanding or even speculation concerning the reasons for either the generation effect or the testing effect, it seems likely that both have something to do with the fact that the self is actively involved, rather than because of some optimal degree of difficulty.

Feedback The second way in which learning can be made difficult in what is thought to be a desirable way, within the desirable difficulties framework, is by reducing feedback (Bjork & Linn, 2006). Unlike the use of generation or testing, in which there was agreement between the desirable difficulties and the region of proximal learning framework, in the case of feedback, the two frameworks differ. Providing feedback, especially for incorrect responses, but also for correct responses that may not yet be fully mastered and which the person may not realize he or she has gotten correct, is recommended in the region of proximal learning framework. Reducing feedback in such cases is seen as a lost opportunity for learning—not desirable. There are several cases, in the motor learning literature, where not getting (or reducing) a certain kind of feedback, called knowledge of results (KR), benefited performance (Winstein & Schmidt, 1990; Wulf &

Y109937.indb 263

10/15/10 11:04:17 AM

264 • Janet Metcalfe

Schmidt, 1989). It is not clear what to make of these results. KR consists of seeing a line graph of one’s own movement together with a graph of the correct movement, immediately after having tried to enact the correct movement. It is not entirely clear whether KR is like correct/incorrect feedback (which has no beneficial effect in verbal memory tasks) or corrective feedback (which has consistently shown benefits to memory), or, indeed, whether it is simply a distraction. It is also not clear whether results about motor movements generalize to verbal tasks, regardless of the type of feedback. Despite these findings in the motor learning literature indicating that reducing feedback may sometimes help, I have been unable to find any cases where decreasing or eliminating corrective feedback helps verbal learning. There is, by contrast, a large literature on the beneficial effects of feedback (Anderson, Kulhavy, & Andre, 1971; Butler, Karpicke, & Roediger, 2007, 2008; Butler & Roediger, 2008; Kulik & Kulik, 1988; Lhyle & Kulhavy, 1987; Metcalfe & Kornell, 2007; Metcalfe, Kornell, & Son, 2007; Pashler, Cepeda, Wixted, & Rohrer, 2005). When memory for the correct answer is compared on what were previously incorrect responses that have been given corrective feedback to those that have been given no feedback, the former are always better than the latter. For example, Metcalfe and Finn (accepted) investigated the correction of incorrect responses to general information questions. Final performance without feedback was on the floor, at around 4%. With corrective feedback, however, performance increased, sometimes by as much as 75%. In Butler et al.’s (2008) experiments, with items not given corrective feedback, later test performance was .41 in one experiment and .47 in another (reflecting the fact that many of the initial answers were initially correct, of course). When participants were given feedback, performance increased to .87 and .83, respectively. Pashler et al. (2005) have shown that feedback needs to be corrective to be effective: The correct answer needs to be given, not just a statement of whether the answer was correct or not. Rarely is simply telling the person that he or she was wrong sufficient to allow him or her to correct his or her error. Nor is telling him or her that he or she was right enough to make him or her more right the next time. The one exception to this general rule occurs when people do not know that they were correct. Although there is usually a very high correlation between responses on one test and responses on the next, in the case of lowconfidence correct responses, people often fail to produce the correct response on the subsequent trial unless they are told that it was correct on the first test (Butler et al., 2008). This is a special case in which mere correct/incorrect feedback has an informative effect. Providing

Y109937.indb 264

10/15/10 11:04:17 AM

Desirable Difficulties and Studying in the Region of Proximal Learning • 265

correct/incorrect feedback followed not by the correct answer itself alone, but by the chance to make a second guess on a multiple choice test (that included the correct answer, of course), also resulted in some improvement in performance (Metcalfe & Finn, accepted). Usually, though, when people are wrong they are wrong because they do not know the answer. Unless told what the answer is, they flounder. Although there is broad consensus that feedback needs to be given to enhance memory, when the feedback needs to be given is more controversial. There are a number of cases in which corrective feedback that is given at a delay is more effective at improving memory than is feedback given immediately. For example, Guzman-Muñoz and Johnson (2007) showed, in a study investigating memory for geographical representations, that delayed feedback resulted in a more laborious acquisition, but better eventual retention, than did immediate feedback, on a test of the location of individual places. This entire pattern fits with the desirable difficulties framework very nicely. However, the delayed feedback was also more informative than the immediate feedback, in this study, because it involved seeing an entire map (including the relations among to-be-learned places), rather than just the location of individual places. As the authors themselves suggested, the result might have been obtained not because of the delayed feedback per se, but rather because the beneficial configural information was more salient in the delayed than in the immediate feedback condition. Schooler and Anderson (1990) also found an advantage for delayed feedback in a study on learning of a computer language, LISP. In their study, learners who were given immediate feedback went through the original learning trials more quickly than students given delayed feedback. Those given immediate feedback were slower on the criterion test problems, however, and made more errors. The authors suggested that the detrimental effect of the immediate feedback, in the situation they studied, resulted because the feedback competed for the working memory resources that were needed to accomplish the task itself, resulting in a disruption of compilation, within the ACT theoretical framework. However, a number of experiments (see Butler et al., 2007, Kulhavy, 1977; Kulhavy & Anderson, 1972) indicate that even disregarding the benefit that the delayed feedback may provide in allowing a coherent overview of the to-be-learned materials, or the disruptive effect that immediate feedback might have because it distracts from the task at hand, delaying feedback is often beneficial. As Kulhavy and Anderson have noted, studies conducted in the laboratory have tended to show a delayed feedback advantage. However, many of these studies controlled total time. They thereby forced the last presentation of the correct

Y109937.indb 265

10/15/10 11:04:17 AM

266 • Janet Metcalfe

answer to be closer to the test in the delayed than in the immediate feedback condition. Clearly an advantage because a shorter retention interval will ensue. However, even controlling for the lag to test, advantages for delayed feedback are sometimes found. Controlling for lag to test, Metcalfe, Kornell, and Finn (2009) found that delayed feedback resulted in better vocabulary learning with grade school children. The delayed feedback advantage was not significant with college students, however. This difference in the results might have come about because children are different from adults, of course. However, the overall level of performance was also higher for the grade school children than for the adults (the materials, of course, were different). It is possible that whether immediate or delayed feedback is more effective depends on the person, the person’s age, the level of recall, and the number of errors. These possibilities need to be explored further, since the determination of when it is optimal to give feedback may be a factor having a considerable impact on memory. These results, overall, suggest that introducing a delay in feedback may, at least under some circumstances, be a desirable difficulty. Finally, the region of proximal learning model points to the fact that feedback is particularly important for certain items that are almost learned. These are the items that are supposedly in the person’s region of proximal learning, and are also items that Butterfield and Mangels (2003) have called metacognitive mismatch items. These are the items on which the person thought he or she was right, but in fact was wrong (high-confidence errors), or on which he or she thought he or she was wrong, but in fact was right (low-confidence corrects). Butterfield and Metcalfe (2001, 2006), Butterfield and Mangels (2003), Fazio and Marsh (2009), and Metcalfe and Finn (accepted) have all shown that these high-confidence errors are very easily corrected; indeed, they are hypercorrected provided that the participant is given corrective feedback. A small amount of feedback converts a highly confident error into a well-entrenched correct response. Similarly, Butler et al. (2007) have shown that the other kind of metacognitive mismatch, the lowconfidence correct responses, will often show up wrong on a subsequent test. This tendency, though, is easily converted to stable correct responding when those initially low-confidence correct responses are bolstered by feedback.

Spacing Both the desirable difficulties and the region of proximal learning framework agree on the benefit of spaced practice over massed practice.

Y109937.indb 266

10/15/10 11:04:17 AM

Desirable Difficulties and Studying in the Region of Proximal Learning • 267

Interestingly, though, both frameworks also agree that there are some particular conditions under which massed practice is indicated. The Landauer and Bjork (1978) model of expanding retrieval practice recommends that at the beginning of learning, repetitions of a to-belearned pair should be close together in time, to ensure that when the cue is presented alone, the target will be correctly retrieved. Then the spacing can be increased somewhat, as long as the target continues to be correctly retrieved. As learning becomes more solid, the spacing can be further increased. The limiting factor in this model is that, unless feedback is given, the answers participants generate need to be correct for memory to be strengthened. The reason the desirable difficulties framework advocates spacing (but with the caveat that retrieval must be correct) is that spaced retrieval is more difficult than massed retrieval, and so it helps memory performance more. Expanding retrieval practice, rather than simply maximizing spacing, is thus advocated (Landauer & Bjork, 1978; Schacter, Rich, & Stamp, 1985; Storm, Bjork, & Storm, 2010; c.f., Balota, Duchek, Sergent-Marshall, & Roediger, 2006; Carpenter & DeLosh, 2005). The region of proximal learning framework also advocates spaced practice, in general, except when full learning or full encoding has not occurred on the first practice. Then the region of proximal learning framework proposes that the learner should stay with the current item until encoding is complete, rather than flitting to other items. The concern in this model, and the reason that continued study on a single item is sometimes advocated, is with learning/encoding/comprehension at the time of study. The spacing predictions in the region of proximal learning framework result from its stop rule, which determines when the person should perseverate and when he or she should stop studying one item and turn to another. The rule in the model is that the person should turn to another study item when the learning on the first item is no longer showing progress. This is given by the learner’s judgment of the rate of learning (jROL). When this rate approaches zero, learning on the current item is no longer productive, and the learner should turn to other items. This lack of learning may come about because the item being studied has already been sufficiently learned (or in the case of very difficult items, because the item is intractable and no progress is in evidence). Such a stop rule fits well with the idea that some items can be quickly and easily learned, or at least learned to the point that no further efforts will be beneficial at time t. Other items take more time to reach a learning asymptote. Indeed, a pervasive finding in the literature is that people study items with low judgments of learning (JOLs; i.e., the subjectively difficult items) longer than items with high JOLs (the

Y109937.indb 267

10/15/10 11:04:18 AM

268 • Janet Metcalfe

subjectively easy items; see Son & Metcalfe, 2000, for a summary of the literature on this pervasive correlation, and see Dunlosky & Metcalfe, 2009, for discussion of these issues). In the case that the person’s jROL has reached an asymptote and no further learning is occurring, the person should turn to other items that may be more yielding, rather than continuing to study an item that has already been learned to the maximum possible at that moment and for which no further learning gain can be expected, at least immediately. This stop rule results in the spacing prediction. Recently, Metcalfe and Jacobs (2010) have proposed an analogy between the strategy that results from the stop rule in the region of proximal learning framework and animal foraging. The analogy may be useful in allowing us to think about massed and spaced practice. The idea is that a person studying an item is like a hummingbird or a bee, say, taking nectar from a flower. The person should stay until he or she has extracted as much usable memory information out of the item as possible as the bird or bee should stay until it has extracted as much nectar out of the flower as possible—only then turning to another item or another flower. Once the source has been depleted, it is a good idea to look elsewhere since there will be no further (immediate) gain from staying on an item or flower whose nectar is depleted. But the nectar will replenish, and it will be advantageous to come back to that flower then. How long does it take to learn or understand the items (or to take the nectar offering)? That will, of course, depend upon the items, and their difficulty or complexity (or the richness of the source). Whether spacing or massing (which is continuing to stay with the current item) is more advantageous depends on where the learner is in the process of nectar extraction. In experimental situations this will depend upon the presentation rate. For verbal paired-associate learning, people may not have encoded what they need to form a deep and useful encoding that will help their later memory within 1/2 s, especially if the materials are relatively difficult. But most likely they will have done so by the time they have studied it for 5 s, depending upon the materials, of course. If there are mediators, in the materials, but they are difficult to discern (such as in a Spanish–English pair, e.g., vodevil–music hall), then massing practice at fast presentation rates may allow learners the time needed to pick up on the mediators that they need to comprehend the meaning. Under such conditions, massed practice may produce an advantage over spaced practice. Metcalfe and Kornell (2005) found that massing was more effective than spacing with medium-difficulty Spanish–English pairs such as that given above, but only if the presentation rate was very short. Otherwise, with either easier materials or a

Y109937.indb 268

10/15/10 11:04:18 AM

Desirable Difficulties and Studying in the Region of Proximal Learning • 269

slower presentation rate, spacing resulted in superior memory. Notably, Nancy Waugh (1970), in one of the first systematic studies of the spacing effect, reported the same presentation rate interaction: Spacing was advantageous only at slow presentation rates. The desirable difficulties perspective, as captured in the Landauer and Bjork (1978) expanding retrieval practice model, recommends massing practice of poorly learned items so that they will be successfully retrieved. Only then should they be spaced. The region of proximal learning framework also recommends massing under some specific conditions, where initial learning or encoding has not been completed during the first presentation.

Interleaving The last of the difficulties thought to be desirable is interleaving. The desirable difficulties framework recommendation is that, like spacing, interleaving may make study more difficult for the learner initially, but that difficulty will produce better long-term learning. Note, however, that Bjork (2006), in his Psychology of Learning and Teaching Conference keynote address, said, “Desirable difficulties are desirable because responding to them successfully engages processes that support learning, comprehension, and remembering.” Despite the fact that interleaving is usually cited as a desirable difficulty, if comprehension is impaired by interleaving, interleaving may be undesirable. Interleaving operates at a superordinate level of analysis rather than at the item level, and so is slightly different from spacing. A coherent structure of some sort is assumed to be fragmented by interleaving. If there were multiple members of the same category, for example, and several different categories to be learned, the question is whether it is better to stay within one category and complete study on, say, all of the category members before turning to the members of another category (blocking practice), or to switch and alternate among categories (or interleave). Similarly, interleaving could apply to stories and narratives. Is it better to complete one story before turning to the next and the one after that (massing practice of each), or to read a bit of one, then turn to the next and the one after that, reading bits of each, before coming back to the first one? The answer given by the region of proximal learning framework is that it depends upon whether the second and third category exemplars (or story events) are providing new information, and hence promoting learning, or whether they are redundant with the exemplars that have just been presented, and therefore offer no immediate opportunity for

Y109937.indb 269

10/15/10 11:04:18 AM

270 • Janet Metcalfe

learning. The same rules apply as applied for whether massed or spaced practice would be more advantageous. An early example of learning that was either blocked (massed) or interleaved was given by Kurt and Hovland (1956) using category exemplars that were geometric patterns that varied on four dichotomous properties. The participants had to learn the categories, and these categories were determined by rules. After each exemplar was presented participants were told the category name, and from these they were to infer the rules. The results showed that this rule-determined categorization was better—though only marginally—when the exemplars from the same concept were presented in succession (blocked), rather than being interleaved. Presumably, blocking allowed the participants to infer the complex rule that determined category membership across exemplars. Having alternate categories interleaved was a distraction. A similar modern experiment showed the same results, with rule-driven artificial categories (Garcia, Kornell, & Bjork, 2009). Along similar lines, Gagne (1950) showed that learning was faster and fewer errors were made when three complex stimuli that were highly similar to one another in a list of 12 items (composed of four groups of such highly confusable items), but that had to be discriminated, were grouped together rather than interleaved. The grouping allowed the confusions between similar items to become more evident, and the learning to proceed at a more rapid pace. In these studies, blocking rather than interleaving was advantageous. Blocking rather than interleaving was also advantageous in narrative learning. In such cases, interleaving may be harmful because it interferes with rather than supports deep comprehension. Mandler and DeForest (1979) presented adults and children with two stories in either a blocked or interleaved fashion. All of their participants were able to recall the stories in their canonical form, even if they had been presented in the interleaved form. They had difficulty maintaining or remembering the interleaved order, suggesting that the canonical order was more natural, and providing support for the idea that a grammarlike story schema is used in text comprehension. When changes were made in the presentation order of such a canonical structure, even with textual markers to indicate the correct placement of the displaced events, recall errors resulted (Mandler & Goodman, 1982). This finding, again, points to the idea that a story structure may be part of the basis of text understanding, and that interleaving may hurt comprehension. Finally, when Mandler (1978) directly compared both the quantity and quality of recall in the same two stories that had either been presented in their canonical form or interleaved with one another, it was found

Y109937.indb 270

10/15/10 11:04:18 AM

Desirable Difficulties and Studying in the Region of Proximal Learning • 271

that interleaving hurt memory. Presumably, the availability of the story schema, and the coordination of that schema with the incoming information in a coherent way, was important for the deep comprehension that facilitated later memory. The memory advantage to presenting the stories, all of a piece in their canonical form, rather than interleaving them, was greater for children participants than for adults. In contrast to the above findings, some studies have shown that interleaving has beneficial effects, even in induction situations. Kornell and Bjork (2008) expected interleaving to hamper induction by obscuring the commonalities or structure that defines a concept or category. They presented participants multiple paintings by different artists along with the artists’ names. The paintings of a single artist were presented either consecutively (massed) or interleaved with other artists’ paintings. When asked which artist painted each of a series of new paintings, the authors found that performance was better in the interleaved than in the massed condition. Older adults showed the same result as did younger adults (Kornell, Castel, Eich, & Bjork, 2010). These findings are surprising. They run counter to the authors’ own intuitions that, in this situation, massing should have been better for induction than interleaving. The kind of inferential processes needed in induction, intuitively, seemed to require that the exemplars be directly compared to one another, as could only occur with massed presentation. Comprehension of the category structure would seem to depend upon it. Even more interesting than the authors’ mistaken intuitions is the fact that the participants themselves, in both experiments, thought that their own performance was better in the massed condition. They thought this even after having completed the experiment! It would seem, then, that whether it is better to mass practice or to interleave is a question that is better left to empirical research than to intuition. Our logical arguments, and our intuitions, both prospectively and retrospectively, as experimenters and even as participants, may provide the wrong answers and, in this situation, are not to be trusted.

Conclusion The region of proximal learning framework proposes that there is a level of difficulty that is desirable. It is a level that is just beyond what the person has mastered well. At this particular level of difficulty there is a window of opportunity for learning. When a person studies materials that are in this range, especially if the study itself involves generation and self-testing, and especially if the presentation is spaced, a small investment of time and effort can pay off in large learning gains.

Y109937.indb 271

10/15/10 11:04:18 AM

272 • Janet Metcalfe

These gains can be amplified with feedback. Furthermore, it is suggested that when the person studies in this way, and sees the rewards of these efforts, it may set up a pattern of success in which learning itself becomes intrinsically rewarding and pleasurable. The proposal, here, is that both the region of proximal learning framework and the desirable difficulties perspective, despite using different terminology, agree that this region—a region that is challenging (i.e., desirably difficult) but not too challenging—is the “just right” level of difficulty. Directing effective learning methods at this particular level of difficulty results in labor with gain rather than labor in vain.

References Anderson, R. C., Kulhavy, R. W., & Andre, T. (1971). Feedback procedures in programmed instruction. Journal of Educational Psychology, 62, 148–156. Atkinson, R. C. (1972). Optimizing the learning of a second-language vocabulary. Journal of Experimental Psychology, 96, 124–129. Balota, H. P., Ducheck, J. M., Sergent-Marshall, S. D., & Roediger, H. L. (2006). Does expanding retrieval produce benefits over equal interval spacing? Exploration of spacing effects in healthy aging and early Alzheimer’s disease. Psychology and Aging, 21, 19–31. Benjamin, A. S., Bjork, R. A., & Schwartz, B. L. (1998). The mismeasure of memory: When retrieval fluency is misleading as a metamnemonic index. Journal of Experimental Psychology: General, 127, 55–68. Bertsch, S., Pesta, B. J., Wiscott, R., & McDaniel, M. A. (2007). The generation effect: A meta-analytic review. Memory & Cognition, 35, 201–210. Bjork, E. L., & Bjork, R. A. (2011). Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning. In M. A. Gernsbacher, R. W. Pew, & J. R. Pomerantz (Eds.), Psychology and the real world: Essays illustrating fundamental contributions to society. New York: Worth Publishers. Bjork, R. A. (1988). Retrieval practice and the maintenance of knowledge. In M. M. Gruneberg & P. E. Morris (Eds.), Practical aspects of memory: Current research and issues: Memory in everyday life (Vol. 1, pp. 396–410). New York: John Wiley & Sons. Bjork, R. A. (1992). Interference and forgetting. In L. R. Squire (Ed.), Encyclopedia of learning and memory (pp. 283–288), New York: MacMillan. Bjork, R. A. (2006). Making things hard on yourself: Desirable difficulties in theory and practice. Keynote address of the UK Higher Education Academy Psychology Network, Psychology of Learning and Teaching Conference, York, UK, 2006. Bjork, R. A., & Bjork, E. L. (2006). Optimizing treatment and instruction: Implications of a new theory of disuse. In L.-G. Nilsson & N. Ohta (Eds.), Memory and society: Psychological perspectives (pp. 109–133). New York: Psychology Press.

Y109937.indb 272

10/15/10 11:04:18 AM

Desirable Difficulties and Studying in the Region of Proximal Learning • 273

Bjork, R. A., & Linn, M. C. (2006). The science of learning and the learning of science: Introducing desirable difficulties. Association of Psychological Science Observer, 19, 29. Butler, A. C., Karpicke, J. D., & Roediger, H. L., III (2007). The effect of type and timing of feedback on learning from multiple-choice tests. Journal of Experimental Psychology: Applied, 13, 273–281. Butler, A. C., Karpicke, J. D., & Roediger, H. L., III (2008). Correcting a metacognitive error: Feedback enhances retention of low confidence correct responses. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34, 918–928. Butler, A. C., & Roediger, H. L., III (2007). Testing improves long-term retention in a simulated classroom setting. European Journal of Cognitive Psychology, 19, 514–527. Butler, A. C., & Roediger, H. L., III (2008). Feedback enhances the positive effects and reduces the negative effects of multiple-choice testing. Memory & Cognition, 36, 604–616. Butterfield, B., & Mangels, J. A. (2003). Neural correlates of error detection and correction in a semantic retrieval task. Cognitive Brain Research, 17, 793–817. Butterfield, B., & Metcalfe, J. (2001). Errors committed with high confidence are hypercorrected. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27, 1491–1494. Butterfield, B., & Metcalfe, J. (2006). The correction of errors committed with high confidence. Metacognition and Learning, 1, 1556–1623. Carpenter, S. K., & DeLosh, E. L. (2005). Application of the testing and spacing effects to name learning. Applied Cognitive Psychology, 19, 619–636. Dunlosky, J., & Metcalfe, J. (2009). Metacognition. Thousand Oaks, CA: Sage Publications. Fazio, L. K., & Marsh, E. J. (2009). Surprising feedback improves later memory. Psychonomic Bulletin and Review, 16, 88–92. Gagne, R. (1950). The effect of sequence of presentation of similar items on the learning of paired associates. Journal of Experimental Psychology, 40, 61–73. Garcia, M. A., Kornell, N., & Bjork, R. A. (2009, November 19). Difficult rulebased category learning benefits from massed practice. Poster presented at the 50th Annual Meeting of the Psychonomic Society, Boston, MA. Guzman-Muñoz, F. J., & Johnson, A. (2007). Error feedback and the acquisition of geographical representations. Applied Cognitive Psychology, 22, 979–995. Kornell, N., & Bjork, R. A. (2008). Learning concepts and categories: Is spacing the “enemy of induction”? Psychological Science, 19, 585–592. Kornell, N., Castel, A. D., Eich, T. S., & Bjork, R. A. (2011). Spacing as the friend of both memory and induction in younger and older adults. Psychology and Aging, 25, 498–503.

Y109937.indb 273

10/15/10 11:04:18 AM

274 • Janet Metcalfe

Kornell, N., Hays, M. J., & Bjork, R. A. (2009). Unsuccessful retrieval attempts enhance subsequent learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35, 989–998. Kornell, N., & Metcalfe, J. (2006). Study efficacy and the region of proximal learning framework. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32, 609–622. Kulhavy, R. W. (1977). Feedback in written instruction. Review of Educational Research, 47, 211–232. Kulhavy, R. W., & Anderson, R. C. (1972). Delay-retention effect with multiplechoice tests. Journal of Educational Psychology, 63, 505–512. Kulik, J. A., & Kulik, C. L. C. (1988). Timing of feedback and verbal learning. Review of Educational Research, 58, 79–97. Kurtz, K. H., & Hovland, C. I. (1956). Concept learning with differing sequences of instances. Journal of Experimental Psychology, 51, 239–243. Landauer, T. K., & Bjork, R. A. (1978). Optimal rehearsal patterns and name learning. In M. M. Gruneberg, P. E. Morris, & R. N. Sykes (Eds.), Practical aspects of memory (pp. 625–632). London: Academic Press. Lhyle, K. G., & Kulhavy, R. W. (1987). Feedback processing and error correction. Journal of Educational Psychology, 79, 320–322. Mandler, J. M. (1978). A code in the node: The use of a story schema in retrieval. Discourse Processes, 1, 14–35. Mandler, J. M., & DeForest, M. (1979). Is there more than one way to recall a story? Child Development, 50, 886–889. Mandler, J. M., & Goodman, M. S. (1982). On the psychological validity of story structure. Journal of Verbal Learning and Verbal Behavior, 21, 507–523. McDaniel, M. A., Anderson, J. L., Derbish, M. H., & Morrisette, N. (2007). Testing the testing effect in the classroom. European Journal of Cognitive Psychology, 19, 494–513. McDaniel, M. A., & Fisher, R. P. (1991). Test and test feedback as learning sources. Contemporary Educational Psychology, 16, 192–201. McDaniel, M. A., Roediger, H. L., III, & McDermott, K. B. (2007). Generalizing test-enhanced learning from the laboratory to the classroom. Psychonomic Bulletin and Review, 14, 200–206. Metcalfe, J. (2002). Is study time allocated selectively to a region of proximal learning? Journal of Experimental Psychology: General, 131, 349–363. Metcalfe, J. (2009). Metacognitive judgments and control of study. Current Direction in Psychological Science, 18, 159–163. Metcalfe, J., & Finn, B. (accepted). Hypercorrection of high confidence errors: Did they know it all along? Journal of Experimental Psychology: Learning, Memory, and Cognition. Metcalfe, J., & Jacobs, W.J. (2010). People’s study time allocation and its relation to animal foraging. Behavioral Processes, 83, 213–221. Metcalfe, J., & Kornell, N. (2005). A region of proximal learning model of study time allocation. Journal of Memory and Language, 52, 463–477.

Y109937.indb 274

10/15/10 11:04:18 AM

Desirable Difficulties and Studying in the Region of Proximal Learning • 275

Metcalfe, J., & Kornell, N. (2007). Principles of cognitive science in education: The effects of generation, errors and feedback. Psychonomic Bulletin and Review, 14, 225–229. Metcalfe, J., Kornell, N., & Finn, B. (2009). Delayed versus immediate feedback in children’s and adults’ vocabulary learning. Memory & Cognition, 37, 1077–1087. Metcalfe, J., Kornell, N., & Son, L. K. (2007). A cognitive-science based program to enhance study efficacy in a high and low-risk setting. European Journal of Cognitive Psychology, 19, 743–768. Nelson, T. O., & Leonesio, R. J. (1988). Allocation of self-paced study time and the “labor-in-vain effect.” Journal of Experimental Psychology: Learning, Memory, and Cognition, 14, 676–686. Pashler, H., Cepeda, N. J., Wixted, J. T., & Rohrer, D. (2005). When does feedback facilitate learning of words? Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 3–8. Piaget, J. (1952). The origins of intelligence in children. New York: International Universities Press. Roediger, H. L., & Karpicke, J. D. (2006a). The power of testing memory: Basic research and implications for educational practice. Perspectives on Psychological Science, 181–210. Roediger, H. L., & Karpicke, J. D. (2006b). Test-enhanced learning: Taking memory tests improves long-term retention. Psychological Science, 17, 249–255. Schacter, D. L., Rich, S. A., & Stamp, M. S. (1985). Remediation of memory disorders: Experimental evaluation of the spaced retrieval technique. Journal of Clinical and Experimental Neuropsychology, 7, 79–96. Schooler, L. J., & Anderson, J. R. (1990). The disruptive potential of immediate feedback. Proceedings of the 12 Annual Conference of the Cognitive Science Society, 12, 702–708. Slamecka, N. J., & Graf, P. (1978). The generation effect: Delineation of a phenomenon. Journal of Experimental Psychology: Human Learning and Memory, 4, 592–604. Son, L. K., & Metcalfe, J. (2000). Metacognitive and control strategies in studytime allocation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 204–221. Storm, B. C., Bjork, R. A., & Storm, J. C. (2010). Optimizing retrieval as a learning event: When and why expanding retrieval practice enhances longterm retention. Memory & Cognition, 38, 244–253. Vygotsky, L. S. (1987). The collected works of L. S. Vygotsky: Problems of general psychology including the volume thinking and speech (R. W. Rieber & A. S. Carton, Eds., Vol. 1). New York: Plenum Press. Waugh, N. C. (1970). On the effective duration of a repeated word. Journal of Verbal Learning and Verbal Behavior, 16, 465–478. Winstein, C. J., & Schmidt, R. A. (1990). Reduced frequency of knowledge of results enhances motor skill learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 677–691.

Y109937.indb 275

10/15/10 11:04:18 AM

276 • Janet Metcalfe

Wulf, G., & Schmidt, R. A. (1989). The learning of generalized motor programs: Reducing the relative frequency of knowledge of results enhances memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 15, 748–757.

Y109937.indb 276

10/15/10 11:04:19 AM

14

Data Entry A Window to Principles of Training Alice F. Healy, James A. Kole, Erica L. Wohldmann, Carolyn J. Buck-Gengler, and Lyle E. Bourne Jr.

In this chapter, we review studies aimed to reveal principles of training, which allow us to understand what factors influence the efficiency or speed of training, its durability or retention across long delay intervals, and its flexibility or transferability to other contexts or situations beyond those occurring during training itself. Note that our emphasis is on training for performance in well-defined tasks with a major skill component, which can be contrasted with research that addresses more general educational issues. In some of our early training studies, influenced by the important article of Schmidt and Bjork (1992), we and others have discovered that training that minimizes the time to acquire knowledge or skills may be detrimental to long-term retention (e.g., Battig, 1972, 1979; Healy & Sinclair, 1996; Schneider, Healy, & Bourne, 1998, 2002; Schneider, Healy, Ericsson, & Bourne, 1995; Shea & Morgan, 1979). Likewise, in subsequent studies (e.g., Healy, Wohldmann, Parker, & Bourne, 2005; Healy, Wohldmann, Sutton, & Bourne, 2006), we have found that some training that maximizes long-term retention may severely limit transferability. Thus, our ultimate goal has been to provide guidelines to trainers that will optimize simultaneously training efficiency, durability, and flexibility. We have been using a variety of tasks to develop our training principles. However, in this chapter we concentrate on just those principles 277

Y109937.indb 277

10/15/10 11:04:19 AM

278 • Healy, Kole, Wohldmann, Buck-Gengler, and Bourne

derived from a single simple task, data entry. In this task, subjects see multidigit numbers and type them on a computer console. Subjects cannot see their responses and receive no feedback on performance. Both typing accuracy and response time are measured.

Procedural Reinstatement Principle In our first experiment using this task (Fendrich, Healy, & Bourne, 1991, Experiment 1), subjects were given two days of training. They typed lists of 10 three-digit numbers. Some lists were shown only once and some were repeated five times across training. One month later, subjects were given a retention test in which they entered old (i.e., practiced) and new (i.e., unpracticed) lists of digits. Also at the retention test, they gave a recognition rating for each digit list shown. Subjects’ response times decreased as training progressed but changed very little over the long retention interval, showing a high degree of durability for this skill. However, response times on the retention test were shorter for the old lists repeated five times than for both the new lists and the old lists repeated only once during training. Thus, by showing that what was learned was specific to the particular lists that were practiced, these results demonstrate the limited flexibility of the learned skill. Consistent with the specificity shown by the response times, subjects showed significant recognition memory for the digit lists repeated five times but not for those repeated only once one month earlier. In a follow-up experiment (Fendrich, Gesi, Healy, & Bourne, 1995, Experiment 1), subjects were presented with four-digit numbers on a computer screen. During training, in one condition the subjects simply read each number. In a second condition they entered the number using the numeric keypad on the side of the computer keyboard, and in a third condition they entered the number using the number row on the top of the computer keyboard. One week after training, subjects were given a test in which they entered old and new numbers using, in some cases, the keypad and, in other cases, the number row. After entering each number, subjects also made an old/new recognition decision. They showed highest recognition for the numbers entered with the same key configuration (keypad or row) at test as at study. In fact, when numbers were entered with different key configurations at study and test, recognition memory was no better than when numbers were simply read at study. Together these experiments support what we have called the procedural reinstatement principle (Clawson, Healy, Ericsson, & Bourne, 2001; Healy, 2007; Healy et al., 1992; Healy, Fendrich, & Proctor, 1990; Healy, Wohldmann, & Bourne, 2005; Kolers & Roediger, 1984;

Y109937.indb 278

10/15/10 11:04:19 AM

Data Entry • 279

McNamara & Healy, 1995, 2000), by which skill learning leads to durable retention when the required procedures are maintained, but limited transfer when the required procedures are altered. This principle of skill learning is related to similar principles developed for word list learning, including encoding specificity (Tulving & Thomson, 1973) and transfer-appropriate processing (McDaniel, Friedman, & Bourne, 1978; Morris, Bransford, & Franks, 1977; Roediger, Weldon, & Challis, 1989), although those earlier principles were based on declarative as opposed to procedural information (Anderson, 1983).

Depth of Processing Principle The advantage found in the previous study by Fendrich et al. (1991) for typing old numbers is known as repetition priming. The fact that repetition priming occurred over a one-month delay interval suggests that the four-digit numbers were durably represented in memory. In a subsequent experiment we sought to determine whether both the surface percept of the number and its abstract meaning were remembered and contributed to the repetition priming effect (Buck-Gengler & Healy, 2001, Experiment 1). To address this issue, we manipulated the surface percept of the four-digit numbers by varying their presentation format. One format involved presenting the numbers as numerals (which corresponded to the labels on the data entry keys), whereas the other format involved presenting the numbers as words. Subjects were trained on some numbers in the numeral format and on others in the word format, with a given number repeated five times during training always in the same format. At test one week later, a given number was presented in either the same or the alternate format. Half of the subjects used the keypad to enter the numbers during training and half used the number row. In each case, all subjects switched key configurations at test so that any repetition priming effect could not be attributed to the retention of the motoric component of the task. Also, there was no explicit recognition test, so that this procedure provided a relatively pure assessment of implicit memory (see Richardson-Klavehn & Bjork, 1988, for a discussion of differences between explicit and implicit measures of memory). We predicted faster response times for the old than for the new numbers at test, even though we eliminated the motoric component of repetition priming by switching the response mode from training to testing. Of greatest interest was whether we would find an effect of the presentation format on repetition priming. If old numbers were entered more quickly at test than new numbers only when there was a match between the presentation formats during training and testing, then there would

Y109937.indb 279

10/15/10 11:04:19 AM

280 • Healy, Kole, Wohldmann, Buck-Gengler, and Bourne

be evidence for a contribution of the surface percept to the repetition priming effect. Alternatively, if old numbers were entered more quickly at test than new numbers even when the presentation formats during training and testing did not match, then there would be evidence for a contribution of the abstract meaning of the percept to the repetition priming effect. We did find a repetition priming effect, with old numbers typed faster than new numbers at test. We also found an effect of test presentation format, with numbers presented as numerals typed faster than those presented as words. However, there was no overall difference for old numbers between those presented in the same and those presented in the alternate format at test as at training, suggesting that the advantage for old numbers must come from the underlying abstract numerical concept, rather than the surface percept. Further, for old numbers there was an interesting interaction of format at test (numeral, word) and format continuity (same, alternate) reflecting the fact that for both test presentation formats, numbers presented as words in training had an advantage over those presented as numerals (see Figure 14.1). We postulated that the superiority for training with words was due to the fact that the word format at training leads to a deeper level of processing than does the numeral format at training, presumably because the

Total Response Time (s)

3.8 3.6

Format continuity Same Alternate

N-W W-W

3.4 3.2 N-N 3.0 2.8

W-N

Numeral

Word Format at Test

Figure 14.1 Total response time (in s) for old numbers at test as a function of format continuity (same, alternate) and format at test (numeral, word) in Buck-Gengler and Healy (2001, Experiment 1). N-N = numeral at training and at test; W-N = word at training, numeral at test; W-W = word at training and at test; N-W = numeral at training, word at test.

Y109937.indb 280

10/15/10 11:04:20 AM

Data Entry • 281

word format is farther from the abstract meaning of the number than is the numeral format. Thus, these results from data entry suggest that one of the long-standing principles of memory, the depth of processing principle, which was developed on the basis of studies of memory for word lists (involving primarily declarative learning; e.g., Craik & Lockhart, 1972; Lockhart & Craik, 1990), also applies to the training of simple skills (involving procedural as well as declarative learning). Processing stimuli more deeply during training improves the skill involved in responding to those stimuli after a long delay.

Phonological Processing Principle In the basic version of the number data entry experiment, subjects are allowed to use whatever means they wish to remember the numbers and type them, including subvocal or vocal phonological rehearsal, thus activating the phonological loop of working memory (Baddeley, 1992, 2007). In a subsequent experiment, we wanted to determine whether articulatory suppression, which would disrupt this means of rehearsal when typing the numbers, would thus alter performance on this task (Kole, Healy, & Buck-Gengler, 2005, Experiment 1). Specifically, this experiment was just like the last one except that half the subjects were in an articulatory suppression group in which they were required to repeat the word the continuously while they typed the digits in both sessions, and the remaining subjects were in a silent group in which they entered the digits silently, with no secondary articulatory suppression task. When we examined a measure of initiation time, that is, the time to enter the first digit of the four, we found a result similar to that obtained in the last experiment (Buck-Gengler & Healy, 2001), because for old numbers there was an advantage at test for numbers presented as words during training. Thus, whenever subjects practiced during training typing a given four-digit number presented as a sequence of words, one week later during the test they initiated trials faster than when they practiced during training typing the same number presented as a sequence of numerals, regardless of test presentation format (see Figure 14.2). Importantly, this advantage for numbers presented as words at training was significant for the silent group, but not for the articulatory suppression group (see Figure 14.3). This finding implies that at least some of the advantage for old numbers presented as words at training might be due to subvocalization or use of the phonological loop. In addition, initiation times yielded a surprising main effect of group. Subjects who completed the task in the articulatory suppression group initiated trials significantly faster at test than did subjects who

Y109937.indb 281

10/15/10 11:04:20 AM

282 • Healy, Kole, Wohldmann, Buck-Gengler, and Bourne 1.8 N-W

Initiation Time (s)

1.7

1.6

Format continuity Same Alternate

W-W

1.5

1.4

1.3

N-N

W-N

Numeral

Word Format at Test

Figure 14.2 Initiation time (time to enter the first digit of the four, in s) for old numbers at test as a function of format continuity (same, alternate) and format at test (numeral, word) in Kole et al. (2005, Experiment 1). N-N = numeral at training and at test; W-N = word at training, numeral at test; W-W = word at training and at test; N-W = numeral at training, word at test.

completed the task in the silent group. In contrast, when we examined execution time (which is the average time to type each of the second, third, and fourth digits) on all numbers at test, we found that times were faster for the silent group than for the articulatory suppression group. With this measure, the disadvantage for the articulatory suppression group was greater for words than for numerals (see Figure 14.4). Thus, we found a different pattern of results for the different response time measures. These contrasting patterns seem to imply that the response for the initial digit of the four-digit number can be based on visual input alone, without any phonological code, but the responses for the subsequent digits do seem to rely on phonological coding, presumably in order to maintain those digits in the phonological loop of working memory, even though all the digits remain visible on the screen until the subjects have completed their response. Again, results from the data entry task suggest that a long-standing principle derived from studies of memory for word lists applies to the training of a simple skill. By the phonological processing principle, disrupting phonological processing of stimuli hinders the skill involved in responding to those stimuli, but only when working memory is used to store the stimuli.

Y109937.indb 282

10/15/10 11:04:21 AM

Data Entry • 283 2.0

Initiation Time (s)

1.8

N-W W-W Format continuity Same Alternate

1.6 W-W

N-W

N-N

W-N

1.4 N-N W-N 1.2

AS Numeral

AS Word

Silent Numeral Silent Word

Group and Format at Test

Figure 14.3 Initiation time (time to enter the first digit of the four, in s) for old numbers at test as a function of format continuity (same, alternate), group (articulatory suppression, silent), and format at test (numeral, word) in Kole et al. (2005, Experiment 1). AS = articulatory suppression; N-N = numeral at training and at test; W-N = word at training, numeral at test; W-W = word at training and at test; N-W = numeral at training, word at test.

Cognitive Antidote Principle In a later experiment (Healy, Kole, Buck-Gengler, & Bourne, 2004, Experiment 1), subjects saw 64 four-digit numbers repeated in different random orders five times over five blocks in the first half of the session and 64 different numbers repeated in different random orders five times over five blocks in the second half. They entered each number using the keypad ending with the “enter” key. With prolonged work on this simple task subjects’ errors increased across blocks of trials, whereas their total response time decreased (see Figure 14.5). Thus, there was a speed-accuracy trade-off (e.g., Pachella, 1974) that grew across blocks of trials. We hypothesized that skill acquisition and repetition priming together were responsible for the improvement in response time and that the decline in accuracy was due to two other factors. First, because subjects were not given any feedback on their responses, they might have lacked sufficient motivation to perform accurately. Second, because the task did not require much cognitive processing, subjects might have become bored or disengaged with the task as trials progressed (Hockey & Earle, 2006).

Y109937.indb 283

10/15/10 11:04:21 AM

284 • Healy, Kole, Wohldmann, Buck-Gengler, and Bourne 0.55 Articulatory suppression Silent

Execution Time (s)

0.50

0.45

0.40

0.35

Numeral

Word Format at Test

Figure 14.4 Execution time (average time to type each of the second, third, and fourth digits, in s) for all numbers at test as a function of group (articulatory suppression, silent) and format at test (numeral, word) in Kole et al. (2005, Experiment 1).

To test for the importance of each of these factors, we conducted a new experiment (Kole, Healy, & Bourne, 2008, Experiment 2), in which we evaluated the effects of two crossed variables: feedback and cognitive demands of the task. In a no-feedback condition, as in our previous experiments (e.g., Healy et al., 2004), no form of feedback was provided, and the subjects’ responses were not displayed. In contrast, both visual and auditory forms of feedback were provided in a feedback condition. First, the typed responses were displayed directly below the stimulus numbers. Second, if the subject made any error while typing the response (even if the error was corrected by the subject), the computer beeped after the “enter” key was pressed. To vary cognitive demands, we compared two tasks: data entry on its own and data entry coupled with mental multiplication (Bourne, Pauli, Fendrich, Rickard, & Healy, 2001; Rickard, Healy, & Bourne, 1994). Subjects typed the same four-digit numbers for each task, but the stimuli differed. For both tasks, the four-digit numbers were divided into two pairs of digits. In the data entry task, the stimulus numbers were equal to the pairs of answers to the multiplication problems seen in the mental multiplication task (e.g., data entry: 4 # 9, 1 # 6; mental

Y109937.indb 284

10/15/10 11:04:22 AM

Data Entry • 285 Proportion of Errors

0.16

2.8 0.08 2.7 Response Time

0.04

0.00

Total Response Time (s)

2.9

0.12 Proportion of Errors

3.0

2.6

1

2

3

First Half

4

5

1

Block

2 3 4 Second Half

5

2.5

Figure 14.5 Proportion of errors and total response time (in s) as a function of session half and block in Healy et al. (2004, Experiment 1).

multiplication: 7 × 7, 4 × 4), so the required response was the same. Each session half included 320 unique numbers divided into five blocks. The decline in accuracy across blocks was overcome simply by adding feedback (see Figure 14.6). In addition, accuracy decreased across blocks for data entry, but improved across blocks for mental multiplication (see Figure 14.7). Thus, both motivation to perform accurately (as reflected in the effects of feedback) and cognition (as reflected in the effects of a mental complication) seem to be involved in the decline in accuracy. Although the addition of mental multiplication to the data entry task mitigated the decline in accuracy due to boredom, there was a cost in terms of response time. Response time improved across blocks but was much higher for mental multiplication than for simple data entry (see Figure 14.8). We wanted to see whether we could find similar advantages for adding cognitive requirements at a smaller cost in response time. Hence, in our next experiment (Kole et al., 2008, Experiment 3), we added a cognitive requirement that was much more modest and less time-consuming. Specifically, we manipulated only the concluding keystroke made by subjects after they typed the four digits shown to them on a given trial. In our earlier experiments, the

Y109937.indb 285

10/15/10 11:04:22 AM

286 • Healy, Kole, Wohldmann, Buck-Gengler, and Bourne 0.90

Proportion Correct

0.89 0.88 Feedback No feedback

0.87 0.86 0.85 0.84 0.83

1

2

3

4

5

Block

Figure 14.6 Proportion correct as a function of feedback (feedback, no feedback) and block in Kole et al. (2008, Experiment 2).

Data entry Mental multiplication

0.90

Proportion Correct

0.89 0.88 0.87 0.86 0.85 0.84 0.83

1

2

3

4

5

Block

Figure 14.7 Proportion correct as a function of task (data entry, mental multiplication) and block in Kole et al. (2008, Experiment 2).

Y109937.indb 286

10/15/10 11:04:23 AM

Data Entry • 287 5.0

No feedback Feedback No feedback Feedback

Total Response Time (in s)

4.5

Mental multiplication Data entry

4.0 3.5 3.0 2.5 2.0

1

2

3

4

5

Block

Figure 14.8 Total response time (in s) as a function of feedback (no feedback, feedback), task (mental multiplication, data entry), and block in Kole et al. (2008, Experiment 2).

concluding keystroke was always the “enter” key. In the present experiment, we compared four different conditions. In the alternating condition, the concluding keystroke simply alternated between + and –. In the relative magnitude condition, the concluding keystroke was + if the first two-digit number of the full four-digit stimulus was larger than the second two-digit number and – if it was smaller. In the + and – control conditions, subjects pressed either the + or – key to conclude every trial. Also, the sessions were extended to include three five-block sequences, for a total of 15 blocks and 960 unique numbers. There was a significant advantage for both the alternating and the relative magnitude experimental conditions over the + and – control conditions in terms of overall proportion correct on entering the four digits (not including the concluding keystroke) and significantly less decline in proportion correct across trials for the experimental conditions than for control conditions (see Figure 14.9). Response time (including the time for the concluding keystroke) improved across trials for all conditions. There was a significant disadvantage in response time for the experimental conditions relative to the control conditions, but the disadvantage was relatively small for the alternating condition (see Figure 14.10). Thus, with minimal cost in terms of response time, the additional cognitive requirements were effective at improving accuracy and removing the decline in accuracy across trials.

Y109937.indb 287

10/15/10 11:04:23 AM

288 • Healy, Kole, Wohldmann, Buck-Gengler, and Bourne 1.00 0.95

Proportion Correct

0.90 0.85 0.80 0.75

Alternating Relative magnitude

0.70

Control (–) Control (+)

0.65

1

2

3

4

First Third

5

1

2

3

4

Second Third Block

5

1

2

3

4

5

Third Third

Figure 14.9 Proportion correct as a function of condition (alternating, relative magnitude, – control, + control), session third, and block in Kole et al. (2008, Experiment 3).

These experiments thus support what we call the cognitive antidote principle, by which adding cognitive complications to an otherwise routine task mitigates the adverse effects of prolonged work. Thus, by this principle, when cognitive activities are added to a task, the speed-accuracy trade-off can be eliminated. This principle, though formulated on the basis of skill learning, is consistent with Bjork’s (1994) demonstration of the benefits of including desirable difficulties for fact learning.

Mental Practice Principle We have also been using the data entry task to explore the possibility of mental practice as a supplement to pure physical practice, especially when physical practice is not feasible (Driskell, Copper, & Moran, 1994; Wohldmann, Healy, & Bourne, 2007). Our work compares the effects of physical and mental practice in an attempt to identify the differences in cognitive processing. For example, consider a pianist wishing to rehearse a song when she is away from her piano. In such cases, it would certainly be helpful if mental practice could serve as an effective adjunct to physical practice. Will mental practice lead to the same

Y109937.indb 288

10/15/10 11:04:24 AM

Data Entry • 289 4.0 Relative magnitude Alternating Control (–) Control (+)

Total Response Time (s)

3.5

3.0

2.5

2.0

1

2 3 4 First Third

5

1

2 3 4 5 Second Third

1

2 3 4 5 Third Third

Block

Figure 14.10 Total response time (in s) as a function of condition (relative magnitude, alternating, – control, + control), session third, and block in Kole et al. (2008, Experiment 3).

improvements as physical practice in such a situation? Will such practice enable the pianist to improve both her knowledge of the particular song and her general skill at playing the piano? To examine these questions, we used a variant of the data entry task in which subjects were tested over a four-week period (Wohldmann, Healy, & Bourne, 2008, Experiment 1). The first week consisted of familiarization, in which subjects typed five blocks of 64 four-digit numbers. Half of the subjects typed on the numeric keypad and half on the number row. Subjects were then given an immediate test in which they typed both old and new numbers. During each of the following two weeks, subjects were given training, which consisted of five blocks of the 64 old numbers with which they had been familiarized. All training occurred using a different key location (row or keypad) from the one used earlier. For half of the subjects, training involved physical typing, whereas for the remaining subjects, training involved instead mental practice of the typing responses. The fourth week consisted of a delayed test in which all subjects physically typed old and new numbers using the same key location as the one used during familiarization and immediate testing.

Y109937.indb 289

10/15/10 11:04:24 AM

290 • Healy, Kole, Wohldmann, Buck-Gengler, and Bourne

In the physical condition, subjects typed each number with their right hand and, after typing the number, pressed the spacebar with their left hand. During training in the mental condition, subjects instead imagined typing each number with their right hand while tightly gripping a mouse in their right hand to preclude finger movements and, after mentally typing the number, pressed the spacebar with their left hand. These subjects were asked to imagine themselves typing each number as vividly as possible, including how the finger movements and the key presses would feel if they were actually typing. We focused our analyses on execution time, which again is the average time to type the last three digits of each number and is thought to reflect primarily motoric processes. For the old numbers, subjects in the mental condition showed perfect retention between the immediate and the delayed tests. In contrast, subjects in the physical condition showed marked forgetting between the two tests. For the new numbers, subjects in the mental condition showed a trend toward transfer, with faster times on the delayed test than on the immediate test, whereas those in the physical condition showed the opposite tendency, with slower times on the delayed test than on the immediate test (see Figure 14.11). The different pattern of forgetting and transfer that we found suggested to us that mental and physical practice are not equivalent, and that mental practice is not simply a weaker version of physical practice. More specifically, our results suggested that mental practice strengthens an abstract, effector-independent representation, whereas physical practice strengthens an effector-dependent representation, despite the fact that the instructions for mental practice could be viewed as facilitating an effector-dependent representation. Our findings also imply that only the effector-dependent representation suffers from retroactive interference caused by a change in effectors (cf. Nyberg, Eriksson, Larsson, & Marklund, 2006). To test our hypothesis about the differences between the representations strengthened by mental and physical practice, we conducted a second experiment (Wohldmann et al., 2008, Experiment 2) involving a more robust manipulation of effector processes. Specifically, we asked subjects to switch from typing with one hand to typing with the other hand, because such a switch necessarily involves different effectors. During familiarization, subjects typed four-digit numbers with either their left or right hand. Subjects were then given an immediate test, which included both old and new numbers, and typed those numbers with the same hand as was used during familiarization. Next, subjects practiced typing, either mentally or physically, the old numbers

Y109937.indb 290

10/15/10 11:04:24 AM

Data Entry • 291 0.50

Immediate Delayed

Execution Time (s)

0.45

0.40

0.35

0.30

Mental

Physical Old

Mental

Physical New

Figure 14.11 Execution time (average time to type each of the second, third, and fourth digits, in s) at test as a function of test time (immediate, delayed), training condition (mental, physical), and number type (old, new) in Wohldmann et al. (2008, Experiment 1).

presented during familiarization with either the same or the opposite hand. Finally, subjects were given a delayed test, on which they typed physically both old and new numbers with the hand that was used during familiarization. Subjects typed on the keypad in each case, using only the index finger. All phases occurred during a single session. For old numbers, subjects who used the same hand during training and testing showed improvement in execution time from the immediate to the delayed test, with similar benefits from mental and physical practice. In contrast, subjects who switched hands during training showed improvement in execution time between the immediate and delayed test in the mental condition but not in the physical condition, where subjects showed instead signs of retroactive interference. For new numbers, subjects who used the same hand throughout the experiment showed improvement across tests from both mental and physical practice, indicating general skill learning. However, subjects who switched hands during training showed larger improvements between the immediate and delayed test in the mental condition than in the physical condition (see Figure 14.12). These findings agree with the hypothesis that mental practice results in a different, more abstract representation from that produced by physical practice.

Y109937.indb 291

10/15/10 11:04:25 AM

292 • Healy, Kole, Wohldmann, Buck-Gengler, and Bourne 0.45

Execution Time (s)

0.40

Mental immediate Mental delayed Physical immediate Physical delayed

0.35

0.30

0.25

Same

Old

Switch

Same

New

Switch

Figure 14.12 Execution time (average time to type each of the second, third, and fourth digits, in s) at test as a function of test time (immediate, delayed), training condition (mental, physical), number type (old, new), and hand condition (same, switch) in Wohldmann et al. (2008, Experiment 2).

These results then support the mental practice principle, according to which mental practice might have certain advantages over physical practice when it comes to slowing forgetting and promoting transfer of training because physical, but not mental, practice suffers from motoric interference when there is a change in effectors. Thus, mental practice can serve as an effective substitute for physical practice for both knowledge of particular learned sequences and the general typing skill. Indeed, mental practice might be superior to physical practice under circumstances that promote retroactive interference. This principle, like the cognitive antidote principle, was formulated on the basis of skill learning, but it is consistent with the literature on fact learning, which also shows strong benefits from the use of mental imagery (e.g., Paivio, 2007).

Conclusions Schmidt and Bjork (1992) have argued that training that minimizes time to acquire knowledge or skills might be detrimental in terms of subsequent performance after a delay. Following that argument and using a simple data entry task, we have provided support for five

Y109937.indb 292

10/15/10 11:04:25 AM

Data Entry • 293

training principles, some directly related to principles explored earlier in studies of word list learning and some newly revealed in the study of skill learning and only indirectly related to principles derived earlier from word list learning studies: procedural reinstatement, depth of processing, phonological processing, cognitive antidote, and mental practice. By the procedural reinstatement principle, skill learning leads to durable retention when the required procedures are maintained, but limits transfer when the required procedures are altered. By the depth of processing principle, processing stimuli more deeply during training improves the skill involved in responding to those stimuli after a long delay. By the phonological processing principle, disrupting phonological processing of stimuli hinders the skill involved in responding to those stimuli, but only when working memory is used to store the stimuli. By the cognitive antidote principle, adding cognitive complications to an otherwise routine task mitigates the adverse effects of prolonged work. By the mental practice principle, mental practice might have certain advantages over physical practice when it comes to slowing forgetting and promoting transfer of training because physical, but not mental, practice suffers from motoric interference when there is a change in effectors. Although these principles were developed with the simple data entry task, we trust that they would apply to virtually all skills, including complex as well as simple skills. Indeed, we have demonstrated the validity of at least some of these principles in other, more complex tasks. We thus hope that these and related principles can provide guidelines to trainers to help them optimize the efficiency, durability, and flexibility of training for any task with a major skill component.

Acknowledgments This work was supported by Army Research Institute Contract DASW0103-K-0002 and Army Research Office Grant W911NF-05-1-0153 to the University of Colorado.

References Anderson, J. R. (1983). The architecture of cognition. Cambridge, MA: Harvard University Press. Baddeley, A. (1992). Is working memory working? The Fifteenth Bartlett Lecture. Quarterly Journal of Experimental Psychology, 44A, 1–31. Baddeley, A. (2007). Working memory, thought, and action. New York: Oxford University Press.

Y109937.indb 293

10/15/10 11:04:26 AM

294 • Healy, Kole, Wohldmann, Buck-Gengler, and Bourne

Battig, W. F. (1972). Intratask interference as a source of facilitation in transfer and retention. In R. F. Thompson & J. F. Voss (Eds.), Topics in learning and performance (pp. 131–159). New York: Academic Press. Battig, W. F. (1979). The flexibility of human memory. In L. S. Cermak & F. I. M. Craik (Eds.), Levels of processing and human memory (pp. 23–44). Mahwah, NJ: Erlbaum. Bjork, R. A. (1994). Memory and metamemory considerations in the training of human beings. In J. Metcalfe & A. Shimamura (Eds.), Metacognition: Knowing about knowing (pp. 185–205). Cambridge, MA: MIT Press. Bourne, L. E., Jr., Pauli, P., Fendrich, D. W., Rickard, T. C., & Healy, A. F. (2001). Deliberate and automatic processes in mental arithmetic. Cognitive Processing, 4, 487–522. Buck-Gengler, C. J., & Healy, A. F. (2001). Processes underlying long-term repetition priming in digit data entry. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27, 879–888. Clawson, D. M., Healy, A. F., Ericsson, K. A., & Bourne, L. E., Jr. (2001). Retention and transfer of Morse code reception skill by novices: Partwhole training. Journal of Experimental Psychology: Applied, 7, 129–142. Craik, F. I. M., & Lockhart, R. S. (1972). Levels of processing: A framework for memory research. Journal of Verbal Learning and Verbal Behavior, 11, 671–684. Driskell, J. E., Copper, C., & Moran, A. (1994). Does mental practice enhance performance? Journal of Applied Psychology, 79, 481–492. Fendrich, D. W., Gesi, A. T., Healy, A. F., & Bourne, L. E., Jr. (1995). The contribution of procedural reinstatement to implicit and explicit memory effects in a motor task. In A. F. Healy & L. E. Bourne, Jr. (Eds.), Learning and memory of knowledge and skills: Durability and specificity (pp. 66–94). Thousand Oaks, CA: Sage. Fendrich, D. W., Healy, A. F., & Bourne, L. E., Jr. (1991). Long-term repetition effects for motoric and perceptual procedures. Journal of Experimental Psychology: Learning, Memory, and Cognition, 17, 137–151. Healy, A. F. (2007). Transfer: Specificity and generality. In H. L. Roediger, III, Y. Dudai, & S. M. Fitzpatrick (Eds.), Science of memory: Concepts (pp. 271–275). New York: Oxford University Press. Healy, A. F., Fendrich, D. W., Crutcher, R. J., Wittman, W. T., Gesi, A. T., Ericsson, K. A., & Bourne, L. E., Jr. (1992). The long-term retention of skills. In A. F. Healy, S. M. Kosslyn, & R. M. Shiffrin (Eds.), From learning processes to cognitive processes: Essays in honor of William K. Estes (Vol. 2, pp. 87–118). Hillsdale, NJ: Erlbaum. Healy, A. F., Fendrich, D. W., & Proctor, J. D. (1990). Acquisition and retention of a letter-detection skill. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 270–281. Healy, A. F., Kole, J. A., Buck-Gengler, C. J., & Bourne, L. E., Jr. (2004). Effects of prolonged work on data entry speed and accuracy. Journal of Experimental Psychology: Applied, 10, 188–199.

Y109937.indb 294

10/15/10 11:04:26 AM

Data Entry • 295

Healy, A. F., & Sinclair, G. P. (1996). The long-term retention of training and instruction. In E. L. Bjork & R. A. Bjork (Eds.), Memory: Handbook of perception and cognition (pp. 525–564). New York: Academic Press. Healy, A. F., Wohldmann, E. L., & Bourne, L. E., Jr. (2005). The procedural reinstatement principle: Studies on training, retention, and transfer. In A. F. Healy (Ed.), Experimental cognitive psychology and its applications (pp. 59–71). Washington, DC: American Psychological Association. Healy, A. F., Wohldmann, E. L., Parker, J. T., & Bourne, L. E., Jr. (2005). Skill training, retention, and transfer: The effects of a concurrent secondary task. Memory & Cognition, 33, 1457–1471. Healy, A. F., Wohldmann, E. L., Sutton, E. M., & Bourne, L. E., Jr. (2006). Specificity effects in training and transfer of speeded responses. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32, 534–546. Hockey, G. R. J., & Earle, F. (2006). Control over the scheduling of simulated office work reduces the impact of workload on mental fatigue and task performance. Journal of Experimental Psychology: Applied, 12, 50–65. Kole, J. A., Healy, A. F., & Bourne, L. E., Jr. (2008). Cognitive complications moderate the speed-accuracy tradeoff in data entry: A cognitive antidote to inhibition. Applied Cognitive Psychology, 22, 917–937. Kole, J. A., Healy, A. F., & Buck-Gengler, C. J. (2005). Does number data entry rely on the phonological loop? Memory, 13, 388–394. Kolers, P. A., & Roediger, H. L. (1984). Procedures of mind. Journal of Verbal Learning and Verbal Behavior, 23, 425–449. Lockhart, R. S., & Craik, F. I. M. (1990). Levels of processing: A retrospective commentary on a framework for memory research. Canadian Journal of Psychology, 44, 87–112. McDaniel, M. A., Friedman, A., & Bourne, L. E., Jr. (1978). Remembering the levels of information in words. Memory & Cognition, 6, 156–164. McNamara, D. S., & Healy, A. F. (1995). A procedural explanation of the generation effect: The use of an operand retrieval strategy for multiplication and addition problems. Journal of Memory and Language, 34, 399–416. McNamara, D. S., & Healy, A. F. (2000). A procedural explanation of the generation effect for simple and difficult multiplication problems and answers. Journal of Memory and Language, 43, 652–679. Morris, C. D., Bransford, J. D., & Franks, J. J. (1977). Levels of processing versus transfer appropriate processing. Journal of Verbal Learning and Verbal Behavior, 16, 519–533. Nyberg, L., Eriksson, J., Larsson, A., & Marklund, P. (2006). Learning by doing versus learning by thinking: An fMRI study of motor and mental training. Neuropsychologica, 44, 711–717. Pachella, R. G. (1974). The interpretation of reaction time in human information processing research. In B. Kantowitz (Ed.), Human information processing: Tutorials in performance and cognition (pp. 41–82). New York: Halstead.

Y109937.indb 295

10/15/10 11:04:26 AM

296 • Healy, Kole, Wohldmann, Buck-Gengler, and Bourne

Paivio, A. (2007). Mind and its evolution: A dual-coding theoretical approach. Mahwah, NJ: Erlbaum. Richardson-Klavehn, A., & Bjork, R. A. (1988). Measures of memory. Annual Review of Psychology, 39, 475–543. Rickard, T. C., Healy, A. F., & Bourne, L. E., Jr. (1994). On the cognitive structure of basic arithmetic skills: Operation, order, and symbol transfer effects. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 1139–1153. Roediger, H. L., III, Weldon, M. S., & Challis, B. H. (1989). Explaining dissociations between implicit and explicit measures of retention: A processing account. In H. L. Roediger, III & F. I. M. Craik (Eds.), Varieties of memory and consciousness: Essays in honour of Endel Tulving (pp. 3–41). Hillsdale, NJ: Erlbaum. Schmidt, R. A., & Bjork, R. A. (1992). New conceptualizations of practice: Common principles in three paradigms suggest new concepts for training. Psychological Science, 3, 207–217. Schneider, V. I., Healy, A. F., & Bourne, L. E., Jr. (1998). Contextual interference effects in foreign language vocabulary acquisition and retention. In A. F. Healy & L. E. Bourne, Jr. (Eds.), Foreign language learning: Psycholinguistic studies on training and retention (pp. 77–90). Mahwah, NJ: Erlbaum. Schneider, V. I., Healy, A. F., & Bourne, L. E., Jr. (2002). What is learned under difficult conditions is hard to forget: Contextual interference effects in foreign vocabulary acquisition, retention, and transfer. Journal of Memory and Language, 46, 419–440. Schneider, V. I., Healy, A. F., Ericsson, K. A., & Bourne, L. E., Jr. (1995). The effects of contextual interference on the acquisition and retention of logical rules. In A. F. Healy & L. E. Bourne, Jr. (Eds.), Learning and memory of knowledge and skills: Durability and specificity (pp. 95–131). Thousand Oaks, CA: Sage. Shea, J. B., & Morgan, R. L. (1979). Contextual interference effects on the acquisition, retention, and transfer of a motor skill. Journal of Experimental Psychology: Human Learning and Memory, 5, 179–187. Tulving, E., & Thomson, D. M. (1973). Encoding specificity and retrieval processes in episodic memory. Psychological Review, 80, 352–373. Wohldmann, E. L., Healy, A. F., & Bourne, L. E., Jr. (2007). Pushing the limits of imagination: Mental practice for learning sequences. Journal of Experimental Psychology: Learning, Memory, and Cognition, 33, 254–261. Wohldmann, E. L., Healy, A. F., & Bourne, L. E., Jr. (2008). A mental practice superiority effect: Less retroactive interference and more transfer than physical practice. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34, 823–833.

Y109937.indb 296

10/15/10 11:04:26 AM

15

An Output-Bound Perspective on False Memories The Case of the Deese–Roediger– McDermott (DRM) Paradigm Asher Koriat, Ainat Pansky, and Morris Goldsmith

Recent years have seen an upsurge of interest in memory accuracy and distortion (Koriat, Goldsmith, & Pansky, 2000). This interest has been fueled by a host of real-life observations documenting severe memory distortions and fabrications, casting doubt on the faithfulness of eyewitness memory (Loftus, 1979, 2003). Some of the studies on memory distortion and false memories have examined naturally occurring memory errors that derive from the constructive nature of memory and are in line with the view originally advanced by Bartlett (1932). Other research has shown how memory is sensitive to a variety of influences that result in erroneous memories (for a review, see Pansky, Koriat, & Goldsmith, 2005). All in all, the view of memory that seems to emerge from the research literature is rather pessimistic regarding the ability of memory to deliver a veracious account of past events. This view is reflected, for example, in the title of Schacter’s (2001) book, The Seven Sins of Memory. In this article we focus on the phenomenon of false recall, and in particular on what is perhaps the most impressive laboratory manifestation of this phenomenon, documented by Roediger and McDermott (1995) and widely replicated since. In the Deese–Roediger–McDermott 297

Y109937.indb 297

10/15/10 11:04:26 AM

298 • Asher Koriat, Ainat Pansky, and Morris Goldsmith

(DRM) paradigm, a study list is presented, composed of words (e.g., thread, pin, eye, sew) that are associates of a critical nonpresented word (e.g., needle). Participants are found to exhibit high rates of false recall of the critical lure even when they are urged not to guess. The DRM paradigm has yielded a wealth of findings on false recall and has provided important insights about the processes underlying memory errors, the factors that affect the rate of these errors, and the extent to which such errors can be avoided (see Roediger, McDermott, & Robinson, 1998, for a review). In this chapter we address a seemingly simple question: What general message should the DRM research deliver to the scientific community and to the general public regarding the reliability of human memory? What do the findings tell us about the extent to which memory reports about past events can be trusted? It might be argued that the stimulus situation used in the DRM paradigm—the study of a list of 15 or so words that are all related to a single word that is itself absent—is not ecologically representative, and perhaps for that reason the implications of the DRM results for everyday memory are limited (e.g., Freyd & Gleaves, 1996). We shall put this argument aside for now, and examine the message that follows from the findings on the assumption that the DRM conditions are in fact representative of real-life memory situations. We should stress that until now, the focus and target of DRM research has been to clarify the mechanisms by which false memories may be created or avoided, rather than to convey a general message concerning the dependability of memory reports as a whole. Nevertheless, such a message seems to emanate—at least implicitly—from DRM findings. Consider the basic observation in DRM studies. The rate of false memory, measured by the probability of reporting the critical nonpresented word, is quite startling: On immediate testing, it is about the same as that of recalling studied words from the middle of the list (assumed to reflect retrieval from long-term memory; Roediger & McDermott, 1995; Schacter, Verfaellie, & Pradere, 1996). On delayed tests, it tends to be even higher than that of studied items (McDermott, 1996). What is more, false recalls are remarkably persistent over time: Whereas the proportion of correctly recalled items reveals the typical decline with retention interval, the probability of recalling the nonpresented item tends to remain high or even to increase (McDermott, 1996; Payne, Elie, Blackwell, & Neuschatz, 1996; Seamon, Lao, Kopecky et al., 2002a; Toglia, Neuschatz, & Goodwin, 1999). Also, whereas veridical recall tends to remain stable across repeated testing (following a single study presentation), false recalls tend to increase (Payne et al., 1996). Overall, it would appear that false memories in the DRM paradigm are no less

Y109937.indb 298

10/15/10 11:04:26 AM

An Output-Bound Perspective on False Memories • 299

frequent, and even more persistent, than true memories (see Roediger et al., 1998). What conclusion is a layperson or a judge to draw from these findings regarding the overall trustworthiness of memory reports? If witnesses are as likely to falsely remember a nonpresented item as they are to correctly remember a presented item, would it not seem natural to conclude that memory reports are worthless? To consider this question, we must first clarify the distinction between input-bound and outputbound memory assessment.

Input-Bound and Output-Bound Measures of Memory Performance Traditionally, measures of memory have been calculated conditional on the input, by expressing the number of items recalled or recognized as the proportion or percentage of the total number of items presented. The assessment of memory performance in terms of input-bound percent correct follows naturally from the storehouse metaphor that underlies much of traditional memory research (Koriat & Goldsmith, 1996a; Roediger, 1980). Koriat and Goldsmith (1994, 1996a, 1996b; for a review, see Goldsmith & Koriat, 2008) have referred to such measures of memory performance as quantity measures, because they are assumed to reflect the amount of presented or studied information that has been retained and is currently accessible. Memory performance, however, can also be assessed using outputbound measures, in which the number of correct items recalled is expressed as a proportion or percentage of the total number of items reported. Such measures reflect the accuracy of the memory report, in terms of the probability that a reported item is correct. Consider, for example, a participant who is presented with 50 words, and in a free-recall test reports 40 words, 36 of which are correct and 4 are commission errors. Input-bound memory quantity performance in that case is .72 (36/50); that is, 72% of the input-study items have been successfully recalled. In contrast, output-bound memory accuracy is .90 (36/40). That is, 90% of the outputrecalled items are, in fact, correct. This latter measure uniquely reflects the dependability of the information that is reported—the degree to which each reported item can be trusted to be correct. It is important to stress that output-bound accuracy and input-bound quantity measures can be distinguished operationally only when participants are given the option of free report. On forced-report tests, such as forced-choice recognition or (less commonly) forced recall,

Y109937.indb 299

10/15/10 11:04:26 AM

300 • Asher Koriat, Ainat Pansky, and Morris Goldsmith

participants are required to provide a substantive response to each and every test item; “pass” or “don’t know” responses are not allowed. Under such conditions, the input-bound quantity and output-bound accuracy percentages are necessarily equivalent, because the number of output items is the same as the number of input items (see Koriat & Goldsmith, 1994, 1996a). For example, if a participant gets 40 out of 50 choices correct on a forced-choice recognition test, we may conclude either that the probability of correctly recognizing an input item is .80 (input-bound quantity) or that the probability that a reported item is correct is .80 (output-bound accuracy). The difference between the two measures is entirely a matter of interpretation—whether one intends to measure quantity or accuracy. In contrast, on free-report tests, such as cued or free recall, participants are allowed to omit items from the memory report or, equivalently, to respond “don’t know” if they feel they do not remember an item. In this case, the number of output items may be far fewer than the number of input items. In this chapter we consider only DRM results obtained under free-report conditions, in which input-bound and output-bound measures differ operationally as well as conceptually. Although the focus of false memory research is on memory accuracy, the analyses of false recall performance in the DRM paradigm generally follow the logic underlying the computation of input-bound performance: They focus on the probability of recalling the critical, nonpresented item under various conditions, and compare this to the probability of recalling a studied list item. This focus perhaps reflects a treatment of the critical lure as if it were an implicit study item (see activation accounts of DRM performance; e.g., Roediger, Balota, & Watson, 2001; Roediger & McDermott, 2000; Roediger, Watson, McDermott, & Gallo, 2001). However, for an external observer, such as a courtroom judge, who is concerned by the phenomenon of false memory, the output-bound accuracy measure is arguably of greatest concern: To what extent can we depend on what a witness reports to be true? That is, what is the probability that an item of information reported by a witness is correct? If the witness reports that there was a knife at the scene of the crime, what is the likelihood that indeed a knife was present? This is the conditional probability captured by output-bound accuracy. What do we know about output-bound memory accuracy in general? A cursory examination of the literature suggests that the accuracy of what people report under free-report conditions is high—typically in the range between .80 and .95, even following long retention intervals (Ebbesen & Rienick, 1998; Koriat, 1993; Koriat & Goldsmith, 1994, 1996b; Poole & White, 1993). This impressive level derives largely from

Y109937.indb 300

10/15/10 11:04:27 AM

An Output-Bound Perspective on False Memories • 301

metacognitive monitoring and control processes that operate under free-report conditions. A number of studies (e.g., Kelley & Sahakyan, 2003; Koriat & Goldsmith, 1994, 1996b) have shown that when given the option to choose which items to report and which to withhold, people enhance their memory accuracy considerably in comparison to forced-report testing, and do so by screening out answers that are likely to be wrong. Koriat and Goldsmith (1996b) proposed a model of the strategic regulation of memory reporting in which rememberers monitor the likelihood that each candidate memory response is correct, and then compare that likelihood to a preset report criterion to determine whether to volunteer that response. Because the control decision is based on the subjective confidence associated with each item that comes to mind, and confidence is generally predictive of correctness, participants are generally effective in regulating their reporting so as to enhance accuracy when accuracy is at stake. Thus, for example, Koriat and Goldsmith (1994, Experiment 1) found that simply giving participants the option of free report allowed them to increase their output-bound accuracy substantially compared to forced-report accuracy, and giving them a stronger incentive led them to increase accuracy even further (Koriat & Goldsmith, 1996b, Experiment 3). In fact, in the latter experiment, fully 25% of the participants were successful in achieving 100% accuracy (see also Higham, 2007; Kelley & Sahakyan, 2003; Koriat & Goldsmith, 1996b)! Similarly high levels of accuracy under free-report conditions have been observed in children as young as eight years old (Koriat, Goldsmith, Schneider, & Nakash-Dura, 2001; Roebers & Fernandez, 2002). To what extent can people draw an output-bound conclusion on the basis of data from an input-bound performance? We examined this question with regard to DRM findings in an informal study conducted with Haifa University third-year psychology undergraduates who were enrolled in a research seminar on memory distortions. The students (n = 38) read the chapter by Roediger and Gallo (2004) summarizing DRM findings and theories, discussed the chapter (in groups of 12 or 13), and were finally asked to answer several multiple choice questions about the DRM phenomenon. Among the questions were two key ones that asked about the implications of the DRM findings from an output-bound perspective. The questions and the distribution of responses appear in Table 15.1. It can be seen that the correct answers to questions 1 and 2 were selected in only 8 and 11% of the cases, respectively. This informal example serves to illustrate the idea that people have difficulty shifting from an explicitly presented input-bound perspective to an output-bound perspective.

Y109937.indb 301

10/15/10 11:04:27 AM

302 • Asher Koriat, Ainat Pansky, and Morris Goldsmith Table 15.1 The Relative Frequency in Which Each Answer Was Chosen on the Two Key Questions in Our Survey (n = 38) (The Correct Answer Is Highlighted in Bold) Question 1. In the DRM paradigm, when a participant recalls a particular word, its chances of being a critical lure that had not appeared in the study list are: 2. It is possible that the DRM situation is not representative of everyday situations. However, assume that there are situations in real life that are very similar to the one experienced by a participant in a DRM experiment, and that a person in such a situation must later testify in a courtroom. In your opinion, which of the following statements best describes the practical implication of the studies conducted using the DRM paradigm with regard to the extent to which a judge (or juror) can rely on this person’s testimony?

Y109937.indb 302

Optional Answers a. Higher than its chances of being a studied word b. Lower than its chances of being a studied word c. More or less equal to its chances of being a studied word a. To the extent that a real-life situation resembles the DRM situation, a judge/ juror can rely on most of the information that an eyewitness provides as being correct. b. To the extent that a real-life situation resembles the DRM situation, a judge/ juror cannot rely on most of the information that an eyewitness provides as being correct. c. To the extent that a real-life situation resembles the DRM situation, a judge/ juror can rely on about half of the information that an eyewitness provides as being correct. d. To the extent that a real-life situation resembles the DRM situation, the eyewitness report is useless.

Relative Frequency 8% 8% 84%

11%

47%

37%

5%

10/15/10 11:04:27 AM

An Output-Bound Perspective on False Memories • 303

Output-Bound Memory Accuracy in Studies Using the DRM Paradigm We now review previous DRM studies with a focus on the outputbound accuracy of recall.1 As mentioned earlier, output-bound memory performance is calculated as the proportion of correctly recalled study items out of the total number of items recalled (i.e., reported). The number of correctly recalled study items is routinely reported in all published DRM studies or can be readily calculated. The total number of items recalled is sometimes reported, but if not, it too can be calculated by summing the number of correctly reported items and the number of commission errors. A remaining problem, however, is that although DRM articles always report the rate of a particular commission error—the so-called critical lure (e.g., sleep in a list of sleep-related words)—they often fail to provide information about other commission errors (e.g., dream, pillow, or other sleep-related or unrelated nonpresented words). Estimates of output-bound accuracy performance that do not take these noncritical commission errors into account will be inflated. Therefore, in the analysis of DRM results, presented below, we included only studies in which data are reported for both types of commission errors. Based on the information just mentioned, output-bound accuracy (OBA) was calculated for each study (or experimental condition) as follows:

OBA =

Ps ⋅ Ni ⋅ Nl Ps ⋅ Ni ⋅ Nl = Nr ( Ps ⋅ Ni ⋅ Nl ) + ( Pc ⋅ Nl ) + ( Nnc ⋅ Nl)

(15.1)

where: Nl = Total number of lists Ni = Number of items in each input list Nr = Total number of items recalled Ps = Probability of recalling the studied items Pc = Probability of recalling the critical lure Nnc = Number of noncritical commission errors Our analysis was based on a total of 108 published DRM studies that allowed the calculation of output-bound recall accuracy by (1) allowing participants the option of free report, and (2) reporting all of the relevant data (including Nnc). Many of these studies consisted of different experimental conditions in the same research report.2 Table 15.2 (see appendix at the end of this chapter) presents a list of these studies, a brief description of the conditions used in each study, and the main data that entered into the calculation of OBA. Not presented in the table, the

Y109937.indb 303

10/15/10 11:04:28 AM

304 • Asher Koriat, Ainat Pansky, and Morris Goldsmith

mean number of noncritical commission errors (Nnc) reported by each participant in each study was 2.7 (5% of all reported items), whereas the mean number of critical lure errors was 3.2 (6% of all reported items). Thus, the rate of producing the single critical lure item was somewhat higher than that of all other commission errors combined. Figure 15.1 presents the distribution of OBA scores across the 108 studies. One result is particularly striking: Output-bound accuracy exceeded .90 in the majority of studies, and it was rarely lower than .85. In fact, mean OBA was .89 (SD = 0.10) and the median was .92. We also calculated mean weighted OBA by weighting each study mean by the number of participants on which it was based. This mean was also .89 (SD = 0.09). Note that these values were obtained across all studies, including some that used children, older adults, and amnesic patients, as will be discussed below. If we limit our analysis to the “standard” DRM conditions, such as those originally used by Roediger and McDermott (1995)—examining young adults’ immediate recall of each list following intentional encoding (N = 69)—an item recalled from a DRM list has a mean likelihood of over .93 of being correct. We compared these values to two input-bound measures. The first is the input-bound quantity (IBQ) score—the probability of recalling a studied item (equivalent to Ps in Equation 15.1). Mean IBQ was .50 (SD = 0.18), and median IBQ was .55. The weighted IBQ mean was .52 50

Frequency

40 30 20 10 0

0.40

0.50

0.60

0.70

0.80

0.90

1.00

Output-Bound Accuracy (OBA)

Figure 15.1 Frequency distribution of mean output-bound accuracy (OBA) performance across the 108 studies.

Y109937.indb 304

10/15/10 11:04:29 AM

An Output-Bound Perspective on False Memories • 305

(SD = 0.18). Thus, as expected, OBA was considerably higher than IBQ. That is, even though participants may not remember much of the input information, the vast majority of what they do report is correct. The second measure, Pc, is the probability of recalling a particular nonstudied item—the critical lure. Although not a true input-bound index, as noted earlier, Pc is based on the same underlying logic, which is most easily seen if one treats the critical lure as an implicit study item. This is the measure that has been the focus of most DRM studies. Mean Pc across the 108 studies was .34 (SD = 0.15), the median Pc was also .34, and the weighted Pc mean was .35 (SD = 0.15). Thus, across these studies, the likelihood of falsely recalling the critical lure was somewhat lower than that of correctly recalling a studied item, but was still quite high. What is the relationship between the OBA and Pc measures? One might expect these two measures to be inversely related, with the former indexing accuracy and the latter indexing error rate. However, the correlation between the two measures across the DRM studies was, in fact, virtually zero (r = –.02). Inspection of the bivariate distribution in Figure 15.2 suggests that indeed, for the majority of studies (n = 73), those in which OBA was .90 or more, the correlation is negative (r = –.58), as would be expected. In contrast, for the relatively small number of studies in which OBA was below .90, the correlation is positive (r = .47, n = 32); that is, a higher rate of critical lure intrusions tends to be 1.00 Output-Bound Accuracy (OBA)

0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 Proportion Recall of Critical (Pc)

Figure 15.2 The bivariate distribution of the proportion of recall of the critical nonpresented item (Pc) and the mean output-bound accuracy (OBA) performance scores across the 108 studies.

Y109937.indb 305

10/15/10 11:04:30 AM

306 • Asher Koriat, Ainat Pansky, and Morris Goldsmith

associated with better output-bound accuracy. This latter correlation may be mediated by differences in IBQ, as will be discussed below. The comparison between OBA and Pc illustrates the contrast between two different perspectives for the examination of DRM results. The first, which is characteristic of all DRM studies, is an input-bound perspective that focuses on the rate at which a predesignated, critical lure is reported. The second is an output-bound perspective, which focuses on the likelihood that each reported item is correct. It is the latter perspective that should be of interest to a courtroom judge or to any external observer who is concerned with the dependability of the memory report as a whole. Thus, it is important to note that despite the remarkable success of the DRM paradigm in inducing a substantial rate of false recall, as reflected in a mean Pc of .34, the overall output-bound accuracy (dependability) of the participants’ memory reports is nevertheless very high, with more than two-thirds of the sampled studies yielding an OBA performance of .90 or higher. A detailed examination of the results presented in Table 15.2 highlights some interesting trends in OBA across different populations and conditions.3 First, as shown in Figure 15.3, young adults exhibited higher accuracy than older adults, children (aged 10 or less), and amnesic patients.4 As shown in Figure 15.4a, using longer retention intervals

Output-Bound Accuracy (OBA)

0.95 0.90

0.90

0.86

0.85

0.84 0.81

0.80 0.75 0.70

S N

93 2894

10 387

young adults older adults

2 37

3 20

children

amnesic patients

Population

Figure 15.3 Mean output-bound accuracy (OBA) performance for studies using young adults, older adults, children, and amnesic patients. S = number of studies, N = number of participants across these studies. Error bars represent ± 1 SEM.

Y109937.indb 306

10/15/10 11:04:31 AM

1.00

Output-Bound Accuracy (OBA)

Output-Bound Accuracy (OBA)

An Output-Bound Perspective on False Memories • 307

0.90

0.90 0.80 0.70 0.60

0.51

0.50 0.40 0.30

105 3253

S N

3 85

0.96 0.94

0.93

0.92 0.90 0.88

0.88

0.86 0.84

S N

31 1102

Visual

Immediate testing Delayed testing Retention Interval

77 2236

Auditory Study Modality

(a)

Output-Bound Accuracy (OBA)

0.95 0.90

0.92

0.90

0.90

0.85

0.82

0.80

0.82

0.85 0.80

0.80

0.75 0.70

Output-Bound Accuracy (OBA)

(b)

S N

102 3248

2 40

Intentional learning

4 50

Incidental Incidental learning deep learning encoding shallow encoding

Encoding Instructions

(c)

0.75

S N

83 2460

25 878

Recall after each list

Recall after all lists List Recall

(d)

Figure 15.4 (a) Mean output-bound accuracy (OBA) performance for studies using immediate testing versus those using delayed testing. (b) Visual presentation versus auditory presentation. (c) Intentional learning versus incidental learning (deep or shallow encoding). (d) Studies in which recall was measured after each list versus those in which it was assessed after the study of all lists (Figure 15.5b). S = number of studies, N = number of participants across these studies. Error bars represent ± 1 SEM.

between study and test (ranging from 48 hours to 2 months) reduced OBA considerably, compared to immediate testing (cf. Goldsmith, Koriat, & Pansky, 2005). Additional factors that seem to influence OBA are presentation modality (visual vs. auditory; see Figure 15.4b), encoding instructions (intentional vs. incidental learning at different levels of processing; see Figure 15.4c), and whether each list is tested individually following its presentation or in a single joint test following the presentation of all the lists (see Figure 15.4d). These results are suggestive of the many factors that affect the dependability of memory reports, and which deserve to be studied more systematically with a focus on output-bound memory performance (see Koriat et al., 2000, for discussion).

Y109937.indb 307

10/15/10 11:04:31 AM

308 • Asher Koriat, Ainat Pansky, and Morris Goldsmith

As might be expected, the ordering of the different populations in terms of OBA performance corresponds roughly with their ordering in terms of IBQ performance. Thus, for example, IBQ performance for young adults, older adults, children, and amnesic patients was .51, .47, .42, and .28, respectively. A similar pattern exists for the results presented in Figure 15.4a to d. In fact, as seen in Figure 15.5, OBA and IBQ are strongly and positively correlated across the DRM studies (r = .72). This is to be expected because of the mathematical relationship between the two variables (both have the same numerator—number of correct reported answers). In addition, however, this correlation may reflect a positive relationship between memory retention and memory monitoring. Indeed, several factors that improve retention (e.g., increasing the number of presentations of the study lists, increasing item presentation duration, allowing full vs. divided attention at study) have been found to improve memory monitoring as well (e.g., Benjamin, 2001; Kelley & Sahakyan, 2003; McDermott & Watson, 2001). Returning now to the relationship between OBA and Pc (Figure 15.2), part of the negative relationship between these two variables can also be explained mathematically: Increasing Pc increases the denominator of OBA (number of reported items) without increasing the numerator (number of correct reported items). However, we noted earlier that whereas this expected negative relationship holds for the relatively large

Output-Bound Accuracy (OBA)

1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 Input-Bound Quantity (IBQ)

Figure 15.5 The bivariate distribution of input-bound quantity (IBQ) performance and outputbound accuracy (OBA) performance scores across the 108 studies.

Y109937.indb 308

10/15/10 11:04:32 AM

An Output-Bound Perspective on False Memories • 309 1.00 Input-Bound Quantity (IBQ)

0.90 0.80 0.70 0.60

Median = 0.55

0.50 0.40 0.30 0.20 0.10 0.00 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 Proportion Recall of Critical (Pc)

Figure 15.6 The bivariate distribution of the proportion of recall of the critical nonpresented item (Pc) and the input-bound quantity (IBQ) performance across the 108 studies. Indicated in the figure is the median of IBQ scores.

number of studies that yielded high levels of OBA (.90 or above), a surprising positive relationship between OBA and Pc is observed across the studies yielding lower levels of OBA. We speculate that the direction of the OBA–Pc relationship is in fact moderated by IBQ. Figure 15.6 shows that the overall relationship between Pc and IBQ is also nonlinear (r = .12) and inverted U shaped, with a positive relationship observed across the studies exhibiting below-median levels of IBQ (r = .55), and a negative correlation observed across the studies exhibiting above-median levels of IBQ (r = –.43). The latter correlation would seem to derive from the positive relationship between memory retention and memory monitoring assumed earlier in explaining the positive IBQ–OBA relationship. Indeed, in a multiple regression analysis performed across 55 DRM lists, Roediger, Watson, et al. (2001) observed a similar negative correlation (r = –.43) between veridical recall of list items (IBQ) and false recall of the critical item (Pc). They took this correlation to indicate that the better encoded list items are, the more easily they can be distinguished from the illusory critical item. Thus, as IBQ (item retention) increases from moderate to high levels, improved monitoring would cause Pc to decrease and OBA to increase, accounting for the expected negative correlation between them. For the same reason (dependence of memory monitoring on retention), one would expect OBA to increase and Pc to decrease in moving

Y109937.indb 309

10/15/10 11:04:32 AM

310 • Asher Koriat, Ainat Pansky, and Morris Goldsmith

from low to moderate levels of IBQ. Instead, however, OBA and Pc jointly increase, yielding the surprising positive correlation between them. We suspect that the anomalous increase in Pc stems from the dependence of critical lure production on implicit activations (e.g., Gallo & Roediger, 2002; Roediger, Balota, et al., 2001; Roediger, Watson, et al., 2001) or gist memory (e.g., Brainerd, Wright, Reyna, & Mojardin, 2001): Conditions that yield very low levels of study item memory (IBQ) may also yield very low levels of gist or critical lure accessibility (Pc), so that the production of both studied items and critical lure items would be jointly impaired. Moreover, if the factors that lead to decreased retention (e.g., longer retention intervals) induce a sharper decline in memory for the studied items than for the critical lures (e.g., McDermott, 1996; Payne et al., 1996; Toglia et al., 1999), a reduced critical lure rate will be accompanied by reduced OBA, because the numerator of OBA (number of recalled study items) is decreasing at a faster rate than is the denominator (number of reported items, which includes gist-based lures; see the data of Seamon, Luo, Kopecky, et al., 2002, Table 15.2). Finally, although fewer critical lure commission errors were produced under conditions that yielded low IBQ, the number of noncritical commission errors actually increased (an average of 2.1 in studies with above-median IBQ vs. 3.3 in studies with below-median IBQ).5 This difference, too, would contribute to the dissociation between OBA and Pc, and again suggests the possible role of gist memory: Because gist memory may support the production of critical lure errors, its decrease over time should reduce the rate of such errors, while at the same time increasing the rate of other, gist-inconsistent commission errors, which otherwise might have been edited out. Of course, this possible account of the OBA–Pc relationship observed across the 108 DRM studies is quite tentative and should be treated primarily as a source of future hypotheses. Regardless of the reason for divergence between OBA and Pc, the point remains that these measures tell a very different story about the overall reliability of memory in the DRM studies.

Discussion Studies using the DRM paradigm have produced extensive evidence that false memories can be readily induced in the laboratory, and that such memories are often endorsed with great confidence. What are the implications of this evidence regarding the trustworthiness of memory reports in general? The high probability of recalling nonpresented words might lead one to place little faith in memory reports. However,

Y109937.indb 310

10/15/10 11:04:32 AM

An Output-Bound Perspective on False Memories • 311

our analyses indicate that even under the rather unusual conditions of the DRM paradigm, memory reports as a whole are highly reliable: Each reported item has about a .90 probability of being correct. In what follows, we comment on three issues: the distinction between input-bound and output-bound perspectives on memory performance, the issue of memory accuracy in the DRM paradigm, and finally, the issue of ecological representativeness of DRM findings. The Distinction Between Input-Bound and Output-Bound Perspectives We distinguished three different ways of assessing memory performance in general, and in the DRM paradigm specifically: 1. Input-bound quantity (or accuracy) performance—the probability that any individual studied item will be produced 2. Output-bound accuracy performance—the probability that any individual reported item will be correct 3. Critical error performance—the probability that a particular prespecified commission error will be made We argued that DRM research has focused on 3, comparing it with 1, essentially ignoring 2. By doing so, the findings are liable to be misinterpreted with respect to 2. The input-bound perspective, reflected in the comparison between 1 and 3, is appropriate given the experimental goals of researchers who use the DRM paradigm to study the mechanisms responsible for this particular type of false memory. For those who are interested in gaining information about the overall dependability of memory reports, however, the output-bound perspective is the one that is directly relevant. Thus, from the vantage point of a judge, police officer, or for that matter, any recipient of information drawn from another person’s memory, the crucial question may be: To what extent can one count on each reported item of information to be correct? As we have shown, the same data, examined from these different perspectives, can lead to very different conclusions. The confusion between the two perspectives is not unlikely to occur, as suggested by research on the inversion of conditional probabilities (e.g., Sherman, McMullen, & Gavanski, 1992). In fact, this confusion may be responsible for our students’ faulty conclusions (see Table 15.1) that within the specific conditions of the DRM paradigm, a reported item is as likely, if not more likely, to be wrong than right. There is no question that the results obtained in the DRM paradigm are striking in showing that under certain conditions, however contrived these may be, participants can be made to recall with a very high probability a particular item that was not presented. As noted earlier,

Y109937.indb 311

10/15/10 11:04:32 AM

312 • Asher Koriat, Ainat Pansky, and Morris Goldsmith

much of the storehouse-guided study of memory has paid little attention to commission errors (Koriat et al., 2000; Roediger, 1996). The DRM paradigm, in contrast, has turned the floodlight precisely on such errors, singling them out as a target of study. Indeed, Roediger and McDermott’s (1995) paradigm-setting work, and the vast amount of research and interest that it sparked, has had an immense impact in advancing the study of memory accuracy and error. The message delivered by this research, however, tends to emphasize, explicitly or implicitly, the fragility of memory and the ease with which false memories can be induced. Our analysis indicates that notwithstanding the impressive findings produced by the DRM research, memory is by and large quite accurate even in this paradigm. Two possible reservations may be raised regarding the high level of output-bound accuracy observed in the DRM paradigm. The first is that this result simply stems from the fact that DRM lists are constructed so that they converge on a single nonpresented item. Therefore, the frequency of producing that single critical commission error is likely to be much lower than the summed frequencies of the many studied items, thereby yielding a high overall OBA percentage. The low ratio of commission errors to studied items in the DRM paradigm, however, is not simply a methodological artifact, because it apparently reflects the large amount of “ammunition” (i.e., associated study items) needed to induce a specific commission error. Of course, the ratio of commission errors to studied items might differ both within (as discussed earlier) and between experimental paradigms, and in fact, there are indications that OBA is somewhat lower in some free-report paradigms that are used to induce memory errors (e.g., Kelley & Sahakyan, 2003; Pansky & Koriat, 2004; Sommers & Lewis, 1999; Toglia et al., 1999). Nevertheless, the fact remains that OBA is actually very high in a paradigm whose implicit message is that memory reports are not to be trusted. A second reservation is that in our analyses we adopted the assumption underlying the input-bound, quantity-oriented approach to memory: that all items are interchangeable, that is, equivalent, as far as memory performance is concerned (see Koriat & Goldsmith, 1996a). Indeed, the assessment procedures we used, like those characteristic of much of memory research, embody the assumption that what matters is not what is remembered or misremembered, but rather how much. However, one may envisage situations in which certain commission errors are especially critical even if their contribution to the overall output-bound accuracy of the report is negligible. For example, many elements of a crime episode might converge in suggesting the presence of

Y109937.indb 312

10/15/10 11:04:33 AM

An Output-Bound Perspective on False Memories • 313

a particular weapon, even though no such weapon was there. A single commission error concerning the presence of the weapon could have tragic consequences, even if a large amount of correct information was also remembered and reported. Clearly, as we readily acknowledge, in applied settings there is more to the assessment of false recall than is captured by the output-bound accuracy percentage alone (for related discussion, see Fisher, 1996; Goldsmith & Koriat, 1996). Output-Bound Memory Accuracy and Its Underlying Mechanisms The positive message delivered by the present analysis is that when people attempt to provide an accurate account of what has occurred, their freely volunteered memory reports are by and large correct. One reason for this is that output-bound memory accuracy performance is—to a large extent—under strategic control: Regardless of how much information one “remembers,” one can still boost one’s accuracy to relatively high levels by volunteering only information that one is sure about and screening out information that is likely to be wrong. As noted earlier, according to Koriat and Goldsmith’s (1996b) model, metacognitive monitoring and control processes play a crucial role in this regulation. Hence, the level of accuracy that is attained depends heavily on the effectiveness of these processes. In terms of that model, the occurrence of false recalls in the DRM paradigm reflects not only a memory failure but also a failure of metamemory (Roediger & Gallo, 2004). Although the structure of the DRM list increases the likelihood that the critical lure will come to mind as a response candidate during recall, in principle, effective monitoring and control processes could operate to reject that candidate once it comes to mind. For example, Brainerd, Reyna, Wright, and Mojardin (2003) postulate an editing process called “recollection rejection,” in which distracters that are consistent with the gist of a presented item are rejected when the verbatim trace of that item is accessed. However, as noted by Gallo (2004), critical DRM errors cannot usually be identified by such a process, because accessing the verbatim traces of some or even most of the studied items does not exclude (disqualify) the possible co-occurrence of the critical lure item. Apparently, then, the DRM paradigm creates a situation in which monitoring and control processes are relatively ineffective in editing out the critical lure. Indeed, warning participants about the DRM effect and instructing them to avoid reporting nonstudied but related words yields only negligible reductions in false recall and recognition of the critical lures (Gallo, Roediger, & McDermott, 2001b; McDermott & Roediger, 1998; Neuschatz, Payne, Lampinen, & Toglia,

Y109937.indb 313

10/15/10 11:04:33 AM

314 • Asher Koriat, Ainat Pansky, and Morris Goldsmith

2001). Moreover, recollections of the critical lures are often experienced as phenomenologically compelling (e.g., Norman & Schacter, 1997; Roediger & McDermott, 1995), and are therefore volunteered under free-report conditions. Koriat and Goldsmith’s (1996b) results suggest that rememberers’ decision to volunteer or withhold a candidate answer in free recall depends almost entirely on subjective confidence in its correctness, with within-participant gamma correlations between confidence and volunteering averaging over .95 in many experiments (see also Kelley & Sahakyan, 2003). Thus, the observation that participants often endorse the critical lure with high confidence (Roediger & McDermott, 1995; Payne et al., 1996) implies that these errors are unlikely to be selectively omitted from rememberers’ memory reports. In regulating the accuracy of what they report, however, rememberers have more options available to them than what has been discussed so far. Another means of strategic regulation that is perhaps more generally available in real-life memory situations is control over the precision or grain size of the information that is reported. For example, rememberers may report “in the late afternoon” rather than “at 4:00,” or a “fruit” rather than an “apple” (see Goldsmith et al., 2005; Goldsmith, Koriat, & Weinberg-Eliezer, 2002; Weber & Brewer, 2008). Neisser (1988) observed that when answering open-ended questions, participants tend to provide answers at a level of generality at which they are not likely to be mistaken. Goldsmith et al. (2002, 2005) found that when participants are allowed to control the grain size of their report, they do so in a strategic manner, sacrificing informativeness (degree of precision) for the sake of accuracy when their subjective confidence in the more precise, informative answer is low (but for a somewhat more complex view, see Ackerman & Goldsmith, 2008). Control over grain size is denied in the DRM paradigm, as well as in almost all list-learning memory tasks. If such control were allowed in DRM studies, however, OBA would undoubtedly be even higher than what was observed in our earlier analysis. The irony with regard to the DRM paradigm is particularly poignant: Gist is used in the DRM paradigm to create memory errors, whereas in most real-life situations rememberers use gist to avoid them. Indeed, some, if not most, of the commission errors in memory studies represent partial recalls, as when a person recalls pants instead of jeans (Pansky & Koriat, 2004) or when some information is retained but its source is lost (see Mitchell & Johnson, 2000, for a review). Clearly, the definition of what constitutes false memory is not simple, as reflected in the criteria that must sometimes be set by the experimenter in testing and scoring memory performance (e.g., Ebbesen & Rienick, 1998; Koriat, Levy-Sadot, Edry, & de Marcas, 2003).

Y109937.indb 314

10/15/10 11:04:33 AM

An Output-Bound Perspective on False Memories • 315

In sum, the DRM paradigm appears to yield a troublesome combination of memory and monitoring impairment. However, this combination occurs specifically for the critical lure, as suggested by the observation that the rate of producing this particular commission error was higher than the overall rate of all other commission errors. Therefore, the output-bound accuracy for the list as a whole remains quite high. Furthermore, even the critical lure errors produced in this paradigm are in some sense “correct”—at a higher, gist level. This raises further questions regarding memory accuracy, and how it should be assessed (see also Spence, 1982). The Issue of Ecological Representativeness At the opening of this chapter, we stated that we would put the “ecological representativeness” issue aside and examine performance in the DRM paradigm as if it is in fact representative of real-life memory situations. In this final section, we briefly address the issue of ecological representativeness. Some of the recent work on false memory illustrates a distinction between two sometimes conflicting objectives of cognitive research (cf. Chomsky, 1965). One objective is to describe the state of affairs in the real world. For example, researchers may wish to delineate the strengths and weaknesses of cognitive abilities, to specify how memory performance changes as a function of retention interval, to evaluate the veracity of human memory under different conditions, and to describe the various biases that affect performance. Researchers with this agenda, by definition, should restrict themselves to conditions that are ecologically representative. The plea for representative research design has been voiced most strongly by Brunswik (1956; see also Gibson, 1979; Gigerenzer, Hoffrage, & Kleinbölting, 1991). The question of whether data should be collected only in the laboratory or also in naturalistic settings has been a subject of some dispute (e.g., Banaji & Crowder, 1989; Neisser, 1988; see special issue of American Psychologist, 1991), but it is clear that to be descriptive of the actual magnitudes of variables and their relationships in the real world, the experimental conditions must be representative of conditions and variations in the real world. The second objective is theoretically oriented: It is aimed at explaining the phenomena under investigation and gaining an understanding of their underlying mechanisms. This objective is illustrated by research that attempts to clarify the processes that cause forgetting, or those that underlie false memories. Research carried out within this agenda need not respect the plea for representative design and need not confine itself to the conditions that approximate real-world settings. In fact, it is sometimes precisely under extreme or deviant

Y109937.indb 315

10/15/10 11:04:33 AM

316 • Asher Koriat, Ainat Pansky, and Morris Goldsmith

conditions that one is best able to gain an understanding of the normal processes that occur under more natural conditions. As noted by Roediger (1996), this approach has been quite effective in the area of perception, in which the study of perceptual illusions induced under unusual conditions such as the trapezoidal window, or the distorted room (see Ittelson, 1952), has revealed important principles about perception in general that are not transparent when ordinary perceptual processes are examined directly. Similarly, the many studies of memory and metamemory illusions (see Koriat, 2007, 2008) provide valuable information precisely because they succeed in decoupling processes and effects that generally go hand in hand under normal conditions. Although much of that research has focused on phenomena testifying to the existence of illusions and errors, other research has investigated cases of exceptionally good memory performance (e.g. Ericsson, Charness, Feltovich, & Hoffman, 2006), which has also contributed to our understanding of basic cognitive processes. Clearly, the significant contribution of the extensive research with the DRM paradigm lies within the explanatory agenda—providing important insights about the processes that lead to false memories and the subjective qualities of veridical and false memories. However, while the focus on ecologically deviant conditions can have important theoretical benefits, it also holds the danger that the research results might be misinterpreted as having descriptive relevance. This danger is twofold. First, the frequent sampling of conditions that yield illusions and errors may create the unintended impression that the frequency of the phenomena in the experimental research literature mirrors their frequency in the real world. This impression may be amplified by the salience of studies showing surprisingly high levels of false memory. Thus, for example, the availability heuristic (Tversky & Kahneman, 1973) could lead memory researchers, as well as the general public, to form a biased judgment regarding memory performance that overemphasizes the sins of memory—the preponderance of error, illusions, and false memory. Second, the focus on distortion and error, and the ensuing challenges in clarifying the mechanisms that may engender or prevent such frailties, may lead to a preoccupation with the explanatory objective at the expense of the descriptive objective. As the results of the present study suggest, theoretically oriented researchers who do not also attend to the descriptive message of their research may unintentionally convey an incomplete or distorted impression of memory performance in real-world contexts.

Y109937.indb 316

10/15/10 11:04:33 AM

An Output-Bound Perspective on False Memories • 317

Having said that, we should emphasize that the mere fact that people can be made to misremember, and to do so with high confidence, is a message of great practical importance with regard to the way in which witness testimony should be treated, as well as in other memory contexts. However, it would seem that the time has come to try to refine that message for the sake of legal practitioners and others. What is the likelihood that any given witness statement is true or false? The outputbound accuracy measure applied to results obtained in both explanatory- and descriptive-oriented research, including the DRM paradigm, suggests that under conditions of free reporting, the dependability of reported information is quite high. Nevertheless, much work remains to be done in identifying the conditions that increase false reports, so that these can be taken into account in attempting to ascertain the reliability of memory in specific reporting contexts.

Acknowledgments This work was supported by a grant from the European Commission (FP6-NEST: EYEWITMEM; 43460). We thank Einat Tenenboim for her valuable help in collecting the data. Address correspondence to Asher Koriat, Department of Psychology, University of Haifa, Haifa 31905, Israel. E-mail: [email protected].

Endnotes 1. Because of the traditional concern with memory quantity, the common practice in memory research has been to ignore commission errors altogether (see Roediger, 1996). As a consequence, it is not possible to determine the output-bound accuracy observed in many of the reported studies. However, with the growing interest in memory accuracy, an increasing number of studies either report output-bound measures directly or provide the data from which these measures can be calculated. The latter generally applies to studies reporting recall results in the context of the DRM paradigm. 2. The 108 studies include all those that we were able to identify until 2007. Pc was not reported for three of these studies (studies 9 to 11 in Table 15.2). Therefore, the analyses involving Pc are based on 105 studies. 3. Note that for our current purposes inferential statistics are not used or needed, because we are limiting ourselves to a comparison of trends involving OBA, IBQ, and Pc in the current (very large) sample of DRM studies. In any case, the use of meta-analytic inferential statistics was precluded by the preponderant lack of information regarding the variance in OBA that was observed within the individual studies.

Y109937.indb 317

10/15/10 11:04:33 AM

318 • Asher Koriat, Ainat Pansky, and Morris Goldsmith

4. Note that the high variability in the OBA scores of amnesic patients can be partially attributed to the pooling together of different types of amnesia (e.g., amnesic patients with frontal lobe damage vs. patients with damage in the medial temporal lobe or diencephalic region), each of which exhibits a different pattern of performance in the DRM paradigm (see Melo, Winocur, & Moscovitch, 1999; Schacter et al., 1996). 5. Note that this difference also holds when Nnc is calculated as a percentage of the number of studied items (mean = 1.6% in studies with abovemedian IBQ vs. 2.7% in studies with below-median IBQ).

References Ackerman, R., & Goldsmith, M. (2008). Control over grain size in memory reporting—With and without satisfying knowledge. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34(5), 1224–1245. American Psychologist. (1991). 46(1). Banaji, M. R., & Crowder, R. G. (1989). The bankruptcy of everyday memory. American Psychologist, 44, 1185–1193. Bartlett, F. C. (1932). Remembering: A study in experimental and social psychology. Cambridge, UK: Cambridge University Press. Basden, B. H., Basden, D. R., Thomas, R. L., & Souphasith, S. (1998). Memory distortion in group recall. Current Psychology: Developmental, Learning, Personality, Social, 16, 225–246. Benjamin, A. S. (2001). On the dual effects of repetition on false recognition. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27, 941–947. Brainerd, C. J., Reyna, V. F., Wright, R., & Mojardin, A. H. (2003). Recollection rejection: False-memory editing in children and adults. Psychological Review, 110, 762–784. Brainerd, C. J., Wright, R., Reyna, V. F., & Mojardin, A. H. (2001). Conjoint recognition and phantom recollection. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27, 307–327. Brunswik, E. (Ed.). (1956). Perception and the representative design of psychological experiments (2nd ed.). Berkeley: University of California Press. Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press. Dehon, H., & Bredart, S. (2004). False memories: Young and older adults think of semantic associates at the same rate, but young adults are more successful at source monitoring. Psychology and Aging, 19, 191–197. Dewhurst, S. A., & Robinson, C. A. (2004). False memories in children: Evidence for a shift from phonological to semantic associations. Psychological Science, 15, 782–786. Ebbesen, E. B., & Rienick, C. B. (1998). Retention interval and eyewitness memory for events and personal identifying attributes. Journal of Applied Psychology, 83, 745–762.

Y109937.indb 318

10/15/10 11:04:33 AM

An Output-Bound Perspective on False Memories • 319

Ericsson, K. A., Charness, N., Feltovich, P. J., & Hoffman, R. R. (Eds.). (2006). The Cambridge handbook of expertise and expert performance. New York: Cambridge University Press. Fisher, R. P. (1996). Misconceptions in design and analysis of research with the cognitive interview. Psycholoquy, 7. Retrieved from witness-memory.12.fisher Freyd, J. J., & Gleaves, D. H. (1996). “Remembering” words not presented in lists: Relevance to the current recovered/false memory controversy. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22, 811–813. Gallo, D. A. (2004). Using recall to reduce false recognition: Diagnostic and disqualifying monitoring. Journal of Experimental Psychology: Learning, Memory, and Cognition, 30, 120–128. Gallo, D. A., McDermott, K. B., Percer, J. M., & Roediger, H. L. (2001). Modality effects in false recall and false recognition. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27, 339–353. Gallo, D. A., & Roediger, H. L. (2002). Variability among word lists in eliciting memory illusions: Evidence for associative activation and monitoring. Journal of Memory and Language, 47, 469–497. Gallo, D. A., Roediger, H. L., & McDermott, K. B. (2001). Associative false recognition occurs without strategic criterion shifts. Psychonomic Bulletin and Review, 8, 579–586. Geraerts, E., Smeets, E., Jelicic, M., van Heerden, J., & Merckelbach, H. (2005). Fantasy proneness, but not self-reported trauma is related to DRM performance of women reporting recovered memories of childhood sexual abuse. Consciousness and Cognition: An International Journal, 14, 602–612. Gibson, J. J. (1979). The ecological approach to visual perception. Boston: Houghton Mifflin and Company. Gigerenzer, G., Hoffrage, U., & Kleinbölting, H. (1991). Probabilistic mental models: A Brunswikian theory of confidence. Psychological Review, 98, 506–528. Goldsmith, M., & Koriat, A. (1996). The assessment and control of memory accuracy: Commentary on Memon & Stevenage on witness-memory. Psycholoquy, 7. Retrieved from witness-memory.9.goldsmith Goldsmith, M., & Koriat, A. (2008). The strategic regulation of memory accuracy and informativeness. In A. Benjamin & B. Ross (Eds.), Psychology of learning and motivation: Memory use as skilled cognition (Vol. 48, pp. 1–60). San Diego, CA: Elsevier. Goldsmith, M., Koriat, A., & Pansky, A. (2005). Strategic regulation of grain size in memory reporting over time. Journal of Memory and Language, 52, 505–525. Goldsmith, M., Koriat, A., & Weinberg-Eliezer, A. (2002). Strategic regulation of grain size memory reporting. Journal of Experimental Psychology: General, 131, 73–95. Harbluk, J. L., & Weingartner, H. J. (1997). Memory distortions in detoxified alcoholics. Brain and Cognition, 35, 328–330.

Y109937.indb 319

10/15/10 11:04:33 AM

320 • Asher Koriat, Ainat Pansky, and Morris Goldsmith

Higham, P. A. (2007). No Special K! A signal-detection framework of the strategic regulation of memory accuracy. Journal of Experimental Psychology: General, 136, 1–22. Intons-Peterson, M. J., Rocchi, P., West, T., McLellan, K., & Hackney, A. (1999). Age, testing at preferred or nonpreferred times (testing optimality), and false memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25, 23–40. Ittelson, W. H. (Ed.). (1952). The Ames demonstrations in perception; a guide to their construction and use. Princeton, NJ: Princeton University Press. Kelley, C. M., & Sahakyan, L. (2003). Memory, monitoring, and control in the attainment of memory accuracy. Journal of Memory and Language, 48, 704–721. Koriat, A. (1993). How do we know that we know? The accessibility model of the feeling of knowing. Psychological Review, 100, 609–639. Koriat, A. (2007). Metacognition and consciousness. In P. D. Zelazo, M. Moscovitch, & E. Thompson (Eds.), The Cambridge handbook of consciousness (pp. 289–325). New York: Cambridge University Press. Koriat, A. (2008). Subjective confidence in one’s answers: The consensuality principle. Journal of Experimental Psychology: Learning, Memory and Cognition, 34, 945–959. Koriat, A., & Goldsmith, M. (1994). Memory in naturalistic and laboratory contexts: Distinguishing the accuracy-oriented and quantity-oriented approaches to memory assessment. Journal of Experimental Psychology: General, 123, 297–315. Koriat, A., & Goldsmith, M. (1996a). Memory metaphors and the real-life/ laboratory controversy: Correspondence versus storehouse conceptions of memory. Behavioral and Brain Sciences, 19, 167–228. Koriat, A., & Goldsmith, M. (1996b). Monitoring and control processes in the strategic regulation of memory accuracy. Psychological Review, 103, 490–517. Koriat, A., Goldsmith, M., & Pansky, A. (2000). Toward a psychology of memory accuracy. Annual Review of Psychology, 51, 481–537. Koriat, A., Goldsmith, M., Schneider, W., & Nakash-Dura, M. (2001). The credibility of children’s testimony: Can children control the accuracy of their memory reports? Journal of Experimental Child Psychology, 79, 405–437. Koriat, A., Levy-Sadot, R., Edry, E., & de Marcas, S. (2003). What do we know about what we cannot remember? Accessing the semantic attributes of words that cannot be recalled. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 1095–1105. Lampinen, J. M., Neuschatz, J. S., & Payne, D. G. (1999). Source attributions and false memories: A test of the demand characteristics account. Psychonomic Bulletin and Review, 6, 130–135. Libby, L. K., & Neisser, U. (2001). Structure and strategy in the associative false memory paradigm. Memory, 9, 145–163. Loftus, E. F. (1979). Eyewitness testimony. Cambridge, MA: Harvard University Press.

Y109937.indb 320

10/15/10 11:04:34 AM

An Output-Bound Perspective on False Memories • 321

Loftus, E. F. (2003). Our changeable memories: Legal and practical implications. Nature Reviews Neuroscience, 4, 231–234. McDermott, K. B. (1996). The persistence of false memories in list recall. Journal of Memory and Language, 35, 212–230. McDermott, K. B., & Roediger, H. L. (1998). Attempting to avoid illusory memories: Robust false recognition of associates persists under conditions of explicit warnings and immediate testing. Journal of Memory and Language, 39, 508–520. McDermott, K. B., & Watson, J. M. (2001). The rise and fall of false recall: The impact of presentation duration. Journal of Memory and Language, 45, 160–176. McKelvie, S. (2001). Effects of free and forced retrieval instructions on false recall and recognition. Journal of General Psychology, 128, 261–278. Melo, B., Winocur, G., & Moscovitch, M. (1999). False recall and false recognition: An examination of the effects of selective and combined lesions to the medial temporal lobe/diencephalon and frontal lobe structures. Cognitive Neuropsychology, 16, 343–359. Milani, R., & Curran, H. V. (2000). Effects of a low dose of alcohol on recollective experience of illusory memory. Psychopharmacology, 147, 397–402. Miller, M. B., & Wolford, G. L. (1999). Theoretical commentary: The role of criterion shift in false memory. Psychological Review, 106, 398–405. Mitchell, K. J., & Johnson, M. K. (2000). Source monitoring: Attributing mental experiences. In E. Tulving & F. I. M. Craik (Eds.), The Oxford handbook of memory (pp. 179–195). Oxford, UK: Oxford University Press. Neisser, U. (1988). Time present and time past. In M. M. Gruneberg, P. E. Morris, & R. N. Sykes (Eds.), Practical aspects of memory: Current research and issues (Vol. 2, pp. 545–560). Chichester, UK: Wiley. Neuschatz, J. S., Payne, D. G., Lampinen, J. M., & Toglia, M. P. (2001). Assessing the effectiveness of warnings and the phenomenological characteristics of false memories. Memory, 9, 53–71. Newstead, B. A., & Newstead, S. E. (1998). False recall and false memory: The effects of instructions on memory errors. Applied Cognitive Psychology, 12, 67–79. Norman, K. A., & Schacter, D. L. (1997). False recognition in younger and older adults: Exploring the characteristics of illusory memories. Memory and Cognition, 25, 838–848. Pansky, A., & Koriat, A. (2004). The basic-level convergence effect in memory distortions. Psychological Science, 15, 52–59. Pansky, A., Koriat, A., & Goldsmith, M. (2005). Eyewitness recall and testimony. In N. Brewer & K. D. Williams (Eds.), Psychology and law: An empirical perspective (pp. 93–150). New York: Guilford Press. Payne, D. G., Elie, C. J., Blackwell, J. M., & Neuschatz, J. S. (1996). Memory illusions: Recalling, recognizing, and recollecting events that never occurred. Journal of Memory and Language, 35, 261–285.

Y109937.indb 321

10/15/10 11:04:34 AM

322 • Asher Koriat, Ainat Pansky, and Morris Goldsmith

Poole, D. A., & White, L. T. (1993). Two years later: Effect of question repetition and retention interval on the eyewitness testimony of children and adults. Developmental Psychology, 29, 844–853. Read, J. D. (1996). From a passing thought to a false memory in 2 minutes: Confusing real and illusory events. Psychonomic Bulletin and Review, 3, 105–111. Rhodes, M. G., & Anastasi, J. S. (2000). The effects of a levels-of-processing manipulation on false recall. Psychonomic Bulletin and Review, 7, 158–162. Robinson, K. J., & Roediger, H. L. (1997). Associative processes in false recall and false recognition. Psychological Science, 8, 231–237. Roebers, C. M., & Fernandez, O. (2002). The effects of accuracy motivation on children’s and adults’ event recall, suggestibility, and their answers to unanswerable questions. Journal of Cognition and Development, 3, 415–443. Roediger, H. L. (1980). Memory metaphors in cognitive psychology. Memory and Cognition, 8, 231–246. Roediger, H. L. (1996). Memory illusions. Journal of Memory and Language, 35, 76–100. Roediger, H. L., Balota, D. A., & Watson, J. M. (2001). Spreading activation and arousal of false memories. In H. L. Roediger, J. S. Nairne, I. Neath, & A. M. Surprenant (Eds.), The nature of remembering: Essays in honor of Robert G. Crowder (pp. 95–115). Washington, DC: American Psychological Association. Roediger, H. L., & Gallo, D. A. (2004). Associative memory illusions. In R. F. Pohl (Ed.), Cognitive illusions: A handbook on fallacies and biases in thinking, judgement and memory (pp. 309–326). Hove, UK: Psychology Press. Roediger, H. L., & McDermott, K. B. (1995). Creating false memories: Remembering words not presented in lists. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 803–814. Roediger, H. L., & McDermott, K. B. (2000). Tricks of memory. Current Directions in Psychological Science, 9, 123–127. Roediger, H. L., McDermott, K. B., & Robinson, K. J. (1998). The role of associative processes in creating false memories. In M. A. Conway, S. E. Gathercole, & C. Cornoldi (Eds.), Theories of memory (Vol. 2, pp. 187– 245). Hove, UK: Psychology Press. Roediger, H. L., Watson, J. M., McDermott, K. B., & Gallo, D. A. (2001). Factors that determine false recall: A multiple regression analysis. Psychonomic Bulletin and Review, 8, 385–407. Schacter, D. L. (2001). The seven sins of memory: How the mind forgets and remembers. Boston: Houghton Mifflin Co. Schacter, D. L., Verfaellie, M., & Pradere, D. (1996). The neuropsychology of memory illusions: False recall and recognition in amnesic patients. Journal of Memory and Language, 35, 319–334.

Y109937.indb 322

10/15/10 11:04:34 AM

An Output-Bound Perspective on False Memories • 323

Seamon, J. G., Luo, C. R., Kopecky, J. J., Price, C. A., Rothschild, L., Fung, N. S., & Schwartz, M. A. (2002). Are false memories more difficult to forget than accurate memories? The effect of retention interval on recall and recognition. Memory and Cognition, 30, 1054–1064. Seamon, J. G., Luo, C. R., Shulman, E. P., Toner, S. K., & Caglar, S. (2002). False memories are hard to inhibit: Differential effects of directed forgetting on accurate and false recall in the DRM procedure. Memory, 10, 225–238. Sherman, S. J., McMullen, M. N., & Gavanski, I. (1992). Natural sample spaces and the inversion of conditional judgments. Journal of Experimental Social Psychology, 28, 401–421. Smith, R. E., & Hunt, R. R. (1998). Presentation modality affects false memory. Psychonomic Bulletin and Review, 5, 710–715. Sommers, M. S., & Lewis, B. P. (1999). Who really lives next door: Creating false memories with phonological neighbors. Journal of Memory and Language, 40, 83–108. Spence, D. P. (1982). Narrative truth and historical truth. New York: Norton. Toglia, M. P., Neuschatz, J. S., & Goodwin, K. A. (1999). Recall accuracy and illusory memories: When more is less. Memory, 7, 233–256. Tun, P. A., Wingfield, A., Rosen, M. J., & Blanchard, L. (1998). Response latencies for false memories: Gist-based processes in normal aging. Psychology and Aging, 13, 230–241. Tversky, A., & Kahneman, D. (1973). Availability: A heuristic for judging frequency and probability. Cognitive Psychology, 5, 207–232. Weber, N., & Brewer, N. (2008). Eyewitness recall: Regulation of grain size and the role of confidence. Journal of Experimental Psychology: Applied, 14, 50–60. Winograd, E., Peluso, J. P., & Glover, T. A. (1998). Individual differences in susceptibility to memory illusions. Applied Cognitive Psychology, 12, S5–S27. Zoellner, L. A., Foa, E. B., Brigidi, B. D., & Przeworski, A. (2000). Are trauma victims susceptible to “false memories”? Journal of Abnormal Psychology, 109, 517–524.

Y109937.indb 323

10/15/10 11:04:34 AM

Y109937.indb 324

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Basden et al. (1998) Basden et al. (1998) Dehon & Bredart (2004) Dehon & Bredart (2004) Dehon & Bredart (2004) Dehon & Bredart (2004) Dehon & Bredart (2004) Dehon & Bredart (2004) Dewhurst & Robinson (2004) Dewhurst & Robinson (2004) Dewhurst & Robinson (2004) Gallo, McDermott et al. (2001) Gallo, McDermott et al.(2001) Geraerts et al. (2005) Geraerts et al. (2005) Geraerts et al. (2005) Geraerts et al. (2005) Geraerts et al. (2005) Geraerts et al. (2005) Geraerts et al. (2005) Geraerts et al. (2005) Harbluk & Weingartner (1997)

Source

Table 15.2 Summary Data for 108 DRM Studies

Experiment 1, nominal condition Experiment 1, collaborative conditionv,j Experiment 1, young adults Experiment 1, older adults Experiment 2, young adults, unwarned Experiment 2, young adults, warned Experiment 2, older adults, unwarned Experiment 2, older adults, warned 5 year olds 8 year olds 11 year olds Experiment 1, auditory presentation Experiment 1, visual presentationv Recovered memory group, neutral wordsv Recovered memory group, trauma-related wordsv Repressed memory group, neutral wordsv Repressed memory group, trauma-related wordsv Continuous memory group, neutral wordsv Continuous memory group, trauma-related wordsv Control group, neutral wordsv Control group, trauma-related wordsv Detoxified alcoholics v,j

Specification .46 .40 .69 .51 .63 .64 .40 .40 .31 .54 .69 .60 .58 .56 .42 .56 .43 .55 .42 .59 .46 .59

Ps (IBQ) .43 .44 .24 .48 .16 .04 .34 .39 . . . .46 .38 .61 .20 .46 .16 .42 .14 .44 .13 .46

Pc .89 .82 .95 .88 .95 .98 .88 .88 .84 .85 .92 .94 .95 .92 .94 .93 .95 .93 .96 .94 .96 .90

OBA

324 • Asher Koriat, Ainat Pansky, and Morris Goldsmith

Appendix

10/15/10 11:04:34 AM

23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Harbluk & Weingartner (1997) Intons-Peterson et al. (1999) Intons-Peterson et al. (1999) Intons-Peterson et al. (1999) Intons-Peterson et al. (1999) Intons-Peterson et al. (1999) Intons-Peterson et al. (1999) Lampinen et al. (1999) Lampinen et al. (1999) Libby & Neisser (2001) Libby & Neisser (2001) Libby & Neisser (2001) Libby & Neisser (2001) McDermott (1996) McDermott (1996) McDermott (1996) McKelvie (2001) Melo et al. (1999) Melo et al. (1999) Melo et al. (1999) Melo et al. (1999) Milani & Curran (2000) Milani & Curran (2000) Miller & Wolford (1999) Miller & Wolford (1999)

Control condition Experiment 1, older adults Experiment 1, young adults Experiment 2, older adults Experiment 2, young adults Experiment 3, older adults, pictorial presentationv,j Experiment 3, young adults, pictorial presentationv,j Experiment 1 Experiment 2 Experiment 1, distraction at study, short list Experiment 1, distraction at study, long list Experiment 1, rehearsal at study, short list Experiment 1, rehearsal at study, long list Experiment 1, immediate testing Experiment 1, 30-second delayed testing Experiment 1, 48-hour delayed testingj Experiment 1 MTL/D amnesic patients FL amnesic patients FL nonamnesic patients Control participants Alcohol consumption condition Placebo consumption condition Experiment 1 Experiment 2

.68 .39 .56 .35 .60 .55 .61 .61 .56 .76 .55 .88 .57 .58 .50 .04 .70 .32 .24 .41 .49 .51 .54 .76 .68

.48 .52 .56 .64 .55 .38 .17 .36 .34 .28 .33 .12 .38 .44 .46 .12 .34 .63 .19 .46 .35 .40 .39 .27 .42 —continued

.94 .82 .90 .75 .91 .91 .94 .90 .91 .90 .93 .96 .92 .93 .91 .52 .93 .80 .93 .90 .90 .91 .92 .95 .96

An Output-Bound Perspective on False Memories • 325

Y109937.indb 325

10/15/10 11:04:34 AM

Y109937.indb 326

48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71

Neuschatz et al. (2001) Newstead & Newstead (1998) Newstead & Newstead (1998) Norman & Schacter (1997) Norman & Schacter (1997) Norman & Schacter (1997) Norman & Schacter (1997) Payne et al. (1996) Read (1996) Rhodes & Anastasi (2000) Rhodes & Anastasi (2000) Rhodes & Anastasi (2000) Rhodes & Anastasi (2000) Robinson & Roediger (1997) Robinson & Roediger (1997) Robinson & Roediger (1997) Robinson & Roediger (1997) Robinson & Roediger (1997) Robinson & Roediger (1997) Robinson & Roediger (1997) Robinson & Roediger (1997) Robinson & Roediger (1997) Robinson & Roediger (1997) Roediger & McDermott (1995)

Experiment 2j Pilot experiment, 15 year olds Main experiment, 13–16 year olds Experiment 1, older adults Experiment 1, young adults Experiment 2, older adults Experiment 2, young adults Experiment 1 Experiment 1 Experiment 1, deep encodingj Experiment 1, shallow encodingj Experiment 2, deep encodingj Experiment 2, shallow encodingj Experiment 1, 3 associates per listv Experiment 1, 6 associates per listv Experiment 1, 9 associates per listv Experiment 1, 12 associates per listv Experiment 1, 15 associates per listv Experiment 2, 3 associates and 12 fillers per listv Experiment 2, 6 associates and 9 fillers per listv Experiment 2, 9 associates and 6 fillers per listv Experiment 2, 12 associates and 3 fillers per listv Experiment 2, 15 associates per listv Experiment 1

Table 15.2 (continued) Summary Data for 108 DRM Studies .32 .75 .72 .48 .67 .54 .69 .60 .64 .29 .18 .38 .09 .97 .80 .68 .57 .50 .72 .64 .60 .52 .50 .65

.49 .46 .38 .51 .38 .47 .34 .45 .66 .47 .23 .41 .09 .03 .11 .21 .27 .31 .03 .15 .20 .25 .30 .40

.79 .95 .95 .80 .90 .86 .89 .93 .89 .75 .74 .89 .63 .98 .96 .95 .95 .95 .99 .97 .96 .95 .94 .94

326 • Asher Koriat, Ainat Pansky, and Morris Goldsmith

10/15/10 11:04:34 AM

Schacter et al. (1996) Schacter et al. (1996) Seamon, Luo, Kopecky et al. (2002) Seamon, Luo, Kopecky et al. (2002) Seamon, Luo, Kopecky et al. (2002) Seamon, Luo, Shulman et al. (2002) Seamon, Luo, Shulman et al. (2002) Seamon, Luo, Shulman et al. (2002) Seamon, Luo, Shulman et al. (2002) Seamon et al. (2003) Seamon et al. (2003) Seamon et al. (2003)

Seamon et al. (2003) Smith & Hunt (1998) Smith & Hunt (1998) Smith & Hunt (1998) Smith & Hunt (1998) Smith & Hunt (1998) Smith & Hunt (1998) Smith & Hunt (1998) Smith & Hunt (1998) Sommers & Lewis (1999) Sommers & Lewis (1999)

72 73 74 75 76 77 78 79 80 81 82 83

84 85 86 87 88 89 90 91 92 93 94

Amnesic patients Control participants Experiment 1, immediate testingj Experiment 1, 2-week delayed testingj Experiment 1, 2-month delayed testingj Remember/remember instructions, 8 listsv,j Forget/remember instructions, 8 listsv,j Remember/remember instructions, 12 listsv,j Forget/remember instructions, 12 listsv,j Experiment 1, only hear the associates condition Experiment 1, write the associates condition Experiment 1, write the second letter of the associates condition Experiment 1, count back by threes condition Experiment 1, auditory presentationj Experiment 1, visual presentationv,j Experiment 2, auditory presentation Experiment 2, visual presentationv Experiment 3, auditory presentation, pleasantness ratingsj Experiment 3, auditory presentation, standard encodingj Experiment 3, visual presentation, pleasantness ratingsv,j Experiment 3, visual presentation, standard encodingv,j Experiment 1, phonological associates Experiment 2, phonological associates, single speaker

Y109937.indb 327

.43 .26 .29 .65 .72 .32 .29 .32 .33 .58 .62

.27 .52 .17 .07 .04 .28 .25 .20 .20 .67 .63 .56 .36 .21 .11 .42 .22 .20 .33 .10 .18 .54 .61

.29 .33 .28 .27 .12 .27 .29 .22 .25 .30 .18 .15

—continued

.90 .75 .87 .93 .97 .94 .90 .97 .95 .94 .93

.72 .90 .80 .65 .38 .92 .89 .90 .87 .95 .97 .97

An Output-Bound Perspective on False Memories • 327

10/15/10 11:04:35 AM

Y109937.indb 328

Sommers & Lewis (1999) Sommers & Lewis (1999) Toglia et al. (1999) Toglia et al. (1999) Tun et al. (1998) Tun et al. (1998) Tun et al. (1998) Tun et al. (1998) Winograd et al. (1998) Zoellner et al. (2000) Zoellner et al. (2000) Zoellner et al. (2000)

97 98 99 100 101 102 103 104 105 106 107 108

j

Traumatized PTSD participants Traumatized non-PTSD participants Control participants

Experiment 2, phonological associates, multiple speakers (blocked) Experiment 2, phonological associates, multiple speakers (random) Experiment 3, phonologically most confusable associates Experiment 3, phonologically least confusable associates Experiment 2, blocked listsj Experiment 2, randomly mixed listsj Experiment 1, older adults Experiment 1, young adults Experiment 2, older adults Experiment 2, young adults

Visual presentation at study; otherwise, auditory presentation at study. Joint recall test following the presentation of all the lists; otherwise, recall test following each list.

Sommers & Lewis (1999)

96

v

Sommers & Lewis (1999)

95

Table 15.2 (continued) Summary Data for 108 DRM Studies

.57 .62 .24 .18 .53 .63 .47 .63 .58 .54 .59 .54

.70

.71

.53 .33 .51 .36 .35 .33 .32 .32 .44 .47 .53 .26

.63

.64

.93 .96 .76 .66 .92 .93 .92 .94 .90 .94 .94 .97

.93

.93

328 • Asher Koriat, Ainat Pansky, and Morris Goldsmith

10/15/10 11:04:35 AM

16

How Should We Define and Differentiate Metacognitions? Harry P. Bahrick, Melinda K. Baker, Lynda K. Hall, and Lise Abrams

Metacognition refers to knowledge of our own cognitive processes, and metamemory, a subcategory of metacognition, specifically refers to knowledge of our own memory processes. During the past two decades, metamemory research has focused on the many ways in which monitoring and controlling our memory processes affects memory performance. The study of metamemory is based on introspective reports of memory phenomena, and issues regarding the validity of introspective reports have a long history in our science. We briefly review a key aspect of that history, and then discuss issues related to defining, differentiating, and validating metacognitive concepts. We also present data from two investigations dealing with the relation of tip-of-the-tongue (TOT) states to strong feeling-ofknowing (FOK) ratings. A primary goal of early scientific psychology was to analyze conscious content by means of introspective reports. Structuralists believed that these reports could provide a valid and reliable conduit to the mind. The approach was later abandoned because conflicting, subjective data led to irresolvable controversies (Boring, 1950, p. 403; Heidbreder, 1933, p. 145). The paradigmatic shift to behaviorism avoided problems of conflicting introspective reports by limiting psychology to more easily verifiable data. Psychophysical methods survived the shift to behaviorism because cognitive concepts were not defined by introspective 329

Y109937.indb 329

10/15/10 11:04:36 AM

330 • Harry P. Bahrick, Melinda K. Baker, Lynda K. Hall, and Lise Abrams

reports per se, but by the relation of introspective reports to verifiable characteristics of stimuli. For example, a just noticeable difference was not defined by the participant’s introspective report that a stimulus had barely changed. Rather, it was defined by the smallest amount of physical change of a stimulus that the participant correctly detected on a certain proportion of trials. Introspective reports were reintroduced in metacognitive research analogous to their use in psychophysics. Current metacognitive monitoring differs from the structuralists’ use of introspection in that no assumptions are made about the validity or reliability of introspection (Nelson, 1996). Metacognitive judgments are simply regarded as a source of data, and it is the relation of these data to subsequent performance that is of interest. The goal of most metacognitive research is to explore these relations. For example, introspective assessments of text comprehension have been found to share little variance with performance on subsequent tests of comprehension (Maki, 1998). We therefore conclude that metacognitive ratings of text comprehension have low validity. In contrast, delayed judgments of learning predict future recall with high relative accuracy (Nelson & Dunlosky, 1991; Van Overschelde & Nelson, 2006), and we conclude that these introspective reports have higher validity. The point is that the validity or the functional significance of metacognitive introspections is not assumed on the basis of reported phenomenology or lexical labels; it is established on the basis of relations between introspective reports and relevant performance. Nelson (1996, citing Wilson, 1994) pointed out that investigators frequently fail to use such functional criteria as a basis for identifying and differentiating cognitive concepts. Instead, they have relied on other criteria, such as cortical activation data (e.g., Widner, Smith & Graziano, 1996) or phenomenological reports (e.g., Schwartz, 2002). Using the TOT phenomenon as an example, we can conceptualize stages involved in validating metacognitive concepts, as shown in Table 16.1. It is useful to discuss the role and the limitations of these stages in validating metacognitions. First, we contend that the names or lexical labels given to metacognitive concepts are somewhat arbitrary; they suggest but do not define operational meanings. For example, the TOT phenomenon might also be called anticipated imminent recall. Second, the specific wording of instructions to participants can affect not only their introspective reports but also cortical activations, and it may affect these differentially. For example, changing the instructions to report a TOT only if participants are completely confident of imminent recall or only if they expect recall

Y109937.indb 330

10/15/10 11:04:36 AM

How Should We Define and Differentiate Metacognitions? • 331 Table 16.1 Stages Involved in Validating a Metacognitive Concept Stages 1 2

Naming a metacognitive concept Instructions to participants

3 4 5

Nature of introspective reports Neurological data Behavioral data

TOT Example Tip-of-the-tongue state Report whether or not recall of an unrecalled target is imminent TOT or no TOT Areas of cortical activation Success or failure of delayed recall

within a specific time limit may affect TOT judgments, but not necessarily cortical activation. In contrast, other minor changes, such as the loudness of instructions, might affect cortical activation but not necessarily TOT judgments. Lastly, cortical activation data can only be a source of hypotheses, not a basis for identifying, differentiating, or validating metacognitions. Although cortical activation data have been used to differentiate metacognitions (Widner et al., 1996), their valid use for this purpose remains highly controversial. Wager (2006, p. 27) states: “One of the biggest pitfalls is the temptation to observe brain activity and make inferences about the psychological state—for example, to infer episodic memory retrieval from hippocampal activity, fear from amygdala activity, or visual processing from activity in the visual cortex (Barrett & Wager, 2006; Poldrack, 2006; Wager et al., in press). These inferences ignore the scope of processes which may activate each of these areas.” Sabine Kastner stated in an interview, “The same system could light up when running into a snake on a path while hiking as when you suddenly see that the last piece of chocolate cake is gone. There are stereotypical brain patterns that can be activated by different causes” (Nicholson, 2006, p. 25). More specifically, Wade (2006, p. 23) states, “Many neuroimaging studies are done to identify areas of brain activation associated with particular psychological processes, but those processes are not always well defined. Most behaviors and mental processes denoted by a single term actually involve an intricate and complicated series of operations.” Uttal (2001, p. xi) addresses the core of the problem: “We must also consider what our definitions of psychological functions and processes themselves signify. It is here that the greatest impediments to understanding the relations of psychological constructs to brain mechanisms actually lie.” Of particular relevance to problems of differentiating metacognitions is Uttal’s (2001, p. 87) observation that “if one is to localize

Y109937.indb 331

10/15/10 11:04:37 AM

332 • Harry P. Bahrick, Melinda K. Baker, Lynda K. Hall, and Lise Abrams

cognitive processes in the brain, one must have not only a powerful imaging system and a nonhomogeneous brain, but also a very clear definition of the cognitive processes being localized.” The names/labels we give to metacognitions do not precisely define the underlying cognitive processes, and Uttal’s assessment is therefore particularly relevant to problems of differentiating metacognitions. He explains why the functional differentiation of such concepts cannot be based on either reported phenomenology, cortical activation, or their interrelations. We therefore contend that these data can only serve as a source of hypotheses; the definitive data for identifying, differentiating, and validating metacognitions must be the behavioral indicants associated with the phenomenology elicited by various instructions, i.e., their relations to introspective reports. Bahrick (2008) raised this general issue in regard to the relation between reported TOT states and highly confident FOK judgments. Both of these metamemory concepts have spawned a large, independently developed literature (Schwartz, 2002), but their functional overlap has not been systematically examined. One reason for the dearth of relevant data is that researchers have typically used different performance criteria to assess the predictive validity of TOTs versus FOKs. TOT reports are usually validated by delayed recall of memory targets, while FOKs are validated by identifying targets on recognition tests. These testing criteria reflect differential instructions to study participants. Participants are asked to report a TOT if they believe that recall of the target in question is imminent (Schwartz, 2002), and to indicate FOKs by judging the likelihood that they would be able to identify the unrecalled target on a subsequent recognition test. It is likely, however, that individuals who report TOTs will also be able to recognize the previously unrecalled targets (Schwartz, 2001; Schwartz, Travis, Castro, & Smith, 2000), and that individuals who report strong FOKs may also be able to recall some of the targets at a later time. The purpose of the present investigations was to explore these TOT–FOK relations systematically. In the first experiment, we established the degree of performance overlap at two retention intervals and for two types of targets, for individuals who reported one versus the other of these monitoring categories. In a second experiment, we explored access to partial knowledge of unrecalled targets. Such knowledge has not previously been reported for strong FOKs but is frequently associated with reports of TOTs (Brown & McNeill, 1966; Hanley & Chapman, 2008; Koriat & Lieblich, 1974).

Y109937.indb 332

10/15/10 11:04:37 AM

How Should We Define and Differentiate Metacognitions? • 333

Experiment 1 Method Participants Forty-six undergraduate students (11 male, 35 female) at Ohio Wesleyan University (OWU) received partial course credit for their participation. Materials A test consisting of 240 general knowledge questions was developed, where the answers to the questions represented the target words. One hundred and forty targets were names of prominent people, for example, John Hinckley was the answer to the question “What is the name of the individual committed to a mental hospital after he shot and wounded President Ronald Reagan?” The remaining 100 targets were nouns; for example, monsoon was the intended answer to “What seasonal South Asian wind is characterized by heavy rains?” The difficulty level of the questions was determined by preliminary data, collected from the OWU student population, so as to yield about 30% correct recall. The 240 questions were divided into two blocks (A and B) of 120 questions equated for difficulty level on the basis of data from prior investigations. Participants received block A and block B questions in counterbalanced sequence. One of two possible sets of instructions was assigned to block A or to block B questions: (1) TOT instructions, in which participants were asked to indicate if they believed that recall was imminent, or (2) FOK instructions, in which participants were asked to rate on a scale of 1 to 6 how certain they were that they would recognize the correct answer. Assignment of instructions to the blocks was counterbalanced within subjects such that half of the participants received questions with TOT instructions followed by FOK instructions, and the other half received these instructions in the reverse sequence. All questions were therefore assigned with the same frequency to the two types of instructions. Procedure A computer program was written by the experimenters using the software package E-Prime (Schneider, Eschman, & Zuccolotto, 2002). The experiment was presented to participants on a personal computer. Participants were told that the purpose of this study was to find out how well people who are unable to answer a question can predict their ability to remember the answer at a later time. Questions assigned to the TOT instruction condition were followed by the instructions: “Often when we cannot recall a person’s name or the answer to a question, we feel that it is on the tip of our tongue and that we will be able to

Y109937.indb 333

10/15/10 11:04:37 AM

334 • Harry P. Bahrick, Melinda K. Baker, Lynda K. Hall, and Lise Abrams

recall the answer momentarily. Do you feel it is likely you could recall the correct answer and that recall is imminent? Type Y for yes or N for no.” Instructions for questions assigned to the FOK condition were: “Do you believe you could recognize the correct answer to this question from a list of choices? On a scale of 1 to 6, rate how certain you are that you could recognize the correct answer from a list. Type a number from 1 to 6 to indicate your certainty, 1 = certainly no and 6 = certainly yes.” Questions were then presented one at a time, and participants read and attempted to retrieve the answers at their own pace. If they gave an answer (regardless of whether the provided answer was correct or incorrect), the next question was presented; when they indicated they did not know the answer, they were presented with instructions and asked to make a TOT or FOK judgment as appropriate, and then the next question was presented. Half of the participants were assigned to return for a retest after one day; the other half returned after one week. When participants returned for the second test, they were presented with the 240 questions in random order and attempted to retrieve the target. If they recalled a target correctly, the next question was presented. If they failed to recall an accurate response, the question was presented in multiple choice format, including the target and three foils to test recognition. The foils for name targets were names of approximately equally prominent individuals known for similar accomplishments. For word targets, foils were semantically related English words of approximately equal usage frequency. Results Participants recalled a mean of 30% of targets on the first test. For items that were not answered on the first test, we calculated the proportion of targets correctly recalled as well as the proportion of targets correctly recognized on the retest. Items that were recalled correctly on the retest were not presented again in recognition format; rather, participants were credited with correct recognition for these items, based on pilot data that showed nearly error-free recognition for such items. This procedure was followed to allow us to calculate recall and recognition scores for comparable items while keeping the total testing time within reasonable limits for participants to handle without becoming fatigued. Recall and Recognition Performance for TOT Reports versus FOK Ratings The top panel in Figure 16.1 shows the mean proportion recalled on the retest for targets participants reported to yield TOT states (answered Y) and for targets participants reported not having a TOT state (answered N). The lower panel of Figure 16.1 shows the

Y109937.indb 334

10/15/10 11:04:37 AM

How Should We Define and Differentiate Metacognitions? • 335 0.25

1 Day interval 7 Day interval

0.2 0.15

Mean Proportion Recalled

0.1 0.05 0

No TOT TOT Reported TOT State

0.25 0.2 0.15 0.1 0.05 0

1

2 3 4 5 Reported FOK Rating

6

Figure 16.1 Mean proportion of items recalled as a function of reported TOT states and reported FOK ratings at two test intervals. Error bars represent standard error.

proportion of recall for targets participants rated at each level of the FOK scale. Figure 16.2 gives comparable data regarding performance on the recognition test. The proportion of items correctly retrieved was significantly higher following a report of a TOT state relative to a report of a no-TOT state for both recall (t (45) = 5.95, p < .001) and recognition (t (45) = 9.58, p < .001). Similarly, one-way repeated measures ANOVAs revealed that retrieval performance improved consistently as FOK ratings increased for both recall (F (5, 195) = 16.18, p < .001) and recognition (F (5, 195) = 40.49, p < .001). Compared to targets reported to yield TOT states, targets rated 5 on the FOK scale yielded somewhat lower recall performance, whereas targets rated 6 on the FOK scale yielded somewhat higher recall. Similarly, targets rated 1, 2, or 3 on the FOK scale yielded recall comparable to targets that were reported not to yield TOT states. Targets rated 4 on the FOK scale yielded performance in between targets associated with TOT and no-TOT reports. Figure 16.2 shows that targets rated 5 and 6 on the FOK scale yielded recognition

Y109937.indb 335

10/15/10 11:04:37 AM

336 • Harry P. Bahrick, Melinda K. Baker, Lynda K. Hall, and Lise Abrams 1 Day interval 7 Day interval 1 0.8 0.6

Mean Proportion Recognized

0.4 0.2 0

No TOT TOT Reported TOT State

1 0.8 0.6 0.4 0.2 0

1

2 3 4 5 6 Reported FOK Ratings

Figure 16.2 Mean proportion of items recognized as a function of reported TOT states and reported FOK ratings at two test intervals. Error bars represent standard error.

performance comparable to targets reported to yield a TOT state, and targets rated 1 to 3 yielded performance comparable to targets reported to yield a no-TOT state. Based on the data in Figures 16.1 and 16.2, we wanted to test the assumption that reports of TOT states reflect a dichotomous judgment imposed on a continuous dimension of FOK such that participants’ reports of TOTs reflect comparable memory performance to FOK ratings of 5 or 6, and reports of no-TOT states yield comparable performance to FOK ratings of 1 to 4. Accordingly, we combined data for targets into these two categories based on retrieval expectations. Means and standard deviations for these new categories are reported in Table 16.2. We conducted four two-way mixed-design ANOVAs with the test interval (one day or seven days) as the between-subjects variable and type of prediction (TOT or FOK) as the within-subjects variable. Two analyses were performed for items that participants expected to retrieve (i.e.,

Y109937.indb 336

10/15/10 11:04:38 AM

How Should We Define and Differentiate Metacognitions? • 337 Table 16.2 Mean Recall and Recognition Performance on Retest for Items on Which Retrieval Was and Was Not Expected Mean Proportion Correct Retrieval Not Expected TOT 0

FOK 1–4

1-day

0.01 (.01)

0.01 (.02)

7-day

0.01 (.02)

0.02 (.04)

1-day

0.55 (.10)

0.54 (.13)

7-day

0.53 (.11)

0.50 (.10)

95% CIa Recall –.013, .008 –.018, .009 Recognition –.025, .060 –.014, .073

Retrieval Expected TOT 1

FOK 5–6

0.13 (.12)

0.15 (.18)

–.090, .057

0.11 (.12)

0.12 (.13)

–.065, .045

0.82 (.13)

0.81 (.11)

–.046, .066

0.76 (.21)

0.80 (.16)

–.146, .066

95% CIa

Note: Standard deviations are in parentheses. a Represents the 95% confidence interval for the difference in mean proportion correct for items with TOT versus FOK ratings.

those eliciting TOT states or FOK ratings of 5 or 6), one with proportion of recall and the other with proportion of recognition as the dependent variable. In these ANOVAs, there were no significant main effects for type of prediction (F < 1) or test interval (p > .25), nor were there interactions between type of prediction and test interval (F < 1). Two analogous analyses were performed on items that elicited no-TOT states or were rated 1 to 4 on the FOK scale. In these analyses, there were again no significant main effects for type of prediction (p > .10) or test interval (p > .25), nor interactions (F < 1). The data in Figures 16.1 and 16.2 suggest that individuals differ somewhat in the FOK scale values they assign to targets they report to yield TOT states, but values of 5 and 6 are used by most individuals when a 6-point FOK scale is offered. The lack of significant effects in all analyses suggests that TOTs and strong FOKs produced comparable recall and recognition performance, as did non-TOTs and weak FOKs when retests were administered at retention intervals of either one or seven days. In fact, the 95% confidence intervals reported in Table 16.2 show that we are consistently confident that the difference in proportion correctly retrieved for items with TOT reports relative to FOK ratings is small, often less than .05.

Y109937.indb 337

10/15/10 11:04:38 AM

338 • Harry P. Bahrick, Melinda K. Baker, Lynda K. Hall, and Lise Abrams

To ascertain whether these results applied to both types of memory targets used in our investigation, we examined recall and recognition performance separately for name and noun targets. The means and standard deviations are shown in Table 16.3. For both name and noun items, the mean proportion correctly retrieved was higher for TOT reports than for no-TOT reports and for higher FOK ratings. Retrieval rates were also much higher for nouns than for names. We calculated item difficulty as the proportion of participants who recalled the correct answer to each item on the initial test. Block A consisted of 70 name targets with mean difficulty of .18 (S = .19) and 50 noun targets with mean difficulty of .49 (S = .25). Block B also consisted of 70 name and 50 noun targets with difficulty of .14 (S = .16) and .50 (S = .24), respectively. For subsequent analyses, we selected for each block a subset of 20 name items with mean difficulty of .35 (S = .13 for Block A and .14 for Block B) and 20 noun items with the same difficulty (M = .35, S = .13 for both blocks). These selections were made without regard to metacognitive ratings or performance on the retest. Table 16.4 illustrates the mean proportion correctly retrieved by question type for the selected items. Because the time interval between tests had not shown any previous significant effects, we collapsed the data across test intervals. Again, memory performance for items associated with a TOT report is comparable to performance for items associated with FOK ratings of 5 and 6, and performance for items associated with a no-TOT report is comparable to performance for items associated with FOK ratings of 1 to 4. We conducted eight dependent t-tests; four compared the mean proportion correctly retrieved for no-TOT reports relative to FOK ratings of 1 to 4 separately for names and nouns and for recall and recognition. The remaining four tests compared the mean proportion correctly retrieved for TOT reports relative to FOK ratings of 5 and 6. None of the tests was statistically significant, and t < 1 for three of the eight comparisons. The results of Experiment 1 are consistent with the hypothesis that TOT experiences are not dissociable from FOK ratings for either recall or recognition of targets. We performed Experiment 2 in order to compare the amount of partial target information accessible for targets eliciting TOT judgments, with partial target information available for targets rated 5 and 6 on the FOK scale. Previous investigations (Brown & McNeill, 1966; Hanley & Chapman, 2008; Koriat & Lieblich, 1974) have shown that participants who reported TOT states were often able to give information about the beginning letter and the number of syllables of the unrecalled targets. If strong FOK ratings are functionally equivalent to TOT states, then participants assigning these ratings

Y109937.indb 338

10/15/10 11:04:39 AM

How Should We Define and Differentiate Metacognitions? • 339 Table 16.3 Mean Recall and Recognition Performance on the Retest by Judgment by Question Type Mean Proportion Correct Recall Question Type Judgments Tip of the tonguea

Feeling of knowingb

Recognition

Name

Noun

Name

Noun

0

0.01 (.01)

0.04 (.07)

0.50 (.12)

0.71 (.17)

1

0.08 (.12)

0.16 (.24)

0.71 (.23)

0.82 (.18)

1

0.00 (.01)

0.02 (.07)

0.39 (.19)

0.59 (.34)

2

0.01 (.04)

0.08 (.21)

0.49 (.20)

0.67 (.36)

3

0.01 (.06)

0.01 (.08)

0.54 (.27)

0.58 (.35)

4

0.03 (.10)

0.09 (.19)

0.60 (.26)

0.73 (.27)

5

0.08 (.19)

0.14 (.28)

0.75 (.22)

0.76 (.26)

0.11 (.21)

0.21 (.29)

0.87 (.16)

0.84 (.17)

6

Note: Standard deviations are in parentheses. a Dichotomous prediction of recall: 0 = will not recall, 1 = will recall. b Scale prediction of recognition: 1 = not likely to recognize, 6 = very likely to recognize.

should be able to give comparable partial target information when tested with the same targets.

Experiment 2 Method Participants Forty-three OWU undergraduates, 24 males and 19 females, received payment for participating.

Y109937.indb 339

10/15/10 11:04:39 AM

340 • Harry P. Bahrick, Melinda K. Baker, Lynda K. Hall, and Lise Abrams Table 16.4 Mean Recall and Recognition Performance on Retest for Items on Which Retrieval Was and Was Not Expected by Question Type Mean Proportion Correct Retrieval Not Expected TOT 0 Names Nouns

Names Nouns

FOK 1–4

0.01 (.05) 0.03 (.08)

0.01 (.05) 0.05 (.12)

0.54 (.26) 0.72 (.22)

0.59 (.23) 0.66 (.27)

Retrieval Expected TOT 1

FOK 5–6

0.11 (.19) 0.14 (.24)

0.17 (.30) 0.12 (.26)

Recognition –.136, .026 0.85 (.21) –.039, .152 0.80 (.23)

0.87 (.16) 0.76 (.29)

95% CIa Recall –.018, .020 –.063, .024

95% CIa –.200, .064 –.091, .100

–.122, .019 –.036, .142

Note: Standard deviations are in parentheses. a Represents the 95% confidence interval for the difference in mean proportion correct for items with TOT versus FOK ratings.

Materials The same questions used in Experiment 1 were also used in Experiment 2; however, we eliminated 16 general knowledge questions with targets (10 names of prominent people and 6 nouns) that were two words or hyphenated (e.g., Mason-Dixon) and individuals whose last name was not generally known (e.g., the name of Cher). We eliminated these questions to avoid ambiguity as to which words or names were to be included in estimating the number of letters. The removal of these targets resulted in 224 general knowledge questions, which were assigned to two blocks. The blocks were equated for difficulty using available data from the same student population. Procedure The procedure used in Experiment 1 was repeated with the following modifications: Participants were only tested once, they did not return for a retest, and on the test, immediately following TOT or FOK judgments, the following additional instructions were given to assess partial knowledge of the unrecalled memory target: (1) Please guess the first letter of the word or the person’s last name, and (2) please guess the number of letters in the word or the person’s last name.

Y109937.indb 340

10/15/10 11:04:39 AM

How Should We Define and Differentiate Metacognitions? • 341

Results Accuracy of First-Letter Predictions We calculated the proportion of correct guesses of the first target letter in the unrecalled targets for each level of FOK rating and for TOT and no-TOT reports. Figure 16.3 displays the accuracy of predicting the first letter of the target when TOT judgments (TOT vs. no TOT) were made compared to when FOK judgments were made (ratings 1 to 6). We compared the proportion of correct guesses for reported TOT states and no-TOT states by means of a paired-samples t-test and found more accurate guesses of the first target letter for TOT than for no-TOT reports (t (40) = 3.74, p = .001). A repeated-measures ANOVA with FOK values as the within-subjects variable and the proportion of correct guesses as the dependent variable revealed that the accuracy of guessing the first letter of the target significantly increased with higher FOK ratings, F (5, 200) = 6.49, p < .001. 0.25 0.20 0.15

Mean Proportion Correct

0.10 0.05 0.00

No TOT

TOT

Reported TOT State 0.25 0.20 0.15 0.10 0.05 0.00

1

2

3

4

5

6

FOK Rating

Figure 16.3 Mean proportion of first-letter accuracy as a function of reported TOT states and reported FOK ratings. Error bars represent standard error.

Y109937.indb 341

10/15/10 11:04:39 AM

342 • Harry P. Bahrick, Melinda K. Baker, Lynda K. Hall, and Lise Abrams

We also conducted paired-samples t-tests to compare TOT judgments and FOK ratings on first-letter accuracy. The proportion of correct firstletter guesses for FOK ratings of 5 and 6 did not differ significantly from those for reported TOT states, t < 1 (95% confidence interval for the difference is –.126, .049). However, the proportion of correct firstletter guesses for FOK ratings of 1 to 4 was significantly larger than that for no-TOT states, t (39) = 2.935, p < .01, although the 95% confidence interval expects a difference in mean proportion correct of .05 or less. Accuracy of Number of Letters Predictions We calculated the proportion of correct guesses of the total number of letters in the unrecalled targets for each reported FOK confidence level and for TOT and no-TOT reports, and these data are shown in Figure 16.4. A paired-samples t-test found no significant effect on proportion of correct predictions of the number of target letters for reported TOT states and no-TOT states, t < 1. Similarly, a repeated-measures ANOVA with FOK values as the within-subjects variable showed no significant effect on the proportion 0.25 0.20 0.15

Mean Proportion Correct

0.10 0.05 0.00

No TOT TOT Reported TOT State

0.25 0.20 0.15 0.10 0.05 0.00

1

2

3 4 FOK Rating

5

6

Figure 16.4 Mean proportion of number of letters accuracy as a function of reported TOT states and reported FOK ratings. Error bars represent standard error.

Y109937.indb 342

10/15/10 11:04:40 AM

How Should We Define and Differentiate Metacognitions? • 343

of correct guesses of number of target letters, F < 1. Unlike the findings regarding correct guesses of the first letter, neither FOK confidence levels nor reported TOT states had a significant effect on the accuracy of estimating the number of letters in unrecalled targets. This result was unexpected, because others (e.g., Brown & McNeill, 1966; Hanley & Chapman, 2008; Koriat & Lieblich, 1974; Yarmey, 1973) have reported TOT states to be associated with significant knowledge of the length of unrecalled targets.

General Discussion The data from Experiment 1 show that reports of TOTs and of strong FOKs by the same individuals predicted subsequent retention of targets with nearly identical accuracy. This result was obtained for both recall and recognition testing formats, for noun and name type memory targets that varied in difficulty level, and for retention intervals of one day and one week. The data from Experiment 2 extend these findings to show that individuals report comparable partial knowledge of unrecalled targets for reports of TOTs and of strong FOKs. Data from both experiments suggest that individuals differ somewhat in the value on the FOK scale they assign to correspond to their reports of TOT versus no-TOT states. In particular, some participants may at times assign FOK values of 4 to their TOT states, while others assign them more often to their no-TOT states. However, such individual differences in criterion setting do not challenge the general conclusion that reports of TOT states reflect dichotomized judgments of a continuous dimension of confidence in the FOK experience. It is, of course, not possible to prove the null hypothesis, but these findings are based on large data sets, are consistent across all experimental conditions, and support acceptance of the null hypotheses. Although the assumption that TOTs are functionally equivalent to strong FOKs has been implied by others (Gardiner, Craik, & Bleasdale, 1973; Shimamura & Squire, 1986), Widner et al. (1996) reached the opposite conclusion based on their report that TOT and FOK instructions resulted in differential degrees of cortical activation in the prefrontal cortex (PFC). We discussed earlier that differential cortical activation data could not provide the basis for differentiating metacognitions because such inferences ignore the scope of processes that may activate brain areas. Similarly, Widner, Otani, and Winkelman (2005) reported that the number of perseverative errors on the Wisconsin Card Sorting Test (indicative of deficient PFC functioning) had a significant effect on the frequency of reporting and the predictive accuracy of

Y109937.indb 343

10/15/10 11:04:40 AM

344 • Harry P. Bahrick, Melinda K. Baker, Lynda K. Hall, and Lise Abrams

FOKs, but not of TOTs. However, they obtained FOK and TOT reports from different participants in different experiments, the magnitude of the effect on TOTs and on FOKs was not directly compared statistically, and the investigators used 20-point confidence ratings for TOTs rather than more conventional dichotomous TOT judgments. Most importantly, the hypothesis that TOT reports are functionally equivalent to strong FOK ratings must be tested by comparing effects on TOT reports with effects on strong FOK ratings, not by comparing effects on the mean degree of confidence of all TOT and FOK reports. For all of these reasons, their findings do not provide a basis for rejecting the hypothesis that TOT judgments are functionally equivalent to reports of strong FOKs. Other researchers (Brown, 1991; Schwartz et al., 2000) have reported the phenomenology of spontaneous, involuntary experience or the experience of a sense of emotional frustration as distinctive attributes of TOT states. We showed earlier why reported differences of phenomenology per se cannot serve to identify or differentiate metacognitive concepts. We contend that this is also true of evidence that instructional or other variables differentially affect the frequency of TOT versus strong FOK reports. Such findings can serve to differentiate metacognitions only if they are linked to differential relations of TOT versus FOK reports with recall or recognition performance. The lexical labels we give to metacognitions are not operational definitions; their functional meanings are usually ambiguous, rather than precise. It is important to prevent the ambiguities of common language from translating into ambiguities of scientific concepts. Ambiguity of scientific concepts leads to irresolvable controversies and often to wasteful research. Metacognitions consist of introspective reports, and the functional differentiation of such reports rests upon their respective relations to indicants of performance.

Acknowledgments This research was supported by National Institute of Aging Grant RO1 AG19803. We thank Ann Daunic and William Uttal for comments on a draft of this article. We thank Randi Amstad, Jacqui Barker, Hilary Comeras, Sarah Koenig, Veronica Malencia, Lindsey Messinger, Elizabeth Polter, Stephanie Parnes, Valerie Sloboda, and Caitlin Willet for their assistance with data collection and scoring. Correspondence concerning this article should be addressed to Harry P. Bahrick, Department of Psychology, Ohio Wesleyan University, Delaware, OH 43015. E-mail: [email protected].

Y109937.indb 344

10/15/10 11:04:40 AM

How Should We Define and Differentiate Metacognitions? • 345

References Bahrick, H. P. (2008). Thomas O. Nelson: His life and comments on implications of his functional view of metacognitive memory monitoring. In J. Dunlosky & R. A. Bjork (Eds.), Handbook of metamemory and memory (pp. 1–7). New York: Psychology Press. Barrett, L. F., & Wager, T. D. (2006). The structure of emotion: Evidence from neuroimaging studies. Current Directions in Psychological Science, 15, 79–83. Boring, E. G. (1950). A history of experimental psychology. New York: AppletonCentury-Crofts. Brown, A. S. (1991). A review of the tip-of-the-tongue experience. Psychological Bulletin, 109, 204–223. Brown, R., & McNeill, D. (1966) The “tip of the tongue”phenomenon. Journal of Verbal Learning and Verbal Behavior, 5, 325–337. Gardiner, J. M., Craik, F. I., & Bleasdale, F. A. (1973). Retrieval difficulty and subsequent recall. Memory and Cognition, 3, 213–216. Hanley, J. R., & Chapman, E. (2008). Partial knowledge in a tip-of-the-tongue state about two- and three-word proper names. Psychonomic Bulletin and Review, 15, 156–160. Heidbreder, E. (1933). Seven psychologies. New York: Appleton-CenturyCrofts. Koriat, A., & Lieblich, I. (1974). What does a person in a “TOT” state know that a person in a “don’t know” state doesn’t know? Memory and Cognition, 2, 647–655. Maki, R. (1998). Test prediction over text material. In D. J. Jacker, J. Dunlosky, & A. C. Graesser (Eds.), Metacognition in educational theory and practice (pp. 117–145). Mahwah, NJ: Erlbaum. Nelson, T. O. (1996). Consciousness and metacognition. American Psychologist, 51, 102–116. Nelson, T. O., & Dunlosky, J. (1991). The delayed-JOL effect: When delaying your judgments of learning can improve the accuracy of your metacognitive monitoring. Psychological Science, 2, 267–270. Nicholson, C. (2006). Thinking it over: fMRI and psychological science. The Observer, 19, 20–27. Poldrack, R. A. (2006). Can cognitive processes be inferred from neuroimaging data? Trends in Cognitive Sciences, 10, 59–63. Schneider, W., Eschman, A., & Zuccolotto, A. (2002). E-Prime user’s guide. Pittsburgh, PA: Psychology Software Tools, Inc. Schwartz, B. L. (2001). The relation of tip-of-the-tongue states and retrieval time. Memory and Cognition, 29, 117–126. Schwartz, B. L. (2002). Tip-of-the-tongue states. Mahwah, NJ: Erlbaum. Schwartz, B. L., Travis, D. M., Castro, A. M., & Smith, S. M. (2000). The phenomenology of real and illusory tip-of-the-tongue states. Memory and Cognition, 28, 18–27.

Y109937.indb 345

10/15/10 11:04:40 AM

346 • Harry P. Bahrick, Melinda K. Baker, Lynda K. Hall, and Lise Abrams

Shimamura, A. P., & Squire, L. R. (1986). Memory and metamemory: A study of the feeling-of-knowing phenomenon in amnesic patients. Journal of Experimental Psychology: Learning, Memory, and Cognition, 12, 452–460. Uttal, W. R. (2001). The new phrenology: The limits of localizing cognitive processes in the brain. Cambridge, MA: MIT Press. Van Overschelde, J. P., & Nelson, T. (2006). Delayed judgments of learning cause both a decrease in absolute accuracy (calibration) and increase in relative accuracy (resolution). Memory and Cognition, 34, 1527–1538. Wade, C. (2006). Some cautions about jumping on the brain-scan bandwagon. The Observer, 19, 23. Wager, T. D. (2006). Do we need to study the brain to understand the mind? The Observer, 19, 25–27. Wager, T. D., Hernandez, L., Jonides, J., & Lindquist, M. (2007). Elements of functional neuroimaging. In J. T. Cacioppo, L. G. Tassinary & G. G. Berntson (Eds.), Handbook of Psychophysiology (4th ed., pp. 19–55). Cambridge: Cambridge University Press. Widner, R. L., Otani, H., & Winkelman, S. E. (2005). Tip-of-the-tongue experiences are not merely strong feeling-of-knowing experiences. Journal of General Psychology, 132, 392–407. Widner, R. L., Smith, S. M., & Graziano, W. G. (1996). The effects of demand characteristics on the reporting of tip-of-the-tongue and feeling-ofknowing states. American Journal of Psychology, 109, 525–538. Wilson, T. D. (1994). The proper protocol: Validity and completeness of verbal reports. Psychological Science, 5, 249–254. Yarmey, A. D. (1973). I recognize your face but I can’t remember your name: Further evidence on the tip-of-the-tongue phenomenon. Memory and Cognition, 1, 287–290.

Y109937.indb 346

10/15/10 11:04:40 AM

17

Learning From the Consequences of Retrieval Another Test Effect Elizabeth Ligon Bjork, Benjamin C. Storm, and Patricia A. deWinstanley

Overview In the present chapter, we describe research that integrates two lines of investigation in which Robert A. Bjork, in whose honor this volume is dedicated, has played a significant leadership role: research regarding the benefits of testing for long-term retention and research regarding the metacognitive processes underlying the development of illusions of competence during study. In our conjoining of these two areas of research, we have explored whether the metacognitive experience gained from taking a test can lead learners to adopt improved strategies for encoding future to-be-learned material. Or, put slightly differently, we have explored whether learning to learn can be another beneficial effect of test taking. In the sections to follow, we first describe some of the more relevant findings regarding the beneficial effects of testing on memory, with this section followed by one in which we describe findings relevant to the question of how individuals monitor their learning during study. Then, we turn to a description of the research in which we have explored whether learning to learn can occur as a consequence of the testing experience. 347

Y109937.indb 347

10/15/10 11:04:41 AM

348 • Elizabeth Ligon Bjork, Benjamin C. Storm, and Patricia A. deWinstanley

The Power of Tests as Learning Events Although to many educators and learners alike, tests are simply measures of what is already known; tests can, in fact, also serve as powerful learning events. Indeed, much laboratory research (e.g., Landauer & Bjork, 1978; Carrier & Pashler, 1992) has demonstrated the power of tests as learning events and, moreover, has shown that a test, even when no corrective feedback is given, can be considerably more effective for the long-term retention of material than additional study of it. This power of tests as learning events comes about because, as pointed out by R. A. Bjork (1975), retrieval processes do not simply assess the contents of memory and then leave the representations of items in memory in the same state as they were before being retrieved. Rather, the act of retrieving information modifies its representation in memory such that it becomes more recallable in the future—a phenomenon that Bjork argued represents a kind of Heisenberg principle for retrieval processes: “an item can seldom, if ever, be retrieved from memory without modifying the representation of that item in significant ways” (p. 123). Or, as similarly observed by Roediger and Karpicke (2006a) in their excellent review of testing effects, “just as measuring the position of an electron changes that position, so the act of retrieving information from memory changes the mnemonic representation underlying retrieval—and enhances later retention of the tested information” (p. 182). That tests can be more powerful as learning events for the long-term retention of material than additional opportunities for studying it, even when the tests do not provide any corrective feedback, was recently and impressively demonstrated by Roediger and Karpicke (2006b) using both educationally realistic materials and ecologically significant retention intervals. To illustrate, in one of their reported experiments, participants in one condition were given four 5-minute opportunities to study the same passage (resulting in the passage being read an average of 14.2 times by these participants), while participants in another condition were given only one 5-minute opportunity to study the passage and then were tested on it three times in a row (resulting in the passage being read an average of only 3.4 times for these participants). Additionally, participants in these two conditions were given a final test on the passage at either a five-minute or a one-week delay. Although at the five-minute delay, participants who had received four opportunities to study the passage recalled significantly more idea units than did those who had only studied the passage once and been tested on it three times, at the one-week delay, this difference in recall was dramatically reversed. Now, the participants who had studied the passage only

Y109937.indb 348

10/15/10 11:04:41 AM

Learning From the Consequences of Retrieval • 349

once but who had been tested on it three times recalled significantly more idea units than those who had studied the passage four times. Additionally, an analysis of the proportion of information forgotten across the one-week delay revealed that the participants who repeatedly studied the material forgot far more (52%) than did the participants who were repeatedly tested on it (only 14%), demonstrating the power of tests to retard the forgetting that would otherwise occur. In addition to such direct effects of testing for the improvement of retention, tests can also enhance learning in several indirect ways. First, when given frequently across a course, tests can lead students to space their studying across the course rather than massing it right before a final exam, thus inducing the oft-demonstrated power of spaced or distributed practice for long-term retention (e.g., Cepeda, Pashler, Vul, Wixted, & Rohrer, 2006; Dempster, 1996). Second, tests can serve to optimize future study activities. From a metacognitive standpoint, tests allow learners to make more accurate assessments than do additional study events concerning whether information is likely to be recallable in the future (e.g., Koriat & Bjork, 2005, 2006; Nelson & Dunlosky, 1991). Furthermore, when feedback is provided after a test, students are informed as to what they do and do not already know and can thus more efficiently allocate their future study efforts. Third, as suggested by the work of Izawa (1970), tests can enhance the effectiveness of subsequent study relative to the effectiveness of such study when not preceded by a test. Finally, as illustrated in several recent studies by Kornell, Hays, and Bjork (2009) using materials that ensured unsuccessful initial retrieval attempts, even failed tests can potentiate the effectiveness of subsequent study opportunities.

Monitoring One’s Learning During Study Much recent research in the area of memory and learning has been concerned with the study of the metacognitive processes by which individuals monitor their level of knowledge during study (e.g., Benjamin, Bjork, & Schwartz, 1997; R. A. Bjork, 1999; Dunlosky & Nelson, 1994; Koriat, 1997, 1998). Undoubtedly, some of the interest in this area of research stems from the possibility, as put forth by R. A. Bjork and others (e.g., R. A. Bjork, 1999; Jacoby, Bjork, & Kelley, 1994), that the readings learners take on their level of comprehension during study, or their judgments of how likely they are to be able to recall the material being studied in the future, are as important as their actual comprehension and degree of learning because such readings play a powerful role in determining how they will decide to allocate their future study

Y109937.indb 349

10/15/10 11:04:41 AM

350 • Elizabeth Ligon Bjork, Benjamin C. Storm, and Patricia A. deWinstanley

activities and learning resources. On the basis of such readings, for example, students may well decide to read one chapter versus another or to study one set of materials versus another in preparation for an upcoming examination. A primary method that has been used to study such metacognitive processes is to ask learners to make judgments of learning (JOLs) during acquisition. A typically used procedure, for example, is to present learners with a list of cue-target pairs to learn and, following the presentation of each pair for study, to ask learners to judge the likelihood of their remembering the target in response to presentation of the cue alone on a later retention test. In a number of experiments using such a procedure, the JOLs made by participants have been found to be moderately accurate (e.g., Dunlosky & Nelson, 1994; Lovelace, 1984; Mazzoni & Nelson, 1995), and considerable research in this area has thus been focused on the question of what accounts for the accuracy of JOLs in predicting future memory performance. Learners, however, can also be far from accurate in taking such readings of comprehension or in making JOLs, and thus other research in this area has been directed to the question of what accounts for such illusions of comprehension (e.g., R. A. Bjork, 1999; Jacoby, Bjork, & Kelley, 1994). Learners, for example, can be led to think that their level of comprehension or skill is greater than it actually is owing to conditions of learning (such as massed practice) that enhance or support performance during study or training, but actually impair long-term retention or transfer (e.g., Simon & Bjork, 2001). Similarly, learners can be led to make JOLs that perfectly mismatch their later performance on a test by basing them on the fluency with which they can retrieve answers from long-term memory in the presence of cues available at the time of study, but that will not be present at the time of test (Benjamin et al., 1998). One account for the occurrence of such dissociations between the JOLs made by learners during study and their actual performance on a later test is offered by the new theory of disuse (NTD), a theory proposed by R. A. Bjork and E. L. Bjork in 1992 to account for a number of unique characteristics or peculiarities of human memory. According to the NTD, such dissociations occur when learners base the JOLs they make during study on retrieval strength (i.e., the current activation or accessibility of an item’s representation in memory) rather than on storage strength (i.e., how entrenched or interassociated a memory representation is with related knowledge and skills), and because the former is a poor indicator of actual learning, it is also a poor indicator of longterm performance. (For more information concerning the new theory

Y109937.indb 350

10/15/10 11:04:41 AM

Learning From the Consequences of Retrieval • 351

of disuse as a model of learning and memory as well as its applications for training and instruction, the reader is also referred to R. A. Bjork & E. L. Bjork, 2006; E. L. Bjork & R. A. Bjork, 2011). From a slightly different perspective, Koriat (1997) has argued that learners can suffer from illusions of competence and be led to make inaccurate JOLs because, during original study, they tend to be relatively insensitive to the presence of extrinsic factors, which entail both conditions of learning (such as number of repetitions, presentation durations, and massed vs. distributed repetitions of items) and encoding operations used by the learner (such as level of processing and interactive imagery) that do enhance learning and later performance on a test, while being overly sensitive to intrinsic factors (such as the perceived association between cues and targets when both are present during study, or the relative difficulty or imagery values of individual words in a list) that do not necessarily enhance performance on later retention tests. Indeed, in several recent studies, Koriat and Bjork (2005) have demonstrated the oversensitivity of learners to the intrinsic factor of the perceived relationship between cues and targets when both are present during study by manipulating the types of associations existing between cue-target pairs to which learners must make JOLs during acquisition. In one of their studies, for example, learners were asked to make JOLs during study of a list of cue-target pairs in which some of the pairs were high in a priori relatedness (i.e., the preexperimental likelihood that presentation of the cue alone would bring the target to mind rather than other possible associates of the cue) and some were what the authors referred to as purely a posteriori pairs (i.e., pairs for which the perceived relationship between the cue and target is high when presented together, but for which the a priori degree of relatedness is low— that is, when presented alone, the cue would be very unlikely to bring that particular target to mind vs. other possible associates of the cue). The JOLs learners made to the a priori cue-target pairs corresponded to their later performance; in contrast, the JOLs they made to the purely a posteriori pairs were highly inflated. Koriat and Bjork (2005, 2006) have interpreted this overconfidence, or illusion of knowing, produced by the a posteriori pairs as being a type of foresight bias akin to, but different from, hindsight bias (Fischhoff, 1975; Hawkins & Hastie, 1990). More specifically, they proposed this foresight bias to be a mirror image of hindsight bias. That is, whereas hindsight bias refers to our tendency to distort our memory of a previously made judgment once the answer is known to us, the foresight bias proposed by Koriat and Bjork occurs when we predict our future

Y109937.indb 351

10/15/10 11:04:42 AM

352 • Elizabeth Ligon Bjork, Benjamin C. Storm, and Patricia A. deWinstanley

success in recalling a correct answer in the presence of that answer. Thus, the authors argue, both represent biases that reflect our inability to “escape” the influence of the correct answer.

Learning From the Experience of Test Taking We turn now to a discussion of research conducted by the present authors that we believe integrates some of the findings and concepts from the research on testing effects and metacognitive processes described in the previous two sections. Because our research has focused on the encoding strategy of generation as one of the conditions of learning to which learners do not seem sensitive, we first define and illustrate the generation effect or advantage, including brief descriptions of two successful accounts of it. We then discuss a series of our studies, some complete and some ongoing, that address the general issue of the sensitivity of learners to the memorial benefits of generation and whether—if made sensitive to this benefit in the context of a test—they would then adopt more effective encoding strategies in the processing of new information.

Generation as a Condition of Learning When learners take an active part in generating the information they are to learn, as opposed to having that information provided to them intact and simply reading it, they tend to remember the to-be-learned information better. To illustrate, if learners are required to generate the word memory from a word fragment (e.g., m _ m _ ry) versus being given the intact word to read, they will recall the word memory better on a later test. Or, if required to generate an exemplar, say lemon, to a category-plus-letter-stem cue (e.g., fruit–le___) versus being given the intact pair, fruit–lemon, to study, they will recall lemon better in response to the cue fruit on a later test. Considerable research has shown this memorial benefit of generation (e.g., Jacoby, 1978; Slamecka & Graf, 1978) to be both a robust phenomenon and one that extends to a variety of materials, including lists of words, word pairs, and trivia questions (e.g., deWinstanley, 1995; Hirshman & Bjork, 1988) as well as mathematical problems (e.g., McNamara & Healy, 1995a, 1995b; Pesta, Sanders, & Murphy, 1999). The generation advantage, however, can also be diminished or even eliminated under certain conditions. For example, McNamara and Healy (1995a, 1995b) found that generation advantages do not occur on a later test for arithmetic problems unless retrieval strategies that were employed by learners during study are evoked again during the

Y109937.indb 352

10/15/10 11:04:42 AM

Learning From the Consequences of Retrieval • 353

test. Similarly, deWinstanley, Bjork, and Bjork (1996) demonstrated that even when participants are learning the same materials, generation advantages may or may not occur depending on the match between the information strengthened as a result of the learner performing the generation task during study and the type of information required for optimal performance on a later test. More specifically, deWinstanley et al. (1996) manipulated the conditions of learning during study so as to force the processing of different types of information in order to generate targets for the same list of cue-target pairs. To illustrate, in one condition, the cue-target pairs to be learned were blocked on the basis of the categorical membership of the targets, leading participants to focus on target-targetrelational information (to which free-recall tests are assumed to be most sensitive)—as opposed to cue-target-relational information (to which cued-recall tests are assumed to be most sensitive)—in order to perform the generation task. On subsequent tests, these participants showed a generation advantage when given a free-recall test, but not when given a cued-recall test. In a second condition, the pairs to be learned were not blocked by category membership of the targets, essentially eliminating target-target-relational processing as a basis for generating targets and forcing participants to rely on cue-targetrelational information instead. On subsequent tests, these participants showed a generation advantage on a cued-recall test, but not on a free-recall test. In short, a striking reversal was observed in the relative levels of free and cued recall for targets that had been generated versus read, depending on the type of information participants had been forced to use in order to generate those targets during learning. The occurrence of generation effects can also be influenced by the encoding instruction given to learners. Begg, Vinski, Frankovich, and Holgate (1991), for example, showed that the advantage of generation over reading could be eliminated when participants were given other effective strategies, such as imagery, to use when encoding intact to‑be‑read items. Similarly, deWinstanley and Bjork (1997) eliminated a previously observed generation advantage for identical materials by giving participants explicit instructions concerning the type of retention test to expect and how to process information optimally in anticipation of such a test. In fact, changes in a variety of factors—such as the type of test learners expect, whether to‑be‑read or to‑be‑generated items are mixed together (i.e., between- or within-subject manipulations of generation vs. read), and the specific requirements of the generation task—have led to a continuum of outcomes ranging from large to small to no generation advantages.

Y109937.indb 353

10/15/10 11:04:42 AM

354 • Elizabeth Ligon Bjork, Benjamin C. Storm, and Patricia A. deWinstanley

These types of findings, delineating conditions under which generation advantages do and do not occur, are largely consistent with two explanations of the generation effect: the procedural account and the transfer-appropriate multifactor account, both of which emphasize the critical nature of the relationship between encoding and retrieval processes in the production of generation effects. Briefly characterized, the procedural account (Crutcher & Healy, 1989; McNamara & Healy, 1995a, 1995b) assumes that when learners are required to generate information at study, as opposed to reading it, they are more likely to utilize encoding procedures that can then be reinstated during a later retention test. When a later test does invoke such procedures, a generation advantage should occur; if not, a generation advantage should not occur. The transfer-appropriate multifactor account (deWinstanley et al., 1996)—built upon the two-factor account of Hirshman and Bjork (1988) and the multifactor account of McDaniel, Waddill, and Einstein (1988)—assumes that the act of generation strengthens whatever type of information is used by the learner to complete the generation task, and thus the consequence of the generation task for later memory performance depends on whether the information so enhanced is information to which a later test is sensitive. Thus, when there is a good match between these types of information, generation advantages should occur; when there is not, generation advantages should not occur.

Making Learners Sensitive to Generation as an Effective Condition of Learning As indicated in our discussion of the research by Koriat, R. A. Bjork, and others on metacognitive processes, investigations of how individuals monitor their level of learning during study paint a picture of learners as being insensitive to many of the conditions of learning that can enhance long-term retention, reflected in the relatively small influence of such variables on the JOLs they make during study. Such conditions, referred to as extrinsic factors by Koriat (1997), include not only aspects of presentation, such as number and duration of study opportunities, but also encoding operations applied by the learner during study, such as generation and levels of processing. In our research, we have asked whether learners could be made sensitive to the encoding effectiveness of one such extrinsic factor—that of generation—if they were to experience its memorial consequences in their own recall performance in the context of a test. We did this not to see if the accuracy of their

Y109937.indb 354

10/15/10 11:04:42 AM

Learning From the Consequences of Retrieval • 355

JOLs might thereby be enhanced during future study (although that is a topic of other research under way), but to see if their encoding strategies might thereby be enhanced during future study. In our research addressing this general issue, we have typically adopted the following general experimental strategy, initially used in the studies conducted by deWinstanley and Bjork (2004). Participants were first presented with a short passage to study of the type that would appear in an undergraduate introductory textbook, but in which we had embedded both to‑be‑generated and to‑be‑read critical items. Next, participants’ recall for these critical items was assessed in a fill-in-theblank test. Then, after the experience of this test, a new text passage, also containing both to‑be‑generated and to‑be‑read critical items, was presented for study and then also followed by the same type of test for the critical items. Thus, before presentation of the second text passage for study, participants would have the opportunity to engage in both generating and reading of critical items in a previous passage as well as the opportunity to experience a generation advantage in their own performance on the test of those items. Hence, if as hypothesized, such an experience could be sufficient to induce participants to adopt a more effective way of encoding future to‑be‑read information, a generation advantage should be attenuated, or possibly eliminated, in the test of the second passage. In the first two studies employing this procedure, deWinstanley and Bjork (2004) obtained results consistent with this hypothesis. Specifically, while a generation advantage was observed in the test of the first passage, no generation advantage was observed in the test of the second passage. Importantly, however, the absence of a generation advantage on the second test did not occur at the expense of the generated items. Instead, recall of the to‑be‑read items presented in the second passage improved to the level of that for the to‑be‑generated items, which did not differ from the level obtained in the test of the first passage. Given the results of these studies, deWinstanley and Bjork (2004) next attempted to determine whether it was, in fact, the testing experience that was critical in leading participants to develop more effective encoding strategies in two follow-up studies. In their first follow-up study, they used the same basic procedure, but rather than presenting both to‑be‑generated and to‑be‑read items within the same passage, they manipulated the requirement to generate versus read between passages. Accordingly, in the first passage for a given participant, the encoding task for all critical items was the same—either generating or reading— and then, in the second passage, the encoding task for critical items was switched. Consequently, participants did not have the opportunity

Y109937.indb 355

10/15/10 11:04:42 AM

356 • Elizabeth Ligon Bjork, Benjamin C. Storm, and Patricia A. deWinstanley

to experience the memorial consequences of generating versus reading within the context of the same test before they were presented with a new passage for study. Thus, if such an experience is critical for leading learners to adopt more effective encoding strategies, then the generation advantage should not be eliminated on the test of the second passage, and indeed, this was the result observed: A generation advantage was obtained on both tests, and furthermore, its size did not differ across tests. Apparently, then, simply having the experience of encoding critical items via generation in the first passage was not sufficient to make participants aware of the need to develop a better processing strategy for encoding to-be-read critical items in the second passage, pointing to the critical role of experiencing the relative memorial consequences of the two types of encoding within the same testing episode. In their second follow-up study addressing the critical nature of the testing experience, deWinstanley and Bjork (2004) examined whether something less specific—like a general dissatisfaction with the number of critical to‑be‑read items they were able to recall in the first test—had led participants to process to‑be‑read items in the second passage more effectively. This possibility could not be ruled out by the first follow-up study because the switch of encoding tasks between passages made it impossible for participants presented with only to‑be‑read items in the first passage to reveal any such improved encoding strategies for subsequent to‑be‑read items, as they only received to‑be‑generated critical items in their second passage. Thus, to address this possibility, they next manipulated the requirement to generate versus read between participants rather than between passages. Given this way of manipulating the encoding variable, a generation advantage would be expected on the test of the first passage whichever hypothesis was correct, whereas different outcomes would be expected on the test of the second passage. If the general feeling of dissatisfaction explanation is correct, then the generation advantage should be reduced or eliminated in the test of the second passage. If, however, the opportunity to experience the memorial benefits of generating relative to reading is critical for inducing a processing change, then participants only reading critical items in the first passage should not change their processing strategy for the second passage, and a generation advantage should be seen on the second test as well. Consistent with the testing experience explanation, a generation advantage was obtained in the tests for both passages and, as with the first follow-up study, the size of this advantage did not differ across tests. Thus, when participants were denied the opportunity to experience the memorial advantage of generation in their own test performance—because the read versus generate

Y109937.indb 356

10/15/10 11:04:42 AM

Learning From the Consequences of Retrieval • 357

encoding variable was manipulated either between passages or between participants—their ability to recall to‑be‑read items remained significantly poorer than their ability to recall to‑be‑generated items.

Additional Questions and Potential Important Applications As indicated in our description of the studies of deWinstanley and Bjork (deWinstanley & Bjork, 2004; E. L. Bjork, deWinstanley, & Storm, 2007), this research explored whether learners could discover for themselves how to become more effective processors or encoders of to-belearned information if given an informative test experience. The pattern of results observed across the four studies described above indicated that experiencing the advantages of encoding by generation versus only reading could induce learners to develop more effective encoding strategies. Or, in the terms of Koriat (1997), making learners sensitive to the power of generation as a condition of learning led them, in turn, to adopt enhanced strategies for the encoding of new information via reading— that is, even for information that they were not required to generate. These findings raise many interesting questions: some regarding the underlying cause of the effect observed by deWinstanley and Bjork (2004) and some regarding how these findings could best be applied to educational practices. It is to a discussion of some of these questions— topics of our recent and ongoing research—that we now turn. One of our first investigated questions concerned the effect’s durability. In the series of studies by deWinstanley and Bjork (2004) just reviewed, the second passage was always presented with little or no delay after the test of the first passage, raising the question of whether the testing experience only leads to enhanced encoding of new information when the new information is presented immediately after such a test. Perhaps, for example, the insertion of a delay between the testing experience and the presentation of the next passage would prevent participants from adopting a more effective processing strategy for subsequently presented to‑be‑read information. Should a delay produce such an effect, the applicability of these findings for educational purposes would be lessened. A related issue, in terms of the educational applications of these findings, would be whether the test experience must occur immediately after presentation of the passage in which participants both generated and read critical items. Or, might it be possible—as might frequently be necessary in educational settings—to delay the test without eliminating the learners’ ability to benefit from the test experience?

Y109937.indb 357

10/15/10 11:04:42 AM

358 • Elizabeth Ligon Bjork, Benjamin C. Storm, and Patricia A. deWinstanley

In two recently conducted studies, we have addressed both of these questions as follows. In a first study, we inserted a delay filled with a number of other activities between the presentation and testing of a first passage and the presentation and testing of a second passage, and in a second study, we inserted the same type of delay between the presentation of the first passage and its subsequent test. Furthermore, as in the original deWinstanley and Bjork (2004) studies, both the first and second passages always contained both to‑be‑read and to‑be‑generated critical items, and the subsequent tests for the passages were in the form of a fill‑in‑the‑blank type of test for the critical items. Importantly, for the applicability of the present effect for educational purposes, the results obtained replicated those of the original studies. Thus, it appears that the observed effect of the testing experience— that is, its ability to lead learners to develop more effective encoding strategies for processing future information—can persist across a delay filled with other activities and, furthermore, does not require that the test be administered immediately after presentation of the first passage. Also being addressed in our current research is the question of the necessity for learners actually to experience the differential effectiveness of encoding via generation versus reading in the context of a memory test. From the deWinstanley and Bjork (2004) studies, we know that this experience is critical in that it was only the participants given this experience who then went on to adopt more effective processing strategies for future to‑be‑read items. Additionally, that such an experience would be necessary is consistent with previous research indicating that learners are typically unable to judge the efficacy of a given processing strategy during its execution and do not switch from a less to a more effective strategy without an opportunity to experience their relative effectiveness (see, e.g., Dunlosky & Hertzog, 2000). What remains unclear, however, is whether the relative effectiveness must be experienced in the context of an actual testing episode. Perhaps, for example, simply instructing or informing learners regarding the differential effectiveness of the two types of encoding might be sufficient, and we are currently addressing this possibility in ongoing research by varying across participants the type of experience they have following study of the first passage. For example, for some participants, they are given the opportunity to experience the relative effectiveness of generating versus reading for later performance via a testing episode, whereas for others, we are instructing them in various ways regarding the relative memorial effectiveness of generation versus reading as encoding strategies. Although still ongoing, results so far are strongly indicating the critical

Y109937.indb 358

10/15/10 11:04:42 AM

Learning From the Consequences of Retrieval • 359

nature of the actual test experience for producing the desired enhancement of future encoding strategies. We are also currently addressing the theoretical question of how participants are improving their encoding of information in the second passage. One possibility that we are considering is that during original study, participants used contextual information provided by other words in the passage to help them complete or encode the to‑be‑generated critical items and, then, used this information again in the subsequent fill-in-the-blank test to aid their recall. Indeed, the use of such a strategy—that is, to use contextual information first to help complete and then to help recall the generated items—could underlie the generation advantages observed on the tests of the first passages and, additionally, would be an explanation consistent with both the procedural and the transfer-appropriate multifactor accounts described earlier. Specifically, it would have been the match between the information strengthened while completing the generation task and the information needed to perform well on the later test, or the ability to reinstate during test the cognitive procedures used during study, that had resulted in the observed generation advantages. Should this explanation for the generation advantage initially observed in the deWinstanley and Bjork (2004) studies be correct, perhaps participants—becoming aware of both their superior recall of generated items and their use of such contextual information in recalling them on the test—then attended to such contextual information during the study of the second passage for both types of critical items, consequently eliminating a generation advantage in their subsequent recall. Such an account would also be consistent with the finding that the generation advantage was not eliminated on tests of the second passage when participants had only received to‑be‑generated critical items during study of the first passage. It may have been more difficult for such participants—even if using contextual information in the same way during study of the first passage—to notice the role of this strategy in aiding their recall during the test because they were only recalling items they had generated and thus were not able to experience a contrast between their ability to recall words encoded via generation and reading. Consequently, they would have been less likely to transfer the use of this strategy when encoding to‑be‑read items presented in the second passage. In research recently completed, we have tested this potential explanation by using different types of retention tests following study of the first passage: in particular, ones that provide contextual information during the testing process and ones that do not. Our reasoning in

Y109937.indb 359

10/15/10 11:04:43 AM

360 • Elizabeth Ligon Bjork, Benjamin C. Storm, and Patricia A. deWinstanley

so doing was as follows: If this explanation is correct, then when the test following study of the first passage does not provide contextual information, the testing experience should not lead participants to the discovery of this encoding strategy, and thus the generation advantage should not be eliminated in the testing of subsequently presented material. Consistent with our hypothesis, when participants were given the same type of fill‑in‑the‑blank test as initially used by deWinstanley and Bjork (2004), a generation advantage was then eliminated in the test of the second passage, but in contrast, when a test that did not provide such information (e.g., a free-recall test) was administered following study of the first passage, a generation advantage continued to occur in the test of the second passage (Sin, Storm, Bjork, & deWinstanley, 2006). Finally, in an even more direct test of this potential strategy in leading to enhanced encoding of the second passage, we have recently conducted a study in which we varied both the nature of the test given to participants following their study of the first passage (i.e., tests that did or did not provide contextual information) and the type of information for which we tested following the second passage (either critical items or contextual items). Our reasoning in so doing was that it would only be participants who could discover this strategy during the test experience (i.e., those given a test involving contextual information) that would then go on to process such information more effectively in the second passage. Consequently, these participants should reveal a superior ability to recall contextual items in the test of the second passage, and the results obtained were consistent with this reasoning (Little, Storm, & Bjork, 2008).

Concluding Comments As clearly documented by research on testing effects, it is not only during study that learning takes place. Learning also occurs during tests: Successful retrieval modifies the representation of the material so retrieved, making it more retrievable in the future. In addition to such specific effects on learning that can occur as a consequence of retrieval during tests—that is, the modification of the representations in memory of the retrieved information—we believe our research demonstrates that another type of learning can also take place during tests—in particular, that a higher-order type of learning can occur as well, such as the learning of an improved strategy for encoding future information.

Y109937.indb 360

10/15/10 11:04:43 AM

Learning From the Consequences of Retrieval • 361

Additionally, while our research has focused on only one such strategy—that engendered by the generation of to-be-learned information— it seems possible that learners could be made sensitive to other extrinsic factors or conditions of learning that enhance long-term performance through similar testing experiences. Thus, the line of research that we have outlined in this chapter seems to us to paint a promising picture from an applied perspective: namely, that providing students with opportunities to experience the consequences of differentially effective encoding processes in their own performance—either in the context of tests, as was done in our research, or potentially in other ways as well— can lead students to discover and then to adopt on their own more effective ways of processing future to-be-learned information. That is, beyond the more effective learning of the information in question, they may also, in general, be learning how to learn more effectively.

References Begg, I., Vinski, L., Frankovich, L., & Holgate, B. (1991). Generating makes words memorable, but so does effective reading. Memory and Cognition, 19, 487–497. Benjamin, A. S., Bjork, R. A., & Schwartz, B. L. (1997). The mismeasure of memory: When retrieval fluency is misleading as a metamnemonic index. Journal of Experimental Psychology: General, 127, 55–68. Bjork, E. L., & Bjork, R. A. (2011). Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning. In M. A. Gernsbacher, R. W. Pew, & J. R. Pomerantz (Eds.), Psychology and the real world: Essays illustrating fundamental contributions to society (pp. 56–64). New York: Worth Publishers. Bjork, E. L., deWinstanley, P. A., & Storm, B. C. (2007). Learning how to learn: Can experiencing the outcome of different encoding strategies enhance subsequent encoding? Psychonomic Bulletin and Review, 14, 207–211. Bjork, R. A. (1975). Retrieval as a memory modifier: An interpretation of negative recency and related phenomena. In R. L. Solso (Ed.), Information processing and cognition: The Loyola Symposium (pp. 123–144). Hillsdale, NJ: Erlbaum. Bjork, R. A. (1999). Assessing our own competence: Heuristics and illusions. In D. Gopher & A. Koriat (Eds.), Attention and Performance XVII—Cognitive regulation of performance: Interaction of theory and application (pp. ). Cambridge, MA: MIT Press. Bjork, R. A., & Bjork, E. L. (1992). A new theory of disuse and an old theory of stimulus fluctuation. In A. Healy, S. Kosslyn, & R. Shiffrin (Eds.), From learning processes to cognitive processes: Essays in honor of William K. Estes (Vol. 2, pp. 35–67). Hillsdale, NJ: Erlbaum.

Y109937.indb 361

10/15/10 11:04:43 AM

362 • Elizabeth Ligon Bjork, Benjamin C. Storm, and Patricia A. deWinstanley

Bjork, R. A., & Bjork, E. L. (2006). Optimizing treatment and instruction: Implications of a new theory of disuse. In L.-G. Nilsson & N. Ohta (Eds.), Memory and society: Psychological perspectives (pp. 109–133). New York: Psychology Press. Carrier, M., & Pashler, H. (1992). The influence of retrieval on retention. Memory & Cognition, 20, 633–642. Cepeda, N. J., Pashler, H., Vul, E., Wixted, J. T., & Rohrer, D. (2006). Distributed practice in verbal recall tasks: A review and quantitative synthesis. Psychological Bulletin, 132, 354–380. Crutcher, R. J., & Healy, A. F. (1989). Cognitive operations and the generation effect. Journal of Experimental Psychology: Learning, Memory, and Cognition, 15, 669–675. Dempster, F. N. (1996). Distributing and managing the conditions of encoding and practice. In E. L. Bjork & R. A. Bjork (Eds.), Human memory (pp. 197–236). San Diego, CA: Academic Press. deWinstanley, P. A. (1995). A generation effect can be found during naturalistic learning. Psychonomic Bulletin and Review, 2, 538–541. deWinstanley, P. A., & Bjork, E. L. (1997). Processing instructions and the generation effect: A test of the multifactor transfer-appropriate processing theory. Memory, 5, 401–421. deWinstanley, P. A., & Bjork, E. L. (2004). Processing strategies and the generation effect: Implications for making a better reader. Memory and Cognition, 32, 945–955. deWinstanley, P. A., Bjork, E. L., & Bjork, R. A. (1996). Generation effects and the lack thereof: The role of transfer appropriate processing. Memory, 4, 31–48. Dunlosky, J., & Hertzog, C. (2000). Updating knowledge about encoding strategies: A componential analysis of learning about strategy effectiveness from task experience. Psychology and Aging, 15, 462–474. Dunlosky, J., & Nelson, T. O. (1994). Does the sensitivity of judgments of learning (JOLs) to the effects of various study activities depend on when the JOLs occur? Journal of Memory and Language, 33(4), 545–565. Fischhoff, B. (1975). Hindsight is not equal to foresight: The effects of outcome knowledge on judgments under uncertainty. Journal of Experimental Psychology: Human Perception and Performance, 1, 288–299. Hawkins, S. A., & Hastie, R. (1990). Hindsight: Biased judgments of past events after the outcomes are known. Psychological Bulletin, 107, 311–327. Hirshman, E., & Bjork, R. A. (1988). The generation effect: Support for a twofactor theory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14, 484–494. Izawa, C. (1970). Optimal potentiating effects and forgetting-prevention effects of tests in paired-associate learning. Journal of Experimental Psychology, 83, 340–344. Jacoby, L. (1978). On interpreting the effects of repetition: Solving a problem versus remembering a solution. Journal of Verbal Learning and Verbal Behavior, 17, 649–667.

Y109937.indb 362

10/15/10 11:04:43 AM

Learning From the Consequences of Retrieval • 363

Jacoby, L. L., Bjork, R. A., & Kelley, C. M. (1994). Illusions of comprehension, competence, and remembering. In D. Druckman & R. A. Bjork (Eds.), Learning, remembering, believing: Enhancing human performance (pp. 57–81). Washington, DC: National Academy Press. Koriat, A. (1997). Monitoring one’s own knowledge during study: A cueutilization approach to judgments of learning. Journal of Experimental Psychology: General, 127, 349–370. Koriat, A. (1998). Illusions of knowing: A window to the link between knowledge and metaknowledge. In V. Y. Yzerbyt, G. Lories, & B. Dardenne (Eds.), Metacognition: Cognitive and social dimensions (pp. 16–34). London: Sage. Koriat, A., & Bjork, R. A. (2005). Illusions of competence in monitoring one’s knowledge during study. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 187–194. Koriat, A., & Bjork, R. A. (2006). Illusions of competence during study can be remedied by manipulations that enhance learners’ sensitivity to retrieval conditions at test. Memory and Cognition, 34, 959–972. Kornell, N., Hays, M. J., & Bjork, R. A. (2009). Unsuccessful retrieval attempts enhance subsequent learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35, 989–998. Landauer, T. K., & Bjork, R. A. (1978). Optimum rehearsal patterns and name learning. In M. M. Gruneberg, P. E. Morris, & R. N. Sykes (Eds.), Practical aspects of memory (pp. 625–632). London: Academic Press. Little, J. L., Storm, B. C., & Bjork, E. L. (2008, November). How does experiencing the generation advantage lead to improved reading of new information? Poster presented at the annual meeting of the Psychonomic Society, Chicago. Lovelace, E. A. (1984). Metamemory: Monitoring future recallability during study. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10, 756–766. McDaniel, M. A., Waddill, P. J., & Einstein, P. J. (1988). A contextual account of the generation effect: A three-factor theory. Journal of Memory and Language, 27, 521–536. McNamara, D. S., & Healy, A. F. (1995a). A generation advantage for multiplication skill training and nonword vocabulary acquisition. In A. F. Healy & L. E. Bourne, Jr. (Eds.), Learning and memory of knowledge and skills: Durability and specificity (pp. 1–29). Thousand Oaks, CA: Sage. McNamara, D. S., & Healy, A. F. (1995b). A procedural explanation of the generation effect: The use of an operand retrieval strategy for multiplication and addition problems. Journal of Memory and Language, 34, 399–416. Mazzoni, G., & Nelson, T. O. (1995). Judgments of earning are affected by the kind of encoding in ways that cannot be attributed to the level of recall. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 1263–1274.

Y109937.indb 363

10/15/10 11:04:43 AM

364 • Elizabeth Ligon Bjork, Benjamin C. Storm, and Patricia A. deWinstanley

Nelson, T. O., & Dunlosky, J. (1991). When people’s judgments of learning (JOLs) are extremely accurate at predicting subsequent recall: The “delayed-JOL effect.” Psychological Science, 2, 267–270. Pesta, B. J., Sanders, R. E., & Murphy, M. D., (1999). A beautiful day in the neighborhood: What factors determine the generation effect for simple multiplication problems? Memory and Cognition, 27(1), 106–115. Roediger, H. L., III, & Karpicke, J. D. (2006a). The power of testing memory: Basic research and implications for educational practice. Perspectives on Psychological Science, 1, 181–210. Roediger, H. L., III, & Karpicke, J. D. (2006b). Test enhanced learning: Taking memory tests improves long-term retention. Psychological Science, 17, 249–255. Simon, D., & Bjork, R. A. (2001). Metacognition in motor learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27, 907–912. Sin, N. L., Storm, B. C., Bjork, E. L., & deWinstanley, P. A. (2006, April). Learning how to learn: How does experience enhance subsequent encoding? Poster presented at the annual meeting of the Western Psychological Association, Palm Springs, CA. Slamecka, N. J., & Graf, P. (1978). The generation effect: Delineation of a phenomenon. Journal of Experimental Psychology: Human Learning and Memory, 4, 592–604.

Y109937.indb 364

10/15/10 11:04:43 AM

18

Failing to Predict Future Changes in Memory A Stability Bias Yields Long-Term Overconfidence Nate Kornell

If you have ever experienced that panicky moment when you realize, while taking a shower in Los Angeles, that you are supposed to be in Seattle telling 75 business executives about their memories, you may have learned something about the instability of memory (Robert A. Bjork, personal communication, November 15, 2009). It is easy to forget travel plans that once seemed memorable, and it is easy to add embarrassing experiences to one’s memory. Human memory is anything but stable. We constantly forget old information and form new memories. Yet recent research has demonstrated a stability bias in human memory: People act as though their memories will remain stable in the future. They fail to predict future forgetting (Koriat, Bjork, Sheffer, & Bar, 2004) and future learning (Kornell & Bjork, 2009). In this chapter, I discuss the importance of assessing one’s memory in everyday life, draw a distinction between predicting future remembering versus predicting future changes in remembering, and review evidence substantiating the stability bias. I then describe an experiment examining the cause of the stability bias. I asked participants (n = 430) to predict their ability to remember word pairs they would study once or four times and would be tested on in five minutes or one week. Participants predicted significant learning and forgetting but vastly underpredicted both effects, demonstrating 365

Y109937.indb 365

10/15/10 11:04:43 AM

366 • Nate Kornell

a stability bias. Asking participants to imagine the test situation had little or no effect. The results demonstrated long-term overconfidence: Relatively modest immediate overconfidence transformed into enormous overconfidence as the test delay increased.

The Importance of Metacognition Metacognition—that is, our judgments and beliefs about memory— helps us regulate our cognitive processes. It is common, for instance, to withhold or modify statements based on uncertainty (Goldsmith, Koriat, & Weinberg-Eliezer, 2002). For example, “Duke’s basketball coach is Mike Krzyzewski” conveys more confidence than “Duke’s basketball coach is Mike Krzyzewski, or whatever.” The phrase “or whatever” may not seem very sophisticated, but it signals that the speaker is unsure of the coach’s name. It is another way of saying “I’m not sure” (which monkeys can also do; Kornell, 2009a; Smith & Washburn, 2005). In many cases, the act of assessing one’s memories is almost as vital as the act of retrieval itself. Memory assessments also help us manage our learning and memory, which is particularly important for students (Kornell & Bjork, 2007; Thiede, Anderson, & Therriault, 2003). Students decide what to study and how much time to spend studying based on these assessments (Nelson, Dunlosky, Graf, & Narens, 1994; Son & Metcalfe, 2000). Assessing one’s memory inaccurately can lead to ineffective study behaviors, such as studying too little, not studying the information most in need of attention, or studying inefficiently (Benjamin, Bjork, & Schwartz, 1998; Kornell & Metcalfe, 2006). Beliefs about how memory works are also a kind of metacognition. Such beliefs affect all sorts of everyday situations; for example, people write shopping lists because they believe they will forget something otherwise. Students rely on metacognitive beliefs when they decide how to study (e.g., Should I test myself? Should I make flashcards? Should I underline? see Karpicke, 2009; Kornell, 2009b; Kornell & Son, 2009).

Judgments Based on the Past Versus Judgments Based on the Future Memory monitoring has been researched extensively in recent years (see Dunlosky & Bjork, 2008; Metcalfe & Shimamura, 1994). In the most common kind of memory monitoring experiment, participants study an item and then judge how likely they are to remember it on a

Y109937.indb 366

10/15/10 11:04:43 AM

Failing to Predict Future Changes in Memory • 367

later test (e.g., Dunlosky & Nelson, 1992). Such a judgment, which is called a judgment of learning, can be made based on cues associated with one’s current memory; the stronger the memory, the more likely the item is to be recalled. A judgment of learning is a prediction of one’s ability to remember in the future. Predicting how one’s ability to remember will change in the future has received much less research attention (see Koriat et al., 2004; Kornell & Bjork, 2009; Tiede & Leboe, 2009). Memories change, for example, as people forget over time. Memory can also improve in the future as a result of time spent studying. Predicting future changes in one’s memory offers potential advantages. For example, predicting future forgetting is a way to avoid overestimating one’s ability to remember in the future. Predicting how much one will learn by studying in the future is probably valuable for students planning future study activities. Research reviewed in the next section suggests that predictions of future changes in remembering are less accurate than predictions of future remembering, however, because people act as though their memories will be stable in the future.

A Stability Bias in Human Memory A wealth of evidence suggests that people cannot directly assess the strength of their memories. Instead, metamemory judgments are made based on inferences (Schwartz, Benjamin, & Bjork, 1997). These inferences are, in turn, based on cues. Koriat (1997) proposed that there are three categories of cues that underlie metamemory judgments. To illustrate, suppose a student is studying Chapter 7 of an introductory psychology textbook. Characteristics of the chapter itself, such as the lucidity of the writing, are intrinsic cues. Features of the student’s interactions with the materials, such as the amount of time he or she spends studying, are extrinsic cues. The student’s internal experiences with the material, such as the fluency with which he or she answers a practice quiz, are mnemonic cues. People seem to rely heavily on intrinsic cues (e.g., Rhodes & Castel, 2008). They rely on mnemonic cues once they have developed such cues (e.g., Finn & Metcalfe, 2008). They tend to undervalue extrinsic cues (e.g., Carroll, Nelson, & Kirwan, 1997; Koriat & Bjork, 2005). Future interactions with learning material mainly fall into the category of extrinsic cues. For example, the delay between a study trial and a test trial is an extrinsic cue; so is the number of times one will be allowed to study. The other two types of cues are less relevant to predictions of future learning: Mnemonic cues, which have to do with one’s

Y109937.indb 367

10/15/10 11:04:44 AM

368 • Nate Kornell

past interactions with the materials, are backward looking, and intrinsic cues have to do with the learning material itself (e.g., a textbook chapter); neither will change in the future. This reasoning suggests that people will be stuck relying on extrinsic cues when making predictions of future learning. If people tend to undervalue extrinsic cues, they will tend to underpredict the ways their memories will change in the future. They should, for example, underestimate future forgetting. Predicting Future Forgetting Koriat et al. (2004) investigated people’s sensitivity to the retention interval between a study session and a test. In a typical experiment, separate groups of participants were asked to predict how many word pairs they would remember on a test that would occur right away, in one day, or in one week. The groups’ predictions were essentially identical. Of course, their actual recall performance decreased dramatically as delay increased. In one experiment, predictions for a test that would occur in one year were basically the same as predictions for an immediate test. These results are evidence of a stability bias in human memory. Koriat et al.’s (2004) participants surely knew that they were prone to forgetting, but they did not appear to apply that belief when making predictions. Koriat et al. distinguished between theory-based judgments, which are based on one’s beliefs about memory, and experience-based judgments, which are based on the learning experience itself. Their participants did employ experience-based judgments—they predicted higher rates of recall for easier items—but they did not seem to apply theory-based judgments. When Koriat et al. (2004) took steps to make the concept of forgetting salient, their participants began to apply their theory-based judgments. In within-participant experiments, where the same person was asked about multiple different intervals, the predictions became more accurate. The predictions also became accurate when participants were asked to predict how much they would forget rather than how much they would remember. Predicting Future Learning Along with gradual forgetting over time, one of the most obvious aspects of memory is that people learn by studying. If people underpredict future forgetting, do they also underpredict future learning? A set of studies by Kornell and Bjork (2009) suggests that the answer is yes. Participants were asked to predict how well they would do on a test that would occur after they had studied a word pair between one and four times. Although memory performance increased dramatically

Y109937.indb 368

10/15/10 11:04:44 AM

Failing to Predict Future Changes in Memory • 369

across trials, predicted performance increased very little. Unlike Koriat et al. (2004), Kornell and Bjork found that even predictions made on a within-participant basis revealed a strong stability bias. Again, these participants surely realized that they learned by studying (otherwise, why would they study?), but they did not appear to apply that belief. Kornell and Bjork’s (2009) participants made their predictions during the first study trial. Thus, a prediction for the first test did not require assessing future changes in memory—it was the same as a traditional judgment of learning. The first test predictions were generally overconfident. This finding underscores the importance of distinguishing between judgments of learning and predictions of future learning; participants were simultaneously overconfident in their judgments of learning and underconfident in their future learning ability. Possible Explanations of the Stability Bias Why is there a stability bias? In the experiment presented below, I tested two hypotheses. One is that extrinsic cues are overshadowed by other cues (see Koriat, Sheffer, & Ma’ayan, 2002; Tiede & Leboe, 2009). In the experiments described above, the relatedness of the word pairs, an intrinsic cue, was highly salient. Its salience may have led people to focus on item difficulty and ignore extrinsic cues such as the number of study trials. If this explanation is correct, minimizing the salience of intrinsic cues should diminish the stability bias. In the present experiment, I minimized the influence of intrinsic cues in two ways: by using homogeneously difficult items and by presenting only one sample item per participant. Another reason people fall victim to the stability bias may be a failure of perspective taking. Seeing things from another person’s perspective can be difficult for adults, to say nothing of children or non-human animals. It can even be difficult to take one’s own perspective. For example, difficult questions often seem easy as soon as one knows the answer, a phenomenon known as the hindsight bias (Fischhoff, 1975). It is possible that in the experiments described above, participants made predictions from the perspective of the studier (which is what they were at the moment of the prediction) rather than from the perspective of the test taker (i.e., themselves in the future). If, for example, participants had imagined themselves taking the test a week in the future, they might have been more sensitive to retention interval and less subject to the stability bias.

Experiment Overview The central variable in the present experiment included three conditions. In the baseline condition, participants were told they would study

Y109937.indb 369

10/15/10 11:04:44 AM

370 • Nate Kornell

a set of word pairs once and then take a test five minutes later; in the extra study condition, they were told there would be four study trials; in the extra delay condition, they were told the test would take place one week later. These three conditions were compared in seven different ways, making a total of 21 between-participant conditions. The first of these seven sets of conditions was the three actual recall conditions, in which participants’ memories were actually tested. In the other six sets of conditions (i.e., the other 18 conditions), participants predicted how they would do if their memories were tested. The six sets of conditions included the no-perspective condition, the perspective condition, and four items-diminished conditions. In the no-perspective condition, participants predicted how many items they would remember based on a description of the materials and procedure in an actual recall condition. The perspective condition was the same except that participants were asked, when making their prediction, to imagine themselves at the time of the test. The items-diminished conditions were the same as the perspective condition; the difference was that the influence of item difficulty was diminished by presenting either homogeneously difficult pairs (which were all easy or all hard) or only one sample pair (which was either easy or hard). The experiment was designed to answer six questions: 1. How would participants do in the actual recall conditions? 2. Were predictions accurate in the no-perspective condition, which served as the baseline prediction condition? 3. Did imagining a test taker’s perspective increase prediction accuracy? 4. Did diminishing the influence of item difficulty increase prediction accuracy? 5. Overall, did participants suffer from a stability bias? 6. Were older adults less susceptible to the stability bias than younger adults? This chapter does not present the entire methodology at once. Instead, the presentation of the experiment is organized around the six questions above. After the participants are described, the methods and results that answer each question are presented one at a time. Participants There were 430 participants. The number of participants in each condition is displayed in Table 18.1. Participants were recruited using Amazon’s Mechanical Turk, a Web site that serves as a micro-task

Y109937.indb 370

10/15/10 11:04:44 AM

Failing to Predict Future Changes in Memory • 371

market, connecting employers to workers; workers voluntarily sign up to do small tasks for pay (a typical job takes less than five minutes) and employers post jobs that they need done (see Kittur, Chi, & Suh, 2008). Participants in the prediction conditions were paid 30 cents each for a task that generally took less than three minutes to complete. Participants in the actual memory conditions, which took longer to complete and involved two sessions, were paid $1.50 for completing the first session and $2.00 for completing the second session. The sample was diverse with respect to age, location, and educational background. The mean age was 34 (median 31), with a standard deviation of 12 years and a range of 18 to 74 years. Eighty percent of the participants reported that they lived in the United States, 10% lived in India, 3% lived in Canada, 2% lived in the Philippines, and the 24 remaining participants came from 14 additional countries. (Participants were asked if they spoke English fluently; they were excluded from the data set if they said no.) Education level ranged from some high school (3%) through high school graduate (36%), to bachelor’s or associate’s degree (46%), to graduate degree (16%). Fifty-nine percent of the participants were female.

Actual Recall Performance Method As mentioned above, the central variable, which I refer to as future condition, included three conditions: study once and take a test five minutes later (baseline), study four times and take a test five minutes later (extra study), and study once and take a test in one week (extra delay). In the actual recall conditions, participants completed a memory experiment without making predictions. After reading instructions describing the experiment, they studied 24 word pairs, which were split evenly into 12 easy pairs (e.g., jelly–bread, usurp–take) and 12 hard pairs (e.g., figment–satire, criterion–attitude). The pairs were presented one at a time for three seconds each. In the extra study condition, the same list of pairs was then presented three more times in a different random order each time. After the study phase, all participants completed a five-minute task in which they were asked to recall the names of as many countries as they could. Following the country-naming distractor task, participants in the baseline and extra study conditions were tested on the 24 items; each cue word was presented, one at a time, and participants were asked to type in the target word. Participants in the extra delay condition were not tested during Session 1.

Y109937.indb 371

10/15/10 11:04:44 AM

Y109937.indb 372

Yes Yes Yes Yes

Easy 24 Difficult 24 Easy 1 Difficult 1

28 20 96 27 23 23 23

29 22 86

24 24 1 1

Items Diminished (Uncombined) Easy 22 Difficult 21 Easy 24 Difficult 19

8

Baseline

Predicted Recall Mixed Mixed Easy or difficult

Extra Study 8

Difficulty Actual Recall Mixed

24 24 1 or 24

24

Items Showna

19 24 16 21

29 16 80

8

Extra Delay

68 68 63 63

86 58 262

24

Total

All participants were informed that there would be 24 items; “items shown” refers to the number of sample items participants were shown.

No Yes Yes

No perspective Perspective Items diminished

a

No

Perspective

Actual

Condition

Table 18.1 The Number of Participants in Each Experimental Condition

372 • Nate Kornell

10/15/10 11:04:44 AM

Failing to Predict Future Changes in Memory • 373 24 Extra delay Baseline Extra study

Number Correct

20

16

Actual

Predicted

12

8

4

0

Actual

No perspective

Perspective

Itemsdiminished

Figure 18.1 Number of items correct (out of 24) as a function of future condition. The three leftmost bars are actual recall performance. The other nine bars are all predicted performance. The items-diminished conditions were created by collapsing the conditions in Figure 18.2.

One week after Session 1, all participants were e-mailed and asked to participate in Session 2. Participants who did not complete the second session were excluded in all conditions. The test in Session 2 was the same as the test in Session 1. Results and Discussion The results are displayed in Figure 18.1. Compared to the baseline condition, recall accuracy was significantly lower in the extra delay condition, t(14) = 5.52, p < .0001, d = 2.76, and significantly higher in the extra study condition, t(14) = 4.27, p < .0001, d = 2.14. (All t-tests comparing the extra delay condition or the extra study condition to the baseline condition were one-tailed.) In other words, as expected, participants learned from additional studying and forgot over the course of a weeklong delay.

Were Predictions Accurate in the No-Perspective Condition? Method As mentioned above, 18 groups made predictions. Each group read a description of one of the conditions in the actual memory experiment just

Y109937.indb 373

10/15/10 11:04:45 AM

374 • Nate Kornell

described (i.e., the baseline, extra study, or extra delay condition). They were asked to predict how many of 24 items they would remember. The no-perspective condition served as the baseline prediction condition. It was similar to the prediction conditions in the research conducted by Koriat et al. (2004) and Kornell and Bjork (2009). Participants were asked to read a description of the actual memory experiment. In particular, they were told that 24 word pairs would be studied for three seconds each; they were asked to read through the 24 mixed-difficulty word pairs used in the actual recall conditions; and the nature of the final cued-recall test was described. Separate groups made predictions for each of the three future conditions; depending on the participant’s condition, they were told that the experiment involved one or four study trials and that the test would take place after five minutes or one week. The information about the number of study trials and the test was then presented a second time in an alternative format, and then participants were asked to predict how many of the 24 items they would have recalled if they had participated in the experiment. The directions were designed to make the manipulation of study trials and test delay prominent and clear. Results and Discussion The results are displayed in Figure 18.1. Predicted recall in the extra delay condition was almost identical to predicted recall in the baseline condition. Predicted recall in the extra study condition was higher than baseline, but the difference was only marginally significant, t(55) = 1.40, p = .08, d = .37. An ANOVA comparing predicted and actual accuracy confirmed that there was a significant interaction between future condition and recall task (predicted vs. actual), F(2, 104) = 8.71, p < .001, ηp2 = .14. That is, the predictions underestimated actual learning and forgetting, demonstrating a stability bias.

Did Imagining a Test Taker’s Perspective Increase Prediction Accuracy? The predictions described above occurred at the time of study. The predictions would have been more accurate if they had occurred after participants had either completed all four study trials or waited out the weeklong delay—that is, at the time of the test (Kornell & Bjork, 2009; see also Carroll et al., 1997). Asking participants to imagine their perspective at the time of the test might also diminish or eliminate the stability bias.

Y109937.indb 374

10/15/10 11:04:45 AM

Failing to Predict Future Changes in Memory • 375

Method The perspective conditions were the same as the no-perspective conditions with one exception: To encourage perspective taking, three sentences were added to the instructions immediately before participants were asked to make their prediction. The sentences in the extra delay condition, for example, read: “Now we want you to imagine that you are participating in the experiment. Imagine that you are about to start the test. You studied the word pairs once each one week ago.” Results and Discussion The results are displayed in Figure 18.1. Compared to the baseline condition, predictions were significantly higher in the extra study condition, t(40) = 1.97, p < .05, d = .61, and lower in the extra delay condition, t(34) = 1.86, p < .05, d = .63. In other words, participants predicted significant learning and forgetting when they were encouraged to take a test taker’s perspective. Compared to actual recall, however, predicted recall still significantly underestimated both learning and forgetting. A future condition × recall measure (predicted vs. actual) ANOVA produced a significant interaction, F(2, 76) = 6.00, p < .01, ηp2 = .14. I also compared predictions in the perspective condition to predictions in the no-perspective condition. A 3 × 2 ANOVA examining the future condition × perspective interaction did not yield a significant effect, F(2, 138) = 1.25, p = .29. That is, the magnitude of the stability bias was not significantly different in the perspective condition compared to the no-perspective condition. Inspecting Figure 18.1 suggests that perspective taking primarily affected the extra delay condition, but even when the extra delay condition was compared to baseline (i.e., the extra study condition was excluded from the analysis), predictions were not significantly more sensitive to delay in the perspective condition than they were in the no-perspective condition, F(1, 89) = 1.62, p = .21. Koriat et al. (2004) also compared a condition in which participants were encouraged to take a test taker’s perspective to a condition in which they were not (see Experiment 6a). They found that the perspective instruction did not have a significant effect on predictions, consistent with the present results. They also found that predictions were not sensitive to retention interval in either case. In the present experiment, by contrast, predictions were sensitive to retention interval and number of study trials, when they were made from the test taker’s perspective.

Y109937.indb 375

10/15/10 11:04:46 AM

376 • Nate Kornell

In summary, encouraging participants to take a test taker’s perspective did not eliminate the stability bias. The stability bias remained robust. Nevertheless, participants did predict significant amounts of both learning and forgetting. No previous between-participant experiment has shown significant effects of number of learning trials (Kornell & Bjork, 2009) or retention interval (Koriat et al., 2004) on predictions of future remembering—including the no-perspective condition of the present experiment. These results seem to suggest that a failure of perspective taking contributes to the stability bias. This conclusion is tentative, however, because the apparent effect of perspective taking on the stability bias was not significant.

Did Diminishing the Influence of Item Difficulty Increase Prediction Accuracy? The items people study often have a powerful influence on metacognitive judgments (Koriat, 1997). Highly salient item characteristics, such as the relatedness of word pairs, might overshadow extrinsic cues such as delay before a test, that would otherwise influence predictions. The last four sets of conditions were designed to reduce item difficulty’s influence on predictions. (To foreshadow, these four conditions were collapsed into the items-diminished condition in later analyses.) Method Except for the sample stimuli, the last four conditions were all identical to the perspective condition. In one of the four sets of conditions, all 24 sample items were relatively easy; in another, they were all relatively difficult. I expected item difficulty to become less salient, and thus impact predictions less, when it was homogenous. In the other two sets of conditions, participants were told that the experiment involved learning 24 word pairs, but they were shown only one sample pair, which was either easy or hard. Displaying only one item was intended to prevent any comparison between items, further diminishing item difficulty’s influence on predictions. Thus, there were four sets of conditions: 24 easy, 24 hard, 1 easy, and 1 hard. Results and Discussion Number and Difficulty of Items Before examining the stability bias, it is worth comparing the four sets of conditions to each other. The results

Y109937.indb 376

10/15/10 11:04:46 AM

Failing to Predict Future Changes in Memory • 377

are displayed in Figure 18.2. There was a main effect of future condition, F(2, 250) = 5.00, p < .01, ηp2 = .04 (more on this effect below). Predictions were higher for easy than hard items, even on a betweenparticipants basis, F(1, 250) = 20.46, p < .0001, ηp2 = .08. Predictions were also higher when one item was presented than when 24 items were presented, F(1, 250) = 16.33, p < .0001, ηp2 = .06. (It is possible that seeing all 24 items made participants consider the effect of list length, and that led them to predict lower performance.) There were no significant interactions between item difficulty, number of items, and future condition (all Fs < 1). Thus, the four sets of conditions designed to minimize item differences were collapsed in further analyses. The collapsed conditions will be referred to as the items-diminished condition. Stability Bias Figure 18.1 displays the collapsed data for the items-diminished condition. Predicted recall was lower in the extra delay condition than it was in the baseline condition, t(174) = 2.09, p < .01, d = .32. The difference between the baseline condition and the extra study condition was marginally significant, t(180) = 1.42, p = .08, d = .21. 24

Extra delay Baseline Extra study

Predicted Number Correct

20 16 12 8 4 0

1 easy

1 hard

24 easy

24 hard

Figure 18.2 Predicted number of items correct (out of 24). The sample items were either all easy or all hard, and the number of sample items presented was either 1 or 24. In all conditions participants were told they would study and be tested on 24 items. These four sets of conditions were collapsed to create the items-diminished condition (see Figure 18.1).

Y109937.indb 377

10/15/10 11:04:46 AM

378 • Nate Kornell

Predicted recall underestimated actual learning and forgetting. A future condition × recall measure (predicted vs. actual) ANOVA produced a significant interaction, F(2, 280) = 5.14, p < .01, ηp2 = .04. The items-diminished conditions were compared to the perspective conditions (in which 24 mixed-difficulty sample pairs were displayed). The perspective versus items-diminished variable did not interact with future condition (F < 1). If anything, contrary to the hypothesis, the stability bias was larger in the items-diminished condition. These findings are consistent with Experiment 4a by Koriat et al. (2004). In that experiment, instead of asking participants to predict based on all of the items they were going to study, Koriat et al. showed only two sample items, one easy and one hard. They, too, found that the stability bias persisted. In summary, minimizing item differences did not diminish the stability bias. The hypothesis that intrinsic cues overshadow extrinsic cues was not supported. Although the stability bias was robust, there was a significant effect of test delay and a marginally significant effect of number of study trials.

Overall, Did Participants Suffer From a Stability Bias? In a final set of analyses, all of the prediction conditions in Figure 18.1 were compared. Prediction condition (no perspective, perspective, items diminished) did not significantly influence the effect of future condition (i.e., there was not a significant interaction, F < 1). Future condition did have significant effects, however. Predicted recall was higher in the extra study condition than the baseline condition, t(279) = 2.30, p < .05, d = .27. Predicted recall was lower in the extra delay condition than the baseline condition, t(267) = 2.35, p < .01, d = .29. Although predicted recall was sensitive to the number of trials and test delay, the predictions still showed clear evidence of a stability bias. Future condition influenced actual recall much more than it influenced predicted recall, as a significant future condition × recall measure (actual, predicted) ANOVA demonstrated, F(2, 424) = 5.95, p < .01, ηp2 = .03. Overall, predicted recall was overconfident compared to actual recall, F(1, 424) = 9.08, p < .01, ηp2 = .02. As expected, there was a main effect of future condition, F(2, 424) = 16.12, p < .0001, ηp2 = .07.

Y109937.indb 378

10/15/10 11:04:46 AM

Failing to Predict Future Changes in Memory • 379

Were Older Adults Less Susceptible to the Stability Bias? As people age, they may become more sensitive to their own forgetting and, more generally, to how their memories change over time. If so, the stability bias might become less pronounced as people age. Participants were median split into a relatively young group (range = 18 to 31, mean = 25) and a relatively old group (range = 32 to 74, mean = 44) and collapsed across all predictions conditions. There were 203 participants per age group. As Figure 18.3 shows, the older group showed less stability bias than the younger group. However, the effect (i.e., the interaction between age group and future condition) was not significant, F(2, 400) = .83, p = .44. The young participants predicted higher recall levels than the older participants, F(1, 400) = 5.19, p < .05, ηp2 = .01. Actual recall performance was almost identical between the age groups, however, and age group did not interact with future condition (Fs < 1). In short, the stability bias appeared to be less pronounced in older participants, but the effect was not significant.

General Discussion The experiment reported here demonstrated a stability bias: Participants underestimated their learning ability, replicating Kornell and Bjork (2009), and they underestimated their propensity to forget, replicating Koriat et al. (2004). Encouraging participants to take a test taker’s perspective did not significantly increase prediction accuracy, nor did diminishing the salience of item differences. Despite the robust stability bias, participants were not completely insensitive to learning and forgetting. Compared to the baseline condition, predictions were significantly higher in the extra study condition and significantly lower in the extra delay condition. In previous between-participant experiments, predictions have not been significantly affected by delay (Koriat et al., 2004) or number of study trials (Kornell & Bjork, 2009). This finding suggests that imagining a test taker’s perspective may have had some effect in the present experiment. The effect was not statistically significant, however.

Long-Term Overconfidence People are frequently overconfident in their memories (Fischhoff, Slovic, & Lichtenstein, 1977; Metcalfe, 1998), and the current

Y109937.indb 379

10/15/10 11:04:47 AM

380 • Nate Kornell 24 Extra delay Baseline Extra study

Predicted Number Correct

20

16

12

8

4

0

18–31

Age Group

32–74

Figure 18.3 Predicted number of items correct (out of 24), collapsed across all prediction conditions, for participants who were relatively young (mean age = 25) and relatively old (mean age = 44).

experiment was no exception. Examining Figure 18.1, though, makes clear what the largest source of the overconfidence was: forgetting. When all prediction conditions were collapsed, the average predictions in the extra study, baseline, and extra delay conditions were 12.4, 10.9, and 9.3, respectively. The actual accuracy scores were 13.9, 7.3, and 1.4. I computed an overconfidence score by subtracting predicted accuracy from actual accuracy and dividing by the standard deviation of actual accuracy. Extra study participants were underconfident by .5 standard deviation. Baseline participants were overconfident by 1.3 standard deviations—a fairly large amount of overconfidence by metacognition standards. The delayed group, however, was overconfident by 8.7 standard deviations. These findings suggest it is important to distinguish between shortterm overconfidence and long-term overconfidence. The term longterm overconfidence refers to a feeling of confidence that one will be able to retrieve a memory in the relatively long-term future. (It should be distinguished from feelings of confidence at the time of retrieval.) As Figure 18.4 illustrates, when actual forgetting outstrips predicted

Y109937.indb 380

10/15/10 11:04:47 AM

Failing to Predict Future Changes in Memory • 381

forgetting, small amounts of short-term overconfidence can grow into large amounts of long-term overconfidence. In real life, we often make confidence judgments about our ability to retrieve information in the relatively distant future. For example, if a student reads a textbook three weeks before a cumulative exam and then makes a judgment that he or she will remember the information on the exam, that judgment may be subject to long-term overconfidence. The same is true when we judge our ability to remember new information, such as a new acquaintance’s name or an interesting fact from a news article, at an undetermined future time.1 In these situations, people may be highly overconfident about their ability to remember in the future. More broadly, long-term overconfidence may be a more ecologically valid measure than short-term overconfidence. Most metacognition experiments (e.g., experiments on judgments of learning) take place in one-hour sessions. These experiments measure short-term overconfidence. The metacognition literature may systematically underestimate long-term overconfidence. 12

Number Correct

10 Predicted

STO

8 6

LTO

4 2 0

Actual 0

7

14

21

28

Retention Interval (Days)

Figure 18.4 Hypothetical changes in overconfidence as a function of time. The solid line represents actual recall; the dashed line represents predicted recall. The solid circles are actual data points computed by averaging across conditions in the present experiment. The two vertical lines represent short-term overconfidence (STO) and long-term overconfidence (LTO).

Y109937.indb 381

10/15/10 11:04:48 AM

382 • Nate Kornell

Reinterpreting Judgments of Learning Based on the Stability Bias In the experiment reported here, participants were asked to make predictions for a delayed test. Other researchers, including Koriat et al. (2004), have asked for similar predictions. For example, Roediger and Karpicke (2006, Experiment 2) tested their participants either immediately after they studied or a week later. Consistent with the stability bias, the predictions were not affected by retention interval. The participants were asked to learn passages, and those given four chances to read the passage (SSSS) predicted they would do better than those who read the passage once and then took three tests (STTT). (Predictions were intermediate in a third SSST condition.) In the delayed test condition, these predictions were essentially backwards, because free-recall accuracy was highest in the STTT condition and lowest in the SSSS condition. Roediger and Karpicke’s (2006) findings can be interpreted as meaning that people underestimate the value of tests. But they look different in light of the stability bias, which causes people to make judgments based on the current state of their memories. Relative to the current state of their memories, Roediger and Karpicke’s participants made accurate predictions, because their immediate test performance, like their predictions, was highest in the SSSS condition and lowest in the STTT condition. Thus, perhaps these participants’ mistake was not misunderstanding the benefits of testing. Their mistake was more basic: They judged their future remembering based on their current memory state. Of course, the outcome was the same: Participants rated testing, the most effective long-term strategy, as least effective. In general, in situations where memory changes over time, the stability bias will tend to make people’s predictions inaccurate. This hypothesis may help explain Roediger and Karpicke’s (2006) findings as well as other research involving predictions of long-term learning (e.g., Agarwal, Karpicke, Kang, Roediger, & McDermott, 2008; Carroll et al., 1997; Karpicke & Roediger, 2008; Shaddock & Carroll, 1997).

Conclusion How is it possible that people act as though their memories will not change in the future? We all know we learn by studying and we forget over time. And in some circumstances we act on that knowledge; for example, everyone knows it is important to record speaking

Y109937.indb 382

10/15/10 11:04:48 AM

Failing to Predict Future Changes in Memory • 383

engagements in a calendar—otherwise, one might forget to fly to Seattle to give a lecture about memory. Yet unless special measures are taken to make learning or forgetting salient, people consistently demonstrate a stability bias. The situation is reminiscent of another failure to anticipate future events: the planning fallacy.2 As anyone who has had a kitchen remodeled knows, people generally underestimate how long it will take to complete complex tasks (Kruger & Evans, 2004). The stability bias and the planning fallacy have a number of features in common. They concern predictions about future events. They seem to be associated with a failure to take past events (e.g., forgotten names, late papers) into consideration when making predictions about the future. Perhaps most striking, knowledge does not eliminate the planning fallacy: Even psychologists who have personally experienced the planning fallacy every time they have written an article or prepared a lecture— even an article or lecture about the planning fallacy—cannot seem to overcome it.3 Similarly, knowledge does not seem to eliminate the stability bias. We all know about forgetting and learning, but we often act as though we don’t.

Acknowledgments Thanks to Robert Bjork and Matt Hays for their comments on a draft of this chapter. Thanks also to Elizabeth Bjork, Lindsey Richland, and Sara Appleton-Knapp. Correspondence concerning this chapter should be addressed to Nate Kornell, Department of Psychology, Williams College, Williamstown, MA, 01267. E-mail: [email protected].

Endnotes 1. In real life, when people predict whether they will be able to remember something in the future, the exact time of the future retrieval is often, perhaps usually, uncertain. It may be that we don’t take into account when we will be tested because we usually don’t know when we will be tested. Similarly, perhaps we don’t take into account how much time we will spend studying in the future because often we do not know that either. This reasoning might explain one of the most puzzling questions about the stability bias: When predicting future remembering, why do people ignore their own beliefs about how memory works? Under natural circumstances, perhaps people do not apply their beliefs because future study trials and retention intervals are unknown quantities. 2. Thanks to Matt Rhodes for pointing this out.

Y109937.indb 383

10/15/10 11:04:48 AM

384 • Nate Kornell

3. This situation evokes Hofstadter’s law: It always takes longer than you expect, even when you take into account Hofstadter’s law (Hofstadter, 1979). Thanks to Jim Kornell for pointing out this similarity.

References Agarwal, P., Karpicke, J., Kang, S., Roediger, H., & McDermott, K. (2008). Examining the testing effect with open- and closed-book tests. Applied Cognitive Psychology, 22, 861–876. Benjamin, A. S., Bjork, R. A., & Schwartz, B. L. (1998). The mismeasure of memory: When retrieval fluency is misleading as a metamnemonic index. Journal of Experimental Psychology: General, 127, 55–68. Carroll, M., Nelson, T., & Kirwan, A. (1997). Tradeoff of semantic relatedness and degree of overlearning: Differential effects on metamemory and on long-term retention. Acta Psychologica, 95, 239–253. Dunlosky, J., & Bjork, R. A. (Eds.). (2008). A handbook of metamemory and memory. Hillsdale, NJ: Psychology Press. Dunlosky, J., & Nelson, T. O. (1992). Importance of the kind of cue for judgments of learning (JOL) and the delayed JOL effect. Memory and Cognition, 20, 374–380. Finn, B., & Metcalfe, J. (2008). Judgments of learning are influenced by memory for past test. Journal of Memory and Language, 58, 19–34. Fischhoff, B. (1975). Hindsight is not equal to foresight: The effects of outcome knowledge on judgment under uncertainty. Journal of Experimental Psychology: Human Perception and Performance, 1, 288–299. Fischhoff, B., Slovic, P., & Lichtenstein, S. (1977). Knowing with certainty: The appropriateness of extreme confidence. Journal of Experimental Psychology: Human Perception and Performance, 3, 552–564. Goldsmith, M., Koriat, A., & Weinberg-Eliezer, A. (2002). Strategic regulation of grain size memory reporting. Journal of Experimental Psychology: General, 131, 73–95. Hofstadter, D. R. (1979). Gödel, Escher, Bach: An eternal golden braid. New York: Basic Books. Karpicke, J. (2009). Metacognitive control and strategy selection: Deciding to practice retrieval during learning. Journal of Experimental Psychology: General, 138, 469–486. Karpicke, J. D., & Roediger, H. L. (2008). The critical importance of retrieval for learning. Science, 319, 966–968. Kittur, A., Chi, E., & Suh, B. (2008). Crowdsourcing user studies with mechanical turk. In CHI 2008: Proceedings of the ACM Conference on HumanFactors in Computing Systems (pp. 453–456). New York: ACM Press. Koriat, A. (1997). Monitoring one’s own knowledge during study: A cueutilization approach to judgments of learning. Journal of Experimental Psychology: General, 126, 349–370.

Y109937.indb 384

10/15/10 11:04:48 AM

Failing to Predict Future Changes in Memory • 385

Koriat, A., & Bjork, R. A. (2005). Illusions of competence in monitoring one’s knowledge during study. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 187–194. Koriat, A., Bjork, R. A., Sheffer, L., & Bar, S. K. (2004). Predicting one’s own forgetting: The role of experience-based and theory-based processes. Journal of Experimental Psychology: General, 133, 643–656. Koriat, A., Sheffer, L., & Ma’ayan, H. (2002). Comparing objective and subjective learning curves: Judgments of learning exhibit increased underconfidence with practice. Journal of Experimental Psychology: General, 131, 147–162. Kornell, N. (2009a). Metacognition in humans and animals. Current Directions in Psychological Science, 18, 11–15. Kornell, N. (2009b). Optimizing learning using flashcards: Spacing is more effective than cramming. Applied Cognitive Psychology, 23, 1297–1317. Kornell, N., & Bjork, R. A. (2007). The promise and perils of self-regulated study. Psychonomic Bulletin and Review, 14, 219–224. Kornell, N., & Bjork, R. A. (2009). A stability bias in human memory: Overestimating remembering and underestimating learning. Journal of Experimental Psychology: General, 138, 449–468. Kornell, N., & Metcalfe, J. (2006). Study efficacy and the region of proximal learning framework. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32, 609–622. Kornell, N., & Son, L. K. (2009). Learners’ choices and beliefs about self-testing. Memory, 17, 493–501.   Kruger, J., & Evans, M. (2004). If you don’t want to be late, enumerate: Unpacking reduces the planning fallacy. Journal of Experimental Social Psychology, 40, 586–598. Metcalfe, J. (1998). Cognitive optimism: Self-deception or memory-based processing heuristics? Personality and Social Psychology Review, 2, 100–110. Metcalfe, J., & Shimamura, A. P. (Eds.). (1994). Metacognition: Knowing about knowing. Cambridge, MA: MIT Press. Nelson, T. O., Dunlosky, J., Graf, A., & Narens, L. (1994). Utilization of metacognitive judgments in the allocation of study during multitrial learning. Psychological Science, 5, 207–213. Rhodes, M. G., & Castel, A. D. (2008). Memory predictions are influenced by perceptual information: Evidence for metacognitive illusions. Journal of Experimental Psychology: General, 137, 615–625. Roediger, H. L., & Karpicke, J. D. (2006). Test-enhanced learning: Taking memory tests improves long-term retention. Psychological Science, 17, 249–255. Schwartz, B. L., Benjamin, A. S., & Bjork, R. A. (1997). The inferential and experiential basis of metamemory. Current Directions in Psychological Science, 6, 132–137. Shaddock, A., & Carroll, M. (1997). Influences on metamemory judgements. Australian Journal of Psychology, 49, 21–27. Smith, J. D., & Washburn, D. A. (2005). Uncertainty monitoring and metacognition by animals. Current Directions in Psychological Science, 14, 19–24.

Y109937.indb 385

10/15/10 11:04:48 AM

386 • Nate Kornell

Son, L. K., & Metcalfe, J. (2000). Metacognitive and control strategies in studytime allocation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 204–221. Thiede, K. W., Anderson, M. C. M., & Therriault, D. (2003). Accuracy of metacognitive monitoring affects learning of texts. Journal of Educational Psychology, 95, 66–73. Tiede, H., & Leboe, J. (2009). Metamemory judgments and the benefits of repeated study: Improving recall predictions through the activation of appropriate knowledge. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35, 822–828.

Y109937.indb 386

10/15/10 11:04:48 AM

19

Relying on Other People’s Metamemory Barbara A. Spellman, Elizabeth R. Tenney, and Margaret J. Scalia

The metamemory literature is filled with justifications for why we should care about our own metamemory—for example, we have to decide how much more time to allocate to tasks (e.g., studying for an exam we need to do well on) or how much to rely on our own memory without asking for outside help (e.g., “I’m sure I remember that if I turn right, I will get to my destination”). What the literature is not filled with are justifications for why we should care about the metamemory of others. Clearly, we do rely on various types of metacognitive statements of others—for example, the probabilistic predictions of stock advisors and meteorologists. And we rely on the metamemory of others in a very important context—the courtroom—where jurors are often swayed by the testimony of confident (but perhaps inaccurate) witnesses. This chapter explores the question: What bases do people use for relying on other people’s metacognitive statements? To answer that question, we first address what researchers know about the conditions that affect the accuracy of metamemory judgments generally. That is, when might people be justified in relying on their own metamemories? Second, we describe some beliefs that laypeople have about their own memories and metamemories that illustrate where they go wrong. Third, we speculate on whether people are likely to generalize from beliefs about their own metamemories to beliefs about the metamemories of others. Next, we present two new studies showing that, as in

387

Y109937.indb 387

10/15/10 11:04:49 AM

388 • Barbara A. Spellman, Elizabeth R. Tenney, and Margaret J. Scalia

many other areas, people believe their own metamemory judgments are likely to be better than those of others. Finally, we review some recent work of our own showing how people use available information about other people’s metamemories (not merely other people’s confidence) when deciding whom to believe. We end with some questions for further research.

Our Own Metamemory People vary in how good they are at judging how accurately they remember things from the past and at predicting what they are likely to remember in the future. Most of that variation probably has to do with the factors surrounding the learning and remembering, and only a small amount with individual differences in metamemory. Such factors include (but are not limited to) characteristics of the to-be-remembered material, the way judgments of confidence are made, the way memory is measured,1 and the presence of biasing events during the retention interval. This chapter focuses on metamemory judgments for episodic memory tasks. Episodic tasks have been used extensively in the development of theories of metamemory. In addition, understanding episodic metamemory has real-world importance in the legal system, where police and jurors must decide whether and how much to rely on the metamemories of witnesses who are often asked to state how confident they are in their identifications and memories. Several different paradigms have been used when investigating the relation between confidence and accuracy on episodic tasks. Here we describe various factors that affect metamemory performance as revealed in assorted (and mostly unconnected) literatures: judgments of learning, flashbulb memories, lineups, and misleading postevent information. Judgments of Learning Judgments of learning (JOLs) differ from the other tasks mentioned above in that JOLs are really predictions of future memory performance rather than judgments of confidence in reported memories. However, the lessons learned from JOL research are relevant to our inquiry because the nature of these tasks allows us to disentangle prediction and performance. A typical JOL study has three phases: (1) encoding of the to-be-remembered information, (2) predicting how likely it is to be remembered in the future (i.e., making the judgment of learning), and (3) actually remembering it on a memory test. Because good metamemory performance means that predictions of what will be remembered are correlated with what is actually remembered, metamemory

Y109937.indb 388

10/15/10 11:04:49 AM

Relying on Other People’s Metamemory • 389

performance can be impaired by exposing people to information that influences predictions (JOLs) and memory independently. How do people predict their future memory performance? Current theory suggests that people use fluency—the ease of internal processing of information. If it felt easy to learn or understand something initially (encoding fluency), or if it felt easy to remember something after some time has passed (retrieval fluency), then people believe they will be likely to remember it in the future (Koriat & Ma’ayan, 2005). However, things that make information easy to process at one time do not necessarily make it easy to remember later, making people prone to errors in their JOLs. For example, different manipulations at encoding may affect predictions of memory and actual memory differently. Physical characteristics of words can do that: When words are presented in a larger font, they are easier to read, yielding more fluency, and thus increased JOLs. However, size of font (above a threshold) does not affect memory (Rhodes & Castel, 2008). Other characteristics of words may also differentially affect predictions and actual memory. For example, when the to-be-remembered items include both high-frequency (e.g., table) and low-frequency (e.g., llama) words, subjects predict that they will be more likely to remember the high- than the low-frequency words. That prediction is correct for a recall test; however, if the memory test is a recognition test, those predictions will be backwards and the overall gamma correlations2 for such lists will be low (Begg, Duft, Lalonde, & Melnick, 1989; see Benjamin, Bjork, & Schwartz, 1998, for other examples). On the other hand, manipulations sometimes affect predictions of memory and actual memory similarly, in which case people’s JOLs are good. For example, when the to-be-remembered items include both highly related pairs of words (e.g., hot–cold) and nonrelated pairs of words (e.g., glue–tone), and subjects are accurately told that the memory task will be cued recall, they will correctly give higher JOLs to the related than the nonrelated pairs (Connor, Dunlosky, & Hertzog, 1997; Dunlosky & Matvey, 2001; Koriat, 1997; Koriat & Bjork, 2005). Because relatedness positively affects both JOLs and cued recall, gamma correlations for such lists will be high (Connor et al., 1997; Dunlosky & Matvey, 2001). A former big issue in the JOL literature was whether having people predict their future memory performance actually influences their future memory performance in a way that would affect the gamma correlation. The explanation is that when subjects are asked to make a prediction of future memory, and that prediction is made with a delay after encoding, subjects try to (covertly) remember that item. Whether

Y109937.indb 389

10/15/10 11:04:49 AM

390 • Barbara A. Spellman, Elizabeth R. Tenney, and Margaret J. Scalia

they can do so (and how easily) informs them about what judgment they should make regarding future recall. Failing to remember it would lead to a low JOL. Succeeding in remembering would lead to a high JOL but also (under some conditions) would increase the actual chances that the item would be remembered later, inflating gamma correlations (Spellman & Bjork, 1992; Spellman, Bloomfield, & Bjork, 2008). These high correlations, therefore, might not be a reflection of good JOLs, but rather of the known effect of retrieval practice. Thus, in this paradigm, as in several of the others, things that happen after encoding can affect memory or metamemory or both. Importantly, we are often unaware of these effects, leading us to rely on faulty judgments of our own memory. Flashbulb Memory Whereas JOLs are typically based on episodic memories that are tightly controlled in the laboratory, another line of research deals with flashbulb memories, which are based on real-life events—typically surprising, highly emotional, oft-repeated, and widely shared ones. Flashbulb memories refer to the bright, detailed, fixed images that people have of where they were and what they were doing when they first heard about some dramatic event (Brown & Kulik, 1977). Every few years, there is some public event that creates such memories, and so, depending on people’s ages, Americans might have flashbulb memories for when they heard that President Kennedy had been assassinated (1963), Martin Luther King had been assassinated (1968), President Reagan had been shot (1981), the Challenger had exploded (1986), O. J. Simpson had been found not guilty (1995),3 or the 9/11 attacks had occurred (2001). Flashbulb memories are especially interesting to metamemory researchers because people claim to have very rich and vivid memories for such events—thus creating high confidence in them. If asked to describe one everyday memory (e.g., the last time you went to see a movie) and one flashbulb memory (e.g., the moment you learned about the 9/11 attacks), nearly all of us would feel more certain in our ability to recall the latter. However, research suggests that these events are not remembered any more accurately than “normal” events (Talarico & Rubin, 2003; Weaver, 1993). Although we may believe that we are more likely to remember highly arousing events, the initial emotional rating of an event is correlated only with a belief in the accuracy of one’s memory, not with accuracy itself (Talarico & Rubin, 2003). People are, therefore, quite poorly calibrated for flashbulb memories. Note that flashbulb memories have several characteristics similar to eyewitness memories for a crime: The event is surprising, novel, consequential, and

Y109937.indb 390

10/15/10 11:04:49 AM

Relying on Other People’s Metamemory • 391

emotional, it will be often discussed, and people are likely to say, “I will never forget when….” In fact, people sometimes do claim to have vivid flashbulb-like memories for personal events with the same characteristics (Rubin & Kozin, 1984). Given the similar nature of flashbulb and eyewitness memories, it is important to consider whether eyewitnesses are prone to overconfidence in their memories. Eyewitness Memory: Lineups In the legal system there are many times at which we quite directly rely on the metamemory of others. The incredible weight we give to lineup identifications and eyewitness testimony indicates our trust in other people’s abilities to monitor their memories. Is this trust justified? Several important research findings have come out of the hundreds of published studies on police lineups (not all of which include confidence judgments).4 Typically, in these studies, subjects first see a target face— perhaps in a series of pictures, in a video of a real or mock crime, or, often, in a staged event in a laboratory or classroom. During the retention interval, subjects might be asked to provide a description of the face or events, talk to others who had also witnessed the events, or view faces of other potential suspects (possibly including the target). Then the subjects are asked to pick out the suspect from a group of people or photographs and rate their confidence in their identification. Many things can be varied at this test. Most important is the composition of the lineup—in particular, whether it actually contains the target. Other commonly manipulated variables include the similarity of the suspect to the others in the lineup, the instructions to the subjects, and the behavior of the experimenter, including whether any feedback is given to the subject. The studies sometimes include an even later memory test (analogous to identification at trial). The literature contains a huge number of studies supporting the finding that the correlation between confidence and accuracy in lineup identifications is relatively weak (on the order of .25; see Read, Lindsay, & Nicholls, 1997, for a review), which is far lower than one might want for the key evidence in a conviction. Why might metamemory be bad in lineups? There certainly are factors that affect confidence and accuracy similarly—for example, the amount of time the witness has to view the suspect at the time of encoding; more time generally means both better accuracy and higher confidence. But there are also factors that affect them independently. Instructions given at the time of the lineup can influence witnesses to pick someone out of the lineup when the target is not present. Witnesses may be given “biased instructions” that suggest that the perpetrator is certainly present in the lineup or that otherwise discourage the witness

Y109937.indb 391

10/15/10 11:04:49 AM

392 • Barbara A. Spellman, Elizabeth R. Tenney, and Margaret J. Scalia

from refusing to make a selection. When the target is absent, biased instructions affect accuracy in that witnesses become much more likely to (incorrectly) choose someone from the lineup. When the target is present, biased instructions tend to create greater confidence in the choice but barely affect accuracy (Brewer & Wells, 2006; Steblay, 1997). In addition, feedback given to witnesses after they make an identification—for example, saying something like “good job”—increases later confidence regardless of accuracy (see Douglass & Steblay, 2006, for a review). Eyewitness Memory: Misleading Postevent Information Another set of studies involving metamemory that has influenced the legal system is based on misleading postevent information. In the basic studies, as pioneered by Loftus (Loftus, 1975; Loftus & Palmer, 1974), subjects learn about some event either by hearing or reading a description or seeing slides or videos. During the retention interval they are exposed to information that conflicts with what was originally learned—often in an indirect manner. For example, subjects may see a video in which a car speeds through a yield sign and gets into an accident. Later, they might be asked questions such as, “How fast was the car going when it went through the stop sign?” Finally they are asked to remember the actual events. Using this misleading questions paradigm, up to 30% more subjects will incorrectly report that the car went through a stop sign—and they will report that fact with high confidence (see Loftus, 1992, 2005, for reviews). Thus, again, accuracy can be decreased without a corresponding loss in confidence.

Explicit Understanding of Our Own Memory and Metamemory In the previous section we saw that when people experience different types of memories, they are often wrong about what they remember well or what they are likely to remember in the future. They use information about their own subjective experiences to make their metamemory judgments. These judgments may be off when people’s accuracy or their confidence, or both, is affected by extrinsic factors. Typically these are factors that people are not aware of or might not even recognize as relevant if they are. As described above with regard to JOLs, it is likely that metamemory judgments are based on an experienced sense of fluency. But what

Y109937.indb 392

10/15/10 11:04:49 AM

Relying on Other People’s Metamemory • 393

happens when, rather than asking people to do a memory task and make the metamemory judgments themselves (thus experiencing fluency or the lack thereof), they are asked to predict memory performance of others? That is, what do people know about memory and metamemory, and how do they extend that to their predictions of others’ behavior? Below we describe two “styles” of studies that ask people to make memory predictions: first, laboratory studies in which we can compare what some subjects (“memorizers”) actually do to what other subjects (“predictors”) predict they will do; second, surveys that ask people about factors that are likely to influence memory. Laboratory Nonexperiential Memory Judgment Tasks Using many different types of laboratory procedures, subjects have been asked to predict the memory performance of other subjects in the task. For example, in a paper comprising seven experiments, Koriat et al. (2004) demonstrated that fluency-based JOLs are generally not affected by information about when the recall test will be. In Experiment 1, subjects saw word pairs and immediately after were asked how likely they would be to recall the second word when shown the first word—either immediately, the following day, or the following week. When the time manipulation was between subject, subjects made the same predictions for the three retention intervals—thus not taking into account any forgetting. When the full experiment was described to a different set of subjects (Experiment 2; time was within subject), those subjects fully appreciated that recall would fall off over time and made very good predictions of the Experiment 1 performance. Thus, it seems that people can use their lay theories of memory to make the right predictions when they are not basing their predictions on subjective fluency. But what happens when people have both theory (or general knowledge) and fluency to rely on when making their metamemory judgments? Subjects who made JOLs for themselves and were told, item by item, whether the test would be in 10 minutes, 1 day, or 1 week did show an effect of delay on their predictions. That is, their predictions reflected an understanding that memory would decline over time. However, they overestimated performance relative to both actual performance and the predictions in Experiment 2. Thus, subjective fluency for a particular instance may override general knowledge. Researchers have also found that subjects mispredict the effects of importance on memory. In a recent study, motivated by a real court case, subjects were assigned to be either memorizers or predictors. All saw six photographs of faces presented with five facts each for a total of two minutes. The memorizers were told that they would have to

Y109937.indb 393

10/15/10 11:04:49 AM

394 • Barbara A. Spellman, Elizabeth R. Tenney, and Margaret J. Scalia

try to recall the facts later and would receive $0.10 for each recalled fact. Two-thirds of the memorizers were also told that they would get a $0.50 bonus for each fact remembered about one particular photograph (“Beryl White”). Some of those subjects were told about the bonus before studying the material; some were told immediately after studying the material. Later the memorizers were shown the photograph of Beryl White and asked to recall the facts about her. The predictors saw the same photographs and facts as the memorizers and read one of the versions of the instructions described above (bonus explained before, after, or no bonus). They were then asked to predict what percentage of memorizers in their condition would remember each fact. How good were the predictions? Compared to the no-bonus conditions, memorizers recalled many more facts when told about the bonus before encoding. But being told about the bonus after encoding did not significantly increase memory. Predictors correctly understood that learning about the bonus beforehand would increase recall over the no-bonus condition. However, they also incorrectly thought that being offered the bonus after the encoding would enhance recall nearly as much as being offered it beforehand (Kassam, Gilbert, Swencionis, & Wilson, 2009). These results are a nice demonstration of the problem that when motivation to remember something occurs after the fact—like the face of a passerby or the details of a conversation (because it is now relevant to a crime)—people think motivation will help memory, but it does not. Surveys on Memory Relating to Eyewitness Memory Several published surveys, typically done by eyewitness memory researchers,5 investigated what laypeople believe about memory and compare that to what experts believe. For example, one study asked 111 jurors, 42 judges, and 52 law enforcement personnel the same questions asked in the Kassin, Tubb, Hosch, and Memon (2001) survey of eyewitness experts (Benton, Ross, Bradshaw, Thomas, & Bradshaw, 2006). There were many interesting differences between the jurors and experts, and many were consistent with the kinds of judgments people make for themselves in laboratory tasks. With regard to encoding, although jurors agreed at the same rate as experts to the statement “Very high levels of stress impair the accuracy of eyewitness testimony,” they were much less likely to believe that “The presence of a weapon impairs an eyewitness’s ability to accurately identify the perpetrator’s face” (87% for experts to 39% for jurors). Jurors also did not appreciate the effect of exposure time; they were much less likely to agree that “The less time an eyewitness has to observe an event, the less well he or she will remember it” (81% to 47%).

Y109937.indb 394

10/15/10 11:04:49 AM

Relying on Other People’s Metamemory • 395

Jurors also did not appreciate some of what occurs during the retention interval. For example, they were less likely to agree that “The rate of memory loss for an event is greatest right after the event and then levels off over time” (83% to 33%). And they were less likely to agree that memories are malleable: “Eyewitness testimony about an event often reflects not only what they actually saw but also information they obtained later on” (94% to 60%), and “Exposure to mug shots of a suspect increases the likelihood that the witness will later choose that suspect in a lineup” (95% to 59%). Jurors also believed in the reliability of statements of confidence more than experts did and were, therefore, less likely to agree with the statement that “An eyewitness’s confidence can be influenced by factors that are unrelated to identification accuracy” (95% to 50%), and that “An eyewitness’s confidence is not a good predictor of his or her identification accuracy” (87% to 38%). Given what we know about the fallibility of metamemory, it is alarming to learn that jurors failed to appreciate the factors that might influence others’ judgments, and that they overestimated the confidence–accuracy correlation. Thus, people are often wrong about how memory works when they are basing their judgments not only on current experience or fluency, but also on beliefs or theories they have about memory.

Comparing Ourselves to Others: Generalize or Differentiate? People’s own subjective experience may thus guide (or misguide) their metamemory judgments. But what about when people need to rely on the metamemories of others? People might assume that others make metamemory judgments the same way they do—so one’s own introspection is useful for predicting and understanding the judgments of others. Alternatively, people might use subjective experience (e.g., fluency) for themselves but other methods, for example, general knowledge about memory, when predicting the judgments of others. There are circumstances in which people use their own introspections to predict the behavior of others. Social psychologists might point to the “false consensus effect,” in which people believe that others’ preferences are, or choices will be, similar to their own (Ross, Greene, & House, 1977). Cognitive psychologists might point to studies in which people use their own (biased) subjective experience to predict the performance of others (e.g., if they can solve the anagram fscar easily, they

Y109937.indb 395

10/15/10 11:04:50 AM

396 • Barbara A. Spellman, Elizabeth R. Tenney, and Margaret J. Scalia

will predict that others can, too, even if the facility unknowingly arises from having seen the correct answer6 earlier; Kelley & Jacoby, 1996). However, more often people seem to treat their own introspections as not only different from, but also better than, those of others (see Pronin, 2008, for a review). People believe that their introspections (and other beliefs) are accurate, informative, and relatively free from bias, even though they usually are none of those things (Pronin & Kugler, 2007; Robinson, Keltner, Ward, & Ross, 1995). As a result, when people believe that they have thought through all the steps in making a decision, and do not feel like they were being biased, they conclude that they have acted objectively and realistically. When others reach a different conclusion, people assume those others were biased (Pronin, Gilovich, & Ross, 2004). Thus, people do not award other people the same introspective credit that they award themselves. This result fits with other findings that people think they are above average drivers, are more moral, and have better senses of humor than others (Dunning, Meyerowitz, & Holzberg, 1989). Present Studies: When I’m Sure, I’m Sure, but When You’re Sure, You’re Not Research therefore suggests that people believe they can look inside themselves (i.e., introspect) to determine their true motivations and intentions, but that others who attempt this process are less successful. What about for the introspections of metacognition? Do people also believe that they are better than others at knowing and predicting the accuracy of their own knowledge? The present studies examined differences between people’s beliefs about their own metacognitive statements and their beliefs about other people’s metacognitive statements involving the idea of being sure. Subjects read about three situations and were asked what it would mean if someone (either a friend or themselves) said they were sure of something: either a fact (the date of the last day of classes the following semester) or a prediction (that he or she would get an A on an upcoming exam; that two mutual acquaintances would hook up). The subjects were asked—given what they or their friend had just said (about being sure)—to indicate how sure they were that the fact was true or the event would occur. The scale ranged from 0 (“certainly will not happen”) with a midpoint of 50 (“equally might or might not”) to 100 (“certainly will happen”). There were numerical labels at multiples of 10. Subjects were asked to place an X at the value representing their best guess and then to draw lines out from the X to represent their uncertainty in the guess. Examples were provided.

Y109937.indb 396

10/15/10 11:04:50 AM

Relying on Other People’s Metamemory • 397

We expected that subjects would believe that when they said they were sure of something, that thing would be more likely to occur than when a friend said he or she was sure of it. We also expected that, sensibly, people would give the highest certainty ratings to the fact (end of semester date), the next highest to the prediction that the predictor had inside knowledge about and some control over (whether he or she would get an A), and the lowest to the prediction over which they had no inside knowledge and no control (Justin and Ashley hook up). Study 1 was run in the less sensitive between-subject design; Study 2 was run within subject. Study 1: Between-Subject Version There were 99 subjects in Study 1 (47 male, 48 female, 4 did not report gender). They participated in this study along with other short reasoning tasks as part of an optional requirement for course credit. Subjects were asked to “think about a college friend you sometimes study with—preferably someone who does about as well as you do in school.” Then they made certainty judgments to three questions. Half of the subjects did it based on their own statements of certainty (bolded version); half did it based on their friend’s statements of certainty (italics version). These were the questions to which subjects responded. (The subjects did not see the “titles” of each question; those are for future reference.) 1. “Get an A.” Suppose the two of you are studying for a test the next day and you are quizzing each other. After a while you say to your friend/your friend says, “I am sure I am going to get an A on this test.” Given what you/your friend just said: How sure are you that you/he or she will get an A? 2. “Justin and Ashley.” Suppose you and your friend are discussing two people you know a little bit—Justin and Ashley. You say you are sure/your friend says he or she is sure that Justin and Ashley will hook up. Given what you/your friend just said: How sure are you that it will happen? 3. “End of semester.” You and your friend are discussing the spring semester calendar. You say you are/your friend says he or she is sure that the semester ends on May 4. Given what you/ your friend just said: How sure are you that the semester will end on May 4? Results We analyzed only the subjects’ best guess point estimates (not the error lines they drew). A research assistant blind to the

Y109937.indb 397

10/15/10 11:04:50 AM

Certainty

398 • Barbara A. Spellman, Elizabeth R. Tenney, and Margaret J. Scalia

90 85 80 75 70 65 60 55 50

Study 1

You Friend

Get an A

Justin & Ashley Question

End of semester

Figure 19.1 Results of Study 1. Mean ratings of how likely it is that a sure thing will occur (with standard error bars).

hypothesis converted the Xs to numbers. The results were very clear and can be see in Figure 19.1. First, there is an obvious main effect of who is making the claim. Subjects are more certain that something will, in fact, happen when they, themselves, say they are sure (M = 75, SD = 11) than when a friend says he or she is sure (M = 68, SD = 12), F (1, 97) = 10.49, p < .005. Second, there was a clear main effect of question: Subjects gave their highest mean ratings to the end of semester question (M = 81, SD = 18), middle ratings to the get an A question (M = 75, SD = 15), and lowest ratings to the Justin and Ashley question (M = 58, SD = 18), F (2, 194) = 66.70, p < .001. Finally, there was no interaction between speaker and question, F (2, 194) = .32. Note that there was a main effect of gender such that the women overall gave higher ratings than the men did (female = 79, male = 69). However, there was no interaction between gender and whether the statement was made by themselves or their friend.7 Thus, people consistently showed more confidence about their own certainty. The judgments of others were not random or inconsistent— they were just somewhat lower than the judgments of self for all questions. This pattern suggests that people are using the same mechanism to make the two judgments—and then either enhancing it for themselves or devaluating it for others. Study 2: Within-Subject Version There were 127 subjects in Study 2 from the same subject pool as Study 1 (36 male, 89 female, 2 did not

Y109937.indb 398

10/15/10 11:04:51 AM

Relying on Other People’s Metamemory • 399

report gender). They participated in the same manner as in Study 1; no one was in both studies. In this within-subject version, all subjects answered the six certainty questions from Study 1—three about themselves and three about a friend. The first two were about predicting the exam grade; the next two were about predicting the hookup; the final two were about the end of semester date. Whether subjects answered the questions about themselves or their friend first was counterbalanced across subjects.8

Certainty

Results Again, we only analyzed the subjects’ best guess point estimates (not the error lines they drew). In this study there were no sex differences and no order effects. The results were nearly identical to those from Study 1 (see Figure 19.2). Again, there is a clear main effect of who is making the claim. Subjects are more sure that something will happen when they, themselves, say they are sure (M = 77, SD = 11) than when a friend says he or she is sure (M = 72, SD = 11), F (1, 126) = 35.81, p < .001. Again, there was a clear main effect of question: Subjects gave their highest mean ratings to the end of semester question (M = 84, SD = 15), middle ratings to the get an A question (M = 75, SD = 15), and lowest ratings to the Justin and Ashley question (M = 63, SD = 16), F (2, 252) = 78.12, p < .001. And, again, there was no interaction, F (2, 252) = .55. Thus, people were again consistently surer about their own sureness. But their judgments about themselves and their judgments about others were highly correlated (get an A = .56, Justin and Ashley = .67, end of semester = .68). This is more evidence suggesting that when judging 90 85 80 75 70 65 60 55 50

Study 2

You Friend

Get an A

Justin & Ashley Question

End of semester

Figure 19.2 Results of Study 2. Mean ratings of how likely it is that a sure thing will occur (with standard error bars).

Y109937.indb 399

10/15/10 11:04:51 AM

400 • Barbara A. Spellman, Elizabeth R. Tenney, and Margaret J. Scalia

others, people assume that others might do the same kind of introspecting but don’t do it as well.

Relying On Others’ Metamemory We see from these studies that people take other people’s statements of confidence with a grain of salt—or, perhaps, with a splash of water— compared to their own. They believe that others are correct about their relative confidence in events (i.e., so resolution is fine), but that others are (slightly) overconfident in their claims. How does that relate to studies in which people must rely on other people’s metamemorial statements? There is a huge literature, especially in the psychology and law domain, showing that confidence is the most important factor in assessing witnesses’ credibility and that high-confidence witnesses are believed very strongly. People place great weight on witnesses’ statements of confidence, even though such statements are not that highly correlated with accuracy 9 (see Read et al., 1997, for a review). Presumption of Calibration10 What happens, however, if someone does get real information about another person’s metacognitive abilities—that is, she learns something about his confidence–accuracy correlation (not just about his confidence level)? In a series of studies, Tenney and colleagues (Tenney, Spellman, & MacCoun, 2008; see also Tenney, MacCoun, Spellman, & Hastie, 2007) have shown that when adult subjects get evidence about two informants’ memory confidence and memory accuracy, so that they can assess the informants’ resolution, the subjects will use that information (rather than confidence alone) in judging whom to believe. Thus, although people might initially believe that witnesses have good resolution (so that high-confidence statements should be strongly believed), that belief may be overridden by data. They call this the presumption of calibration hypothesis. In the basic version of their studies, subjects read a story in which they play the role of jurors listening to the conflicting depositions of two witnesses to a car accident. One witness is highly confident about his memory for the accident and for two collateral events from that day (e.g., the weather was especially cold and he took his dog to the veterinarian); he believes the accident was the red car’s fault. The other witness is highly confident about his memory for the accident, but of mixed confidence (one high, one low) for the two collateral events (e.g., he is certain that the weather was especially cold and unsure of whether

Y109937.indb 400

10/15/10 11:04:51 AM

Relying on Other People’s Metamemory • 401

he had a meeting at work that day); he believes that the accident was the gray car’s fault. (Of course, which events the witnesses recalled, and the order of the presentation, was counterbalanced across subjects.) After reading this much of the story, subjects were asked how credible they thought each witness was and whose version of the accident they believed. Consistent with previous findings regarding confidence, subjects thought that the high-confidence witness was more credible and tended to believe his version of the accident. Subjects then read that a private investigator was sent to find out the truth about the collateral events. It turns out that each witness was right about one collateral event—it had, indeed, been a particularly cold day. They were also each wrong about one collateral event. The high-confidence witness was certain he had been to the vet that day, but the appointment was a few days off. The mixed-confidence witness was unsure of whether he had a meeting at work that day—and that event was a few days off as well. Subjects were then queried again. Their preferences switched: Now, rather than continuing to believe the highconfidence witness, they thought that the mixed-confidence witness was more credible and tended to believe his version of the accident.11 Consider what the subjects are doing. They are switching away from a highly confident witness to a less confident one, but only when the less confident one demonstrates better resolution. They must be keeping track, for each person and each event, whether the person was accurate and how confident he was. That is, to each event they must bind the accuracy and confidence information in order to determine calibration. Then they choose to rely on the well-calibrated witness’s high-confidence statement about which car was at fault (see Spellman & Terry, 2010, for a review).

Some Questions Above we have touched on research from various fields that we think is relevant to understanding how, whether, and when people will rely on the metamemory of others. Some common themes have emerged. Below we lay out some questions we still have. One question we are addressing in other research is under what conditions people are able to use information about confidence and accuracy to assess others’ metamemories. We believe that it is an attention-consuming task and have found that five- to six-year-old children as well as adults under cognitive load do not use such calibration information in tasks analogous to the car accident task described above (Tenney, Small, Jaswal, Kondrad, & Spellman, 2010). We wonder whether being

Y109937.indb 401

10/15/10 11:04:52 AM

402 • Barbara A. Spellman, Elizabeth R. Tenney, and Margaret J. Scalia

in a courtroom setting would enhance or detract from using calibration information—recognizing that jurors don’t often have dispositive information about whether a witness is accurate or not (although they do have some ambiguous information when one witness’s story conflicts with others). Another set of questions has to do with calibration across domains: If we know that someone shows good calibration in one domain, should we assume that he or she is well calibrated in others? At least for different types of metamemory judgments, sometimes they are. For example, people who show good resolution for general knowledge questions also show good resolution for eyewitness memory (Perfect & Hollins, 1996)—even though they might also be overconfident in their judgments (Bornstein & Zickafoose, 1999). A related question is, regardless of what is true about cross-domain calibration, what do people believe about cross-domain calibration? So, for example, if someone demonstrates good calibration in one domain, will people then believe him or her about everything?

Conclusion The literature on judgments of learning, flashbulb memories, and eyewitness testimony sheds light on how people make metacognitive decisions about themselves—about what they know and how well they know it. The story this literature tells is that, using subjective experience as a guide, people believe they can know themselves well; however, unbeknownst to them, many extraneous factors can systematically trick them into a false sense of confidence in their knowledge. Regardless of whether they should, people use information gleaned from their own metacognitions to make judgments about others (e.g., if people feel like they know something, they believe others probably do, too). Because people believe that they are well calibrated themselves, it is not a bad guess that other people are also well calibrated. Our research indicates that people assume their own statements of confidence are good indicators of accuracy and that their friends’ statements of confidence are not quite as good—but not so far off, either. (Perhaps people would discount even further the metacognitive statements of people less well known to them.) Why the difference between self and others? Because people cannot know others’ internal dialogs or feel for themselves how fluently information came to someone else’s mind, people might use themselves as a baseline, then take what they know about memory in general to

Y109937.indb 402

10/15/10 11:04:52 AM

Relying on Other People’s Metamemory • 403

adjust and decide what to make of others’ metacognitive statements. According to memory surveys, laypeople are not as conscious of memory fallibility as are the memory experts; however, people are more suspicious of other people’s ability to introspect compared to their own. As a result, people tend to trust their own metacognitions and confidently held beliefs more than other people’s. We want to see it, and we want to feel like we know it, to believe it.

Acknowledgments The authors thank Janet Luo for entering data. B.A.S. thanks A.S.B. for putting together a fabulous festschrift for R.A.B. (and especially for doing it without her help).

Endnotes 1. And sometimes whether the way memory is measured differs from the way people think it will be measured. 2. Two kinds of measures of metamemory performance are often used for these tasks. Calibration is a measure of absolute accuracy; a person is well calibrated if all of the items she says she has 100% chance of remembering are remembered, 70% of the items she says she has a 70% chance of remembering are remembered, and so on. Calibration reveals over- and underconfidence. Resolution is a measure of relative accuracy because it measures how well people can detect which items are more likely to be remembered than which other items—regardless of whether people are generally overall under- or overconfident. The Goodman–Kruskal gamma correlation (or just “gamma”) is used to measure resolution. Gamma correlates two observed variables: predictions of memory and actual memory for individual items. 3. People reported flashbulb memories for learning the verdict, but it is not a typical kind of flashbulb memory event because everyone knew something was going to occur that Monday morning. 4. Results of these studies, plus the scores of DNA exonerations involving lineup misidentifications, have changed the way lineups are conducted in many U.S. jurisdictions. See Wells et al. (2000). 5. Knowing what other people know about memory is an important applied issue in the legal arena. In order for an expert to be allowed to testify, a litigant must show that the expert knows something that is not within the common knowledge of the jury (FRE 702). According to the judges who decide whether to let in testimony, this hurdle is often not overcome by memory experts because judges assume jurors know a lot about mem-

Y109937.indb 403

10/15/10 11:04:52 AM

404 • Barbara A. Spellman, Elizabeth R. Tenney, and Margaret J. Scalia

ory. Hence, the surveys are designed to determine whether jurors (and other legal actors) are actually aware of various well-established memory findings. 6. The answer is scarf. 7. This finding could be of interest, but it was not replicated in Study 2. 8. We also asked other questions that we did not analyze regarding why their answers to some questions were the same as or different from their answers to others. We also asked them whether they had really imagined a specific friend or just friends generally (“specific friend,” “sort of a mix,” “friends generally”). 9. Researchers used to claim that the relationship was on the order of r = .25 (significantly above zero but probably not something on which a conviction should be entirely based); however, recent research is more optimistic that the correlation could be higher under different and more realistic conditions (Read et al., 1997). 10. We sometimes use calibration as the general covering term for what is technically both calibration—which we refer to as over- and underconfidence—and resolution. 11. In another variation of the study, the mixed-confidence witness was poorly calibrated (wrong about his high-confidence memory, correct about his low-confidence memory). People continued to prefer the high-confidence witness’s account but thought neither witness was very credible.

References Begg, I., Duft, S., Lalonde, P., & Melnick, R. (1989). Memory predictions are based on ease of processing. Journal of Memory and Language, 28(5), 610–632. Benjamin, A. S., Bjork, R. A., & Schwartz, B. L. (1998). The mismeasure of memory: When retrieval fluency is misleading as a metamnemonic index. Journal of Experimental Psychology: General, 127, 55–68. Benton, T. R., Ross, D. F., Bradshaw, E., Thomas, W. N., & Bradshaw, G. S. (2006). Eyewitness memory is still not common sense: Comparing jurors, judges and law enforcement to eyewitness experts. Applied Cognitive Psychology, 20, 115–129. Bornstein, B. H., & Zickafoose, D. J. (1999). “I know I know it, I know I saw it”: The stability of the confidence-accuracy relationship across domains. Journal of Experimental Psychology: Applied, 5, 76–88. Brewer, N., & Wells, G. L. (2006). The confidence-accuracy relationship in eyewitness identification: Effects of lineup instructions, foil similarity, and target-absent base rates. Journal of Experimental Psychology: Applied, 12(1), 11–30. Brown, R., & Kulik, J. (1977). Flashbulb memories. Cognition, 5, 73–99.

Y109937.indb 404

10/15/10 11:04:52 AM

Relying on Other People’s Metamemory • 405

Connor, L. T., Dunlosky, J., & Hertzog, C. (1997). Age-related differences in absolute but not relative metamemory accuracy. Psychology and Aging, 12(1), 50–71. Douglass, A. B., & Steblay, N. (2006). Memory distortion in eyewitnesses: A meta-analysis of the post-identification feedback effect. Applied Cognitive Psychology, 20(7), 859–869. Dunlosky, J., & Matvey, G. (2001). Empirical analysis of the intrinsic-extrinsic distinction of judgments of learning (JOLs): Effects of relatedness and serial position on JOLs. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27(5), 1180–1191. Dunning, D., Meyerowitz, J. A., & Holzberg, A. D. (1989). Ambiguity and self-evaluation: The role of idiosyncratic trait definitions in self-serving assessments of ability. Journal of Personality and Social Psychology, 57, 1082–1090. Kassam, K. S., Gilbert, D. T., Swencionis, J. K., & Wilson, T. D. (2009). Misconceptions of memory: The Scooter Libby effect. Psychological Science, 20, 551–552. Kassin, S., Tubb, V., Hosch, H. M., & Memon, A. (2001). On the “general acceptance” of eyewitness testimony research: A new survey of the experts. American Psychologist, 56, 405–416. Kelley, C. M., & Jacoby, L. L. (1996). Adult egocentrism: Subjective experience versus analytic bases for judgment. Journal of Memory and Language, 35(2), 157–175. Koriat, A. (1997). Monitoring one’s own knowledge during study: A cueutilization approach to judgments of learning. Journal of Experimental Psychology: General, 126(4), 349–370. Koriat, A., & Bjork, R. A. (2005). Illusions of competence in monitoring one’s knowledge during study. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31(2), 187–194. Koriat, A., Bjork, R. A., Sheffer, L., & Bar, S. K. (2004). Predicting one’s own forgetting: The role of experience-based and theory-based processes. Journal of Experimental Psychology: General, 133, 643–656. Koriat, A., & Ma’ayan, H. (2005). The effects of encoding fluency and retrieval fluency on judgments of learning. Journal of Memory and Language, 52, 478–492. Loftus, E. F. (1975). Leading questions and the eyewitness report. Cognitive Psychology, 7, 560–572. Loftus, E. F. (1992). When a lie becomes memory’s truth: Memory distortion after exposure to misinformation. Current Directions in Psychological Science, 1(4), 121–123. Loftus, E. F. (2005). Planting misinformation in the human mind: A 30-year investigation of the malleability of memory. Learning and Memory, 12(4), 361–366. Loftus, E. F., & Palmer, J. C. (1974). Reconstruction of automobile destruction: An example of the interaction between language and memory. Journal of Verbal Learning and Verbal Behavior, 13, 585–589.

Y109937.indb 405

10/15/10 11:04:52 AM

406 • Barbara A. Spellman, Elizabeth R. Tenney, and Margaret J. Scalia

Perfect, T. J., & Hollins, T. S. (1996). Predictive feeling of knowing judgements and postdictive confidence judgements in eyewitness memory and general knowledge. Applied Cognitive Psychology, 10, 371–382. Pronin, E. (2008). How we see ourselves and how we see others. Science, 320(5880), 1177–1180. Pronin, E., Gilovich, T., & Ross, L. (2004). Objectivity in the eye of the beholder: Divergent perceptions of bias in self versus others. Psychological Review, 111, 781–799. Pronin, E., & Kugler, M. B. (2007). Valuing thoughts, ignoring behavior: The introspection illusion as a source of the bias blind spot. Journal of Experimental Social Psychology, 43, 565–578. Read, J. D., Lindsay, D. S., & Nicholls, T. (1997) The relation between confidence and accuracy in eyewitness identification studies: Is the conclusion changing? In C. P. Thompson, D. J. Herrmann, J. D. Read, D. Bruce, D. G. Payne, & M. P. Toglia (Eds.), Eyewitness memory: Theoretical and applied perspectives (pp. 107–130). Mahwah, NJ: Erlbaum. Rhodes, M. G., & Castel, A. D. (2008). Memory predictions are influenced by perceptual information: Evidence for metacognitive illusions. Journal of Experimental Psychology: General, 137(4), 615–625. Robinson, R. J., Keltner, D., Ward, A., & Ross, L. (1995). Actual versus assumed differences in construal: “Naïve realism” in intergroup perception and conflict. Journal of Personality and Social Psychology, 68, 404–417. Ross, L., Greene, D., & House, P. (1977). The “false consensus effect”: An egocentric bias in social perception and attribution process. Journal of Experimental Social Psychology, 13, 279–301. Rubin, D. C., & Kozin, M. (1984). Vivid memories. Cognition, 16, 81–95. Spellman, B. A., & Bjork, R. A. (1992). When predictions create reality: Judgments of learning may alter what they are intended to assess. Psychological Science, 3, 315–316. Spellman, B. A., Bloomfield, A., & Bjork, R. A. (2008). Measuring memory and metamemory: Theoretical and statistical problems with assessing learning (in general) and using gamma (in particular) to do so. In J. Dunlosky & R. A. Bjork (Eds.), A handbook of memory and metamemory (pp. 95–116). New York: Psychology Press. Spellman, B. A., & Tenney, E. R. (2010). Credible testimony in and out of court. Psychonomic Bulletin & Review, 17, 168–173. Steblay, N. M. (1997). Social influence in eyewitness recall: A meta-analytic review of lineup instruction effects. Law and Human Behavior, 21(3), 283–297. Talarico J. M., & Rubin D. C. (2003). Confidence, not consistency, characterizes flashbulb memories. Psychological Science, 14, 455–461. Tenney, E. R., MacCoun, R. J., Spellman, B. A., & Hastie, R. (2007). Calibration trumps confidence as the basis for witness credibility. Psychological Science, 18, 46–50.

Y109937.indb 406

10/15/10 11:04:52 AM

Relying on Other People’s Metamemory • 407

Tenney, E. R., Small, J. E., Kondrad, R. L., Jaswal, V. K., & Spellman, B. A. (2010). Young children use accuracy and confidence, but not calibration, when judging credibility. Unpublished manuscript. Tenney, E. R., Spellman, B. A., & MacCoun, R. J. (2008). The benefits of knowing what you know (and what you don’t): How calibration affects credibility. Journal of Experimental Social Psychology, 44, 1368–1375. Weaver, C. A., III. (1993). Do you need a “flash” to form a flashbulb memory? Journal of Experimental Psychology: General, 122, 39–46. Wells, G. L., Malpass, R. S., Lindsay, R. C. L., Fisher, R. P., Turtle, J. W., & Fulero, S. M. (2000). From the lab to the police station: A successful application of eyewitness research. American Psychologist, 55, 561–598.

Y109937.indb 407

10/15/10 11:04:52 AM

Y109937.indb 408

10/15/10 11:04:52 AM

20

Multidimensional Models for Item Recognition and Source Identification Thomas D. Wickens

Item recognition followed by source identification is now a standard paradigm in memory research. During a study phase, a set of material, usually words, is presented and studied. This material comes from two or more sources; for example, the words might be read by a male and a female voice. The test phase has two parts. One is a standard recognition test—a mixture of old and new words is presented, and the subject must determine which ones were studied. The other is source identification— the subject states whether the word was originally presented by the male or the female voice. It is common to couple the recognition and identification responses with an r -level confidence rating. Table 20.1 shows six-level ratings for the two responses from a study by Glanzer, Hilford, and Kim (2004), their Experiment 5 (hereafter GHK5). The amount of information the subject acquires when an item is studied varies, and it is conventional to study its distribution by using the ratings to plot the operating characteristic (or ROC) from signal detection theory. One curve can be drawn for the recognition task and another for source identification. Figure 20.1 shows these curves for the GHK5 data, plotted both on a linear probability scale and with the probabilities transformed by a cumulative Gaussian distribution function. The shape of the operating characteristic gives information about the memory process that is unavailable from simple probabilities of correct responses. Glanzer et al. (2004; see also Glanzer, Hilford, & Maloney, 409

Y109937.indb 409

10/15/10 11:04:53 AM

410 • Thomas D. Wickens Table 20.1 Item Recognition and Source Identification Item Recognition New Old

1

2

3

4

5

495 75

625 205

471 228

222 179

114 236

6 89 1,093

Source Identification M3 Female Male

M2

50 411

M1

85 138

150 183

F1

F2

167 150

F3

165 86

391 40

1.0

2.0

0.8

1.5

0.5

0.4 0.2 0

–2.0 0.0

0.2

0.4

0.6

0.8

P(“Female”|Female)

–1.0

0.0

0.0 0.5

1.0

–1.0

1.0

False Alarm Rate

Z(False Alarm Rate)

1.0

2.0

0.8

1.5

0.6

1.0 0.5

0.4 0.2 0.0

Z(Hit rate)

1.0

0.6

–2.0 0.0

0.2

0.4

0.6

0.8

P(“Female”|Male)

–1.0

0.0

0.0 0.5

1.0

Z[P(“Female”|Female)]

Hit Rate

Source: Data from Glanzer, M., Hilford, A., & Kim, K., Journal of Experimental Psychology: Learning, Memory, and Cognition, 30, 1176–1195, 2004, Experiment 5.

–1.0

1.0

Z[P(“Female”|Male)]

Figure 20.1 Operating characteristics for the GHK5 data in Table 20.1.

Y109937.indb 410

10/15/10 11:04:54 AM

Multidimensional Models for Item Recognition and Source Identification • 411

2009) identify several regularities of the operating characteristics of such data: 1. The probability operating characteristic for recognition is concave downwards and asymmetric; the Gaussian operating characteristic is straight, with a slope less than 1. 2. The probability operating characteristic for source identification is concave downwards; the Gaussian operating characteristic is concave upward. Figure 20.1 shows these regularities clearly. The two response distributions are presented in Table 20.1 as if they were distinct sources of information, but of course they are not. An effectively studied item will get a high-confidence old rating and a rating at the appropriate end of the source scale; a poorly studied item will receive a new rating and a rating in the center of the source scale. Thus, the pair of responses constitutes a bivariate observation. As I will argue here, both models and analyses should take into account this bidimensionality.

One-Dimensional Models Although bidimensional models are desirable, theories are often easier to construct in one dimension. The representations currently used to analyze operating characteristics in memory research are based in signal detection theory. Briefly, each item type is represented by a random variable with a known distribution: X new and X old for the recognition task and X A and X B for source identification. Decisions are made by assigning intervals of the random variable to responses according to a set of criteria λ(task) (Figure 20.2).1 Points on the operating characterisk tic for recognition are

–4

–2

0

2

4

6

8

10

Figure 20.2 The unidimensional signal detection model. The left-hand distribution (new words in item recognition) is Gaussian; the right-hand distribution (old words) is not.

Y109937.indb 411

10/15/10 11:04:55 AM

412 • Thomas D. Wickens

(IR) [P( X new > λ(IR) j ), P( X old > λ j )], j = 1, 2,..., r −1

and those for source identification are

(SI) [P( X A > λ(SI) j ), P( X B > λ j )], j = 1, 2,..., r − 1

Different theoretical treatments lead to different distributions for the random variables. Without going into detail (for which, see Shimamura & Wickens, 2009, or reviews such as Yonelinas & Parks, 2008), consider four possibilities: 1. Simple Gaussian (normal) signal detection theory with unequal variance for the old items in recognition (Wixted, 2007). The distributions associated with the distributions are:2 Recognition:

X new ~ N (0,1) X old ~ N (µ IR , σ 2 ), σ 2 > 1

Identification:

X A ~ N (− 12 dSI′ ,1) X B ~ N (+ 12 dSI′ ,1)

2. Dual-process models in which some items are recollected (with probability m ) and others are judged by their familiarity (Yonelinas, 1994, 1999). Recollected items are recognized as old and their source is correctly identified. The source of unrecollected items is unknown. Using the symbols I −∞ and I ∞ for impulse distributions very far to the left or right, the distributions associated with this model are Recognition:

X new ~ N (0,1) X old ~ mI ∞ + (1 − m)N (dIR ′ ,1)

Identification:

X A ~ m′I −∞ + (1 − m′)N (0,1) X B ~ m′I +∞ + (1 − m′)N (0,1)

Y109937.indb 412

10/15/10 11:04:58 AM

Multidimensional Models for Item Recognition and Source Identification • 413

For consistency, the mixing parameters m and m′ in the two parts of this model should be identical. A less strict version allows some information about the source when recollection fails: Identification:

X A ~ m′I −∞ + (1 − m)N (− 12 dSI′ ,1) X B ~ m′I +∞ + (1 − m)N (+ 12 dSI′ ,1)

3. Mixture models in which some items are studied (with probability m ) and others are ignored (DeCarlo, 2003). Responses to the ignored items are at chance; the others are generated according to a standard equal-variance Gaussian model: Recognition:

X new ~ N (0,1) X old ~ (1 − m)N (0,1) + mN (dIR ′ ,1)

Identification:

X A ~ m′N (− 12 dSI′ ,1) + (1 − m′)N (0,1) X B ~ m′N (+ 12 dSI′ ,1) + (1 − m′)N (0,1)

Again, m and m′ should be the same. 4. On theoretical grounds (hierarchical relational binding of contextual information in the medial temporal lobe) and the regularities identified by Glanzer et al. (2004), Shimamura and Wickens (2009) argued that the distribution of acquired information is positively skewed. They used the ex-Gaussian distribution as a flexible skewed distribution:3 Recognition:

X new ~ N (0,1) X old ~ exG(µ IR , σ 2 , γ IR )

Identification:

X A ~ exG(−µ SI ,1, − γ SI ) X B ~ exG(+µ SI ,1, + γ SI )

Y109937.indb 413

10/15/10 11:05:01 AM

414 • Thomas D. Wickens 0.4 Density

0.3

Old items

New items

0.2 0.1 0.0 0

x

Density

0.20

5

10

Female voice

Male voice 0.10

0.00 –5

0 x

5

Figure 20.3 The distributions obtained by fitting the ex-Gaussian model to the GHK5 data.

Figure 20.3 shows the skewed distributions implied by this model and the GHK5 data. There are many other models that make predictions for recognition and source identification. Any global matching model (Humphries, Pike, Bain, & Tehan, 1989) can be adapted; in particular, predictions from Hintzman’s MINERVA II model (Hintzman, 1986, 1988) and the REM model of Shiffrin and Steyvers (1997) could be developed, although they are harder to treat analytically. Table 20.2 shows the result of fitting these models to the GHK5 data (other data sets give comparable results). If one pays attention to the significance levels, no model fits the item recognition data exactly. However, because the data are pooled across subjects, a null hypothesis test is positively biased and its probabilistic interpretation is even more problematic than usual. The unequal-variance Gaussian model does well, and the ex-Gaussian model, which is a generalization of it, does a little better, although at the price of an extra parameter.4 For source identification, the pure Gaussian model and the strong version of the Yonelinas dual-process model are unsatisfactory; the other models are all equally adequate.

Y109937.indb 414

10/15/10 11:05:02 AM

Y109937.indb 415

1.30 0* 0.62 4.31 1.43

Gaussian (equal variance) Yonelinas dual process (strong) Yonelinas dual process (weak) DeCarlo Gaussian mixture Ex-Gaussian

Skew

Item Recognition — — 0.42 — 0.79 — — 1.43

Mix

Source Identification 1* — — 1* 0.35 — 1* 0.27 — 1* 0.46 — 2.54* — 1.13

1.48 1* 1* 2.35

SD

Parameters

24.66 42.31 1.12 0.86 0.86

13.86 21.61 20.68 9.28

G2

4 4 3 3 3

3 3 3 2

df

0.000 0.000 0.772 0.834 0.834

0.003 0.000 0.000 0.010

P

Fit Indices

36.66 54.31 15.12 14.86 14.86

27.86 35.61 34.68 25.28

AIC

70.31 87.96 54.38 54.13 54.13

71.97 79.12 78.79 75.70

BIC

Key: G2 = Liklihood ratio chi-square; df = degrees of freedom for G2; p = upper-tail chi-square probability for G2; AK = Akaike information criterion; BIC = Bayesian information criterion.

1.83 0.91 2.10 2.37

Mean

Gaussian (unequal variance) Yonelinas dual process DeCarlo Gaussian mixture Ex-Gaussian

Model

Table 20.2 Parameter Estimates for Four Models Fitted to the GHK5 Data, With the Likelihood-Ratio Chi-Square and the Akaike and Bayesian Information Criterion Statistics (Values Marked by an Asterisk Are Constrained or Derived From Other Parameters)

Multidimensional Models for Item Recognition and Source Identification • 415

10/15/10 11:05:02 AM

416 • Thomas D. Wickens Table 20.3 Two-Dimensional Recognition and Source Identification Data Female Voice O7 O6 O5 O4 O3 O2 O1

F3

F2

F1

U

M1

M2

M3

134 2 3 2 2 0 1 144

114 18 5 3 2 0 0 142

86 30 14 6 2 4 2 144

85 23 15 24 31 19 42 239

76 26 10 8 7 2 1 130

78 17 1 1 2 0 0 99

43 0 2 1 1 0 0 47

616 116 50 45 47 25 46

Male Voice F3 O7 O6 O5 O4 O3 O2 O1

44 1 1 0 0 0 1 47

F2 72 17 1 1 1 0 0 92

F1

U

M1

M2

M3

77 28 8 8 2 2 2 127

57 23 15 27 14 26 50 212

88 32 14 12 6 6 0 158

116 19 2 5 2 2 1 147

159 2 1 1 1 1 0 165

M1

M2

M3

34 21 5 15 9 7 0 91

21 8 1 2 2 1 0 43

14 1 1 0 1 0 0 17

613 122 42 54 26 37 54

New Items F3 O7 O6 O5 O4 O3 O2 O1

16 2 0 2 0 0 1 21

F2 18 14 2 5 3 0 0 42

F1

U

47 15 21 11 9 4 0 107

66 37 33 65 83 118 234 636

216 103 65 101 107 130 235

Source: Adapted from Slotnick, S. D., Klein, S. A., Dodson, C. S., and Shimamura, A. P., Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 1499–1517, 2000, with pairs of response categories pooled. With permission. Key: O = old, F = female voice, M = male voice, U = uncertain, and larger numbers indicate greater confidence.

Y109937.indb 416

10/15/10 11:05:02 AM

Multidimensional Models for Item Recognition and Source Identification • 417

This analysis shows that we are pushing the limits of what the operating characteristics can tell us. They can exclude a representation, but they are too weak to resolve which of several more or less plausible models is best. Moreover, it is not always clear how the two pieces of the model fit together, especially for relatively complicated models such as the ex-Gaussian model 4. For a better treatment, both responses should be examined together.

Bivariate Recognition and Identification Models The univariate analysis of item recognition and source identification above is readily extended to a two-dimensional model. At the data level, the distribution of recognition and source identification responses is also bivariate. Table 20.3 shows an example of these empirical distributions taken from Slotnick, Klein, Dodson, and Shimamura (2000). Univariate distributions of responses such as those in Table 20.1 are marginals of these bivariate distributions. Looking at the bivariate distribution of responses allows a twodimensional memory model to be fitted. Signal detection models for detection and identification in the perceptual domain go back to Tanner (1956).5 Figure 20.4 shows the essence of his model adapted to recognition memory. The three trial types, no signal and two varieties of signal, are represented by three bivariate Gaussian distributions (shown in the figure by their circular level curves). These distributions are the two-dimensional analogs of those in Figure 20.2. Responses are made by projecting an observation on two orthogonal axes. The vertical axis determines recognition performance, the horizontal axis the identification of the two item types. With appropriate criteria, this model generates two-dimensional patterns of responses like those in Table 20.3. The perceptual model was applied to memory research independently by Banks (2000) and Slotnick et al. (2000). Recognition corresponds to detection, and source identification to identification. The vertical dimension represents memory strength, and the horizontal dimension differentiates the item types. The Tanner model applies directly to the Gaussian model described above (model 1). New items have a standard bivariate Gaussian distribution: X new ~ N (0, I) where I is a 2 × 2 identity matrix. Old items have a displaced distribution with possibly different variance:

Y109937.indb 417

10/15/10 11:05:03 AM

Old B Words

rom dy F Stu rce A Sou

Stu dy Sou From rce B

Old A Words

Decision Axis for Item Recognition

418 • Thomas D. Wickens

Decision Axis for Source Identification

New Words

Figure 20.4 Two-dimensional signal detection model for recognition and source identification. (Based on Tanner, W. P., Jr., Journal of the Acoustical Society of America, 28, 882–888, 1956.)

  µ IR   X old,A ~ N   1  , σ 2I    − 2 µ SI     µ IR   X old,B ~ N   1  , σ 2I    + 2 µ SI  

The marginal predictions of this model are the same as those of the univariate Gaussian model 1. It can be fitted in bivariate form using the method of Wickens (1992). The two mixture models are formulated similarly. Both start with the equal-variance Tanner model ( σ 2 = 1 ) and modify the old-item distributions by mixing them with another distribution. Yonelinas’s recollection model (model 2) mixes each distribution with a proportion m of an impulse distribution positioned at the appropriate extreme of the space; in its general form,

Y109937.indb 418

  dIR ′   X old,A ~ mI ∞ ,−∞ + (1 − m)N   1  , I    − 2 dSI′  

10/15/10 11:05:05 AM

Multidimensional Models for Item Recognition and Source Identification • 419

  dIR ′   X old,B ~ mI ∞ ,∞ + (1 − m)N   1  , I    + 2 dSI′  

DeCarlo’s mixture model (model 3) mixes the new-item distribution with that of the studied items:

  dIR ′   X old,A ~ mN   1  , I  + (1 − m)N (0, I)   − 2 dSI′     dIR ′   X old,B ~ mN   1  , I  + (1 − m)N (0, I)   + 2 dSI′  

These bivariate representations yield the univariate marginal models given above. As their form makes clear, the same mixing parameter m applies to both item recognition and source identification, a characteristic that is violated by the disconnected fits of Table 20.2. The model with skew is not so simply described. As in the other models, new items are represented by a multivariate Gaussian random variable:

X new ~ N (0, I)

Studying an item of type k displaces it along a vector s k whose direction depends on the content of the material studied and whose length depends on study effectiveness. The result is a new random variable for each type of item:

X old −k = X new + s k

The direction of the displacement s k depends on characteristics of the source of the studied item, but when the sources are comparable, its length does not. Figure 20.5 shows these displacements in two dimensions for a set of random items. New items (triangles) and old items (open circles and squares) originally derive from the same distribution. The old items are displaced by study to create two old-item distributions (solid circles and squares). This process creates three multivariate distributions of items. Although these distributions have unknown (and probably large) dimensionality, their implications for the responses are determined by their projection into the two-dimensional space that contains the centers of the three random variables. Unlike the Tanner model, the distribution of old items need not have a Gaussian distribution.

Y109937.indb 419

10/15/10 11:05:07 AM

420 • Thomas D. Wickens

Figure 20.5 The effect of study on items in two dimensions. Unstudied items are represented by triangles, open circles, and open squares. Closed circles and squares show the items after study.

Figure 20.6 shows in detail what happens to one studied item. This item initially occupies the position X = (U ,V ) in the new-item distribution, where U and V have independent N(0,1) distributions. The increment s added to X is random both in length and direction. Following Shimamura and Wickens (2009), its length D = s has an ex-Gaussian distribution. The angle T of this displacement has a distribution on the circle. The von Mises distribution is a simple choice here.6 The final point is ( X ,Y ), where

X = U + D sin(T )

Y = V + D cos(T ) The random variables U, V, D, and T are independent. With sources of balanced strength, only T differs with the source, and that only in its mean parameter θ . When T = 0 indicates the vertical axis, the two sources have parameters −θ and +θ . After acquisition, items have three overlapping distributions (Figure 20.7). The new-item distribution (solid contours) is symmetric and Gaussian. The two old-item distributions (dashed lines) are more variable (only the lower two contours show) and are skewed away from

Y109937.indb 420

10/15/10 11:05:10 AM

Multidimensional Models for Item Recognition and Source Identification • 421

(X, Y)

D

T (U, V) (0, 0)

Figure 20.6 Study of a single item displaces it from (U , V ) to ( X , Y ), as described in the text.

both the origin and each other. Projected onto the vertical and horizontal axes, these distributions look like Figure 20.3. Responses are made with a decision model similar to that of the Tanner model. Item recognition ratings are made by applying criteria to Y (the horizontal lines), and source identification ratings by applying criteria to X (the vertical lines).7 Formalizing the model this way lets recognition performance be connected to source identification through a single set of parameters. The joint density of U , V , D, and T in Figure 20.6 is the product of their four individual densities. The unconditional densities of X and Y —that is, the density of the distributions in Figure 20.7—are obtained by substituting their values into the joint density and integrating over the two unneeded dimensions. Although these integrals are (so far) recalcitrant analytically, they can be evaluated numerically (at the cost of many CPU cycles). The distributions in Figure 20.7 are those that approximately reproduce the univariate distributions of the GHK5 data (more specifically, the moments of the ex-Gaussian model in Table 20.2). In principle, the bivariate models can be fitted to two-dimensional data. The criterion lines in Figure 20.7 divide the space into regions corresponding to the cells in Table 20.3 (albeit fewer of them). More cells give more degrees of freedom and lead to stronger tests. This

Y109937.indb 421

10/15/10 11:05:13 AM

422 • Thomas D. Wickens 6

Evidence for Presentation

4

2

0

–2

–4 –4

–2

0 2 Source Evidence

4

Figure 20.7 Level curves for the three distributions in the skewed displacement model, with criterion lines. The configuration roughly reproduces the univariate GHK5 data (parameters are ν = 0.125, τ = 0.006 , λ = 0.376, θ = 0.427, and κ = 10.15 ).

approach may be limited in practice, however. First, obtaining reliable frequencies in the cells of large matrices requires many trials conducted under stable conditions. Pooling data from many inexperienced participants is problematic because their parameters certainly differ. The use of practiced observers, like those in psychophysical studies, raises questions about practice and strategy shifts. Second, the two-dimensional structure of the data is readily corrupted by strategic response factors. An examination of Table 20.3 reveals a substantial accumulation of responses in the uncertain category that cannot be accounted for by response criteria alone. Exactly what is happening here is not clear, but it may depend on the particular instructions that were used. In a similar study (Slotnick & Dodson, 2005), the participants were instructed to use the response 4 on the source judgment when they were very sure a word was new. The response process implied by this instruction is inconsistent with any of the four models above, and it cannot be described by any set of simple cut points. The moral here is to be very clear on instructions.8

Y109937.indb 422

10/15/10 11:05:15 AM

Multidimensional Models for Item Recognition and Source Identification • 423

The basic point of this chapter is simple: The bidimensionality of twodimensional (or more generally, multidimensional) responses should be acknowledged both in the models one constructs for them and in their analyses. Even when limits on the data require the responses to be treated one aspect at a time, a full model integrates the responses and provides both a clearer picture of the phenomenon and a more appropriate analysis.

Endnotes 1. The dimension on which the decision is made is not directly observable, so it is identified only up to an arbitrary monotone transformation. In consequence, the shape of the distribution functions are not identified, and the use of a Gaussian distribution for new words in Figure 20.2 is a matter of convenience. Only the relative shape of the two distributions is determined. 2. Here µ is used for a mean parameter in general and d ′ for the mean in a Gaussian distribution with unit variance. 3. An ex-Gaussian random variable is the sum of random variables with exponential and Gaussian distributions. It depends on the rate parameter λ of the exponential component and the mean ν and variance τ 2 of the Gaussian component. Its density can be written using the cumulative Gaussian distribution function Φ(x ; µ, σ 2 ) as

f (x ) = Ce −λx Φ(x ; ν + λτ 2 , τ 2 )

where C is a constant (Shimamura & Wickens, 2009, Appendix B). It is often more convenient to express the distribution in terms of its moments:

µ = E( X ) = ν + 1 λ

σ 2 = var( X ) = τ 2 λ 2 γ = skew( X ) =

2

(1 + τ λ ) 2

2

3/2

The ex-Gaussian distribution includes the Gaussian distribution in the limit as λ → ∞ and the shifted exponential distribution as σ → 0 . 4. The Akaike and Bayesian information criterion statistics are of limited help here as they lead to different conclusions. Without a principled reason to prefer one over the other, their evidence is ambiguous. 5. See also the work by James Thomas for related models (Thomas, 1985; Thomas, Gille, & Barker, 1982). 6. The von Mises distribution is the circular analog of the Gaussian distribution. Its density

Y109937.indb 423

10/15/10 11:05:18 AM

424 • Thomas D. Wickens

g (T ) =

1 e κ cos(T −θ) 2πI 0 (κ)

has two parameters, the mean angle θ and the dispersion κ (the function I 0 (κ) in the normalizing constant is a Bessel function). See, for example, Jammalamadaka and SenGupta (2001). 7. An alternative recognition rule bases the ratings on the distance that the observations are displaced from the neutral center, i.e., x 0 + s j = X 2 + Y 2 . This rule corresponds to locating the item in one set of spherical shells around 0. 8. Differences in instructions probably account for the fact that studies from the same laboratory are more like each other than they are like those from another laboratory.

References Banks, W. P. (2000). Recognition and source memory as multivariate decision processes. Psychological Science, 11, 267–273. DeCarlo, L. T. (2003). An application of signal detection theory with finite mixture distributions to source discrimination. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 767–778. Glanzer, M., Hilford, A., & Kim, K. (2004). Six regularities of source recognition. Journal of Experimental Psychology: Learning, Memory, and Cognition, 30, 1176–1195. Glanzer, M., Hilford, A., & Maloney, L. T. (2009). Likelihood ratio decisions in memory: Three implied regularities. Psychological Bulletin and Review, 16, 431–455. Hintzman, D. L. (1986). “Schema abstraction” in a multiple-trace memory model. Psychological Review, 93, 411–428. Hintzman, D. L. (1988). Judgments of frequency and recognition memory in a multiple-trace memory model. Psychological Review, 95, 528–551. Humphreys, M. S., Pike, R., Bain, J. D., & Tehan, G. (1989). Global matching: A comparison of the SAM, Minerva II, Matrix, and TODAM models. Journal of Mathematical Psychology, 33, 36–67. Jammalamadaka, S. R., & SenGupta, A. (2001). Topics in circular statistics. Singapore: World Scientific. Shiffrin, R. M., & Steyvers, M. (1997). A model for recognition memory: REM—retrieving effectively from memory. Psychonomic Bulletin and Review, 4, 145–166. Shimamura, A. P., & Wickens, T. D. (2009). Superadditive memory strength for item and source recognition: The role of hierarchical relational binding in the medial temporal lobe. Psychological Review, 116, 1–19. Slotnick, S. D. & Dodson, C. S. (2005). Support for a continuous (single-process) model of recognition memory. Memory & Cognition, 33, 151–170.

Y109937.indb 424

10/15/10 11:05:19 AM

Multidimensional Models for Item Recognition and Source Identification • 425

Slotnick, S. D., Klein, S. A., Dodson, C. S., & Shimamura, A. P. (2000). An analysis of signal detection and threshold models of source memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 1499–1517. Tanner, W. P., Jr. (1964). Theory of recognition. In J. A. Swets, (Ed.), Signal detection and recognition by human observers: Contemporary readings. New York: Wiley. (Reprinted from Journal of the Acoustical Society of America, 28, 882–888, 1956.) Thomas, J. P. (1985). Detection and identification: How are they related? Journal of the Optical Society of America A, 2, 1457–1467. Thomas, J. P., Gille, J., & Barker, R. A. (1982). Simultaneous visual detection and identification: Theory and data. Journal of the Optical Society of America, 72, 1642–1651. Wickens, T. D. (1992). Maximum-likelihood estimation of a multivariate Gaussian rating model with excluded data. Journal of Mathematical Psychology, 36, 213–234. Wickens, T. D. (2002). Elementary signal detection theory. New Yorkw: Oxford University Press. Wixted, J. T. (2007). Dual-process theory and signal-detection theory of recognition memory. Psychological Review, 114, 152–176. Yonelinas, A. P. (1994). Receiver-operating characteristics in recognition memory: Evidence for a dual-process model. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 1341–1354. Yonelinas, A. P. (1999). The contribution of recollection and familiarity to recognition and source-memory judgments: A formal dual-process model and analysis of receiver operating characteristics. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25, 1415–1434. Yonelinas, A. P., & Parks, C. M. (2008). Receiver operating characteristics (ROCs) in recognition memory: A review. Psychological Bulletin, 133, 800–832.

Y109937.indb 425

10/15/10 11:05:19 AM

Y109937.indb 426

10/15/10 11:05:19 AM

21

Pursuing a General Model of Recall and Recognition Troy A. Smith and Daniel R. Kimball

One of the many lessons Bob Bjork taught the second author (D.R.K.) in the course of many presentations in the Cogfog memory lab is the importance of being aware of the audience—what they know to start with, what they understand (or don’t) as you move forward, and what their likely criticisms will be. This advice bears on a phenomenon we have experienced in presenting our work on computational models of free recall (Kimball, Smith, & Kahana, 2007; Sirotin, Kimball, & Kahana, 2005). We have generally received quite different receptions from two types of audiences, both of which are often represented on any given occasion. The first type comprises empirical researchers— those who conduct experiments with human subjects to test various memory theories but who do not do any computational modeling. Although generally receptive toward our enterprise, these researchers often want to know whether the model can accommodate findings from paradigms other than free recall, such as cued recall and recognition memory. The implication in their questioning is that a model worth its salt should be able to explain empirical findings in a wide variety of paradigms, and they quickly grow impatient—and to some degree dismissive—if the model has not been developed or tested as yet for use with such paradigms. The other type of audience comprises modelers—researchers who design quantitative models of memory and cognition, perhaps in 427

Y109937.indb 427

10/15/10 11:05:20 AM

428 • Troy A. Smith and Daniel R. Kimball

addition to conducting empirical research. Although this audience is also largely receptive toward our enterprise, some modelers are less so when we raise the prospect of generalizing our model to cover multiple paradigms, such as recall and recognition. Some have even argued that the goal of a model should be to understand the processes underlying a specific paradigm through testing theories instantiated in a particular model of that paradigm, rather than to understand processes and representations used in memory more generally. There is an implication in their remarks that attempting to design a general memory model would be a rather quixotic and theoretically misguided endeavor. As empirical researchers and modelers ourselves, we are sympathetic to both of these conflicting and often unstated perspectives that shape attitudes toward computational modeling and perceptions of its usefulness in testing cognitive theories. Against this admittedly anecdotal backdrop, we discuss in this chapter the conceptual underpinnings of our efforts to extend a model of false and veridical recall—the fSAM model (Kimball et al., 2007), a descendant of the Search of Associate Memory (SAM; Raaijmakers & Shiffrin, 1981)—to cover false and veridical recognition memory as well. We begin with a discussion of the advantages and challenges of using computational models to simulate memory processes in the first place. We then discuss previous efforts of others to simulate recall and recognition within a single modeling framework, why those efforts were abandoned, and why we think it is time to reconsider the prospect of designing such a general memory model. We close with a discussion of issues involved in such an undertaking and features that we think are good candidates for inclusion in such a general model, in particular in an extension of our fSAM model.

The Advantages and Challenges of Modeling We have recently been surprised by views expressed by some memory experts challenging the value of memory modeling generally. For example, a reviewer recently commented that although memory models have been accorded prestige over the past few decades, this prestige is undeserved because the models have had little impact beyond the modeling community itself. There is perhaps some merit in this point, and modelers should accept some responsibility for this perception. However, notwithstanding the validity of such criticism, there is obviously a risk in “throwing out the baby with the bath water” by concluding that memory modeling per se is of questionable value.

Y109937.indb 428

10/15/10 11:05:20 AM

Pursuing a General Model of Recall and Recognition • 429

Quantitatively specified models have two main advantages over verbally described memory theories—advantages that are often not acknowledged sufficiently by researchers outside of the modeling community. First, verbally described theories generally include a number of unstated, hidden assumptions that can provide such a high degree of flexibility as to make them essentially unfalsifiable (Hintzman, 1991). Building a quantitative model that can be instantiated in a computer simulation program requires one to specify assumptions and processes in detail, thus resulting in a theory that is highly testable (Lewandowsky, 1993). Second, when well-specified quantitative models are tested, the failure of the model can be just as theoretically informative as the success of the model, if not more so (Bjork, 1973; Hintzman, 1991). When all assumptions are well specified and the model faithfully reflects the theory’s assumptions, the failure of the model’s predictions says something specific about the validity of those assumptions, and thus about the validity of the theory. For these reasons, our efforts in building a general theory of recall and recognition are directed toward developing a quantitative, or computational, model. Of course, there are potential pitfalls to be avoided in designing and testing such memory models, including the need to balance parsimony and generality. However, as we discuss in this chapter, there are ways to address these challenges and preserve the advantages of quantitative models over verbally described theories.

Previous Efforts Toward Developing General Memory Models Of course, ours is not the first attempt at creating a general computational memory model for recall and recognition. One of the hallmarks of memory research in the 1980s was the development and use of global matching models (GMMs) to explain the processes that underlie performance in recall and recognition, among other memory paradigms. These models included the search of associative memory (SAM) model (Raaijmakers & Shiffrin, 1981; Gillund & Shiffrin, 1984), the theory of distributed associative memory (TODAM2; Murdock, 1993), and MINERVA 2 (Hintzman, 1988), among others. Because our ongoing modeling efforts use the SAM framework as a starting point, the history of the efforts to apply these models to recall and recognition is particularly salient.

Y109937.indb 429

10/15/10 11:05:20 AM

430 • Troy A. Smith and Daniel R. Kimball

The Rise and Fall of Strength-Based Global Matching Models While the GMMs differ substantially in their architectures, they share a common set of assumptions about how human memory operates (for a comparative review, see Clark & Gronlund, 1996). The defining characteristic of GMMs is their assumption that, at test, a set of available cues is assembled in short-term memory for use as a single, joint probe of long-term memory, that this probe is compared to all items in memory, and that a match value is calculated for each of these comparisons (Clark & Gronlund). This match value is variously called strength (e.g., Raaijmakers & Shiffrin, 1981), familiarity (e.g., Gillund & Shiffrin, 1984), or activation (e.g., Hintzman, 1988); we use these terms interchangeably. As applied to recall, several GMMs have performed well. For example, models of recall developed within the SAM framework have been used to account for the effects of a wide variety of empirical phenomena, including serial position curves and part-set cuing, word frequency effects, interference and forgetting, generation, category clustering and multiple list learning, and the generation of false recall in the Deese–Roediger–McDermott (DRM) paradigm (for reviews, see Kimball et al., 2007; Shiffrin & Raaijmakers, 1992). This widespread success in modeling free recall is one of the key reasons we selected the SAM framework as our starting point in developing a general model of both recognition and recall. The picture has been somewhat different when GMMs have been applied to recognition memory. As implemented initially, GMMs all assumed that recognition decisions are made using a single decision process that compares an index of an item’s familiarity to a criterion and responds that the item is “old” if the familiarity index is above the criterion or “new” if it is not, similar to the assumptions of classical signal detection theory (SDT; for a review of SDT, see Wickens, 2002). Although these single-process GMMs were able to explain a great variety of empirical data from human participants, a number of empirical challenges to the ability of GMMs to simulate key recognition memory phenomena arose in the late 1980s and early 1990s (for reviews, see Clark & Gronlund, 1996; Shiffrin & Steyvers, 1997). Mirror effects were foremost among these challenges. As an example of a mirror effect, low-frequency (unfamiliar) words consistently exhibit both higher hit rates and lower false alarm rates than highfrequency (familiar) words—the effects of frequency on hits and false alarm rates thus mirror each other (Glanzer & Adams, 1990). GMMs have difficulty simulating this pattern using a single familiarity-based matching process because they predict that more familiar words will

Y109937.indb 430

10/15/10 11:05:20 AM

Pursuing a General Model of Recall and Recognition • 431

be recognized more often than will less familiar words, regardless of whether the words were studied or not, thus yielding both higher hits and higher false alarms. Thus, GMMs predict parallel rather than mirror effects for hits and false alarms as a function of word frequency. Other findings that posed difficulties for GMMs included the list length effect—the finding that recognition of a given item decreases as other items are added episodically to memory (Bowles & Glanzer, 1983); the list strength effect—the finding that strengthening some list items does not harm and may help recognition of other list items (Ratcliff, Clark, & Shiffrin, 1990); the slope of the z-ROC curve (i.e., the normalized receiver operating characteristic function that relates hits to false alarms as a function of rater confidence) which is generally less than 1.0 and is insensitive to differences in length, strength, and word frequency (Ratcliff, McKoon, & Tindall, 1994; Ratcliff, Sheu, & Gronlund, 1992); dissociations in patterns for item and associative recognition tasks (Malmberg, 2008); and effects of verbal context (Clark & Shiffrin, 1992) and environmental context (e.g., Smith, Glenberg, & Bjork, 1978). Although each of the GMMs could account for a subset of these effects, no single GMM was able to account for all of them. As a consequence, many researchers—including some of the developers of the GMMs themselves—came to consider the entire class of strengthbased GMMs to have been falsified as theories of recognition (see, e.g., Diana, Reder, Arndt, & Park, 2006). Bayesian Likelihood Recognition Models Following the decline of the GMMs as models of recognition, theorists developed other recognition models that use Bayesian likelihood ratios in recognition decision processes. Of these, the retrieving effectively from memory model (REM; Shiffrin & Steyvers, 1997) is of some importance for our purposes, given that it and our fSAM model both descend from SAM. REM was developed in reaction to the failure of the SAM model to account for mirror effects and z-ROC curves, and it overcomes the limitations of the SAM architecture by incorporating two significant changes. First, the items themselves are represented explicitly as vectors of feature values rather than implicitly in terms of strengths of associations to other items as in SAM. Second, instead of using strength-based familiarity as the basis for recognition, REM first calculates the Bayesian likelihood ratio of the match between the test cue and each item stored in memory, and then sums these likelihood ratios to calculate the odds that the cue matches an item in memory. If the odds are above a threshold, then the test cue is declared old; otherwise, it is declared new.

Y109937.indb 431

10/15/10 11:05:20 AM

432 • Troy A. Smith and Daniel R. Kimball

One of the strengths of REM is that its Bayesian mechanisms allow it to account for recognition phenomena that were especially problematic for SAM, including the effects of variations in list length, list strength, and item strength on discriminability and z-ROC measures of item recognition (Shiffrin & Steyvers, 1997) and the occurrence of word frequency mirror effects (Malmberg, Holden, & Shiffrin, 2004; Shiffrin & Steyvers, 1997). REM has also been applied to a limited number of recall phenomena, including accuracy and response time in cued recall (Diller, Nobel, & Shiffrin, 2001), the list strength effect in free recall (Malmberg & Shiffrin, 2005), the use of a recall-to-reject strategy in recognition (Malmberg, 2008; Malmberg et al., 2004), and the effects of directed forgetting on recall and recognition (Lehman & Malmberg, 2009). Despite the successes of the REM model, it has a number of limitations in its current form. First, it is not known whether REM can provide a satisfactory account for many of the basic recall phenomena that are explained by the episodic SAM models on which REM was based. More critically, REM seems able to simulate certain recognition phenomena—such as directed forgetting, recall-to-reject, and exceptions to mirror effects (i.e., conditions under which mirror effects are not observed)—only by incorporating an episodic recall process (Lehman & Malmberg, 2009; Malmberg, 2008; Malmberg et al., 2004). Of course, doing so increases the model’s complexity and raises the question of whether adding such a recall process to a GMM recognition model—as Gillund and Shiffrin (1984) proposed for SAM a quarter century ago— might yield comparable results.

The Rise of Dual-Process Recognition Decision Theories The incorporation of a recall process into REM reflects the influence of another major development since the decline of GMMs as models of recognition: the growing consensus among memory theorists that recognition memory decisions involve multiple memory processes rather than just one. In contrast to theories in which participants are assumed to make recognition decisions purely on the basis of familiarity—as is generally assumed by GMMs and Bayesian models—most recent theories posit that participants combine evidence from multiple memory processes in order to make recognition decisions (e.g., Heathcote, Raymond, & Dunn, 2006; Wixted, 2007; Yonelinas & Parks, 2007). And although these theories differ considerably with respect to the nature of the memory processes that are

Y109937.indb 432

10/15/10 11:05:20 AM

Pursuing a General Model of Recall and Recognition • 433

used in recognition tasks and the way in which the evidence from those memory processes is used to make recognition decisions, they all agree that recognition tasks require more than just an assessment of familiarity. Despite this consensus, the terms single-process model and dualprocess model continue to be used in the literature, causing a considerable amount of confusion among both casual readers and recognition memory researchers (see Parks & Yonelinas, 2007; Wixted, 2007, for a detailed discussion). Whereas some authors use the terms to differentiate between theories that use evidence from a single memory process (familiarity) to drive recognition decisions and those that use evidence from two memory processes (familiarity and recollection; e.g., Wixted, 2007; Yonelinas & Parks, 2007), other authors use the terms to differentiate between theories that use a single SDT or threshold decision process and those that combine SDT and threshold decision processes (e.g., Heathcote et al., 2006). Notably, theories that are sometimes described under this second definition as singleprocess models include those that use multiple memory processes to provide evidence for a single (sometimes multidimensional) SDT decision process (e.g., Heathcote et al.), even when the authors of those theories describe them as incorporating dual processes (e.g., Wixted & Stretch, 2004). For clarity, we adopt the former definition that is based on memory processes—rather than decision processes— involved in recognition tasks. Under this definition, it is clear that single-process recognition models such as the strength-based familiarity GMMs and single-process Bayesian models are insufficient to explain recognition memory, and that additional memory processes such as recollection or source monitoring are required (Wixted, 2007; Yonelinas & Parks, 2007). Of course, memory and decision processes cannot be completely dissociated, and the debate in the current recognition memory literature concerns both the nature of recognition memory processes and the manner in which they are combined to make recognition decisions. For example, Yonelinas’s (1994) dual-process signal detection (DPSD) theory assumes that recollection and familiarity are used in two distinct decision processes that run in parallel. Recollection is assumed to be a discrete, probabilistic process somewhat akin to free recall in which memory is searched for experiential information about a previous study episode; if information is retrieved and its strength is above a threshold, then the item will be recognized (Parks & Yonelinas, 2007). When recollection fails or is not possible, a familiarity-based SDT decision process is used (Yonelinas, 1994). By contrast, Wixted’s (2007) unequal

Y109937.indb 433

10/15/10 11:05:20 AM

434 • Troy A. Smith and Daniel R. Kimball

variance signal detection (UVSD) model assumes that evidence from the familiarity and recollection memory processes is combined into a continuous, cumulative distribution of item strength that is then used to make a decision using an SDT process. While both of these theories conceptualize familiarity as a continuous distribution of memory strength that can be used in SDT, they disagree on the shape of the recollection distribution. DPSD makes no assumptions about the distribution of recollection strength (Yonelinas & Parks, 2007), whereas UVSD assumes that this distribution is also continuous, such that when it is added to the familiarity distribution, the resulting strength can be used in SDT (Wixted, 2007). However, both of these theories are limited because they do not explain how these distributions arise, nor do they provide any theoretical reason for these distributions to have the hypothesized shapes. Thus, it appears that the most viable current models of recognition memory actually have little to say about how the memory processes that provide the evidence for recognition decisions actually operate. Computational memory models such as fSAM are important tools that can be used to describe these memory processes and generate and test underlying assumptions in a more systematic, theoretical manner—another reason why pursuing a general model of recall and recognition processes is important and timely.

Toward a General Model of Recall and Recognition: Features and Design Issues In our modeling efforts we have tried to adhere to a set of basic model design principles. The first is parsimony. One criticism often leveled against computer simulation models of memory—and likely to be important in a model covering multiple memory paradigms—is that they are too complex and thus violate parsimony. This risk is often analyzed in terms of the number of the model’s free parameters (i.e., parameters that are allowed to vary in a largely atheoretical way, so as to increase the fit of the model to particular data). One way that we have dealt with this issue is to express such complexity in terms of the number of free parameters per process included in the model, rather than across all processes combined. In the basic fSAM model of false recall, for instance, we used 10 free parameters overall, but they were spread across four simulated processes—initial encoding, forgetting, retrieval, and output encoding (Kimball et al., 2007).

Y109937.indb 434

10/15/10 11:05:20 AM

Pursuing a General Model of Recall and Recognition • 435

A second—and equally important—design consideration is that the model architecture and mechanisms should be psychologically plausible, not just mathematically convenient—even if it means adding additional mechanisms or processes to the model. Although the phrase psychologically plausible might be defined in many ways, we define it based on both subjective experience and empirical data. For example, regardless of how one conceptualizes the relationship between working memory and long-term memory, a rehearsal buffer with an approximate capacity of four items (Shiffrin, 1976) not only seems to fit well with most people’s subjective experience, but also appears to be necessary to explain serial position curves in free recall (Kahana, 1996; Raaijmakers & Shiffrin, 1981) and the generation of semantically induced false memories during encoding (Kimball et al., 2007). We recognize that these two principles are often in conflict. As we described in the opening section of this chapter, some of our colleagues and reviewers have encouraged us to make our models more complex in order to account for a wider array of empirical phenomena, while others have advised us to make the models simpler and more focused. While we are sympathetic toward both perspectives, we believe that a general model of memory that can explain data from free recall, cued recall, item recognition, associative recognition, and even source monitoring experiments would be useful not just to memory theorists but to the wider cognitive psychology community. Given the complexity of the human mind, this necessitates a relatively complicated model with multiple mechanisms, including processes for encoding, storage, forgetting, retrieval, recognition decisions, and source judgments. Our current project to model recall and recognition is a first step along that path. Although we follow others’ efforts to build a general model of memory, our effort is unique in at least two respects. First, we have a greater wealth of both theory and empirical data to guide us than was available 25 to 30 years ago when the GMMs were initially developed. In particular, there is a rich literature and a fairly well-developed set of theories of recognition decision processes that are ripe for integration into computational models (cf. Elfman, Parks, & Yonelinas, 2008). Second, rather than constraining ourselves to working within an existing architecture, we are open to developing a novel architecture, in whole or in part, so as to reflect recent theoretical developments in cognitive psychology and neuroscience. Rationale for Using the SAM Framework as a Starting Point Starting with the SAM framework, our first goal is to gauge how far we can get in simulating recall and recognition phenomena by making modifications to SAM processes alone, leaving its architecture intact.

Y109937.indb 435

10/15/10 11:05:21 AM

436 • Troy A. Smith and Daniel R. Kimball

Our previous work has been conducted within the SAM framework, and in Kimball et al. (2007) we discuss how the fSAM model might be extended to recognition phenomena, much as the SAM recall model was extended by Gillund and Shiffrin (1984). We believe that the fSAM model is a good starting point for several reasons. First, as mentioned previously, SAM models have been quite successful in simulating a wide array of recall phenomena. By contrast, most Bayesian likelihood models do not include memory search processes to support recall. An exception is the REM model, which theoretically includes the memory search process from the SAM recall model (Shiffrin & Steyvers, 1997, pp. 160–161), but the implementation of recall processes in REM has generally been limited to supplementing recognition decisions (e.g., Malmberg, 2008; Malmberg et al., 2004; but see Lehman & Malmberg, 2009). Second, the general SAM recognition model proposed by Gillund and Shiffrin (1984) includes both a familiarity process and a recollection process. There have been no tests as yet of the capability of a dual-process version of the SAM model—or any dual-process version of a strength-based computational model—to account for the recognition effects that posed difficulties for single-process GMMs. However, a number of researchers have suggested that such a model could account for such effects. For example, although strength-based (i.e., familiarity) SDT theories of recognition cannot account for mirror effects, when the familiarity process is supplemented with a recollection process, these theories easily predict mirror effects in many situations (see, e.g., Joordens & Hockley, 2000). Third, with the addition of a preexperimental semantic memory (Kimball et al., 2007), the SAM framework can be used to examine the interaction of semantic and episodic influences on memory performance, something no other extant recognition model can do. This is a critical capability for a general model of memory because many of the most interesting phenomena in the literature—such as the DRM false memory effect (Roediger & McDermott, 1995)—explicitly involve semantic and episodic components. Additionally, semantic relations impact many phenomena that are usually thought of as purely episodic. For example, many of the variables that produce mirror effects—such as normative word frequency, concreteness, imageability, and meaningfulness—are inherently semantic variables rather than episodic variables. The fSAM Model: A Brief Description The following is a brief description of the fSAM model’s features, including those borrowed from earlier SAM models as well as the novel

Y109937.indb 436

10/15/10 11:05:21 AM

Pursuing a General Model of Recall and Recognition • 437

features added by fSAM, particularly regarding semantic associations (see Kimball et al., 2007, for a complete description). Memory Stores The fSAM model assumes the existence of two memory stores: short-term memory (STM) and long-term memory (LTM). Within STM, rehearsal processes are idealized in the form of a limited-capacity buffer in which studied words become associated through a rehearsal process, as described below. LTM contains values for the strengths of three types of associations: associations formed at study between each list word and the list context (conceptualized as the temporal and situational setting for a particular list), pairwise episodic associations formed among list words during study, and pairwise preexperimental semantic associations among all words in the lexicon. Encoding Processes As each list item is presented during study, it enters the STM buffer and is rehearsed along with other items occupying the buffer at any given time. While an item is in the buffer, there is an incrementing of the strength of its episodic associations to itself, to all other items then occupying the buffer, and to the list context. Unpresented words that are semantically associated to the words in the rehearsal buffer also become associated to the study episode by way of an increase in each word’s strength of association to the list context; this incrementing contributes to semantically induced intrusions such as those in the DRM paradigm. Once the buffer is full, each new item displaces one of the items then occupying the buffer. Recall Processes In immediate free recall, there are two retrieval stages: the output of items in the STM buffer at the beginning of recall, followed by retrieval of items from LTM. In delayed free recall, STM has been emptied during the retention interval, and recall therefore begins with retrieval from LTM. Retrieval of items from LTM results from strength-dependent competition among all items associated with a given retrieval cue or set of cues. The search of LTM begins with context as the sole retrieval cue. As words are recalled they are stored in the STM buffer and—together with the list context—form the cue set for subsequent retrieval attempts. Each time LTM is cued, all items in memory are potential candidates for subsequent retrieval. Each of these candidate items has a certain probability of being sampled, as determined by calculating the strength of association between the cue set and the candidate item and comparing that strength to the total strength of association between the cue set and all items in memory. The strength of association for an item is

Y109937.indb 437

10/15/10 11:05:21 AM

438 • Troy A. Smith and Daniel R. Kimball

determined jointly by its strength of association to the context and its strengths of episodic and semantic association to each of the recently recalled items then occupying the STM buffer. Once an item has been sampled, it is successfully retrieved (recovered) only if its joint strength of association to the cue set surpasses a recovery threshold. Following an item’s retrieval, the strengths of its episodic and contextual associations to the retrieval cues are incremented in LTM; semantic association strengths remain unchanged, reflecting their greater robustness against transient fluctuations. Search using a particular set of retrieval cues is abandoned after a certain number of retrieval failures, and search is abandoned altogether with the accumulation of a criterial number of retrieval failures across all retrieval cue sets. The SAM Recognition Model Gillund and Shiffrin (1984) extended the SAM model to account for item recognition by using the sum of the associations between the cue set and all items in memory (i.e., the denominator in the recall sampling rule) to calculate a familiarity value that is then compared to a preset criterion to make an old/new judgment. This simple single-process model was able to qualitatively simulate some, but not all, of the key findings from the recognition literature, as mentioned previously. Note that this model predated the explicit inclusion of semantic associations, as in fSAM (Kimball et al., 2007; see also Sirotin et al., 2005). Proposed Modifications to Generalize fSAM to Simulate Recognition Memory We reasoned that the first logical step in generalizing the fSAM model to recognition should be to create a single-process recognition model in which the Gillund and Shiffrin (1984) familiarity rule has been modified to include semantic associations as well as contextual and episodic associations. As Kimball et al. (2007) discuss, a familiarity-based version of fSAM would inherit the abilities of the SAM recognition model to account for a variety of episodic recognition phenomena and would extend those capabilities to semantically influenced recognition phenomena. In particular, this modification should enable the fSAM model to systematically generate high levels of false alarms to lures that are semantically related to previously studied words, including the critical lure in the DRM paradigm. However, a single-process fSAM model would still be subject to many of the same limitations of single-process episodic GMMs. In particular, there is no reason to believe that adding a semantic component to the familiarity process would help the model

Y109937.indb 438

10/15/10 11:05:21 AM

Pursuing a General Model of Recall and Recognition • 439

account for mirror effects or the shape of the z-ROC curve—two of the key effects that led to the rejection of GMMs a decade ago. This led us to consider creating a dual-process fSAM model. Of course, a single-process model would be more parsimonious, but given the growing consensus among recognition memory theorists that recognition involves multiple processes, a dual-process model seems more psychologically plausible. But would a dual-process model still fit within the SAM framework? And, more critically, would it be able to account for mirror effects and z -ROC curves? Surprisingly, the answer to the first question is yes. Because implementations of the SAM recognition model have generally assumed that recognition relies solely on a familiarity process (e.g., Clark & Shiffrin, 1992; Gillund & Shiffrin, 1984; Mensink & Raaijmakers, 1988), it is commonly thought that this assumption is integral to the SAM theory. However, a closer reading of the literature reveals that familiarity and recollection both play a role in recognition in the SAM framework. As Gillund and Shiffrin (1984, p. 56) put it, “the question is really not whether search processes occur, but the degree to which such processes occur…. [T]he underlying logic of the SAM model requires that the subject be able to utilize search if he or she so chooses.” The answer to the second question appears, at least provisionally, to be yes as well. Ratcliff, Van Zandt, and McKoon (1995) showed that when recall processes were allowed to contribute to recognition memory decisions, the episodic SAM model could account for data from the process dissociation procedure that was highly problematic for singleprocess models, including, in particular, the shape of z-ROCs. We are currently testing the ability of several dual-process versions of the fSAM model to account for mirror effects and other problematic data. We are testing multiple versions of the model because there are a number of possible ways in which recollection processes can be implemented within the SAM framework, including supplementing recognition with a straightforward recall process (e.g., Ratcliff et al., 1995) and using a recall-to-reject process (cf. Lehman & Malmberg, 2009). Furthermore, the manner in which the familiarity and recollection processes are combined to make a recognition decision is a critical aspect of model development, both theoretically and pragmatically. Thus, deciding how to incorporate recollection into the model is one of the most difficult steps we have faced in developing a dual-process fSAM model. We discuss several options in more detail below. The most obvious way to implement recollection is to use the same memory search process that is used in free recall, as proposed by

Y109937.indb 439

10/15/10 11:05:21 AM

440 • Troy A. Smith and Daniel R. Kimball

Gillund and Shiffrin (1984) and partially implemented by Ratcliff et al. (1995). The recognition decision can then be made using a two-step decision process, as in the DPSD model of Yonelinas (1994). If the item is recollected, it is identified as old with the highest confidence rating; otherwise, the decision is made based on familiarity, as in the singleprocess model. A somewhat simpler recollection mechanism is to calculate a value that represents the amount of contextual information that can be retrieved from memory in response to the test item. In the current version of fSAM, this is most logically a function of the absolute strength of association between the test item and the current context. One such function that would retain the theoretical link between recall and recognition proposed by Gillund and Shiffrin (1984) is to use the free-recall recovery rule to generate a probability that the item will be recollected. The recognition decision could be made using a DPSD-like decision process in which the item is recollected if this probability of recovery surpasses a stochastically determined threshold. If the item is not recollected, it is evaluated for familiarity using the denominator from the free-recall sampling rule, as in Gillund and Shiffrin’s model. Alternatively, the recognition decision could be made by summing the familiarity value and the recollection/recovery value to calculate a continuous value for total memory strength, which is then compared to a preset criterion, as in Wixted’s (2007) UVSD theory of recognition. Given that dual-process SDT models appear to be able to account for mirror effects, z-ROC curves, and many of the other phenomena that have been the bane of strength-based GMMs, we believe that a dual-process version of fSAM, such as those outlined above, will likely be able to account for these same effects, as well as semantically driven effects such as the DRM false memory effect and the effects of verbal context on recognition (e.g., Reder, Anderson, & Bjork, 1974). Furthermore, a computational cognitive model such as fSAM would have several advantages over the more descriptive SDT models. First, the fSAM model would detail the operation of the cognitive processes that give rise to memory strength, familiarity, and recollection. Second, because the fSAM model will specify these memory processes, it could potentially be used to test theories of how the outputs of memory processes are used to make recognition decisions. Finally, as we discussed in the opening section of this chapter, as long as our assumptions are well specified and the model faithfully reflects the theory’s assumptions, both the successes and the failures of the fSAM recognition model should be theoretically informative.

Y109937.indb 440

10/15/10 11:05:21 AM

Pursuing a General Model of Recall and Recognition • 441

Going Beyond the SAM Framework Although we clearly have positive expectations for a dual-process fSAM model, we also recognize that the SAM framework restrains the generalizability of the model. In particular, we would like to include mechanisms for encoding and differentiating between different environmental or mental contexts (i.e., source monitoring), delay-based forgetting, and contextual reinstatement. However, despite some limited success handling these issues (see, e.g., Mensink & Raaijmakers, 1988), the prospects for handling these types of phenomena in a parsimonious manner within the SAM framework seem quite limited. The main source of this limitation is the representational assumption upon which the SAM framework is built. Unlike most other models of memory, SAM has no explicit representation for items; it is purely associative. That is, SAM assumes that the contents of memory can be represented by an associative matrix in which there are no “items” per se, only pairwise associations between items or between an item and the context (Raaijmakers & Shiffrin, 1981). This association-based structure can be thought of as a representation of activity, similar to the concept as it is used in spreading activation theory (see, e.g., Kimball et al., 2007; Shiffrin & Raaijmakers, 1992), but the lack of a representation for the nodes in the underlying network clearly limits this conceptualization. An alternative to the association-based representations used in SAM is to use a featural representation for items. Most other memory models, including TODAM2 (Murdock, 1993), MINERVA 2 (Hintzman, 1988), and REM (Shiffrin & Steyvers, 1997), use featural representations. A featural representation has a number of advantages over an association-based representation. First, it gives the model a way to represent similarities and differences between individual stimuli; stimuli that share more features are, by definition, more similar. This capability is critical in source monitoring tasks (Johnson, Hashtroudi, & Lindsay, 1993) and is likely to be important in modeling stimulus-based mirror effects such as the word frequency effect (e.g., Shiffrin & Steyvers, 1997). Second, a featural representation gives the model a rich way of representing environmental and mental context, another necessity for source monitoring. Finally, featural representations can help bridge the gap between high-level models of cognitive processes and lower-level models, such as neural networks and neurobiological models (Howard & Kahana, 2002).

Y109937.indb 441

10/15/10 11:05:21 AM

442 • Troy A. Smith and Daniel R. Kimball

Despite these advantages, we are reluctant to completely abandon the association-based structure of the SAM framework. Associations have been—and will continue to be—an important part of memory theory, and it seems to us that a general model of recall and recognition ought to include associations as a core component of the model. Therefore, the challenge is in how to combine the two types of representations (associations and features) in a way that is both parsimonious and psychologically plausible. Spatial Representation One possible solution is to think of memory as consisting of a set of linked neural networks in which information about item features is generated, encoded, and stored. This information can be computationally represented as sets of abstract feature values that are mathematically equivalent to locations in a high-dimensional space (e.g., Howard & Kahana, 2002) so that an item is represented by a location in memory space. Although associations are not stored directly in LTM in this architecture, they nevertheless can play key roles in memory processes because associations are a natural by-product of feature overlap. That is, the strength of association between any two items in memory can be calculated as a function of the relative locations of those items in LTM space. Items that are close together in memory space will tend to be highly associated with each other, while items that are far apart in memory space will not be. Deriving associations within a spatial representation is, of course, not entirely novel. Indeed, this is the central idea behind data analysis techniques such as factor analysis and multidimensional scaling (Kruskal & Wish, 1978), lexical-semantic tools such as word association space (Steyvers, Shiffrin, & Nelson, 2004) and latent semantic analysis (Landauer & Dumais, 1997), and some models of similarity judgment (Krumhansl, 1978; Tversky, 1977). Additionally, the temporal context model of memory and its derivatives (Howard & Kahana, 2002; Polyn, Norman, & Kahana, 2009) are based on this type of representation. However, we believe that combining a spatial association representation with the encoding and retrieval processes that have proven to be highly successful within the SAM framework is an exceptionally promising avenue to explore. In particular, using a featural representation of both items and context would provide a basis for implementing source monitoring mechanisms within the model. These mechanisms are likely to be important in a number of paradigms, including episodic recognition, false recall, and false recognition. Additionally, because such a model would include representations of both items and associations, it could provide a good platform from which to explore dissociations

Y109937.indb 442

10/15/10 11:05:21 AM

Pursuing a General Model of Recall and Recognition • 443

between item recognition and associative recognition—a task that has proven to be challenging for even the most complex computational models of memory (e.g., Malmberg, 2008). Of course, there are a number of challenges to overcome in the development of such a model. One of the most critical issues is the fact that similarity and association are different constructs, even though many models—including fSAM—do not really distinguish between them (e.g., Kimball et al., 2007; Murdock, 1993; Shiffrin & Steyvers, 1997). We are currently working on finding a way to derive separate measures of similarity and association from the same underlying featural representations. One possibility is to use angular distance as the measure of association, as many current models do, while using a metric such as Krumhansl’s (1978) distance-density model as a measure of similarity. Despite the difficulties of this task, we remain hopeful.

Conclusion We expect that pursuing a general model of recall and recognition along these lines will prove fruitful and theoretically significant. Although the rationale and preliminary road map for this pursuit provided in this chapter may not fully persuade skeptics in either of our two audiences—empirical researchers and modelers—as to the merits of this pursuit, we hope that they will be persuaded by the outcome of our enterprise, whether that outcome involves model success, model failure, or a combination of the two. The proof, as always, will be in the pudding.

References Bjork, R. A. (1973). Why mathematical models? American Psychologist, 28, 426–433. Bowles, N. L., & Glanzer, M. (1983). An analysis of interference in recognition memory. Memory and Cognition, 11, 307–315. Clark, S. E., & Gronlund, S. D. (1996). Global matching models of recognition memory: How the models match the data. Psychonomic Bulletin and Review, 3, 37–60. Clark, S. E., & Shiffrin, R. M. (1992). Cuing effects and associative information in recognition memory. Memory and Cognition, 20, 580–598. Diana, R. A., Reder, L. M., Arndt, J., & Park, H. (2006). Models of recognition: A review of arguments in favor of a dual-process account. Psychonomic Bulletin and Review, 13, 1–21.

Y109937.indb 443

10/15/10 11:05:21 AM

444 • Troy A. Smith and Daniel R. Kimball

Diller, D. E., Nobel, P. A., & Shiffrin, R. M. (2001). An ARC-REM model for accuracy and response time in recognition and recall. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27, 414–435. Elfman, K. W., Parks, C. M., & Yonelinas, A. P. (2008). Testing a neurocomputational model of recollection, familiarity, and source recognition. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34, 752–768. Gillund, G., & Shiffrin, R. M. (1984). A retrieval model for both recognition and recall. Psychological Review, 91, 1–67. Glanzer, M., & Adams, J. K. (1990). The mirror effect in recognition memory: Data and theory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 5–16. Heathcote, A., Raymond, F., & Dunn, J. (2006). Recollection and familiarity in recognition memory: Evidence from ROC curves. Journal of Memory and Language, 55, 495–514. Hintzman, D. L. (1988). Judgments of frequency and recognition memory in a multiple-trace memory model. Psychological Review, 95, 528–551. Hintzman, D. L. (1991). Why are formal models useful in psychology? In W. E. Hockley & S. Lewandowsky (Eds.), Relating theory and data: Essays on human memory in honor of Bennet B. Murdock (pp. 39–56). Hillsdale, NJ: Erlbaum. Howard, M. W., & Kahana, M. J. (2002). A distributed representation of temporal context. Journal of Mathematical Psychology, 46, 269–299. Johnson, M. K., Hashtroudi, S., & Lindsay, D. S. (1993). Source monitoring. Psychological Bulletin, 114, 3–28. Joordens, S., & Hockley, W. E. (2000). Recollection and familiarity through the looking glass: When old does not mirror new. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 1534–1555. Kahana, M. J. (1996). Associative retrieval processes in free recall. Memory and Cognition, 24, 103–109. Kimball, D. R., Smith, T. A., & Kahana, M. J. (2007). The fSAM model of false recall. Psychological Review, 114, 954–993. Krumhansl, C. L. (1978). Concerning the applicability of geometric models to similarity data: The interrelationship between similarity and spacial density. Psychological Review, 85(5), 445–463. Kruskal, J. B., & Wish, M. (1978). Multidimensional scaling. Beverly Hills, CA: Sage. Landauer, T. K., & Dumais, S. T. (1997). Solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104, 211–240. Lehman, M., & Malmberg, K.J. (2009). A global theory of remembering and forgetting from multiple lists. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35, 970–988. Lewandowsky, S. (1993). The rewards and hazards of computer simulations. Psychological Science, 4, 236–243. Malmberg, K. J. (2008). Recognition memory: A review of the critical findings and an integrated theory for relating them. Cognitive Psychology, 57, 335–384.

Y109937.indb 444

10/15/10 11:05:21 AM

Pursuing a General Model of Recall and Recognition • 445

Malmberg, K. J., Holden, J. E., & Shiffrin, R. M. (2004). Modeling the effects of repetitions, similarity, and normative word frequency on old-new recognition and judgments of frequency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 30, 319–331. Malmberg, K. J., & Shiffrin, R. M. (2005). The “one-shot” hypothesis for context storage. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 322–336. Mensink, G. J. M., & Raaijmakers, J. G. (1988). A model for interference and forgetting. Psychological Review, 95, 434–455. Murdock, B. B. (1993). TODAM2: A model for the storage and retrieval of item, associative, and serial-order information. Psychological Review, 100, 183–203. Parks, C. M., & Yonelinas, A. P. (2007). Moving beyond pure signal-detection models: Comment on Wixted (2007). Psychological Review, 114, 188–202. Polyn, S. M., Norman, K. A., & Kahana, M. J. (2009). A context maintenance and retrieval model of organizational processes in free recall. Psychological Review, 116, 129–156. Raaijmakers, J. G. W., & Shiffrin, R. M. (1981). Search of associative memory. Psychological Review, 88, 93–134. Ratcliff, R., Clark, S. E., & Shiffrin, R. M. (1990). List-strength effect. I. Data and discussion. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 163–178. Ratcliff, R., McKoon, G., & Tindall, M. (1994). Empirical generality of data from recognition memory receiver-operating characteristic functions and implications for the global memory models. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 763–785. Ratcliff, R., Sheu, C.-F., & Gronlund, S. D. (1992). Testing global memory models using ROC curves. Psychological Review, 99, 518–535. Ratcliff, R., Van Zandt, T., & McKoon, G. (1995). Process dissociation, single-process theories, and recognition memory. Journal of Experimental Psychology: General, 124, 352–374. Reder, L. A., Anderson, J. R., & Bjork, R. A. (1974). A semantic interpretation of encoding specificity. Journal of Experimental Psychology, 102, 648–656. Roediger, H. L., & McDermott, K. B. (1995). Creating false memories: Remembering words not presented in lists. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 803–814. Shiffrin, R. M. (1976). Capacity limitations in information processing, attention and memory. In W. K. Estes (Ed.), Handbook of learning and cognitive processes: Memory processes (Vol. 4, pp. 177–236). Hillsdale, NJ: Erlbaum. Shiffrin, R. M., & Raaijmakers, J. (1992). The SAM retrieval model: A retrospective and prospective. In A. F. Healy, S. M. Kosslyn, & R. M. Shiffrin (Eds.), Essays in honor of William K. Estes: From learning processes to cognitive processes (Vol. 2, pp. 69–86). Hillsdale, NJ: Lawrence Erlbaum Associates. Shiffrin, R. M., & Steyvers, M. (1997). A model for recognition memory: REM—retrieving effectively from memory. Psychonomic Bulletin and Review, 4, 145–166.

Y109937.indb 445

10/15/10 11:05:21 AM

446 • Troy A. Smith and Daniel R. Kimball

Sirotin, Y. B., Kimball, D. R., & Kahana, M. J. (2005). Going beyond a single list: Modeling the effects of prior experience on episodic free recall. Psychonomic Bulletin and Review, 12, 787–805. Smith, S. M., Glenberg, A., & Bjork, R. A. (1978). Environmental context and human memory. Memory and Cognition, 6, 342–353. Steyvers, M., Shiffrin, R. M., & Nelson, D. L. (2004). Word association spaces for predicting semantic similarity effects in episodic memory. In A. Healy (Ed.), Experimental cognitive psychology and its applications: Festschrift in honor of Lyle Bourne, Walter Kintsch, and Thomas Landauer (pp. 237– 249). Washington, DC: American Psychological Association. Tversky, A. (1977). Features of similarity. Psychological Review, 84, 327–352. Wickens, T. D. (2002). Elementary signal detection theory. New York: Oxford University Press. Wixted, J. T. (2007). Dual-process theory and signal-detection theory of recognition memory. Psychological Review, 114, 152–176. Wixted, J. T., & Stretch, V. (2004). In defense of the signal detection interpretation of remember/know judgments. Psychonomic Bulletin and Review, 11, 616–641. Yonelinas, A. P. (1994). Receiver-operating characteristics in recognition memory: Evidence for a dual-process model. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 1341–1354. Yonelinas, A. P., & Parks, C. M. (2007). Receiver operating characteristics (ROCs) in recognition memory: A review. Psychological Bulletin, 133, 800–832.

Y109937.indb 446

10/15/10 11:05:21 AM

22

Memory for Pictures Sometimes a Picture Is Not Worth a Single Word Joyce M. Oates and Lynne M. Reder

Although there are some phenomena in memory that are poorly understood, it is generally accepted that an item studied as a picture will be better remembered than an item studied as a word. Most of us are familiar with the expression “a picture is worth a thousand words” and agree that memory for pictures is remarkable. Indeed, in most situations, it would be wise to assume that pictures will be better remembered than words. One picture can contain enough information to convey many sentences. Children learn to follow stories in picture books before they are able to comprehend written text. Pictures are often universal and not restricted by knowledge of a specific language. For example, even if a traveler cannot read the local language, he or she can find the ladies’ or the men’s room in virtually any country by looking at picture signs. Conventional wisdom among many memory theorists also holds that pictures are better remembered than words on recognition tests (e.g., Ally & Budson, 2007; Anderson, 2009; Brady, Konkle, Alvarez, & Oliva, 2008; Mintzer & Snodgrass, 1999; Nelson, Reed, & Walling, 1976; Schacter, Israel, & Racine, 1999; Shepard, 1967; Standing, 1973). Paivio’s dual-code theory (Paivio & Csapo, 1973) explains this seemingly ubiquitous phenomena as follows: When pictures are studied, they elicit their verbal label, and thus two representations or codes are stored in memory. In contrast, words do not automatically elicit a picture, and thus they have a relatively impoverished memory representation. The redundant 447

Y109937.indb 447

10/15/10 11:05:22 AM

448 • Joyce M. Oates and Lynne M. Reder

representation for pictures makes their retrieval or recognition more probable than that of stimuli studied as words. The explanation offered by Paivio implies that there are limits on the picture superiority effect; however, memory experts sometimes only remember the general phenomenon and forget the theoretical explanation for its occurrence.1 In this chapter we will reexamine the conditions under which picture memory is superior to word memory. We will review evidence from the literature and present data of our own work that demonstrate when picture recognition is neither superior to nor even comparable to word recognition. As part of the discussion of the attributes of situations that cause picture inferiority, we will elaborate Paivio’s dual-code theory. First, we review the conditions under which pictures are better remembered than words. Next, we discuss the conditions that produce the picture inferiority effect.

Pictures Are Better Remembered Than Words The finding of better memory for pictures than words was reported as early as the nineteenth century (Kirkpatrick, 1894). Kirkpatrick demonstrated that real objects were better remembered than either written or spoken words both tested immediately and at a three-day delay. This picture superiority effect (PSE), as it has come to be called, is a robust phenomenon with numerous demonstrations of the basic finding that pictures are better recognized and recalled than their labels (e.g., Brady et al., 2008; Madigan, 1974, 1983; Nelson et al., 1976; Nickerson 1965, 1968; Paivio, 1991; Paivio & Csapo, 1973; Paivio, Rogers, & Smythe, 1968; Shepard, 1967). In addition, the number of pictures that can be remembered is striking. Standing (1973) showed that people can remember thousands of unique pictures with great accuracy. Brady et al. (2008) explored the limits of the finding reported by Standing by using a two-alternative forced choice procedure to measure the nature of subjects’ ability to discriminate targets from foils. They showed subjects 2,500 pictures of concrete objects, followed by a recognition test that compared three different types of foils that could be paired with the target: the novel condition paired an old item with a new unrelated item, the exemplar condition required a discrimination between an old item and a foil that shared the same concept (e.g., an old picture of bread paired with a new picture of bread), and the state condition showed an old item with another picture of the same item in a different state (e.g., a melon shown whole and a new picture depicting the same melon, half eaten). Discrimination accuracy was best (92.5%) when the foil was novel, but accuracy was surprisingly good in

Y109937.indb 448

10/15/10 11:05:22 AM

Memory for Pictures • 449

the exemplar and state conditions as well (87.6 and 87.2%, respectively). In order to discriminate the studied picture from a semantically similar foil, subjects must have encoded and retained visual details of the stimulus, such as color, shape, texture, and so on. The most popular and frequently cited explanation for PSE is Paivio’s dual-code theory (Paivio, 1971, 1986, 1991; Paivio & Csapo, 1973; Paivio, Rogers, & Smythe, 1968). The theory’s basic premise is that there are two codes for pictures, one that is visual (imagen) and one that is verbal (logogen; Morton, 1969), and only one for words (verbal). Pictures have perceptual information (e.g., colors, shapes etc.) and also verbal information (e.g., picture of a “dog”). Together the two codes increase the memory strength of pictures because there can usually be two ways to represent any one pictorial item. Pictures have a “naming” advantage since labels are often automatically elicited, whereas images for words are not generated without explicit instruction and additional mental effort. Paivio and Csapo demonstrated that when subjects are instructed to generate images when studying lists of pictures and words, words are remembered as accurately as pictures. This showed that words can benefit from dual code, but only with effortful mental imaging. Conversely, it has been reported that using a rapid presentation rate (5.3 items per second) of pictures and words eliminates PSE. This is because subjects are slower at naming a picture than reading a word (Fraisse, 1968), and the rapid presentation does not allow subjects enough time to generate a label, so pictures lose the dual-code advantage. Pictures are more perceptually rich than words, and this visual distinctiveness lends them an advantage in memory. To the extent that subjects also encode the stimulus as a verbal label, subjects have two codes for pictures: In addition to the perceptual features of the stimulus, such as color, shape, and texture, subjects also store a verbal label (similar to the representation for a studied word) that enriches the memory trace and provides redundancy. Picture illustrations are included in textbooks because they corroborate text and are often more effective than text alone for problem-solving transfer (Mayer, 1984, 1989, 1993; Mayer, Steinhoff, Bower, & Mars, 1995). Words are visually sparse, as the letters are usually presented in one color (black) in a common font (e.g., Times New Roman, Helvetica). Not only is the visual image of a word impoverished, but the common font has been seen with thousands if not millions of other words and provides no visual distinctiveness. Unlike a picture that tends to be automatically labeled, words rarely stimulate the generation of an image (Kroll & Potter, 1984; Paivio & Csapo, 1973).

Y109937.indb 449

10/15/10 11:05:22 AM

450 • Joyce M. Oates and Lynne M. Reder

Consider the situation in which memory is measured using an old/ new recognition paradigm. If a picture is tested as a word, the test word is likely to match the word that was automatically generated and stored. However, if a word is tested as a picture, it is unlikely that the test picture will match the one a subject generated (in the rare case that one was generated). If memory is measured using free recall, the dual-code theory of Paivio can explain why pictures produce more free recall. The two potential cues (visual and verbal) to remember a picture make accurate recall more probable. In summary, it seems logical that a picture of anything (e.g., a balloon) is more likely to be remembered than its label.

Sometimes Words Are Better Remembered Than Pictures Despite all the evidence for the picture superiority effect (PSE), there are counterexamples. We conducted an experiment whose goal was not to test the PSE, but the results frequently elicited comments from memory experts who were surprised by the superior memory for words than for photographs and pictures. Reder et al. (2006) conducted a recognition memory experiment that tested recognition memory performance under the drug midazolam compared with saline control for three types of stimuli: photographs, abstract pictures, and words. Midazolam is a benzodiazepine and an anxiolytic that causes temporary anterograde amnesia. We believe that midazolam causes this memory impairment by blocking the ability to form new bindings in memory (Park, Quinlan, Thornton, & Reder, 2004). We conjectured that it would be easier to bind words to the experimental context than it would be for unfamiliar faces, places, or abstract pictures. After receiving an injection of either midazolam or saline (placebo), a study list was presented one stimulus at a time for the subject to rate for pleasantness. The list consisted of words displayed in a common font, pictures of unfamiliar abstract paintings, and unfamiliar photographs (nonfamous faces, cityscapes, and landscapes). After completing the rating task and a filler task, subjects were tested on their ability to recognize the studied/rated items compared with foils that had not previously been seen. Subjects served as their own control (receiving midazolam at one session and saline at the other). This meant that subjects completed two separate occasions and rated different exemplars of each stimulus class for each session (stimuli of each type were randomly assigned for each subject to be targets or foils for Session 1 or Session 2). The order of drug conditions was counterbalanced over subjects, and the experimenter and

Y109937.indb 450

10/15/10 11:05:22 AM

Memory for Pictures • 451 0.9 Midazolam Saline

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0

Abst. Pics.

Photos Hits

Words

Abst. Pics.

Photos

Words

False Alarms

Figure 22.1 Proportion of hits and false alarms for abstract pictures, photographs, and words as a function of drug condition. (From Reder, L. M., Oates, J. M., Thornton, E. R., Quinlan, J. J., Kaufer, A., & Sauer, J., Psychological Science, 17(7), 565, 2006. Copyright © John Wiley & Sons, Inc. Reprinted with permission.)

subjects were blind to drug condition (see Reder et al., 2006, for more details). It would be reasonable to assume that under saline, memory for pictures would be better, and under midazolam, memory for pictures would be the more attenuated (since midazolam hurts recollection and pictures would have two representations to strengthen memory). In contrast to the conventional wisdom, Figure 22.1 shows that under saline, accuracy for abstract pictures was worst, followed by photographs, and words were the best recognized. Furthermore, under midazolam, words showed the greatest decline, followed by photographs. Recognition memory for abstract pictures was unaffected by drug condition. Why should words be better remembered under the placebo and show the greatest decrement under the amnestic drug when pictures have a two-representation advantage?

Pictures That Do Not Have a Meaningful Label Are Hard to Remember As we discussed earlier, the abstract pictures used in Reder et al. could not be easily labeled. That is, a good descriptor of the picture would not

Y109937.indb 451

10/15/10 11:05:23 AM

452 • Joyce M. Oates and Lynne M. Reder

easily come to mind that would be unique (“abstract picture” would not distinguish among the abstract pictures, targets, or foils). The label or descriptor not only needs to be unique, but the subject needs to regenerate the same label at test for it to be useful. Even if a subject is successful for a particular picture, the process requires more effort (working memory resources) to create labels that are not automatically elicited (already assigned to the picture). It is much easier for a subject to generate rather generic labels, such as “abstract picture,” or the name of colors used in the painting (red, blue, pink, etc.), or a general shape, such as “rectangle” or “blob.” Those labels are not very useful since they are unlikely to discriminate one target from another or from a foil. Without a concrete label to aid memory, perceptual features alone are not sufficient to make abstract pictures as memorable as words. Abstract pictures tend to be processed as the conjunction of their features or a subset of their features that attract attention at any one encoding. With many repetitions, more and more pieces of the design are bound together to form larger and larger chunks. Famous abstract pictures often have labels consisting of either the title of the piece or the artist who created it. Novel abstract pictures are much more difficult to encode holistically and recognize later. They do not benefit from a dual code, as there is no preexisting verbal code for them. To summarize, it is our position (see also Reder et al., 2006; Reder, Park, & Kieffaber, 2009) that unfamiliar abstract pictures are ambiguous and do not foster labels. When subjects attempt to generate one, they may fail to bind it to context because cognitive resources are exhausted, or they may fail to generate the same one when the stimulus is next seen. Converging evidence for this position comes from the developmental literature, described below.

If You Cannot Name It, You Will Probably Not Remember It Ducharme and Fraisse (1965) did not find superior memory for pictures for seven- and eight-year-old children when using free recall. They suggested that children do not automatically label pictures, much like subjects were not able to automatically label abstract pictures in Reder et al. (2006). Later, Jenkins, Stack, and Deno (1969) reported that second grade students exhibited PSE for recognition, but that PSE was reversed for free recall; words were better recalled than pictures. They speculated that children must be at a disadvantage for labeling, and

Y109937.indb 452

10/15/10 11:05:24 AM

Memory for Pictures • 453

recall of those labels is more difficult than recognizing a reinstated picture. It is also possible that they found spelling very difficult, but when they had viewed the word earlier, the spelling was primed (reinforced), making it easier to output the word than the picture. Whitehouse, Maybery, and Durkin (2006) tested students from four different age groups: 2nd and 3rd graders, 4th and 5th graders, 7th and 8th graders, and 10th and 11th graders. The subjects were given a mixed list of concrete words and pictures, and then were asked to recall as many of the stimuli as they could. Whereas the number of words recalled remained constant across all four groups, the number of pictures recalled increased with age. Younger subjects were clearly able to read, comprehend, and retain verbal information, since recall for words was as good at the youngest age as at all the older ages. Whitehouse et al. concluded that their results were due to younger children lacking ability to use inner speech. We expand on that conclusion and hypothesize that the diminished PSE for the younger subjects is due to an inability to maximize dual code. Younger subjects must not be able to label pictures as easily or as automatically as older subjects, and if a label is not generated at study, then recalling a picture concept at test becomes very difficult. Further evidence that the ability to label a picture affects its subsequent memory comes from Robertson and Köhler (2007). They tested children aged four to six years and discovered that the children’s ability to label a picture affected subsequent recognition. Those pictures that children could successfully name aloud during encoding were more likely to be remembered at test. Empirical work in our lab using famous faces supports this finding. Oates, Dutcher, Walsh, Xu, and Reder (in preparation) used pictures of famous and nonfamous people and discovered that famous people who were identified as being famous by subjects were better remembered when subjects could generate the name of the person depicted in the photograph. In summary, when people find it easy to generate unique labels at encoding that they can reproduce at the time of memory retrieval, memory is better. We believe that this is because the label can be bound to context, enabling recollection as well as familiarity-based judgments.

If the Picture Label Is Not Discriminative, the Verbal Code Is Useless Another result from the Reder et al. (2006) study, shown in Figure 22.1, is that the hit rate is lower for photographs than for words, while the

Y109937.indb 453

10/15/10 11:05:24 AM

454 • Joyce M. Oates and Lynne M. Reder

false alarm rate is higher. The pattern of results is similar to those found for the word frequency mirror effect: High-frequency words have lower hit rates and higher false alarm rates than low-frequency words (Glanzer & Adams, 1985). We speculate that photographs are behaving like high-frequency (higher-familiarity) words. The concepts, or labels, of the photographs in the experiment were few and had been encountered over and over again in the study phase, making them highly familiar in the experimental context. In other words, subjects saw many different photographs, but they came from a small number of concept categories (e.g., pictures of landscapes divided into only a handful of subcategories, such as mountains, deserts, bodies of water, forests). Subjects could only rely on visual information to discriminate old from new items (“I saw many pictures of mountains, but did I see a picture of this particular mountain?”). The labels, if used, are not distinctive because they are used over and over again with many similar photographs. In other words, the verbal part of dual code to help discriminate old from new photographs was useless. Even if one ignores the problem that the verbal labels are useless in this situation, the high perceptual familiarity/similarity among stimuli also interferes with familiarity-based recognition. Familiarity-based responding is used in absence of remembering contextual details. In absence of contextual details, or episodic retrieval, a person can use the level of activation of a concept to help indicate if an item was previously experienced. If the level of activation of the concept is high enough, then it can be used as an index to make a memory judgment. Given a high level of concept activation, a subject can determine that an item is sufficiently familiar and, therefore, must have been experienced recently, despite lacking details about the experience. However, the familiarity-based process is more vulnerable to false alarms, and is less accurate than using contextual cues to discriminate among items. The familiarity-based process does not help discriminate between an item that is highly familiar because it has been frequently experienced in the past, and an item that is highly familiar because it has been recently experienced. In a clever experiment, Chandler (1994) demonstrated that familiarity with similar pictures hurts accuracy while increasing confidence. She showed subjects segmented portions of unique nature pictures (e.g., “lake thawing,” “sand dunes”). The pictures were divided into three, dimensionally equal portions (A, B, and C) that were all visually similar since they were from the same scenic photograph. During study presentation in the control condition, no picture segments were related (from the same picture). During study presentation in the experimental condition (A) segments and related (B) portions were shown. At test,

Y109937.indb 454

10/15/10 11:05:24 AM

Memory for Pictures • 455

subjects had to discriminate the originally viewed segment (A) from the never viewed segment (C). When subjects in the experimental condition saw the related counterpart (B) they were less accurate but more confident in their discrimination between (A) and (C). One interpretation of this result is that more associations to the same stimulus label (e.g., lake scene) negatively impacted ability to discriminate. Or, the memory trace “I saw a lake scene” was reinforced more when there was a second presentation of a lake scene. It seems likely that the specific perceptual details could be confused from one presentation to the next and generally create more interference. Further evidence for high concept familiarity having a negative effect on picture memory comes from Goldstein and Chance (1970). They tested recognition memory for snowflakes, inkblots, and pictures of faces. They reported that memory was best for faces, followed by inkblots, with accuracy worst for snowflakes. Goldstein and Chance predicted faces to be best remembered since subjects have expertise in recognizing human faces. They expected memory for snowflakes to be worst because of configuration complexity (defined as the number of “corners” or “turns” contained in an item). Complexity may not be the only variable that affects retention. It is worth noting that the stimulus sets differed in their degree of homogeneity. That is, all pictures of snowflakes would be labeled “snowflake,” such that there would be no added value of the verbal code. According to Goldstein and Chance (1970), inkblots should be more memorable than snowflakes that have more detail that must be encoded. To expand on that hypothesis, we propose that whereas the verbal label for snowflakes was useless, inkblots could be coded by similarity to real-world objects, much like the practice of labeling/interpreting inkblots for a Rorschach test. Likewise, even nonfamous faces could be labeled by gender, race, age, similarity to friends, and so on.

Comparing Picture Versus Word Recognition When the Picture Foils Share the Verbal Code We reviewed evidence earlier (Reder et al., 2006) that when the picture label is generic, meaning that it is shared with many other targets and foils, performance is worse than for words. Now we consider the case in which the picture label is distinctive (i.e., not shared with other studied items), but the foils are other pictures that represent the same meaning or label. Will the perceptual richness of the picture, when combined with meaning, be better than the abstracted meaning of a word when

Y109937.indb 455

10/15/10 11:05:24 AM

456 • Joyce M. Oates and Lynne M. Reder

the visual information is all that allows discrimination of a target from a foil? We hypothesize that the dual code is of little value when the only discriminative facet is the picture representation, even though there are no other stimuli encoded that share that meaning. Work in our lab, previously unpublished, has examined memory for words versus pictures of comparable words (concepts are randomly assigned to be shown as a picture vs. word for each subject). We wanted to give words a “fighting chance,” so each word was presented in a unique font, and that font was used to present the word at test as well. Our goal was to make words more visually distinctive and to reduce the benefit of pictures by diminishing the benefit of the conceptual aspect of the dual code for pictures. Some of our prior work has demonstrated that visual distinctiveness modulates memory for words. Arndt and Reder (2003) and Park, Arndt, and Reder (2006) showed that the number of words that share a font inversely affects memorability. Therefore, as the number of words that share a font increases, the distinctiveness of the font decreases, and so does the memorability of those particular words. Words that are presented in reinstated, relatively unique fonts are recognized more often than words that are reinstated in unusual fonts shown with other words in the experiment. Therefore, we expect that presenting words in unique fonts and reinstating those fonts at test will boost recognition memory for words, while using pictures of words whose foils share the same meaning will reduce memory for those pictures. Experiment 1 Subjects were presented with a list of 60 pictures and 60 words. Each item was displayed for two seconds, and the orienting task involved subjects indicating: “Typically, how often would you encounter the concept depicted by this item?” The response choices were either “very rarely,” “rarely,” “frequently,” or “every day.” After the encoding task and a short break, subjects were given a surprise recognition memory test. They were told that some of the items would be old and that some would be new, and were explicitly instructed to respond “old” to a picture only if it was the exact picture that they had seen previously. Test stimuli consisted of 30 old pictures, 30 old words, 30 new pictures, and 30 new words. The lure words shared the same unique font shown with the old words, but the meaning was changed. Studied words were shown in the same unique font seen earlier. No font was shown twice. In other words, foil pictures had the identical concept but a different image, while foil words had the identical font but a different concept.

Y109937.indb 456

10/15/10 11:05:24 AM

Memory for Pictures • 457

As predicted, d' was not statistically different and words were as memorable as pictures, F(1,15) = .54, p > .05. Subjects had reliably higher hit rates for pictures than words, F(1,15) = 14.4, p < .05, but also reliably higher false alarm rates F(1,15) = 33.2, p < .01, which is not surprising because subjects were most likely responding “old” to the concept, not the visual information, despite explicit instruction only to judge based on visual information. The higher false alarm rate indicates that subjects were not always able to remember exact visual information that corresponded with a label. By relying on the familiarity of the concept (“I know I saw a picture of a house”), subjects were depending on the label to make a judgment when they should have been judging the visual content only. Experiment 2 Experiment 2 attempted to address the problem of task switching between words and pictures with different rules by creating separate blocks for words and pictures memory tests. We used an ABBA design in which A was a test for words and B a test for pictures for half of the subjects, and the reverse for the other half. Other than that, the experiment was the same as Experiment 1 (see Table 22.1 for a summary of the data from both experiments). In the first experiment, picture memory performance might have been attenuated from switching tasks between words and pictures. It might have been too difficult for subjects to decide if they had seen an exact picture (foils shared the same concept as targets), since previously viewed words were always presented in a reinstated font and could be discriminated by concept alone. Using the blocked design, the hit rate for pictures increased slightly while the false alarm rates decreased: 95.2% hit rate, up 2.1%; 7.7% false alarm rate, down 15.2%. However, the same pattern occurred for words: 90.4% hit rate, up 14.4%; 3.9% false alarm rate, down .7%. Again, d' was not reliably different between the two modalities, F(1,19) < 1.0. Table 22.1 Results From Experiments 1 and 2: Mean Proportion of Hits, False Alarms, and d ' as a Function of Item

Experiment 1 Experiment 2

Y109937.indb 457

Mean Proportion Response

Item Type

Hits

False Alarms

d'

Pictures Words Pictures Words

0.93 0.76 0.95 0.90

0.23 0.05 0.08 0.04

2.46 2.67 3.30 3.44

10/15/10 11:05:24 AM

458 • Joyce M. Oates and Lynne M. Reder

It is noteworthy that Reder and Thornton’s findings in Experiment 2 are similar to those reported by Brady et al. (2008). Recall that Brady et al. showed subjects 2,500 pictures and then administered a two-alternative forced choice test. In their task, like ours, memory depended on the visual code alone for pictures because lures shared concepts with targets. The recognition test compared three different types of foils that could be paired with the target. Directly related to our experimental manipulation is the exemplar condition that required a discrimination between an old item and a foil that shared the same concept (e.g., an old picture of bread paired with a new picture of bread). In Brady et al., subjects’ accuracy was 87.6% in the exemplar condition. When we apply the same correction, subtracting false alarms from hit rates, our subjects also performed at the same corrected accuracy (87.5%). The results of Reder and Thornton’s two experiments underscore our claim that when the picture label is not discriminative, the verbal code is useless. Similar picture foils (same label) at test forced subjects to rely on visual information to make old/new judgments. This made pictures more comparable to words in that only one code, in this case, the image, could be used to discriminate targets from foils. In most recognition experiments involving words only, the verbal code is available for discriminating targets from foils. In summary, these two experiments demonstrate a constraint to dual-code theory: If the label of the picture is not discriminative, pictures will be no better remembered than words.

Conclusions In this chapter we have illustrated that the picture superiority effect in recognition memory, as explained by Paivio’s dual-code theory, does not always hold. The failure to observe the PSE can also be understood within Paivio’s dual-code theory by postulating the variables that affect (1) when people are able to generate a second code, and (2) when that code will or will not be helpful. Other researchers have also demonstrated poor memory performance for pictures (Amrhein, McDaniel, & Waddill, 2002; Weldon & Roediger, 1987), although their interpretations are different from ours. Our elaborations to Paivio’s dual code are thus: The picture must afford a meaningful label that discriminates it from other test probes. When those labels are shared with other test items, a dual code does not offer a memory advantage. To summarize, PSE occurs when items are presented as distinctive, easy-to-label pictures, and the foils do not share the same labels (Brady et al. 2008; Standing, 1973). However, when a picture’s verbal code is

Y109937.indb 458

10/15/10 11:05:24 AM

Memory for Pictures • 459

shared with other pictures, the conceptual fan effect makes retrieval of the encoding episode difficult and makes spurious recognition based on the concept’s familiarity likely. Remembering that you saw a picture of a cat is not going to help if you must discriminate between pictures of the previously shown cat and a different cat at test. Subjects must rely only on the image code to determine if an item is old or new. When the visual stimulus is difficult to identify, that is, generation of a consistent label is not easy or possible (“abstract picture” will not suffice if there are many such stimuli), the picture is not even as memorable as a single word.

Acknowledgments Preparation of this chapter was supported by grant R01-MH52808 from the National Institute of Mental Health to the second author. The authors thank Edward Thornton for help on Experiments 1 and 2, and Christopher Paynter and Anna Manelis for comments on the manuscript.

Endnote 1. In casual conversation we found that a number of internationally recognized memory experts were vulnerable to the erroneous expectation that any picture should be better recognized than any word. That is, they were surprised when presented with data that were inconsistent with the picture superiority effect.

References Ally, B. A., & Budson, A.E. (2007). The worth of pictures: Using high density event-related potentials to understand the memorial power of pictures and the dynamics of recognition memory. Neuroimage, 35, 378–395. Amrhein, P. C., McDaniel, M. A., & Waddill, P. J. (2002). Revisiting the picture superiority effect in symbolic comparisons: Do pictures provide privileged access? Journal of Experimental Psychology: Learning, Memory, and Cognition, 28, 843–857. Anderson, J. R. (2009). Cognitive psychology and its implications (7th ed.). New York: Worth Publishers. Arndt, J., & Reder, L. M. (2003). The effect of distinctive visual information on false recognition. Journal of Memory and Language, 48, 1–15. Brady, T. F., Konkle, T., Alvarez, G. A., & Oliva, A. (2008). Visual long-term memory has a massive storage capacity for object details. Proceedings of the National Academy of Sciences, 105(38), 14325–14329. Chandler, C. C. (1994). Studying related pictures can reduce accuracy, but increase confidence, in a modified recognition test. Memory and Cognition, 22, 273–280.

Y109937.indb 459

10/15/10 11:05:24 AM

460 • Joyce M. Oates and Lynne M. Reder

Ducharme, R., & Fraisse, P. (1965). Etude genetique de la memorisation de mots et d’images. Canadian Journal of Psychology, 19, 253–261. Fraisse, P. (1968). Motor and verbal reaction times to words and drawings. Psychonomic Science, 12, 235–236. Glanzer, M., & Adams, J. K. (1985). The mirror effect in recognition memory. Memory and Cognition, 13, 8–20. Goldstein, A. G., & Chance, J. E. (1970). Visual recognition memory for complex configurations. Perception and Psychophysics, 9, 237–241. Jenkins, J. R., Deno, S. L., & Stack, W. B. (1969). Children’s recognition and recall of picture and word stimuli. AV Communications Review, 17, 265–271. Kirkpatrick, E. A. (1894). An experimental study of memory. Psychological Review, 1, 602–609. Kroll, J. F., & Potter, M. C. (1984). Recognizing words, pictures, and concepts: A comparison of lexical, object, and reality decisions. Journal of Verbal Learning and Verbal Behavior, 23, 39–66. Madigan, S. (1974). Representational storage in memory. Bulletin of the Psychonomic Society, 4, 567–568. Madigan, S. (1983). Picture memory. In J. C. Yuille (Ed.), Imagery, memory, and cognition: Essays in honor of Allan Paivio (pp. 65–89). Hillsdale, NJ: Erlbaum. Mayer, R. E. (1984). Aids to text comprehension. Educational Psychology, 19, 30–42. Mayer, R. E. (1989). Models for understanding. Review of Educational Research, 59, 43–64. Mayer, R. E. (1993). Illustrations that instruct. In R. Glaser (Ed.), Advances in instructional psychology (Vol. 5, pp.). Hillsdale, NJ: Erlbaum. Mayer, R. E., Steinhoff, K., Bower, G., & Mars, R. (1995). A generative theory of textbook design: Using annotated illustrations to foster meaningful learning of science text. Educational Technology Research and Development, 43(1), 31–41. Mintzer, M. Z. & Snodgrass, J. G. (1999). The picture superiority effect: Support for the distinctiveness model. American Journal of Psychology, 112, 113–146. Morton, J. (1969). Interaction of information in word recognition. Psychological Review, 76, 165–178. Nelson, D. L., Reed, U. S., & Walling, J. R. (1976). Pictorial superiority effect. Journal of Experimental Psychology: Human Learning and Memory, 2, 523–528. Nickerson, R. S. (1965). Short-term memory for complex meaningful visual configurations, a demonstration of capacity. Canadian Journal of Psychology, 19, 155–60. Nickerson, R. S. (1968). A note on long-term recognition memory for pictorial material. Psychonomic Science, 11, 58. Oates, J. M., Dutcher, J. M., Walsh, M. M., Xu, J., & Reder, L. M. (In preparation). Celebrity faces, early ERP fame detection, and memory: Ability to label predicts subsequent recognition. Paivio, A. (1971). Imagery and verbal processes. New York: Holt, Rinehart, and Winston.

Y109937.indb 460

10/15/10 11:05:24 AM

Memory for Pictures • 461

Paivio, A. (1986). Mental representations: A dual coding approach. Oxford, UK: Oxford University Press. Paivio, A., & Csapo, K. (1973). Picture superiority in free recall: Imagery or dual coding? Cognitive Psychology, 5(2) 176–206. Paivio, A., Rogers, T. B., & Smythe, P. C. (1968). Why are pictures easier to recall than words? Psychonomic Science, 11, 137–138. Paivio, A. D. (1991). Images in the mind, the evolution of a theory. New York: Harvester Wheatsheaf. Park, H., Arndt, J. D., & Reder, L. M. (2006). A contextual interference account of distinctiveness effects in recognition. Memory and Cognition, 34(4), 743–751. Park H., Quinlan, J. J., Thornton, E. R., & Reder, L. M. (2004). The effect of midazolam on visual search: Implications for understanding amnesia. Proceedings of the National Academy of Sciences, 101(51), 17879–17883. Reder, L. M., Oates, J. M., Thornton, E. R., Quinlan, J. J., Kaufer, A., & Sauer, J. (2006). Drug induced amnesia hurts recognition, but only for memories that can be unitized. Psychological Science, 17(7), 562–567. Reder, L. M., Park, H., & Kieffaber, P. D. (2009). Memory systems do not divide on consciousness: Reinterpreting memory in terms of activation and binding. Psychological Bulletin, 135(1), 23–49. Robertson, E. K., & Köhler, S. (2007). Insights from child development on the relationship between episodic and semantic memory. Neuropsychologia, 45(14), 3178–3189. Schacter, D. L, Israel, L., & Racine, C. (1999). Suppressing false recognition in younger and older adults: The distinctiveness heuristic. Journal of Memory and Language, 40(1), 1–24. Shepard, R. (1967). Recognition memory for words, sentences and pictures. Journal of Verbal Learning and Verbal Behavior, 6, (156–163). Standing, L. (1973). Learning 10,000 pictures. Quarterly Journal of Experimental Psychology, 25, 207–222. Weldon, M. S., & Roediger, H. L. (1987). Altering retrieval demands reverses the picture superiority effect. Memory and Cognition, 15(4), 269–280. Whitehouse, A. J. O., Maybery, M. T., & Durkin, K. (2006). The development of the picture-superiority effect. British Journal of Developmental Psychology, 24, 767–773.

Y109937.indb 461

10/15/10 11:05:25 AM

Y109937.indb 462

10/15/10 11:05:25 AM

23

Administration of Dehydroepiandrosterone (DHEA) Increases Serum Levels of Androgens and Estrogens But Does Not Enhance Recognition Memory in Postmenopausal Women Bethany Stangl, Elliot Hirshman, and Joseph Verbalis

Bob Bjork has established a research tradition that brings together the highest level of theoretical and methodological rigor with a focus on topics of great practical importance. He and his many collaborators, both students and colleagues, have made critical discoveries that have advanced our understanding of learning and memory and their role in real-world situations. We know that the current chapter has benefited both directly and indirectly from Bob’s teaching, mentoring, and expertise, and we hope that, in some small way, it contributes to the tradition that Bob has established. Understanding the influence of sex steroids on cognition has emerged as a wide-ranging field of inquiry. The current chapter examines the effect of dehydroepiandrosterone (DHEA) on recognition memory. DHEA is an adrenal steroid, synthesized from pregnenolone, that can be metabolized to a sulfated version, DHEAS, as well as androgens such as testosterone (Regelson, Loria, & Kalimi, 1994). These androgens, in turn, can be metabolized to estrogens (Mortola & Yen, 1990). DHEA has received significant scientific and popular attention because plasma 463

Y109937.indb 463

10/15/10 11:05:25 AM

464 • Bethany Stangl, Elliot Hirshman, and Joseph Verbalis

concentrations of DHEA show significant decreases with age. These concentrations peak in the twenties or thirties, declining thereafter at a rate of 10 to 15% per decade. Plasma levels of DHEA are only 30% of peak levels by age 70 (Labrie, Bélanger, Cusan, Gomez, & Candas, 1997; Orentreich, Brind, Vogelman, Andres, & Baldwin, 1992). This relation with aging, coupled with the numerous impairments associated with aging, has motivated the suggestions that DHEA replacement therapy may be used to mitigate a range of age-related concerns. There are multiple reasons to expect that administration of DHEA might enhance recognition memory. First, DHEA functions directly as a neurosteroid and as an antagonist of γ-aminobutyric acid (GABA) receptors, and Wigstrom and Gustaffson (1985) have demonstrated that GABA antagonists facilitate long-term potentiation in the hippocampus, a process hypothesized to be a mechanism of learning and memory. Second, DHEA is metabolized into estrogens that may affect episodic memory (see Hogervorst, Williams, Budge, Riedel, & Jolles, 2000; LeBlanc, Janowsky, Chan, & Nelson, 2001, for reviews). Third, numerous studies (e.g., Reddy & Kulkarni, 1998) have demonstrated that DHEA can enhance learning and memory in rodents. (Caution is merited in interpreting rodent performance given low levels of endogenous DHEA in this population.) Despite this rationale, prior studies have produced inconsistent results of DHEA administration on learning and memory in humans. While Wolf et al. (1997) demonstrated that women who received 50 mg of oral DHEA showed enhanced picture recall, Wolf et al. (1998b) demonstrated that DHEA administration reduced picture recall when recall was tested following a stressful event. In our own research (Hirshman et al., 2003), postmenopausal women who received 50 mg of oral DHEA daily for four weeks demonstrated enhanced recognition memory for words, but this increase only occurred when items were presented briefly during study (e.g., for 800 ms). This boundary condition suggests that the observed benefit demonstrated in our study may be due to effects of DHEA on perceptual processing, rather than to direct effects on learning and memory. Hirshman et al. (2004) examined the effect of DHEA administration on recognition memory in postmenopausal women at four different testing times (9 a.m., 10 a.m., noon, 4 p.m.) when items were presented for two seconds at study. There was no significant benefit of DHEA administration on recognition memory. Interestingly, regression analyses indicated that serum levels of estrogens were positively correlated with recognition memory performance, while androgen levels were negatively correlated with recognition memory performance. These findings raise significant questions regarding the view that DHEA

Y109937.indb 464

10/15/10 11:05:25 AM

Administration of Dehydroepiandrosterone (DHEA) • 465

administration enhances learning and memory (see Kritz-Silverstein, von Mühlen, Laughlin, & Bettencourt, 2008, for an investigation of cognitive and affective functioning more generally). A primary purpose of Hirshman et al.’s (2004) study was to test participants at multiple times during the day to identify the time at which serum levels of hormones peaked with the current dose and administration regimen. This occurred at approximately 10 a.m., with serum levels maintaining this maximum value for at least two hours. In the current study, we tested multiple participants at approximately 10 a.m. to provide further evidence on whether administration of DHEA enhances recognition memory. In addition, we performed correlation analyses to determine the relation between serum levels of various hormones and recognition memory performance. As in our prior studies, we examined the effect of DHEA administration on recognition memory in postmenopausal women. We focused on this population because estrogen and testosterone deficiency associated with menopause (Morales, Nolan, Nelson, & Yen, 1994; Rubino et al., 1998), in combination with declining DHEA levels, may sensitize this population to the effects of DHEA. This situation contrasts to that in young women, where levels of these hormones are substantially higher. We use two different types of recognition memory tests, an inclusion test and an exclusion test (Jacoby, Toth, & Yonelinas, 1993), to examine the generality of our results. These tests are hypothesized to reflect different forms of recognition memory. Demonstrating different effects of DHEA administration on these tests would help identify the underlying processes affected by DHEA and its metabolites. In addition to measuring recognition performance, we also measured serum levels of DHEA, DHEAS, testosterone, estrone, and cortisol. The measurement of DHEA provides a check on our experimental manipulation. The measurement of DHEAS, testosterone, and estrone provides checks on DHEA’s metabolism into its sulfated version, androgens, and estrogens, respectively. Cortisol, like DHEA, is an adrenal product, and it provides a control measure that should not be affected by DHEA administration.

Experiment Method Participants Forty-two women ages 55 to 80 participated in the current experiment. All participants were volunteers recruited by newspaper

Y109937.indb 465

10/15/10 11:05:26 AM

466 • Bethany Stangl, Elliot Hirshman, and Joseph Verbalis

advertisement and were paid $300.00 for their participation. Participants met the World Health Organization’s criterion for postmenopausal status of one year’s absence of menses or bilateral ovariectomy that preceded the study by at least one year. The participants were not using any form of hormone replacement therapy. Potential participants were excluded if a preenrollment medical evaluation revealed contraindications to DHEA, estrogen, or androgen treatment (i.e., personal history of, or active, breast cancer or other estrogen-dependent neoplasms, acute liver disease, undiagnosed vaginal bleeding, uncontrolled hypertension, deep venous thrombosis, pulmonary embolus, history of clotting disorders, history of psychiatric or cognitive disorders). Women whose preenrollment assays of DHEAS, estradiol, or testosterone were above the normal postmenopausal women’s range or whose body mass index (BMI) exceeded 35 were excluded. Use of substances that influence cognition (e.g., amphetamines, benzodiazepines, narcotics, nicotine, steroid hormones, steroid receptor antagonists) was also grounds for exclusion, as was a serious physical illness within the last year. All women enrolled had a normal mammogram within the prior year and a normal pap smear within the prior three years. Design Type of drug (DHEA vs. placebo) was manipulated within subject in a crossover design. Participants received DHEA for one 4-week period and placebo for the other 4-week period with a 1-week “washout” period between treatments. Order of treatments was counterbalanced across participants. DHEA Doses and Placebos Fifty milligrams of oral DHEA was administered daily. This dose was selected because it is safe in short-term clinical trials, can alter levels of estrone and testosterone, and can affect cognitive performance (Hirshman et al., 2003; Morales, Haubricht, Hwangt, Asakura, & Yen, 1998). DHEA was compounded by Belmar Pharmaceutical (Lakewood, Colorado), the source of DHEA for numerous clinical trials (e.g., Casson, Lindsay, Pisarksa, Carson, & Buster, 2000). Placebos consisted of lactose and were identical in appearance to DHEA capsules. The number of pills dispensed was greater than the number needed for purposes of retrospective verification of compliance. Procedures Participants who responded to newspaper advertisements underwent a five-minute telephone interview by the study coordinator. Individuals who met the entry criteria were set up for a preenrollment evaluation at the General Clinical Research Center at Georgetown University Hospital. The evaluation began with informed consent. If the

Y109937.indb 466

10/15/10 11:05:26 AM

Administration of Dehydroepiandrosterone (DHEA) • 467

participant agreed to participate in the study, the medical evaluation was conducted. It began with a history, including review of illnesses, medications, and contraindications to steroid hormone administration. A physical exam was performed and blood was drawn for a complete blood count (CBC), CHEM 7 (including electrolytes, blood urea nitrogen, and creatinine), as well as baseline sex steroid assays (DHEA, DHEAS, total testosterone, estrone, cortisol, and estradiol). The Symptom Checklist 90-R (SCL 90-R; Derogatis & Savitz, 2000) and the Mini-Mental Status Exam (Turvey, Schultz, Arndt, Wallace, & Herzog, 2000) were administered to supplement information obtained by history. Participants were assigned to receive DHEA or placebo in a block randomization scheme. All investigators and participants were blind to treatment status. Upon receipt of the appropriate pillbox containing DHEA or placebo, participants were instructed to take one pill each morning with breakfast and to write any significant adverse events on a provided diary sheet. Cognitive testing occurred in 28 days, and the participant was given a reminder call on the day prior to testing. Participants were instructed to fast after midnight and refrain from ingesting any caffeine. On the test day, pills were counted as a measure of compliance. At 7:30 a.m. the participant was expected to arrive and was given a breakfast. At 8:00 a.m. the participant ingested either 50 mg of DHEA or the placebo. A blood draw was performed at 9:50 a.m. Cognitive testing occurred at approximately 10 a.m. on each testing day. We chose this specific time for the blood draw and cognitive testing because serum steroid levels are maximal at this time with the current drug administration regimen (Hirshman et al., 2004). The Beck Depression Inventory and the SCL90 tests were administered following cognitive testing. At the end of the first test day, participants were given appropriate pills, an appointment for the second testing session, and new diary sheets. Participants were instructed to begin taking the pills in one week, and a reminder call was made prior to the second testing day. Testing procedures were identical on the second test day, except participants were debriefed at the end of this day. Recognition Memory Study and Test Procedures Participants were presented with two study lists of 50 words each. Items presented were medium- to high-frequency words, and each item was presented for one second with no appreciable interstimulus interval. Prior to the presentation of the study lists, participants were instructed to study and remember both lists for a later memory test. Following the list, participants were to complete 10 difficult math problems during a five-minute

Y109937.indb 467

10/15/10 11:05:26 AM

468 • Bethany Stangl, Elliot Hirshman, and Joseph Verbalis

retention interval. At the conclusion of the retention interval, participants were presented with two test lists of 100 items each (25 items from the first study list, 25 items from the second study list, and 50 new distracters in each test list). One test list represented the inclusion task, and the other test list represented the exclusion tasks. The order of tasks (inclusion or exclusion) was counterbalanced across participants. In general, participants were asked to press “o” for an “old, studied” item and “n” for a “new” item. For the inclusion task, participants were asked to call items “old” if they had been presented in either study list. For the exclusion test, participants were asked to call items old only if they had been presented in the second study list. Items that were not studied or items that were presented in the first study list were to be called “new.” Outcome Measures and Statistical Analyses The primary outcome measures were recognition memory hit rates, false alarm rates, D-prime, and C for the inclusion and exclusion tasks and serum levels of DHEA, DHEAS, testosterone, estrone, and cortisol. The effect of drug condition (DHEA vs. placebo) was examined in a one-way ANOVA for each of the recognition memory and hormone variables. Ancillary analyses examined the correlation between the hormone and recognition memory variables in the placebo and DHEA conditions. Results and Discussion Demographic Data Per our selection criteria, participants were aged (M = 63.62, SD = 7.06), performed well on the Mini-Mental Status Exam (M = 28.00, SD =1.86), and did not have extreme body mass indices (M = 27.69, SD = 4.39). The Beck Depression Inventory (BDI) scores were not significantly different between the DHEA drug condition (M = 3.64, SD = 4.37) and the placebo drug condition (M = 4.59, SD = 5.48). The SCL-90 also did not differ between the DHEA drug condition and the placebo drug condition on all nine symptoms (somatization, obsessive compulsive, interpersonal sensitivity, anxiety, depression, hostility, phobic anxiety, paranoid ideation, psychoticism). Similarly, as indicated in Table 23.1, measured hormone levels were in the normal range for postmenopausal women. Circulating Levels of Steroid Hormones Table 23.1 represents circulating levels of DHEA, DHEAS, testosterone, estrone, and cortisol as a function of type of drug (DHEA vs. placebo). The central result is that DHEA administration substantially increased circulating levels of DHEA, DHEAS, testosterone, and estrone [F(1,41) = 65.13, p < .001; F(1,41) =

Y109937.indb 468

10/15/10 11:05:26 AM

Administration of Dehydroepiandrosterone (DHEA) • 469 Table 23.1 Circulating Levels of DHEA, DHEAS, Testosterone, Estrone, and Cortisol (Means With Standard Deviations in Parentheses) Type of Drug

DHEA (ng/dl)

DHEAS (µg/dl)

DHEA

351.78 (196.62) 91.99 (73.84)

272.33 (137.90) 46.02 (31.62)

Placebo

Testosterone (ng/dl) 49.04 (41.74) 14.93 (8.44)

Estrone (pg/dl)

Cortisol (µg/dl)

31.12 (14.43) 18.96 (13.31)

10.60 (3.74) 10.25 (3.32)

105.65, p < .001; F(1,41) = 31.48, p < .001; and F(1,41) = 24.03, p < .001, respectively]. Levels of cortisol did not differ significantly across the conditions (p > .50). The increase in serum levels of DHEA provides strong evidence of the efficacy of our manipulation. Similarly, the results with DHEAS, testosterone, and estrone provide evidence that DHEA was metabolized into these related hormones. In contrast, cortisol, another adrenal product, was not affected by DHEA administration. Table 23.2 represents the correlations (Pearson r) between the hormones in the DHEA and placebo conditions. The comparison of the patterns of correlations across the DHEA and placebo condition provides a secondary measure of the effects of DHEA administration. In the placebo condition, there are generally moderate, positive correlations among all of the hormones (with the exception of cortisol’s null correlations with estrone and DHEAS; all p < .05). These moderate positive correlations may reflect general measures of health or organ functioning that differ across participants. In the DHEA condition, the correlations between DHEA and DHEAS and between DHEAS and testosterone are substantially greater than in the placebo condition. These increased correlations reflect the concurrent increase in DHEA and its direct metabolites (DHEAS, testosterone) due to DHEA administration (see Table 23.1). In contrast, the correlations between estrone and DHEA/DHEAS and between cortisol and DHEA/DHEAS are generally weaker in the DHEA condition than the placebo condition. In these cases, the metabolism is less direct (e.g., DHEA metabolizes into androgens that, in turn, metabolize into estrogens) or the observed correlation in the placebo condition may arise from general factors cited above (e.g., the correlation of DHEA and cortisol may be due to differences in general levels of adrenal functioning across participants). In these circumstances, the substantial increases in DHEA/DHEAS in the DHEA condition are not accompanied by similar increases in estrone or cortisol (see Table 23.1), reducing the observed correlations among these variables in the DHEA condition.

Y109937.indb 469

10/15/10 11:05:27 AM

470 • Bethany Stangl, Elliot Hirshman, and Joseph Verbalis Table 23.2 Correlations (Pearson r) Between Serum Levels of Hormones as a Function of Type of Drug (DHEA vs. Placebo) Type of Drug DHEA Placebo *

Hormone DHEA (ng/dl) DHEAS (µg/dl) Testosterone (ng/dl) Estrone (pg/dl) Cortisol (µg/dl) DHEA (ng/dl) DHEAS (µg/dl) Testosterone (ng/dl) Estrone (pg/dl) Cortisol (µg/dl)

DHEA (ng/dl)

DHEAS Testosterone (µg/dl) (ng/dl)

Estrone (pg/dl)

Cortisol (µg/dl)

1

.74*

.62*

.01

.21

1

.53*

.17

.14

1

.34*

.08

1

.04

1

1

.39*

.34*

.39*

.37*

1

.49*

.57*

.26

1

.36*

.32*

1

–.08

1

p < .05.

Recognition Memory Performance Table 23.3 presents recognition memory hits, false alarms, D-prime, and C for the inclusion and exclusion tasks as a function of type of drug (DHEA vs. placebo). DHEA administration had no detectible effect on recognition memory hits, false alarms, D-prime, or C in either the inclusion or exclusion tasks (all p > .20). Thus, our central finding is that despite substantially increasing levels of a range of sex steroids, there was no detectible impact of DHEA administration on four measures of recognition measures across two different tasks. This finding is consistent with our prior findings (Hirshman et al., 2003, 2004), raising significant questions regarding whether DHEA administration enhances verbal learning and memory in postmenopausal women. Correlations Between Hormone Levels and Recognition Memory Measures Table 23.4 presents the correlations (Pearson r) between participant’s recognition memory performance (D-prime) and serum

Y109937.indb 470

10/15/10 11:05:27 AM

Administration of Dehydroepiandrosterone (DHEA) • 471 Table 23.3 Recognition Memory Hit Rates, False Alarm Rates, D-Prime, and C for the Inclusion and Exclusion Tasks as a Function of Type of Drug (DHEA vs. Placebo; Means With Standard Deviations in Parentheses) Recognition Memory Measure

Type of Drug

Hits

False Alarms

D-Prime

C

Inclusion Task

DHEA Placebo

Exclusion Task

.77 (.22) .76 (.19)

.39 (.19) .38 (.19)

1.30 (.53) 1.20 (.49)

–.30 (.69) –.25 (.61)

DHEA Placebo

.53 (.20) .52 (.24)

.27 (.18) .30 (.20)

.86 (.6) .71 (.82)

–.12 (.55) –.13 (.77)

hormone levels in the placebo and DHEA conditions for the inclusion and exclusion tasks. The results demonstrate moderate-sized positive correlations in the placebo condition between estrone and inclusion performance, between DHEA and inclusion performance, and between DHEAS and inclusion performance. These correlations did not occur in the exclusion task, nor did they occur on the inclusion or exclusion task in the DHEA condition. To interpret the positive correlations on the inclusion task in the placebo condition, we carried out a stepwise regression using estrone, DHEA, and DHEAS as potential predictors of inclusion D-prime. The results of this analysis indicated that estrone was a significant positive predictor of inclusion D-prime [F(1,40) = 7.07, p < .05], but DHEA and DHEAS did not independently prediction inclusion performance. The correlational results clarify the current results in important ways. First, the demonstration of a correlation between hormones levels and D-prime on the inclusion task, but not the exclusion task, Table 23.4 Correlations (Pearson r) Between Serum Levels of Hormones and Recognition Type of Drug

Hormone

DHEA Placebo

Memory task D-prime inclusion D-prime exclusion D-primeinclusion D-prime exclusion

DHEA DHEAS Testosterone Estrone Cortisol (ng/dl) (µg/dl) (ng/dl) (pg/dl) (µg/dl)

.18 .25 .38* .05

.09 .12 .36* .11

.05 .05 .24 .19

.08 .12 .39* –.17

.12 .14 .03 .16

*p < .05.

Y109937.indb 471

10/15/10 11:05:27 AM

472 • Bethany Stangl, Elliot Hirshman, and Joseph Verbalis

is consistent with the view that these tasks reflect different cognitive processes. Given these differences, the failure to find beneficial effects of DHEA administration on both tasks reinforces questions regarding whether DHEA administration can enhance recognition memory. Second, the correlation between estrone and inclusion performance in the placebo condition is specific in two ways. First, estrone is the only significant predictor of inclusion performance in the placebo condition in the step-wise regression. Second, there is no relation between cortisol, an adrenal control hormone, and inclusion performance in the placebo condition even though cortisol is positively correlated with other hormones in this condition. This specificity is consistent with the general view that estrogens are associated with enhanced episodic memory. The contrast between the relationship between estrone and inclusion performance and the failure of DHEA administration (which increases estrone) to enhance inclusion performance appears inconsistent. These contrasting findings may arise because (1) the positive effects of estrogens on memory performance only arise in the lower end of the range of values (i.e., the placebo condition); (2) DHEA’s metabolism into estrogens does not produce sufficiently large changes in estrogens to alter memory levels (see relative changes in estrone vs. androgens in Table 23.1); or (3) any positive effect of estrogens is counteracted by negative effects of other hormones that are altered by DHEA administration (Hirshman et al., 2004).

General Discussion The current results, coupled with a range of prior results from our laboratory and other laboratories, raise significant questions regarding whether DHEA administration can enhance verbal learning and memory. In this section, we briefly consider several features of the current design (period of administration, drug dosage, time of testing, materials used, cognitive tasks) and how they impact the generality of this conclusion. The administration of DHEA for four weeks represents an important feature of the current design. Given this length of administration, it is unlikely that an insufficient period of administration produced the current results. Similarly, the use of a 50 mg dose produces substantial increases in serum levels of hormones (see Table 23.1). One can, however, argue that the 50 mg dose is not sufficiently high to produce effects on learning and memory. For example, one can argue that a higher dose of DHEA is necessary to produce sufficient increases in estrogens to affect memory due to the indirect metabolism of DHEA into estrogens. In this context, it is important to note two important

Y109937.indb 472

10/15/10 11:05:27 AM

Administration of Dehydroepiandrosterone (DHEA) • 473

practical considerations. First, administering substantially higher doses of DHEA may produce significant side effects (e.g., mania), negating the practical usefulness of such dosing regimens. Second, and particularly germane to the above example, if the positive effect of DHEA administration is due solely to its metabolisms into estrogens, direct administration of estrogens may be a more appropriate therapeutic regime. One can also raise questions about whether our results would differ if cognitive testing occurred at a later time of day. Hirshman et al. (2004) did not see any such effects when cognitive testing occurred between 10 a.m. and 4 p.m. Moreover, even if time of administration were a critical variable, the restriction of beneficial effects to a very limited time period, coupled with DHEA’s rapid metabolism, would substantially limit the practical benefits of DHEA administration. The use of alternative tasks and materials represents the most promising avenue for detecting beneficial cognitive effects of DHEA administration. The strong androgenic effects of DHEA administration (see Table 23.1 results on DHEA, DHEAS, testosterone), coupled with the potential effects of androgens on visual-spatial performance (e.g., Wharton et al., 2008), suggest that DHEA administration may enhance visual-spatial performance, even if it does not affect verbal memory. This hypothesis is currently being examined.

References Casson, P., Lindsay, M., Pisarksa, M., Carson, S., & Buster, J. (2000). Dehydroepiandrosterone supplementation augments ovarian stimulation in poor responders: A case series. Human Reproduction, 15, 2129–2132. Derogatis, L. R., & Savitz, K. L. (2000). The SCL-90-R and Brief Symptom Inventory (BSI) in primary care. In M. E. Maruish (Ed.), Handbook of psychological assessment in primary care settings (pp. 297–334). Mahwah, NJ: Lawrence Erlbaum Associates. Hirshman, E., Merritt, P., Wang, C. C. L., Wierman, M., Budescu, D. V., Kohrt, W., et al., (2004). Evidence that androgenic and estrogenic metabolites contribute to the effects of dehydroepiandrosterone on cognition in postmenopausal women. Hormones and Behavior, 45(2), 144–155. Hirshman, E., Wells, E., Wierman, M., Anderson, B., Butler, A., Senholzi, M., & Fisher, J. (2003). The effect of dehyroepiandrosterone (DHEA) on recognition memory decision processes and discrimination in postmenopausal women. Psychonomic Bulletin and Review, 10(1), 125–134. Hogervorst, E., Williams, J., Budge, M., Riedel, W., & Jolles, J. (2000). The nature of the effect of female gonadal hormone replacement therapy on cognitive function in post-menopausal women. Neuroscience, 101, 485–512.

Y109937.indb 473

10/15/10 11:05:27 AM

474 • Bethany Stangl, Elliot Hirshman, and Joseph Verbalis

Jacoby, L. L., Toth, J. P., & Yonelinas, A. P. (1993). Separating conscious and unconscious influences of memory: Measuring recollection. Journal of Experimental Psychology: General, 122, 139–154. Kritz-Silverstein, D., von Mühlen, D., Laughlin, G. A., & Bettencourt, R. (2008). Effects of dehydroepieandrosterone supplementation on cognitive function and quality of life: The DHEA and Well-Ness (DAWN) Trial. Journal of the American Geriatrics Society, 56(7), 1292–1298. Labrie, F., Bélanger, A., Cusan, L., Gomez, J.-L., & Candas, B. (1997). Marked decline in serum concentrations of adrenal C19 sex steroid precursors and conjugated androgen metabolites during aging. Journal of Clinical Endocrinology and Metabolism, 82, 2396–2402. LeBlanc, E., Janowsky, J., Chan, B., & Nelson, H. (2001). Hormone replacement therapy in cognition: Systematic review and meta-analysis. JAMA: Journal of the American Medical Association, 285(11), 1489–1499. Morales, A., Nolan, J., Nelson, J., & Yen, S. (1994). Effects of replacement dose of dehydroepiandrosterone in men and women of advancing age. Journal of Clinical Endocrinology and Metabolism, 78, 1360–1367. Morales, A. J., Haubricht, R. H., Hwangt, J. Y., Asakura, H., & Yen, S. S. C. (1998). The effect of six months treatment with a 100 mg daily dose of dehydroepiandrosterone (DHEA) on circulating sex steroids, body composition and muscle strength in age-advanced men and women. Clinical Endocrinology, 49, 421–432. Mortola, J. F., & Yen, S. S. C. (1990). The effects of oral dehydroepiandrosterone on endocrine-metabolic parameters in postmenopausal women. Journal of Clinical Endocrinology and Metabolism, 71, 57–62. Orentreich, N., Brind, J. L., Vogelman, J. H., Andres, R., & Baldwin, H. (1992). Long-term longitudinal measurements of plasma dehydroepiandrosterone sulfate in normal men. Journal of Endocrinology and Metabolism, 75, 1002–1004. Reddy, D. S., & Kulkarni, S. K. (1998). The effects of neurosteroids on acquisition and retention of a modified passive-avoidance learning task in mice. Brain Research, 791, 108–116. Regelson, W., Loria, R., & Kalimi, L. R. (1994). Dehydroepiandrosterone (DHEA)—The “mother steroid.” I. Immunologic action. Annals of the New York Academy of Sciences, 719, 533–563. Rubino, S., Stomati, M., Bersi, C., Casarosa, E., Luisi, M., Petraglia, F., & Genazzani, A. (1998). Neuroendocrine effect of a short-term treatment with DHEA in post-menopausal women. Maturitas, 28, 251–257. Turvey, C., Schultz, S., Arndt, S., Wallace, R., & Herzog, R. (2000). Memory complaints in a community sample aged 70 and older. Journal of the American Geriatrics Society, 48, 1435–1441. Wharton, W., Hirshman, E., Merritt, P., Doyle, L., Paris, S., & Gleason, C. E. (2008). Oral contraceptives and androgenicity: Influences on visuospatial task performance in younger individuals. Experimental and Clinical Psychopharmacology, 16, 156–164.

Y109937.indb 474

10/15/10 11:05:27 AM

Administration of Dehydroepiandrosterone (DHEA) • 475

Wigstrom, H., & Gustaffson, B. (1985). On long-lasting potentiation in the hippocampus: A proposed mechanism for its dependence on coincident pre- and postsynaptic activity. Acta Physiologica Scandinavica, 123(4), 519–522. Wolf, O. T., Kudielka, B. M., Hellhammer, D. H., Hellhammer, J., & Kirschbaum, C. (1998b). Opposing effects of DHEA replacement in elderly subjects on declarative memory and attention after exposure to a laboratory stressor. Psychoneuroendocrinology 23, 617–629. Wolf, O. T., Naumann, O., Hellhammer, D., Geiben, A. C., Strasburger, C. J., Dressendorfer, R. A., et al., (1997). Effects of a two-week physiological dehydroepiandrosterone substitution on cognitive performance and well being in healthy elderly women and men. Journal of Clinical Endocrinology and Metabolism, 82(7), 2363–2367. Wolf, O. T., Naumann, O., Hellhammer, D., & Kirschbaum, C. (1998a). Effects of dehydroepiandrosterone replacement in elderly men on event-related potential, memory and well being. Journal of Gerontology: Medical Sciences, 53A, M385–M390.

Y109937.indb 475

10/15/10 11:05:27 AM

Y109937.indb 476

10/15/10 11:05:27 AM

24

On the Fruitful Relationship Between Functional Neuroimaging and Cognitive Theories of Human Learning and Memory Alan Richardson-Klavehn

Bob Bjork’s overarching view of human learning and memory as dynamic and constructive capabilities adapted to constraints of the brain “wetware”—notably the view of forgetting as a dynamic consequence of memory updating captured by the title of this volume—will enduringly influence our science. That view, which implies going beyond the functionalist computer metaphor (e.g., Richardson-Klavehn & Bjork, 2002), fits well with the view that contributions from neurobiology are essential to progressing cognitive theories (e.g., Buzsáki, 2006; Henson, 2005; Moscovitch, 2008; Richardson-Klavehn et al., 2009; Rugg, 2009). Here I hope to present some telling examples of such contributions from functional neuroimaging. I define neuroimaging to encompass both hemodynamic (e.g., functional magnetic resonance imaging, fMRI) and electrophysiological methods (e.g., electroencephalography, EEG, and magnetoencepalography, MEG), given advances in the spatial resolution of the latter methods, and I also hope to illustrate the benefits of relating these approaches. The chapter is organized into two main parts, the first addressing research on encoding processes, extending interests that I developed while a graduate student working with Bob (Richardson-Klavehn & Bjork, 1988), and the second addressing research on retrieval processes—and in particular research on retrieval inhibition, a topic close to Bob’s heart (e.g., Bjork, 1989, 2007). 477

Y109937.indb 477

10/15/10 11:05:28 AM

478 • Alan Richardson-Klavehn

Prestimulus Neural Oscillations at Encoding and Their Hemodynamic Correlates Encoding for Later Conscious Episodic Recollection Processing views of memory postulate that memory traces are a byproduct of the processing a stimulus receives at encoding (e.g., Bransford et al., 1979; Craik, 2002; Crowder, 1993; Kolers & Roediger, 1984; Lockhart, 2002; Roediger, Gallo, & Geraci, 2002). In short, encoding equals processing. Such views postulate dynamic interactions between encoding and retrieval processes, such that successful retrieval occurs when retrieval cues cause the processing that occurred at encoding to be recapitulated. They also emphasize the close interrelationship of retrieval and encoding, in the sense that the processing that occurs at encoding is itself an act of retrieval (e.g., stimuli are processed in relation to information already in memory), and that retrieval, in that it can increase the future accessibility of the retrieved information, is itself an act of encoding (e.g., Bjork, 1975; Landauer & Bjork, 1978; RichardsonKlavehn & Bjork, 2002). Such views were, therefore, central to progressing beyond the computer-inspired functionalist approach to memory (Craik, 2002; Richardson-Klavehn & Bjork, 2002; Richardson-Klavehn et al., 2009). Processing views account well for a vast range of data showing interactions between encoding and retrieval conditions in healthy participants across a variety of types of memory test (for reviews see, e.g., Brown & Craik, 2000; Richardson-Klavehn & Bjork, 1988; Roediger et al., 2002; Roediger & McDermott, 1993; ). Additionally, neuroimaging shows that the same informationally content-specific brain areas involved in processing at encoding can be reactivated during retrieval (e.g., Fenker et al., 2005; J. D. Johnson & Rugg, 2007; for reviews, see Khader & Rösler, 2009; Nyberg, 2002; Rugg et al., 2008). The processing perspective, however, has difficulty with data from patients suffering from amnesia owing to damage to medial temporal lobe and related limbic system structures in the brain (e.g., Cipolotti & Bird, 2006; Mayes, 2000; Squire, 2004; Squire, Stark, & Clark, 2004). These patients appear to process information in the same way as healthy participants, but are simply impaired in establishing new memory traces supportive of later conscious episodic recollection (e.g., Mayes, 2000). These data suggest that encoding is not just processing, but that an additional encoding step is necessary for episodic memory formation that depends on the brain areas damaged in amnesia (e.g., Craik, 2002; Moscovitch, 1992, 2000, 2008; Tulving, 2001). Futhermore, this step may be “cognitively silent” (Craik, 2002, p. 315), or informationally

Y109937.indb 478

10/15/10 11:05:28 AM

Relationship Between Neuroimaging and Cognitive Theories of Learning • 479

encapsulated and modular (Moscovitch, 1992, 2000, 2008), rendering it difficult to investigate further with behavioral and neuropsychological methods, despite its critical functional significance. Guderian et al. (2009) addressed this issue using a simple free-recall paradigm in which participants studied a series of 20-word lists, with words in each list being presented at a 2,750 ms rate, and with recall after each list being delayed by a 20 s distractor activity that removed recency effects. During study, words were processed either deeply (pleasantness judgments: semantic processing) or shallowly (syllable counting: phonemic processing), with this manipulation being between list but within subject. Whole-head MEG measurements (148 electromagnetic sensors) were taken during study of each list and analyzed in terms of their frequency (oscillatory) content during an epoch from 800 ms before study word presentation to 2,000 ms thereafter, at the level of single trials and participants, before averaging data across trials for each participant at each time point in the epoch (sampling rate, 508 Hz) for each of the four combinations of level of processing (deep vs. shallow) and memory fate (later recalled vs. later forgotten). The data, pooled across participants, revealed a striking difference in oscillatory activity in the 3 to 8 Hz theta range (most marked at ~7 Hz) between words later recalled and later forgotten, with higher theta amplitudes for later recalled words already occurring from 200 ms before stimulus onset at left anterior temporal sensors, and lower theta amplitudes for later recalled words between 600 and 900 ms in the poststimulus epoch at right posterior occipitotemporal sensors. Later recall success was related to prestimulus theta amplitude in an approximately linear way across the 24 participants as a group, with the slope relating recall success to theta amplitude being greater than zero in every individual participant. Reaction times at study, by contrast, were not related to prestimulus theta amplitudes, rendering explanations in terms of generalized arousal or other nonspecific factors unlikely. Recall did not exhibit significant serial position effects, owing to the incidental learning tasks and the distractor activity before recall, and prestimulus theta amplitudes did not vary depending on the later recall success of the prior two study list words, showing that the prestimulus difference was not a carryover effect from processing of previous study list words. Equally strikingly, the differences in theta amplitude were independent of the level of processing of the words at study (semantic vs. phonemic), despite the latter manipulation exerting its traditional influence on later recall performance, with better recall following semantic than phonemic processing. Because the prestimulus theta amplitude increase was associated with successful encoding, and encoding success was

Y109937.indb 479

10/15/10 11:05:28 AM

480 • Alan Richardson-Klavehn

more frequent during deep than during shallow study processing, it is possible that the favorable prestimulus theta state occurs more frequently during deep than during shallow study processing, but with the same amplitude regardless of level of study processing (Düzel & Guderian, 2009). Guderian et al. (2009) also present evidence that the prestimulus theta difference did not reflect selective sampling of a randomly fluctuating theta state, suggesting instead that it may reflect an anticipatory mnemonic state that provides the basis for enhanced synaptic plasticity in relation to stimulus-related processing. These findings may provide a real-time window on the hypothesized “cognitively silent” encoding step in healthy humans (e.g., Craik, 2002; Moscovitch, 1992, 2000, 2008; Tulving, 2001), which requires the integrity of the medial temporal lobe and connected limbic system structures. Consistent with this notion, the prestimulus theta differences found by Guderian et al. (2009) originated from the medial temporal lobe as revealed by current source density reconstructions. The medial temporal lobe, specifically the hippocampus, is also activated when words are successfully encoded in an fMRI version of the same free-recall task (e.g., Schott, Seiderbecher, et al., 2006b). Together these results provide strong evidence that the theta differences were specifically memory related, and complement a growing body of research in animals and humans linking theta oscillations with learning. Such research suggests that theta oscillations orchestrate interactions between medial temporal regions and distributed neocortical assemblies, which form the basis for encoding the activity patterns in these neocortical assemblies (e.g., Osipova et al., 2006; Sederberg et al., 2003) and for reinstating these activity patterns at retrieval (e.g., Guderian & Düzel, 2005), consistent with a number of influential theories of hippocampal and medial temporal lobe connectivity and function (for review, see Buzsáki, 2006; Düzel & Guderian, 2009; Guderian et al., 2009; Kahana, 2006; Kirov et al., 2009; Sejnowski & Paulsen, 2006). Neuroimaging in humans, therefore, together with animal research and modeling approaches, is beginning to resolve the apparent conflict between processing and systems views of encoding and retrieval, by elucidating the central role of medial temporal lobe structures in encoding processing-related brain activity patterns and reinstating these patterns at retrieval (see also Rugg et al., 2008). Our theta findings in this regard open numerous possibilities for further research exploring the relationship between processing and encoding, as well as possibilities for bringing favorable medial temporal lobe theta states under experimental control via neurofeedback (e.g., Bauer & Nirnberger, 1980) or brain stimulation (Kirov et al., 2009).

Y109937.indb 480

10/15/10 11:05:29 AM

Relationship Between Neuroimaging and Cognitive Theories of Learning • 481

Measuring brain activity during encoding of information that is later remembered versus later forgotten, using both electrophysiological and hemodynamic methods, has revealed a wealth of other new information about the timing and anatomy of memory formation that could not easily have been achieved by purely behavioral studies of memorydisordered patients and healthy participants (for review, see Paller & Wagner, 2002). However, such studies involve separating brain activity measurements at encoding between items that are, for a particular participant, inevitably different, with the later remembered versus later forgotten variable being uncontrolled or random. It has, therefore, been a perennial question as to whether the brain activity differences could reflect properties of the items themselves, such as frequency, imageability, meaningfulness, personal relevance, and so forth. The brain activity differences might then not be directly related to memory encoding. One answer to this potential problem (for others, see Paller & Wagner, 2002) is to measure item characteristics post hoc and show these do not differ between later remembered and forgotten items (e.g., Schott et al., 2002; Schott, Richardson-Klavehn, et al., 2006). It is nonetheless never possible to measure all potentially confounded item characteristics. Concerns about item characteristics are, however, more conclusively addressed when brain activity states predictive of later remembering versus forgetting temporally precede the onsets of the studied items, as in the case of the theta differences reported by Guderian et al. (2009), revealed by the high time resolution of MEG. Such brain state differences could not be confounded with item characteristics, because the items have not yet been presented when the brain state differences occur (see also Otten et al., 2006, for related evidence from EEG). Encoding for Later Perceptual-Lexical Repetition Priming Prestimulus oscillations have been identified not only in relation to encoding for conscious episodic recollection, but also in relation to encoding for later perceptual-lexical repetition priming. Initial attempts to isolate the neural correlates of such priming at encoding (e.g., Friedman, Ritter, & Snodgrass, 1996; Paller, 1990; Paller & Kutas, 1992; for review, see Paller & Wagner, 2002) were bedeviled by the problem that, in healthy participants, priming at retrieval can be accompanied by conscious recollection of the prior episode that led to priming, even when that priming reflects involuntary or automatic retrieval (e.g., Curran, 1999; Richardson-Klavehn, Gardiner, & Ramponi, 2002; Rugg, 1995; Schacter, 1987; Schacter & Buckner, 1998a; Schott et al., 2005; for a summary of the evidence and theoretical analysis, see Richardson-Klavehn, 2010). Given the use of standard

Y109937.indb 481

10/15/10 11:05:29 AM

482 • Alan Richardson-Klavehn

indirect (e.g., Richardson-Klavehn & Bjork, 1988) or incidental (e.g., Richardson-Klavehn, Gardiner, & Java, 1996) memory tests to assess priming, the neural measure of priming-related activity at encoding will, therefore, be contaminated with neural activity related to encoding for later involuntary conscious episodic recollection (Rugg, 1995; Schott et al., 2002; Schott, Richardson-Klavehn, et al., 2006). To address this problem, Schott et al. (2002) adapted a word stem completion procedure employed by Richardson-Klavehn and Gardiner (1995, 1996). At test, participants tried to complete each word stem with a studied word, but if they could not, they completed the stem with the first word coming to mind. They also reported whether the completed word came from the prior study list or was a new, nonstudied word. This procedure permits separation of the behavioral data at test for stems of studied words into remembered (consciously recollected) items, which are studied target items judged as being from the study list and which may be voluntarily or involuntarily retrieved (RichardsonKlavehn, 2010; see also Moscovitch, 1992, 2000, 2008); primed items, which are studied target items judged to be nonstudied and, by implication, also involuntarily retrieved; and forgotten (or unprimed) items, which are nonstudied (nontarget) items judged nonstudied. Brain activity measurements at encoding are then sorted according to these three later memory fates. It should be noted that the restricted behavioral definition of priming adopted for neuroimaging purposes—that is, priming in the absence of conscious recollection—does not preclude priming-related brain activity patterns also being observed for remembered items, and indeed such priming-related patterns are observed in fMRI data at retrieval for both primed and remembered items in contrast with correct rejections, which correspond to stems of nonstudied words completed with words judged nonstudied (Schott et al., 2005; for discussion, see Voss & Paller, 2008). With EEG, the contrasts of primed and forgotten (unprimed) items, and of remembered and forgotten (unprimed) items, allowed an eventrelated potential (ERP) correlate of encoding for later perceptual-lexical repetition priming to be isolated for the first time, and distinguished temporally and topographically from the neural correlates of encoding for later conscious recollection and of level of processing effects at encoding (Schott et al., 2002). Most notably, the neurocognitive activity predictive of later priming had already peaked before neurocognitive activity relating to later conscious recollection and level of processing began to be evident. Using whole-head MEG measured simultaneously with EEG during the study phase of the same experiment, Düzel et al. (2005) found that alpha (~10 Hz) and beta-2/gamma (>20 Hz) oscillatory

Y109937.indb 482

10/15/10 11:05:29 AM

Relationship Between Neuroimaging and Cognitive Theories of Learning • 483

activity predictive of later priming was already apparent during the time window of word identification itself (i.e., 80 to 200 ms after word onset; e.g., Cohen et al., 2000; Nobre, Allison, & McCarthy, 1994), with the beta-2/gamma oscillations having lower amplitudes, and the alpha oscillations higher amplitudes, for later primed than for later forgotten (unprimed) words. As revealed by current source density reconstructions, these alpha and gamma oscillatory differences originated in ventral visual stream areas known from independent lesion and neuroimaging evidence to support word identification and perceptuallexical priming, including extrastriate cortex and fusiform gyrus (e.g., Cohen et al., 2000; Dehaene et al., 2001; Eden & Moats, 2002; Gabrieli et al., 1995; Keane et al., 1995; Nobre et al., 1994). Furthermore, the increases in alpha amplitude predictive of later priming began about 80 ms before word onset and were accompanied by increases in synchrony (phase locking) across sensor groups reflecting the alpha sources, with these changes in alpha synchrony beginning around 200 ms before word onset. As argued above, such prestimulus differences cannot reflect item selection artifacts. Instead, we interpret the alpha differences as relating to item-specific attention, which is known to facilitate perceptual-lexical priming (e.g., Rajaram, Srinivas, & Travers, 2001; Richardson-Klavehn & Gardiner, 1998; Stone, Ladd, & Gabrieli, 2000; Stone et al., 1998; for review, see Rajaram & Travers, 2005; Richardson-Klavehn & Bjork, 1988). Such attention induces a more coordinated response in the word identification hierarchy (as reflected in the interareal alpha synchrony), which is associated with increased specificity or sharpness (e.g., Desimone, 1996) of neural responding to stimulus features, as reflected in reduced amplitude beta-2/gamma oscillations, thus facilitating the Hebbian learning (e.g., Sejnowski & Paulsen, 2006) within the hierarchy that underlies later priming. Consistent with this hypothesis, the beta-2/gamma oscillations had a more consistent phase relationship to word onset across trials for later primed than later forgotten (unprimed) words. The overall pattern of data, and our theoretical interpretation, is consistent with other data concerning attentional modulation of oscillations and synchrony, some of which also show prestimulus activity differences (e.g., Fries et al., 2001; Hanslmayr et al., 2007; Super et al., 2003; von Stein, Chiang, & König, 2000; for review, see Engel, Fries, & Singer, 2001; Uhlhaas et al., 2009). In an fMRI analog of the Düzel et al. (2005) study, Schott, Richardson-Klavehn, et al. (2006) localized priming-related hemodynamic activity at encoding to areas of the ventral visual stream similar to those implicated by the MEG data and further found that medial

Y109937.indb 483

10/15/10 11:05:29 AM

484 • Alan Richardson-Klavehn

temporal lobe and left frontal activations at encoding were specific to later conscious episodic recollection and not to later perceptual-lexical priming. Notably, the priming-related hemodynamic activity in ventral visual stream areas at encoding consisted of response decreases for later primed compared with later forgotten (unprimed) words (as discussed further at the end of this section). These results appear consistent with the priming-related gamma amplitude decreases just described, in view of evidence that gamma oscillatory activity may be the aspect of electrophysiological activity that correlates best with the fMRI blood-oxygen-dependent (BOLD) signal (e.g., Niessing et al., 2005). There were also priming-related hemodynamic response increases in parietal areas, which are often linked with attention (e.g., Behrmann, Geng, & Shomstein, 2004; Culham & Kanwisher, 2001) and are therefore consistent with our attentional interpretation of the alpha oscillation findings. These activations included inferior parietal areas (in our data, supramarginal gyrus and inferior parietal lobule) that have been linked with bottom-up, or captured, attention (e.g., Cabeza et al., 2008; Corbetta & Schulman, 2002), and the activation of which is negatively related to encoding for later conscious episodic recollection (Uncapher & Wagner, 2009), thus raising intriguing questions concerning the relationship between the alpha oscillation findings—which showed partly prestimulus modulations that may reflect anticipatory attention—and the respective roles of top-down (goal-directed) versus bottom-up (captured) attention in priming at encoding. (See Wimber, Heinze, & Richardson-Klavehn, 2010.) Schott, Richardson-Klavehn et al. (2006) additionally used inclusive masking analysis to reveal a considerable overlap between ventral visual stream areas showing priming-related deactivations at encoding [i.e., primed < forgotten (unprimed) items] and those showing primingrelated deactivations (i.e., primed < correct rejections) at retrieval in the data of Schott et al. (2005, Experiment 1; the encoding and retrieval data were both obtained in this same experiment). These data, taken together with the MEG data of Düzel et al. (2005), suggest that both encoding and retrieval processes related to perceptual-lexical repetition priming occur in part within brain areas known to be involved in stimulus identification and are thus consistent with theoretical views that attribute such priming to cognitively encapsulated or modular perceptual representation systems (e.g., Moscovitch, 1992, 2000, 2008; Tulving & Schacter, 2001). Moreover, medial temporal lobe activations both at retrieval (i.e., remembered > primed items) in Schott et al. (2005, Experiment 1) and at encoding [i.e., remembered > forgotten (unprimed) items] in Schott,

Y109937.indb 484

10/15/10 11:05:29 AM

Relationship Between Neuroimaging and Cognitive Theories of Learning • 485

Richardson-Klavehn, et al. (2006) were specific to conscious episodic recollection. The results, therefore, also fit in well with the natural explanation provided by such memory systems perspectives of the repetition priming in conjunction with impaired conscious episodic recollection found in amnesic patients with damage to medial temporal lobe and related limbic system structures (“Encoding for Later Conscious Episodic Recollection” section). They are also potentially accounted for by versions of processing approaches that view memory traces as being stored in brain areas responsible for the relevant aspects of stimulus processing (e.g., Crowder, 1993; Roediger et al., 2002), but with the caveat (e.g., Richardson-Klavehn & Bjork, 1988; Schacter, 1987) that these approaches still do not provide a natural explanation of why the pattern of repetition priming in conjunction with impaired conscious episodic recollection in amnesia cuts across major processing distinctions— most notably the distinction between perceptual and conceptual processing (e.g., Graf, Shimamura, & Squire, 1985; Shimamura & Squire, 1984; see also Ramponi, Richardson-Klavehn, & Gardiner, 2004, 2007; Wig, Buckner, & Schacter, 2009). The results do, however, at least at face value, appear to create more serious difficulty for some versions of processing approaches that attribute both priming and conscious episodic recollection to memory traces for the same whole prior processing episodes (e.g., Jacoby, 1983a, 1983b; Jacoby & Brooks, 1984) and for other, modeling-based, approaches that view priming and conscious episodic recollection as depending on common memory representations (e.g., Berry, Henson, & Shanks, 2006; Berry, Shanks, & Henson, 2006, 2008; Kinder & Shanks, 2001, 2003). More generally, the results illustrate that localization in the brain, especially when combined with timing information from electrophysiological measures, can be highly relevant to cognitive theories of memory, when the regions revealed have known functions based on other, independent evidence (e.g., Cabeza et al., 2008; Humphreys & Price, 2001; Poldrack, 2006; Richardson-Klavehn et al., 2009; Rugg, 2009; Uncapher & Wagner, 2009). A further critical caveat, however, is that human neuromaging data cannot currently conclusively indicate in which brain areas or networks memory traces are stored. Such knowledge awaits the bridging of the brain systems and neurotransmitter/ cellular levels (e.g., Düzel & Guderian, 2009; Richardson-Klavehn et al., 2009; for work that begins to bridge this gap, see, e.g., Neves, Cooke, & Bliss, 2008; Schott, Seidenbecher, et al., 2006; Schott et al., 2008). One feature of the results reviewed in this section requires further comment. The priming-related hemodynamic deactivations in ventral visual stream areas at encoding (Schott, Richardson-Klavehn et

Y109937.indb 485

10/15/10 11:05:29 AM

486 • Alan Richardson-Klavehn

al., 2006), and the accompanying beta-2/gamma amplitude decreases (Düzel et al. 2005), are counterintuitive and surprising, in view of the finding that similar areas also show priming-related hemodynamic deactivations at retrieval (e.g., Schott et al., 2005; for reviews, see GrillSpector, Henson, & Martin, 2006; Henson, 2003; Henson & Rugg, 2003; Schacter & Buckner, 1998a, 1998b; Schacter, Wig, & Stevens, 2007; Wiggs & Martin, 1998). Behavioral data on perceptual-lexical repetition priming lead to a substantially different expectation. Such priming appears to shift items up a negatively accelerated learning curve (e.g., Kirsner, Speelman, & Schofield, 1993), which, cognitively speaking, means that items that are less efficiently processed at encoding benefit more from the study exposure in terms of later improvement in processing efficiency (i.e., show more later priming) than do items already efficiently processed at encoding. For example, low-frequency words often show more perceptual-lexical priming than do high-frequency words (e.g., Jacoby & Dallas, 1981; for review, see Richardson-Klavehn & Bjork, 1988). Consistent with this regularity, conditions that increase cognitive processing efficiency at encoding, such as presenting items in coherent semantic contexts, reduce later perceptual-lexical priming of those individual items (e.g., Jacoby, 1983b; B. A. Levy & Kirsner, 1989; MacLeod, 1989; Oliphant, 1983; Osgood & Hoosain, 1974; for review, see B. A. Levy, 1993; Richardson-Klavehn & Bjork, 1988). If hemodynamic responses in ventral visual stream areas simply reflect the efficiency of neural processing related to stimulus identification, one would seemingly have to predict increased priming-related responses in these areas at encoding and decreased priming-related responses in these areas at retrieval, and not decreased responses at both encoding and retrieval. The observed neuroimaging patterns have led us (Schott, RichardsonKlavehn et al., 2006) to tentatively propose that the neural mechanisms underlying priming-related hemodynamic response decreases in ventral visual stream areas at encoding and retrieval may be different. At encoding, item-specific attention brings the neural stimulus identification hierarchy into a state conducive to Hebbian learning, meaning that neurons in the hierarchy display a more specific or sharpened response to stimuli that are later primed, as previously argued with respect to electrophysiological data (Düzel et al., 2005). Sharpened responding means that the tuning curves of neurons are narrower, so that only neurons that code the precise features of the stimulus respond (Desimone, 1996; Grill-Spector et al., 2006; Wiggs & Martin, 1998), and thus fewer neurons respond to later primed than later forgotten (unprimed) items. The Hebbian learning that results from this sharpened responding sets the stage for top-down feedback within the neural stimulus identification

Y109937.indb 486

10/15/10 11:05:30 AM

Relationship Between Neuroimaging and Cognitive Theories of Learning • 487

hierarchy at retrieval (e.g., Henson, 2003), so that less responding (or faster settling of the hierarchy into a stable state) accomplishes identification (e.g., Grill-Spector et al., 2006). This tentative hypothesis provides one possible explanation of reduced priming-related hemodynamic responses at both encoding and retrieval and integrates the electrophysiological data, but clearly requires further investigation. Such a neural hypothesis suggests, in turn, at a psychological level, that a simple one-factor “processing efficiency learning curve” view of priming might be insufficient. Perhaps reduced cognitive processing efficiency at encoding, such as with low-frequency words and words not embedded in coherent semantic contexts, is associated with greater later priming via enhanced item-specific attention (as hypothesized, e.g., by Richardson-Klavehn & Bjork, 1988).

Neural Processes of Retrieval Inhibition The Neural Basis of Retrieval Avoidance I now turn to a first example of the value of neuroimaging data in illuminating retrieval processes. The notion of retrieval inhibition at the functional level (e.g., Bjork, 1989, 2007) is in part inspired by the well-known occurrence of inhibitory processes at the neurobiological level and will be a lasting scientific legacy. One paradigm employed to study the hypothesized inhibitory control processes involves participants, through repeated training, learning to avoid retrieving previously learned information in the presence of cues that would normally remind them of that information. Such avoidance can sometimes lead to the information later being less retrievable than it otherwise would be owing to the normal forgetting that is correlated with the passage of time (e.g., Anderson & Green, 2001; Anderson et al., 2004). What mental processes happen at the very time that participants avoid retrieval that are responsible for the later forgetting fate of the avoided material? Because retrieval is being stopped during the avoidance training, there is no adequate behavioral index at the time of that training, and devising subjective (introspective) indices of whether, and how, retrieval has been avoided involves numerous scientific problems. Moreover, making later behavioral forgetting patterns a criterion for inferring what happened earlier during avoidance training only provides indirect evidence and risks circularity. Neuroimaging findings have begun to address these problems. Avoiding retrieval in the presence of reminders results in hemodynamic response increases in a number of brain areas implicated in cognitive

Y109937.indb 487

10/15/10 11:05:30 AM

488 • Alan Richardson-Klavehn

control, including dorsolateral prefrontal cortex, ventrolateral prefrontal cortex, frontopolar cortex, and anterior cingulate cortex (Anderson et al., 2004; Depue, Curran, & Banich, 2007). In contrast, hemodynamic activity in medial temporal lobe and visual representational regions is reduced (Anderson et al., 2004; Depue et al., 2007). The magnitude of the ERP correlate of conscious episodic recollection obtained from EEG, a late left parietal voltage positivity (e.g., Mecklinger & Jäger, 2009; Rugg, 2009), is also reduced during retrieval avoidance (Bergström et al., 2007), even when retrieval has previously been repeatedly practiced so as to be highly automatic (Bergström, de Fockert, & RichardsonKlavehn, 2009b). These findings collectively indicate successful retrieval avoidance via cognitive control, at the time when that avoidance occurs. Indeed, Bergström et al. (2007) showed that recollection of previously successfully learned material was reduced to the point where there was little difference in retrieval-related ERPs between avoiding retrieval of material previously successfully learned and attempting retrieval of material previously not successfully learned. Such neuroimaging findings have also begun to provide a resolution of the issue of why avoiding retrieval in the presence of reminders sometimes produces later forgetting (e.g., Anderson & Green, 2001; Anderson et al., 2004) and sometimes does not (e.g., Bergström et al., 2007, 2009b; Bulevich et al., 2006; Hertel & Calcaterra, 2005). Critical here are the questions of what strategies participants employ to avoid retrieval and of whether the later forgetting, if it occurs, is inhibitory (i.e., reflects suppression of the memory representation itself) or whether it reflects noninhibitory mechanisms such as interference from competing memories. Avoiding retrieval might involve substituting alternative thoughts for the avoided memory (Hertel & Calcaterra), which would produce interfering memories at retrieval, rather than inhibiting the representation of the avoided memory itself. Bergström, de Fockert, and Richardson-Klavehn (2009a) examined this issue by asking one randomly assigned group of participants to avoid retrieval by substituting alternative thoughts in the presence of reminders (thought substitution group), and another randomly assigned group to avoid retrieval by simply focusing on the reminder itself and keeping the to-be-avoided memory out of consciousness without substituting an alternative thought (thought suppression group). EEG (30 scalp electrodes) was recorded during this training phase and provided an index of whether these different instructions created any difference in retrieval avoidance strategies. That is, the null hypothesis that avoidance strategies were identical despite different instructions predicts no difference in brain activity between the groups as indexed by ERPs time

Y109937.indb 488

10/15/10 11:05:30 AM

Relationship Between Neuroimaging and Cognitive Theories of Learning • 489

locked to the presentation of the reminders during avoidance training. As it turned out, however, this null hypothesis of no difference in brain activity was clearly rejected, providing an example of the value of function-to-brain-activity inference, or forward inference, in neuroimaging (Henson, 2005, 2006; Richardson-Klavehn et al., 2009; Rugg, 2009). Critically, the different retrieval avoidance strategies, as created by the avoidance instruction manipulation and as indexed by the ERP data, also led to different later behavioral forgetting consequences. Both groups showed above-baseline forgetting for the avoided material (i.e., more forgetting on a final recall test than occurred for control material that was learned but did not participate in the avoidance training phase intervening between learning and final test). However, the thought substitution group only showed above-baseline forgetting when retrieval was cued with the same reminder as used during avoidance training, but not with a new reminder not used during avoidance training, suggesting that competing memories had been associated with the reminder used during avoidance training, creating later interference. In striking contrast, the thought suppression group showed above-baseline forgetting with both previously used and new reminders, suggesting that the representations of the avoided memories themselves had been inhibited (e.g., Anderson & Green, 2001; Anderson et al., 2004; Anderson & Spellman, 1995). Bergström et al. (2009b) also used brain-activity-to-function inference, or reverse inference (Henson, 2005; Poldrack, 2006; RichardsonKlavehn et al., 2009; Rugg, 2009), to draw specific conclusions about the nature of the strategies during avoidance training that led to these different types of later forgetting. An early (175 to 225 ms) N2-like ERP effect related to strategy for avoiding retrieval (because it was similar for avoided material later remembered and later forgotten) was twice as large in the thought suppression group as in the thought substitution group. The size of this ERP strategy effect predicted the amount of later inhibitory (i.e., cue-independent) forgetting of the avoided material at the level of individual participants in the thought suppression group, being larger for participants showing greater later forgetting. The early ERP effect has some apparent spatial and temporal similarities with the N2 effects observed during stopping of overt motor acts (see Falkenstein, 2006), although it emerged earlier and was smaller in magnitude than many motor-stopping N2 effects. Moreover, only the thought suppression group showed a reduction in the late (300 to 600 ms) left parietal ERP positivity related to conscious episodic recollection (e.g., Bergström et al., 2007, 2009b; Mecklinger & Jäger, 2009;

Y109937.indb 489

10/15/10 11:05:30 AM

490 • Alan Richardson-Klavehn

Rugg, 2009) for avoided material that was later forgotten compared with avoided material that was later remembered. Our behavioral and ERP findings complement previous hemodynamic evidence linking stopping of overt actions, stopping of memory retrieval, and later retrieval inhibition (e.g., Anderson et al., 2004; Depue et al., 2007) and suggest that different retrieval avoidance strategies lead to different kinds of later forgetting, one of which is inhibitory (e.g., Bjork, 1989, 2007) and the other noninhibitory (e.g., Hertel & Calcaterra, 2005). Whether or not our findings and theoretical interpretations are born out in future research, they illustrate the value of neuroimaging in learning paradigms in which measurements of overt behavior are not feasible, and that neuroimaging has developed to the point where brain activity can begin to act as a marker for cognitive processes (see also Hall, Gjedde, & Kupers, 2008; Norman, Quamme, & Newman, 2009; Poldrack, 2006; Richardson-Klavehn, 2010; Richardson-Klavehn et al., 2009). The Neural Basis of Retrieval-Induced Forgetting It seems both personally and scientifically fitting to conclude this chapter with recent neural evidence concerning theories of retrieval-induced forgetting, the phenomenon whereby repeated retrieval of material in response to particular retrieval cues leads causally to the later forgetting of other material associated with the same cues, over and above the forgetting that is correlated with the passage of time. This paradigm has yielded some of the most stable and convincing behavioral evidence for Bob’s (e.g., Bjork, 1989, 2007) neurobiologically driven ideas concerning retrieval inhibition (e.g., Anderson & Spellman, 1995). Once rendered tractable to experimental investigation (Anderson, Bjork, & Bjork, 1994), retrieval-induced forgetting turned out to be a highly general phenomenon across many types of memory test and across a remarkably wide range of material, including both semantic and episodic memories (e.g., Anderson & Bell, 2001; Bäuml, 2002; Ciranni & Shimamura, 1999; S. K. Johnson & Anderson, 2004; B. J. Levy et al., 2007; for reviews, see Anderson, 2003; Anderson & Levy, 2007; Bäuml, 2008; B. J. Levy & Anderson, 2002). Until recently, however, studies of the neural basis of retrieval-induced forgetting have focused on brain activity during repeated retrieval practice, under the assumption that the inhibitory mechanisms that lead to later retrieval-induced forgetting begin to operate during retrieval practice (Johansson et al., 2006; Kuhl et al., 2007; Wimber et al., 2009). By contrast, Wimber et al. (2008; see also Kuhl et al., 2008; Spitzer et al., 2009) focused on retrieval-induced forgetting at the later time when

Y109937.indb 490

10/15/10 11:05:30 AM

Relationship Between Neuroimaging and Cognitive Theories of Learning • 491

such forgetting is apparent (i.e., on a final test after retrieval practice). We sought neural markers of retrieval-induced forgetting that might point to inhibitory mechanisms impairing access to memory representations of non-retrieval-practiced materials and distinguish them from mechanisms of interference or blocking by the retrieval-practiced materials. Participants studied items from a total of 36 semantic categories, 8 items per category (i.e., a total of 288 items), with the 8 items consisting of four weak associates to the category cue (e.g., fruit–kiwi) and four strong associates to that cue (fruit–apple). Retrieval practice then took place for the four weak associates from 24 of the 36 categories (e.g., fruit–k___). At final recall, all items from all categories were to be recalled, again with the category name and the first letter of the item as a cue. This procedure yielded, at final recall, (1) retrieval-practiced items (P+ items) and their matched control items (C+ items), the latter being the weak associates from the categories not used during retrieval practice; and (2) non-retrieval-practiced items from practiced categories (P– items) and their matched control items (C– items), the latter being the strong associates from the categories that were not themselves practiced. We followed this procedure in order to maximize the strength of the retrieval-induced forgetting effect, because research shows that strong exemplars are more likely to interfere with retrieval of the weak exemplars during retrieval practice and are, therefore, more likely to have their final recall impaired by retrieval practice of the weak exemplars (e.g., Anderson et al., 1994; Bäuml, 1998). Practically speaking, the experiment was divided into six separate runs, each consisting of a study phase, a retrieval practice phase, a distractor phase, and a final recall phase, and trial timings and fMRI scanning parameters were optimized for the final recall phases. The behavioral data at final test replicated the typical pattern, with recall of the previously practiced P+ items being enhanced relative to their matched C+ control items, and recall of the unpracticed P– items from practiced categories being impaired relative to their matched C– control items. The behavioral differences between P+ and P– items at final test were accompanied by striking differences in hemodynamic response: Compared with successfully recalled P+ items, successfully recalled P– items elicited a larger response in left ventrolateral prefrontal regions (Brodmann areas, BAs, 45 and 47). By contrast, compared with successfully recalled P– items, successfully recalled P+ items elicited a larger response in the precuneus (BA 7) and the supramarginal gyrus (BA 40). Some areas thus identified showed impressive brain-behavior correlations, with greater activity in BA 47 (but not BA 45) and in a further left posterolateral temporal area, BA 22, being associated with

Y109937.indb 491

10/15/10 11:05:30 AM

492 • Alan Richardson-Klavehn

greater retrieval-induced forgetting for P– items at the level of individual participants. These same areas showed trends for correlations in the opposite direction with retrieval-induced enhancement for P+ items. By contrast, greater activity in BA 7 and BA 40 was associated with greater retrieval-induced enhancement for P+ items at the level of individual participants. The neural correlates of retrieval-induced forgetting and retrieval-induced enhancement were, therefore, dissociable, contrary to what one might expect from an interference-blocking interpretation of retrieval-induced forgetting (e.g., Williams & Zacks, 2001), which suggests that P– items are blocked at final test by the increased retrievability of the P+ items from the same semantic categories. Examining the hemodynamic responses of BAs 45 and 47 separately for P+ and P– items and their matched C+ (weak exemplars) and C– (strong exemplars) control items revealed a further striking pattern. BA 45, an area that has been associated with response competition (e.g., Badre et al., 2005; Badre & Wagner, 2007; Gold et al., 2006), showed little difference between C+ and C– control items, and a greater response for the impaired P– than the enhanced P+ items. By contrast, BA 47, an area that has been associated with the controlled or voluntary retrieval of weakly represented semantic information (e.g., Badre et al., 2005; Badre & Wagner, 2007; Gold et al., 2006), showed a larger response for the normatively weak C+ items than for the normatively strong C– items, consistent with such an interpretation of this area’s functional role. Prior retrieval practice reversed this pattern, with BA 47 showing a larger response for the P– items, which although normatively strong, had been impaired by retrieval practice, compared with the P+ items, which although normatively weak, had been enhanced by retrieval practice. The correlations between activity in BA 47, activity in the left posterolateral temporal area BA 22, and retrieval-induced forgetting across individual participants, with no correlation being observed for BA 45, are also consistent with an interpretation of the retrieval-induced forgetting in terms of the need for increased control of retrieval in order to access weakened semantic memory representations, under the assumption that the relevant temporal area may represent a neural substrate for those memory representations (e.g., Badre & Wagner, 2007; Gold et al., 2006). A further finding supportive of this interpretation is that activity in BA 47, but not in BA 45, functionally coupled with (correlated with) activity in BA 22 during retrieval of the impaired P– items. The overall pattern of results, therefore, strongly supports an inhibitory interpretation of retrieval-induced forgetting—as hypothesized by Bjork (e.g., 1989, 2007)—although the involvement of BA 45 suggests some additional role for response competition from items strengthened

Y109937.indb 492

10/15/10 11:05:30 AM

Relationship Between Neuroimaging and Cognitive Theories of Learning • 493

through retrieval practice. The results also provide a basis for understanding why retrieval-induced forgetting generalizes across semantic and episodic memory (e.g., Anderson & Bell, 2001; Bäuml, 2002; Ciranni & Shimamura, 1999; S. K. Johnson & Anderson, 2004; B. J. Levy et al., 2007). More generally, the results provide a good example of how forward and reverse inference in neuroimaging (Henson, 2005, 2006; Poldrack, 2006), although logically distinguishable, actually work together synergistically in practice (e.g., Richardson-Klavehn et al., 2009; Rugg, 2009; see also “Prestimulus Neural Oscillations at Encoding and Their Hemodynamic Correlates” and “The Neural Basis of Retrieval Avoidance” sections). On the forward inference side (Henson, 2006), the cognitive inhibition and interference-blocking theories of retrievalinduced forgetting led to different expectations about the extent to which the neural correlates of retrieval-induced forgetting and retrievalinduced enhancement should be dissociable. On the reverse inference side (Poldrack, 2006), existing neuroimaging knowledge and neurocognitive models of the different functional roles of particular brain areas— in this case BAs 45 and 47 (e.g., Badre & Wagner, 2007)—are used to inform interpretation of new neuroimaging data in the service of testing the cognitive theories. My argument in honor of Bob Bjork, and my bet for the future, is that such lines of neurobiological investigation will prove increasingly fruitful in furthering our theoretical understanding of the astounding human capability for learning and memory.

Acknowledgments Preparation of this chapter was supported by the German National Science Foundation (Deutsche Forschungsgemeinschaft, Grants DFG RI1847/1-1 and DFG SFB 779TPA7), and the Alexander von Humboldt Foundation (Alexander von Humboldt Stiftung). I thank all research collaborators whose work is reviewed here, and Aaron Benjamin, Gerasimos Markopoulos, Tobias Staudigl, and Maria Wimber for helpful comments on drafts.

References Anderson, M. C. (2003). Rethinking interference theory: Executive control and the mechanisms of forgetting. Journal of Memory and Language, 49, 415–445. Anderson, M. C., & Bell, T. (2001). Forgetting our facts: The role of inhibitory processes in the loss of propositional knowledge. Journal of Experimental Psychology: General, 130, 544–570.

Y109937.indb 493

10/15/10 11:05:30 AM

494 • Alan Richardson-Klavehn

Anderson, M. C., Bjork, R. A., & Bjork, E. L. (1994). Remembering can cause forgetting: Retrieval dynamics in long-term memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 1063–1087. Anderson, M. C., & Green, C. (2001). Suppressing unwanted memories by executive control. Nature, 410, 366–369. Anderson, M. C., & Levy, B. J. (2007). Theoretical issues in inhibition: Insights from research on human memory. In D. S. Gorfein & C. M. MacLeod (Eds.), Inhibition in cognition (pp. 81–102). Washington, DC: American Psychological Association. Anderson, M. C., Ochsner, K. N., Kuhl, B., Cooper, J., Robertson, E., Gabrieli, S. W., et al. (2004). Neural systems underlying the suppression of unwanted memories. Science, 303, 232–235. Anderson, M. C., & Spellman, B. A. (1995). On the status of inhibitory mechanisms in cognition: Memory retrieval as a model case. Psychological Review, 102, 68–100. Badre, D., Poldrack, R. A., Paré-Blagoev, E. J., Insler, R. Z., & Wagner, A. D. (2005). Dissociable controlled retrieval and generalized selection mechanisms in ventrolateral prefrontal cortex. Neuron, 47, 907–918. Badre, D., & Wagner, A. D. (2007). Left prefrontal cortex and the cognitive control of memory. Neuropsychologia, 45, 2883–2901. Bauer, H., & Nirnberger, G. (1980). Paired associate learning with feedback of DC potential shifts of the cerebral cortex. Archiv für Psychologie [Archive for Psychology], 132, 237–239. Bäuml, K.-H. (1998). Strong items get suppressed, weak items do not: The role of item strength in output interference. Psychonomic Bulletin and Review, 5, 459–463. Bäuml, K.-H. (2002). Semantic generation can cause episodic forgetting. Psychological Science, 13, 356–360. Bäuml, K.-H. (2008). Inhibitory processes. In H. L. Roediger III (Ed.), Learning and memory: A comprehensive reference (Vol. 2, pp. 195–220). Oxford, UK: Elsevier. Behrmann, M., Geng, J. J., & Shomstein, S. (2004). Parietal cortex and attention. Current Opinion in Neurobiology, 14, 212–217. Bergström, Z. M., de Fockert, J., & Richardson-Klavehn, A. (2009a). ERP and behavioural evidence for direct suppression of unwanted memories. Neuroimage, 48, 726–737. Bergström, Z. M., de Fockert, J., & Richardson-Klavehn, A. (2009b). Eventrelated potential evidence that automatic recollection can be voluntarily avoided. Journal of Cognitive Neuroscience, 21, 1280–1301. Bergström, Z. M., Velmans, M., de Fockert, J., & Richardson-Klavehn, A. (2007). ERP evidence for successful voluntary avoidance of conscious recollection. Brain Research, 1151, 119–133. Berry, C. J., Henson, R. N. A., & Shanks, D. R. (2006). On the relationship between repetition priming and recognition memory: Insights from a computational model. Journal of Memory and Language, 55, 515–533.

Y109937.indb 494

10/15/10 11:05:30 AM

Relationship Between Neuroimaging and Cognitive Theories of Learning • 495

Berry, C. J., Shanks, D. R., & Henson, R. N. A. (2006). On the status of unconscious memory: Merikle and Reingold (1991) revisited. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32, 925–935. Berry, C. J., Shanks, D. R., & Henson, R. N. A. (2008). A single-system account of the relationship between priming, recognition, and fluency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34, 97–111. Bjork, R. A. (1975). Retrieval as a memory modifier. In R. Solso (Ed.), Information processing and cognition: The Loyola Symposium (pp. 123– 144). Hillsdale, NJ: Erlbaum. Bjork, R. A. (1989). Retrieval inhibition as an adaptive mechanism in human memory. In H. L. Roediger, III & F. I. M. Craik (Eds.), Varieties of memory and consciousness: Essays in honor of Endel Tulving (pp. 309–330). Hillsdale, NJ: Erlbaum. Bjork, R. A. (2007). Inhibition: An essential and contentious concept. In H. L. Roediger, Y. Dudai, & S. M. Fitzpatrick (Eds.), Science of memory: Concepts (pp. 307–313). New York: Oxford University Press. Bransford, J. D., Franks, J. J., Morris, C. D., & Stein, B. S. (1979). Some general constraints on learning and memory research. In L. S. Cermak & F. I. M. Craik (Eds.), Levels of processing in human memory (pp. 331–354). Hillsdale, NJ: Erlbaum. Brown, S. C., & Craik, F. I. M. (2000). Encoding and retrieval of information. In E. Tulving & F. I. M. Craik (Eds.), Oxford handbook of memory (pp. 93–107). New York: Oxford University Press. Bulevich, J. B., Roediger, H. L., III, Balota, D. A., & Butler, A. C. (2006). Failures to find suppression of episodic memories in the think/no-think paradigm. Memory and Cognition, 34, 1569–1577. Buzsáki, G. (2006). Rhythms of the brain. New York: Oxford University Press. Cabeza, R., Ciaramelli, E., Olson, I. R., & Moscovitch, M. (2008). The parietal cortex and episodic memory: An attentional account. Nature Reviews Neuroscience, 9, 613–625. Cipolotti, L., & Bird, C. M. (2006). Amnesia and the hippocampus. Current Opinion in Neurology, 19, 593–598. Ciranni, M. A., & Shimamura, A. P. (1999). Retrieval-induced forgetting in episodic memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25, 1403–1414. Cohen, L., Dehaene, S., Naccache, L., Lehéricy, S., Dehaene-Lambertz, G., Hénaff, M. A., & Michel, F. (2000). The visual word form area: Spatial and temporal characterization of an initial stage of reading in normal subjects and posterior split-brain patients. Brain, 123, 291–307. Corbetta, M., & Schulman, G. L. (2002). Control of goal directed and stimulus driven attention in the brain. Nature Reviews Neuroscience, 3, 201–215. Craik, F. I. M. (2002). Levels of processing: Past, present … and future? Memory, 10, 305–318. Crowder, R. G. (1993). Systems and principles in memory theory: Another critique of pure memory. In A. Collins, S. E. Gathercole, M. A. Conway, & P. E. Morris (Eds.), Theories of memory (pp. 139–161). Hove, UK: Erlbaum.

Y109937.indb 495

10/15/10 11:05:31 AM

496 • Alan Richardson-Klavehn

Culham, J. C., & Kanwisher, N. G. (2001). Neuroimaging of cognitive functions in human parietal cortex. Current Opinion in Neurobiology, 11, 157–163. Curran, T. (1999). The electrophysiology of incidental and intentional retrieval: ERP old/new effects in lexical decision and recognition memory. Neuropsychologia, 37, 771–785. Dehaene, S., Naccache, L., Cohen, L., Bihan, D. L., Mangin, J. F., Poline, J. B., & Rivière, D. (2001). Cerebral mechanisms of word masking and unconscious repetition priming. Nature Neuroscience, 4, 752–758. Depue, B. E., Curran, T., & Banich M. T. (2007). Prefrontal regions orchestrate suppression of emotional memories via a two-phase process. Science, 317, 215–219. Desimone, R. (1996). Neural mechanisms for visual memory and their role in attention. Proceedings of the National Academy of Sciences USA, 93, 13494–13499. Düzel, E., & Guderian, S. (2009). Oscillatory and haemodynamic temporal responses preceding stimulus onset modulate episodic memory. In F. Rösler, C. Ranganath, B. Röder, & R. H. Kluwe (Eds.), Neuroimaging of human memory: Linking cognitive processes to neural systems (pp. 427– 442). New York: Oxford University Press. Düzel, E., Richardson-Klavehn, A., Neufang, M., Schott, B. H., Scholz, M., & Heinze, H.-J. (2005). Early, partly anticipatory, neural oscillations during identification set the stage for priming. Neuroimage, 25, 690–700. Eden, G. F., & Moats, L. (2002). The role of neuroscience in the remediation of students with dyslexia. Nature Neuroscience, 5, 1080–1084. Engel, A. K., Fries, P., & Singer, W. (2001). Dynamic predictions: Oscillations and synchrony in top-down processing. Nature Reviews Neuroscience, 2, 704–716. Falkenstein, M. (2006). Inhibition, conflict and the Nogo-N2. Clinical Neurophysiology, 117, 1638–1640. Fenker, D. B., Schott, B. H., Richardson-Klavehn, A., Heinze, H.-J., & Düzel, E. (2005). Recapitulating emotional context: Activity of amygdala, hippocampus and fusiform cortex during recollection and familiarity. European Journal of Neuroscience, 21, 1993–1999. Friedman, D., Ritter, W., & Snodgrass, J. G. (1996). ERPs during study as a function of subsequent direct and indirect memory testing in young and old adults. Cognitive Brain Research, 4, 1–13. Fries, P., Reynolds, J. H., Rorie, A. E., & Desimone, R. (2001). Modulation of oscillatory neuronal synchronization by selective visual attention. Science, 291, 1560–1563. Gabrieli, J. D. E., Fleischman, D. A., Keane, M. M., Reminger, S. L., & Morrell, F. (1995). Double dissociation between memory systems underlying explicit and implicit memory in the human brain. Psychological Science, 6, 76–82. Gold, B. T., Balota, D. A., Jones, S. J., Powell, D. K., Smith, C. D., & Andersen, A. H. (2006). Dissociation of automatic and strategic lexical-semantics: Functional magnetic resonance imaging evidence for differing roles of multiple frontotemporal regions. Journal of Neuroscience, 26, 6523–6532.

Y109937.indb 496

10/15/10 11:05:31 AM

Relationship Between Neuroimaging and Cognitive Theories of Learning • 497

Graf, P., Shimamura, A. P., & Squire, L. R. (1985). Priming across modalities and priming across category levels: Extending the domain of preserved function in amnesia. Journal of Experimental Psychology: Learning, Memory, and Cognition, 11, 386–396. Grill-Spector, K., Henson, R. N. A., & Martin, A. (2006). Repetition and the brain: Neural models of stimulus-specific effects. Trends in Cognitive Sciences, 10, 14–23. Guderian, S., & Düzel, E. (2005). Induced theta oscillations mediate large-scale synchrony with mediotemporal areas during recollection in humans. Hippocampus, 15, 901–912. Guderian, S., Schott, B. H., Richardson-Klavehn, A., & Düzel, E. (2009). Medial temporal theta state before an event predicts episodic encoding success in humans. Proceedings of the National Academy of Sciences USA, 106, 5365–5370. Hall, N. M., Gjedde, A., & Kupers, R. (2008). Neural mechanisms of voluntary and involuntary recall: A PET study. Behavioural Brain Research, 186, 261–272. Hanslmayr, S., Aslan, A., Staudigl, T., Klimesch, W., Herrmann, C. S., & Bäuml, K.-H. (2007). Prestimulus oscillations predict visual perceptual performance between and within subjects. Neuroimage, 37, 1465–1473. Henson, R. N., & Rugg, M. D. (2003). Neural response suppression, haemodynamic repetition effects, and behavioural priming. Neuropsychologia, 41, 263–270. Henson, R. N. A. (2003). Neuroimaging studies of priming. Progress in Neurobiology, 70, 53–81. Henson, R. N. A. (2005). What can functional neuroimaging tell the experimental psychologist? Quarterly Journal of Experimental Psychology, 58A, 193–233. Henson, R. N. A. (2006). Forward inference using functional neuroimaging: Dissociations versus associations. Trends in Cognitive Sciences, 10, 64–69. Hertel, P. T., & Calcaterra, G. (2005). Intentional forgetting benefits from thought substitution. Psychonomic Bulletin and Review, 12, 484–489. Humphreys, G. W., & Price, C. J. (2001). Cognitive neuropsychology and functional brain imaging: Implications for functional and neuroanatomical models of cognition. Acta Psychologica (Amsterdam), 107, 119–153. Jacoby, L. L. (1983a). Perceptual enhancement: Persistent effects of an experience. Journal of Experimental Psychology: Learning, Memory, and Cognition, 9, 21–38. Jacoby L. L. (1983b). Remembering the data: Analyzing interactive processes in reading. Journal of Verbal Learning and Verbal Behavior, 22, 485–508. Jacoby, L. L., & Brooks, L. R. (1984). Nonanalytic cognition: Memory, perception, and concept learning. In G. H. Bower (Ed.), The psychology of learning and motivation (Vol. 18, pp. 1–47). New York: Academic Press. Jacoby, L. L., & Dallas, M. (1981). On the relationship between autobiographical memory and perceptual learning. Journal of Experimental Psychology: General, 111, 306–340.

Y109937.indb 497

10/15/10 11:05:31 AM

498 • Alan Richardson-Klavehn

Johansson, M., Aslan, A., Bäuml, K.-H., Gabel, A., & Mecklinger, A. (2006). When remembering causes forgetting: Electrophysiological correlates of retrieval-induced forgetting. Cerebral Cortex, 17, 1135–1141. Johnson, J. D., & Rugg, M. D. (2007). Recollection and the reinstatement of encoding-related cortical activity. Cerebral Cortex, 17, 2507–2515. Johnson, S. K., & Anderson, M. C. (2004). The role of inhibitory control in forgetting semantic knowledge. Psychological Science, 15, 448–453. Kahana, M. J. (2006). The cognitive correlates of human brain oscillations. Journal of Neuroscience, 26, 1669–1672. Keane, M. M., Gabrieli, J. D., Mapstone, H. C., Johnson, K. A., & Corkin, S. (1995). Double dissociation of memory capacities after bilateral occipitallobe or medial temporal-lobe lesions. Brain, 118, 1129–1148. Khader, P., & Rösler, F. (2009). Content specificity of long-term memory representations. In F. Rösler, C. Ranganath, B. Röder, & R. H. Kluwe (Eds.), Neuroimaging of human memory: Linking cognitive processes to neural systems (pp. 283–298). New York: Oxford University Press. Kinder, A., & Shanks, D. R. (2001). Amnesia and the declarative/procedural distinction: A recurrent network model of classification, recognition, and repetition priming. Journal of Cognitive Neuroscience, 13, 648–669. Kinder, A., & Shanks, D. R. (2003). Neuropsychological dissociations between priming and recognition: A single-system connectionist account. Psychological Review, 110, 728–744. Kirov, R., Weiss, C., Siebner, H. R., Born, J., & Marshall, L. (2009). Slow oscillation electrical brain stimulation during waking promotes EEG theta activity and memory encoding. Proceedings of the National Academy of Sciences USA, 106, 15460–15465. Kirsner, K., Speelman, C. P., & Schofield, P. (1993). Implicit memory and skill acquisition: Is synthesis possible? In M. E. J. Masson & P. Graf (Eds.), Implicit Memory: New directions in cognition, development, and neuropsychology (pp. 119–140). Hillsdale, NJ: Erlbaum. Kolers, P. A., & Roediger, H. L., III (1984). Procedures of mind. Journal of Verbal Learning and Verbal Behavior, 23, 425–449. Kuhl, B. A., Dudukovic, N. M., Kahn, I., & Wagner, A. D. (2007). Decreased demands on cognitive control reveal the neural processing benefits of forgetting. Nature Neuroscience, 10, 908–914. Kuhl, B. A., Kahn, I., Dudukovic, N. M., & Wagner, A. D. (2008). Overcoming suppression in order to remember: Contributions from anterior cingulate and ventrolateral prefrontal cortex. Cognitive Affective and Behavioral Neuroscience, 8, 211–221. Landauer, T. K., & Bjork, R. A. (1978). Optimum rehearsal patterns and name learning. In M. M. Gruneberg, P. E. Morris, & R. N. Sykes (Eds.), Practical aspects of memory (pp. 625–632). London: Academic Press. Levy, B. A. (1993). Fluent rereading: An implicit indicator of reading skill development. In M. E. J. Masson & P. Graf (Eds.), Implicit memory: New directions in cognition, development, and neuropsychology (pp. 49–73). Hillsdale, NJ: Erlbaum.

Y109937.indb 498

10/15/10 11:05:31 AM

Relationship Between Neuroimaging and Cognitive Theories of Learning • 499

Levy, B. A., & Kirsner, K. (1989). Reprocessing text: Indirect measures of word and message level processes. Journal of Experimental Psychology: Learning, Memory, and Cognition, 15, 407–417. Levy, B. J., & Anderson, M. C. (2002). Inhibitory processes and the control of memory retrieval. Trends in Cognitive Sciences, 6, 299–305. Levy, B. J., McVeigh, M. D., Marful, A., & Anderson, M. C. (2007). Inhibiting your native language: The role of retrieval-induced forgetting during second-language acquisition. Psychological Science, 18, 29–34. Lockhart, R. S. (2002). Levels of processing, transfer-appropriate processing, and the concept of robust encoding. Memory, 10, 397–403. MacLeod, C. M. (1989). Word context during initial exposure influences degree of priming in word fragment completion. Journal of Experimental Psychology: Learning, Memory, and Cognition, 15, 398–406. Mayes, A. (2000). Selective memory disorders. In E. Tulving & F. I. M. Craik (Eds.), Oxford handbook of memory (pp. 427–440). New York: Oxford University Press. Mecklinger, A., & Jäger, T. (2009). Episodic memory storage and retrieval: Insights from electrophysiological measures. In F. Rösler, C. Ranganath, B. Röder, & R. H. Kluwe (Eds.), Neuroimaging of human memory: Linking cognitive processes to neural systems (pp. 357–381). New York: Oxford University Press. Moscovitch, M. (1992). Memory and working-with-memory: A component process model based on modules and central systems. Journal of Cognitive Neuroscience, 4, 257–267. Moscovitch, M. (2000). Theories of memory and consciousness. In E. Tulving & F. I. M. Craik (Eds.), Oxford handbook of memory (pp. 609–626). New York: Oxford University Press. Moscovitch, M. (2008). The hippocampus as a “stupid,” domain-specific module: Implications for theories of recent and remote memory, and of imagination. Canadian Journal of Experimental Psychology, 62, 62–79. Neves, G., Cooke, S. F., & Bliss, T. V. (2008). Synaptic plasticity, memory, and the hippocampus: A neural network approach to causality. Nature Reviews Neuroscience, 9, 65–75. Niessing, J., Ebisch, B., Schmidt, K. E., Niessing, M., Singer, W., & Galuske, R. A. W. (2005). Hemodynamic signals correlate tightly with synchronized gamma oscillations. Science, 309, 948–951. Nobre, A. C., Allison, T., & McCarthy, G. (1994). Word recognition in the human inferior temporal lobe. Nature, 372, 260–263. Norman, K. A., Quamme, J. R., & Newman, E. L. (2009). Multivariate methods for tracking cognitive states. In F. Rösler, C. Ranganath, B. Röder, & R. H. Kluwe (Eds.), Neuroimaging of human memory: Linking cognitive processes to neural systems (pp. 299–329). New York: Oxford University Press. Nyberg, L. (2002). Levels of processing: A view from functional brain imaging. Memory, 10, 345–348. Oliphant, G. W. (1983). Repetition and recency effects in word recognition. Australian Journal of Psychology, 35, 393–403.

Y109937.indb 499

10/15/10 11:05:31 AM

500 • Alan Richardson-Klavehn

Osgood, C. E., & Hoosain, R. (1974). Salience of the word as a unit in the perception of language. Perception and Psychophysics, 15, 168–192. Osipova, D., Takashima, A., Oostenveld, R., Fernández, G., Maris, E., & Jensen, O. (2006). Theta and gamma oscillations predict encoding and retrieval of declarative memory. Journal of Neuroscience, 26, 7523–7631. Otten, L. J., Quayle, A. H., Akram, S., Ditewig, T. A., & Rugg, M. D. (2006). Brain activity before an event predicts later recollection. Nature Neuroscience, 9, 489–491. Paller, K. A. (1990). Recall and stem-completion priming have different electrophysiological correlates and are modified differentially by directed forgetting. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 1021–1032. Paller, K. A., & Kutas, M. (1992). Brain potentials during memory retrieval provide neurophysiological support for the distinction between conscious recollection and priming. Journal of Cognitive Neuroscience, 4, 375–391. Paller, K. A., & Wagner, A. D. (2002). Observing the transformation of experience into memory. Trends in Cognitive Sciences, 6, 93–102. Poldrack, R. A. (2006). Can cognitive processes be inferred from neuroimaging data? Trends in Cognitive Sciences, 10, 59–63. Rajaram, S., Srinivas, K., & Travers, S. (2001). The effects of attention on perceptual implicit memory. Memory and Cognition, 29, 920–930. Rajaram, S., & Travers, S. (2005). Deselection effects in long-term memory. In N. Ohta, C. MacLeod, & B. Uttl (Eds.), Dynamic cognitive processes (pp. 191–217). Tokyo: Springer-Verlag. Ramponi, C., Richardson-Klavehn, A., & Gardiner, J. M. (2004). Level of processing and age affect involuntary conceptual priming of weak but not strong associates. Experimental Psychology, 51, 159–164. Ramponi, C., Richardson-Klavehn, A., & Gardiner, J. M. (2007). Component processes of conceptual priming and associative cued recall: The roles of preexisting representation and depth of processing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 33, 843–862. Richardson-Klavehn, A. (2010). Priming, automatic recollection, and control of retrieval: Toward an integrative retrieval architecture. In J. H. Mace (Ed.), The act of remembering (pp. 111–179). New York: Wiley-Blackwell. Richardson-Klavehn, A., Bergström, Z. M., Magno, E., Markopoulos, G., Sweeney-Reed, C. M., & Wimber, M. (2009). On the intimate relationship between neurobiology and function in the theoretical analysis of human learning and memory. In F. Rösler, C. Ranganath, B. Röder, & R. H. Kluwe (Eds.), Neuroimaging of human memory: Linking cognitive processes to neural systems (pp. 127–165). New York: Oxford University Press. Richardson-Klavehn, A., & Bjork, R. A. (1988). Measures of memory. Annual Review of Psychology, 39, 475–543. Richardson-Klavehn, A., & Bjork, R. A. (2002). Memory, long-term. In L. Nadel (Ed.), Encyclopedia of cognitive science (Vol. 2, pp. 1096–1105). London: Nature Publishing Group.

Y109937.indb 500

10/15/10 11:05:31 AM

Relationship Between Neuroimaging and Cognitive Theories of Learning • 501

Richardson-Klavehn, A., & Gardiner, J. M. (1995). Retrieval volition and memorial awareness in stem completion: An empirical analysis. Psychological Research, 57, 166–178. Richardson-Klavehn, A., & Gardiner, J. M. (1996). Cross-modality priming reflects conscious memory, but not voluntary memory. Psychonomic Bulletin and Review, 3, 238–244. Richardson-Klavehn, A., & Gardiner, J. M. (1998). Depth-of-processing effects on priming in word-stem completion: Tests of the voluntary-contamination, lexical-processing, and conceptual-processing hypotheses. Journal of Experimental Psychology: Learning Memory and Cognition, 24, 593–609. Richardson-Klavehn, A., Gardiner, J. M., & Java, R. I. (1996). Memory: Task dissociations, process dissociations and dissociations of consciousness. In G. Underwood (Ed.), Implicit cognition (pp. 85–158). Oxford, UK: Oxford University Press. Richardson-Klavehn, A., Gardiner, J. M., & Ramponi, C. (2002). Level of processing and the process-dissociation procedure: Elusiveness of null effects on estimates of automatic retrieval. Memory, 10, 349–364. Roediger, H. L., III, Gallo, D. A., & Geraci, L. (2002). Processing approaches to cognition: The impetus from the levels-of-processing framework. Memory, 10, 319–332. Roediger, H. L., III, and McDermott, K. B. (1993). Implicit memory in normal human subjects. In F. Boller & J. Grafman (Eds.), Handbook of neuropsychology (Vol. 8, pp. 63–131). Amsterdam, The Netherlands: Elsevier. Rugg, M. D. (1995). ERP studies of memory. In M. D. Rugg & M. G. H. Coles (Eds.), Electrophysiology of mind: Event-related brain potentials and cognition (pp. 132–170). Oxford, UK: Oxford University Press. Rugg, M. D. (2009). Functional neuroimaging and cognitive theory. In F. Rösler, C. Ranganath, B. Röder, & R. H. Kluwe (Eds.), Neuroimaging of human memory: Linking cognitive processes to neural systems (pp. 443–450). New York: Oxford University Press. Rugg, M. D., Johnson, J. D., Park, H., & Uncapher, M. R. (2008). Encodingretrieval overlap in human episodic memory: A functional neuroimaging perspective. In W. S. Sossin, J.-C. Lacaille, V. F. Castellucci, & S. Belleville (Eds.), Progress in brain research (Vol. 169, pp. 339–352). Amsterdam, The Netherlands: Elsevier. Schacter, D. L. (1987). Implicit memory: History and current status. Journal of Experimental Psychology: Learning, Memory, and Cognition, 13, 501–518. Schacter, D. L., & Buckner, R. L. (1998a). On the relations among priming, conscious recollection, and intentional retrieval: Evidence from neuroimaging research. Neurobiology of Learning and Memory, 70, 284–303. Schacter, D. L., & Buckner, R. L. (1998b). Priming and the brain. Neuron, 20, 185–195. Schacter, D. L., Wig, G. S., & Stevens, W. D. (2007). Reductions in cortical activity during priming. Current Opinion in Neurobiology, 17, 171–176.

Y109937.indb 501

10/15/10 11:05:31 AM

502 • Alan Richardson-Klavehn

Schott, B. H., Henson, R. N. A., Richardson-Klavehn, A., Becker, C., Thoma, V., Heinze, H.-J., & Düzel, E. (2005). Redefining implicit and explicit memory: The functional neuroanatomy of priming, remembering, and control of retrieval. Proceedings of the National Academy of Sciences USA, 102, 1257–1262. Schott, B. H., Minuzzi, L., Krebs, R. M., Elmenhorst, D., Lang, M., Winz, O. H., et al. (2008). Mesolimbic functional magnetic resonance imaging activations during reward anticipation correlate with reward-related ventral striatal dopamine release. Journal of Neuroscience, 28, 14311–14319. Schott, B. H., Richardson-Klavehn, A., Heinze, H.-J., & Düzel, E. (2002). Perceptual priming versus explicit memory: Dissociable neural correlates at encoding. Journal of Cognitive Neuroscience, 14, 578–592. Schott, B. H., Richardson-Klavehn, A., Henson, R. N. A., Becker, C., Heinze, H.-J., & Düzel, E. (2006). Neuroanatomical dissociation of encoding processes related to priming and explicit memory. Journal of Neuroscience, 26, 792–800. Schott, B. H., Seidenbecher, C. I., Fenker, D. B., Lauer, C. J., Bunzeck, N., Bernstein, H.-G., et al. (2006). The dopaminergic midbrain participates in human episodic memory formation: Evidence from genetic imaging. Journal of Neuroscience, 26, 1407–1417. Sederberg, P. B., Kahana, M. J., Howard, M. W., Donner, E. J., & Madsen, J. R. (2003). Theta and gamma oscillations during encoding predict subsequent recall. Journal of Neuroscience, 23, 10809–10814. Sejnowski, T. J., & Paulsen, O. (2006). Network oscillations: Emerging computational principles. Journal of Neuroscience, 26, 1673–1676. Shimamura, A. P., & Squire, L. R. (1984). Paired associate learning and priming effects in amnesia: A neuropsychological approach. Journal of Experimental Psychology: General, 113, 556–570. Spitzer, B., Hanslmayr, S., Opitz, B., Mecklinger, A., & Bäuml, K.-H. (2008). Oscillatory correlates of retrieval-induced forgetting in recognition memory. Journal of Cognitive Neuroscience, 21, 976–990. Squire, L. R. (2004). Memory systems of the brain: A brief history and current perspective. Neurobiology of Learning and Memory, 82, 171–177. Squire, L. R., Stark, C. E., & Clark, R. E. (2004). The medial temporal lobe. Annual Review of Neuroscience, 27, 279–306. Stone, M., Ladd, S. L., & Gabrieli, J. D. E. (2000). The role of selective attention in perceptual and affective priming. American Journal of Psychology, 113, 341–358. Stone, M., Ladd, S. L., Vaidya, C. J., & Gabrieli, J. D. E. (1998). Word-identification priming for ignored and attended words. Consciousness and Cognition, 7, 238–258. Super, H., van der Togt, C., Spekreijse, H., & Lamme, V. A. (2003). Internal state of monkey primary visual cortex (V1) predicts figure-ground perception. Journal of Neuroscience, 23, 3407–3414.

Y109937.indb 502

10/15/10 11:05:32 AM

Relationship Between Neuroimaging and Cognitive Theories of Learning • 503

Tulving, E. (2001). Does memory encoding exist? In M. Naveh-Benjamin, M. Moscovitch, & H. L. Roediger III (Eds.), Perspectives on human memory and cognitive aging: Essays in honor of Fergus Craik (pp. 6–27). New York: Psychology Press. Tulving, E., & Schacter, D. L. (1990). Priming and human memory systems. Science, 247, 301–306. Uhlhaas, P. J., Pipa, G., Lima, B., Melloni, L., Neuenschwander, S., Nikolić, D., & Singer, W. (2009). Neural synchrony in cortical networks: History, concept and current status. Frontiers in Integrative Neuroscience, 3, 1–19. Uncapher, M. R., & Wagner, A. D. (2009). Posterior parietal cortex and episodic encoding: Insights from fMRI subsequent memory effects and dualattention theory. Neurobiology of Learning and Memory, 91, 139–154. von Stein, A., Chiang, C., & König, P. (2000). Top-down processing mediated by interareal synchronization. Proceedings of the National Academy of Sciences USA, 97, 14748–14753. Voss, J. L., & Paller, K. A. (2008). Brain substrates of implicit and explicit memory: The importance of concurrently acquired neural signals of both memory types. Neuropsychologia, 46, 3021–3029. Wig, G. S., Buckner, R. L., & Schacter, D. L. (2009). Repetition priming influences distinct brain systems: Evidence from task-evoked data and restingstate correlations. Journal of Neurophysiology, 101, 2632–2648. Wiggs, C. L., & Martin, A. (1998). Properties and mechanisms of perceptual priming. Current Opinion in Neurobiology, 8, 227–233. Williams, C. C., & Zacks, R. T. (2001). Is retrieval-induced forgetting an inhibitory process? American Journal of Psychology, 114, 329–354. Wimber, M., Bäuml, K.-H., Bergström, Z. M., Markopoulos, G., Heinze, H.-J., & Richardson-Klavehn, A. (2008). Neural markers of inhibition in human memory retrieval. Journal of Neuroscience, 28, 13419–13427. Wimber, M., Heinze, H.-J., & Richardson-Klavehn, A. (2010). Distinct frontoparietal networks set the stage for later perceptual identification priming and episodic recognition memory. Journal of Neuroscience, in press. Wimber, M., Rutschmann, R. M., Greenlee, M. W., & Bäuml, K.-H. (2009). Retrieval from episodic memory: Neural mechanisms of interference resolution. Journal of Cognitive Neuroscience, 21, 538–549.

Y109937.indb 503

10/15/10 11:05:32 AM

Y109937.indb 504

10/15/10 11:05:32 AM

25

Age-Related Changes in the Episodic Simulation of Past and Future Events Daniel L. Schacter, Brendan Gaesser, and Donna Rose Addis

A great deal of modern memory research is aimed at uncovering the processes and mechanisms responsible for the encoding, storage, and retrieval of various kinds of information. Although this mechanistic orientation has been—and continues to be—highly productive, one undesirable consequence is that memory researchers have sometimes underemphasized or even ignored questions concerning the adaptive functions that memory serves. Robert Bjork is notable among contemporary researchers for a long-standing emphasis on both the mechanisms and functions of memory. To take just a couple of notable examples, Bjork’s influential analyses of directed forgetting and retrieval-induced forgetting not only have helped to clarify the nature of inhibitory processes that are critical to mechanistic explanations of these phenomena, but have also emphasized the adaptive functions served by such inhibitory (and related) processes, functions that are critical for using memory effectively in everyday life (e.g., Bjork, 1978, 1989; Storm, Bjork, & Bjork, 2008). In line with the focus on adaptive functions of memory pioneered by Bjork and colleagues, we have recently begun to examine what we believe to be a key adaptive feature of memory: its role in allowing us to imagine and simulate future events. While researchers have traditionally emphasized elucidating memory’s role in preserving and recovering past events, a rapidly growing number of recent studies have begun to examine how memory is used to build simulations of possible future 505

Y109937.indb 505

10/15/10 11:05:32 AM

506 • Daniel L. Schacter, Brendan Gaesser, and Donna Rose Addis

events (for reviews, see Buckner & Carroll, 2007; Schacter, Addis, & Buckner, 2007, 2008; Suddendorf & Corballis, 2007; Szpunar, 2010). Using past experience to anticipate possible future happenings should be adaptive, in that it allows individuals to mentally “try out” alternative approaches to an upcoming situation without having to expend the resources necessary to engage in the actual behaviors. Strikingly, recent research has uncovered a number of commonalities in the processes that support remembering the past and imagining the future. For example, cognitive studies indicate that manipulating variables such as valence and temporal distance from the present (e.g., D’Argembeau & van der Linden, 2004) and contextual vividness (Szpunar & McDermott, 2008) has a similar effect on the subjective experience of past and future events, as do individual differences in visual imagery abilities and emotion regulation strategies (D’Argembeau & van der Linden, 2006). Neuropsychological evidence reveals that amnesic patients exhibit deficits in imagining future or novel events (Hassabis, Kumaran, Vann, & Maguire, 2007b; Klein, Loftus, & Kihlstrom, 2002; see also, Tulving, 1985), and research with psychopathological populations, such as depression (e.g., Williams et al., 1996) and schizophrenia (e.g., D’Argembeau, Raffard, & Van der Linden, 2008), reveals similar impairments when such patients remember the past and imagine the future. Neuroimaging studies show that many of the brain regions that are active when remembering the past are similarly active when imagining the future (Addis, Pan, Vu, Laiser, & Schacter, 2009; Addis, Wong, & Schacter, 2007; Botzung, Dankova, & Manning, 2008; Hassabis, Kumaran, & Maguire, 2007; Okuda et al., 2003; Szpunar, Watson, & McDermott, 2007; for a meta-analysis, see Spreng, Mar, & Kim, 2009). We and others have argued that these regions—most notably, medial temporal lobe including parahippocampal cortex and hippocampus, posterior cingulate/retrosplenial cortex, inferior parietal lobule, as well as medial prefrontal and lateral temporal cortices—constitute a core network used in remembering the past, imagining the future, and related forms of mental simulation (cf., Addis, Pan, et al., 2009; Addis et al., 2007; Buckner & Carroll, 2007; Hassabis, Kumaran, & Maguire, 2007; Schacter et al., 2007, 2008; Spreng et al., 2009). Based on the above commonalities between remembering the past and imagining the future, Schacter and Addis (2007a, 2007b, 2009) proposed the constructive episodic simulation hypothesis. This hypothesis takes an adaptive approach to understanding constructive aspects of memory—distortions such as source misattribution and false recognition, where elements of prior experience may be confused (e.g., Schacter, 1999)—by linking them to memory’s role in simulating future events.

Y109937.indb 506

10/15/10 11:05:32 AM

Age-Related Changes in the Episodic Simulation of Past and Future Events • 507

The constructive episodic simulation hypothesis holds that past and future events draw on similar information and rely on similar underlying processes, and that episodic memory contributes to the construction of future events by extracting and recombining stored information into a simulation of a novel event. From this perspective, a critical function of a constructive memory system is to make information available for simulation of future events. While this adaptive function allows past information to be used flexibly when simulating alternative future scenarios, this very flexibility of memory may also result in vulnerability to source misattribution, false recognition, and related memory errors. We have recently begun to apply the constructive episodic simulation hypothesis to the study of cognitive aging. Though there has been a great deal of research on various aspects of memory in older adults, the same cannot be said of research on the simulation of future events. In fact, when we began our work on constructive episodic simulation, we were struck by the virtual absence of relevant data concerning future event simulation in older adults. While there have been numerous studies on prospective memory in older adults (for recent reviews, see Einstein & McDaniel, 2005; Kliegel, Jager, & Phillips, 2008), they have focused on remembering intentions to perform future acts, rather than on the process of simulating possible future events. Spreng and Levine (2006) reported a study that touched on an aspect of future event simulation in older adults, demonstrating similarities in the shapes of the temporal distributions of past and possible future autobiographical events provided by college students and older adults. During the past few years, we have conducted experiments that examine directly how aging affects the processes underlying future event simulation, and compared these effects to age influences on autobiographical memories. In this chapter, we will review and discuss our studies in light of the constructive episodic simulation hypothesis. As we will see, our work reveals robust changes in future event simulation in healthy older adults, as well as in patients in the early stages of Alzheimer’s disease. These findings support the constructive episodic simulation hypothesis, but at the same time raise a number of conceptual challenges that we consider in relation to theoretical accounts of cognitive aging.

Aging and Future Event Simulation: Experimental Evidence As noted earlier, it is well documented that normal aging is associated with a decline in various aspects of episodic memory (e.g., Craik

Y109937.indb 507

10/15/10 11:05:32 AM

508 • Daniel L. Schacter, Brendan Gaesser, and Donna Rose Addis

& Salthouse, 2000). While many of these deficits have been documented on tests involving word lists and related material, there is also evidence of age-related declines for autobiographical memories of everyday experiences: Older adults sometimes exhibit less specific memories of past experiences than do younger adults. Levine, Svoboda, Hay, Winocur, and Moscovitch (2002) documented this point using a procedure they called the autobiographical interview (AI). During the AI, young adults and older adults were asked to recall past personal events in response to probes over a period of five minutes. Transcriptions were segmented into distinct details that were classified as either internal (episodic) or external (semantic). Internal details involve the “episodic core” of the retrieved experience: specific details concerning who, what, where, and when. External details involve related facts, elaborations, or references to other events. Levine and colleagues found that when remembering past experiences during the AI, older adults generated fewer internal and more external details than younger adults. Because the constructive episodic simulation hypothesis posits a close relationship between episodic memory and imagining future events, the procedures and findings of Levine and colleagues provide a useful basis for initiating studies concerning future event simulation in older adults and testing the constructive episodic simulation hypothesis. We have completed four studies using the AI to examine remembering the past and imagining the future in healthy older adults and Alzheimer’s disease (AD) patients, which we now consider in turn.

Healthy Aging and Constructive Episodic Simulation: Initial Observations We began by asking a simple question: Do the findings of Levine et al. (2002) showing reduced episodic specificity of autobiographical memories with age also characterize imagined future events? It is possible that the pattern observed by Levine and collaborators reflects processes uniquely associated with memory for actual autobiographical experiences, but based on the constructive episodic simulation hypothesis, we predicted that age-related changes for remembered past events would extend to imaginary future events. To assess this hypothesis, we developed an adapted version of the AI (Addis, Wong, & Schacter, 2008). Sixteen younger and older adults (in this experiment and the others from our lab discussed in

Y109937.indb 508

10/15/10 11:05:33 AM

Age-Related Changes in the Episodic Simulation of Past and Future Events • 509

this chapter, the mean age for younger adults was early twenties and for older adults was early seventies) generated eight events in each of four conditions (past few weeks, past few years, next few weeks, and next few years) in response to a cue word. During each trial, participants were instructed to recall or imagine an event, and to generate as much detail as possible within three minutes. The event generated in response to a cue word did not have to strictly involve the named object, but each event had to be temporally and contextually specific (i.e., episodic), occurring over minutes or hours but not more than one day. Future events had to be plausible given the participant’s plans, and novel, that is, not previously experienced by the participant. The relevant cue word was displayed on a computer screen along with the task instruction (“recall past event” or “imagine future event”) and time period for the duration of the trial. If needed, general probes were given to clarify instructions and encourage further description of details. Recordings of remembered and imagined events were transcribed, and four events from each condition were scored by three raters blind to group membership, using the standardized AI scoring procedure (Levine et al., 2002) in which details were categorized as internal (episodic information relating to the central event) or external (nonepisodic information, including semantic details, extended events, and repetitions, as well as episodic information unrelated to the central event). Interrater reliability was high (Cronbach’s alpha: internal, .96; external, .92). The key results of the experiment are depicted in Figure 25.1. Replicating the findings first reported by Levine et al., we observed that older adults produced fewer internal and more external details than young adults for remembered past events (Figure 25.1a). Critically, we found the identical pattern for imagined future events. Note that these age differences in past and future events cannot be explained by group differences in the temporal distance of events: age groups did not differ significantly for temporal distance (in weeks) from the present for either past or future events. In addition to the similar effects of aging on the memory and imagination tasks, we observed that past and future scores were highly positively correlated with each other for both internal and external details (Figure 25.1b). By contrast, internal and external detail scores were uncorrelated with one another. Overall, then, these initial results indicate that aging has similar effects on remembering the past and imagining the future, consistent with predictions made by the constructive episodic simulation hypothesis.

Y109937.indb 509

10/15/10 11:05:33 AM

510 • Daniel L. Schacter, Brendan Gaesser, and Donna Rose Addis 60

Young

Mean Number of Details

Old 40

20

0

past week past year

next week next year

past week past year

Internal Detail Score

next week next year

External Detail Score

Mean Number External Future Details

Mean Number Internal Future Details

(a) 70 r = 0.82

60 50 40 30 20 10 20

30

40

50

60

70

r = 0.65

40 30 20 10

10

Mean Number Internal Past Details Young adults

20

30

40

50

Mean Number External Past Details Older adults

(b)

Figure 25.1 Results from an experiment by Addis et al. (2008) showing (A) the mean number of internal and external details generated by younger and older adults for past and future events as a function of time period (error bars represent standard errors of the mean), and (B) correlations between the mean number of internal (left panel) and external (right panel) details for past and future events. (Adapted from Addis, D. R., Wong, A. T., & Schacter, D. L., Psychological Science, 19, 33–41, 2008.)

Pathological Aging: Future Event Simulation in Alzheimer’s Disease The results of our initial study revealed tightly correlated age-related declines in remembering the past and imagining the future. As noted earlier in the chapter, the core network common to these functions includes medial and lateral temporal, prefrontal, and parietal regions. Intriguingly, these same regions are ones that exhibit pathology early in Alzheimer’s disease (Buckner et al., 2005; McKee et al., 2006), which suggested to us that AD patients should exhibit similar

Y109937.indb 510

10/15/10 11:05:37 AM

Age-Related Changes in the Episodic Simulation of Past and Future Events • 511

deficits in remembering the past and imagining the future. To test this hypothesis, Addis, Sacchetti, Ally, Budson, and Schacter (2009) used an adapted version the AI from our initial study of healthy older adults that maintained the main features of the design described earlier, but was simplified to make it appropriate for AD patients (e.g., patients and controls were asked to remember or imagine events from only the last few months or next few months; five past and five future trials instead of eight were administered). Scoring of internal and external details for past and future events followed the same protocol as described earlier. The study participants included 16 patients with a diagnosis of mild AD and 16 age-matched older controls with no significant history of other neurological or psychiatric impairment. Temporal distance of past and future events from the present did not differ between groups, indicating that any group differences on the AI cannot be accounted for by differences in temporal distance. Figure 25.2 displays the main results. Compared with healthy older adults, AD patients showed a marked decrease in internal details for both past and future events, and a more modest decrease in external details for both event types (Figure 25.2a). Given the overall reduction in detail for the AD patients, it becomes important to determine whether these differences are attributable to more general problems in verbal fluency that are often observed in AD patients. Indeed, we found that our patients were severely impaired on a semantic fluency test that requires participants to generate members of various categories (animals, fruits, and vegetables), and were nonsignificantly impaired on a phonemic fluency task that requires generating words beginning with F, A, and S. When we controlled for these deficits by including phonemic and category fluency as covariates in the analysis of internal and external details, we found that the group difference in internal details for past and future events was not affected, whereas the smaller difference in external details was even further reduced. Furthermore, as in our previous study, there were positive correlations between the past and future internal (.60) and external (.64) AI scores (Figure 25.2b), and these significant correlations were evident even when controlling for fluency differences and overall cognitive decline in AD patients. The main message of this study, then, is that mild AD patients exhibit comparable deficits in generation of episodic details for both past and future events that cannot be attributed to more general cognitive decline. These results are generally in line with the constructive episodic simulation hypothesis and are consistent with the findings described earlier that even mild AD patients show pathology in core network regions that are implicated in remembering the past and imagining the future.

Y109937.indb 511

10/15/10 11:05:37 AM

512 • Daniel L. Schacter, Brendan Gaesser, and Donna Rose Addis

Mean Number of Details

60

40

20

0

Past

Internal

Future

Past

Future External

60 40 20 10 20 30 40 50 Mean Future Internal Detail Score

Mean Past External Details Score

Mean Past Internal Details Score

(a)

OC

80 60 40 20 25 50 75 100 Mean Future External Detail Score

AD (b)

Figure 25.2 Results from an experiment showing the mean number of internal and external details generated for past and future events by patients with Alzheimer’s disease (AD) and healthy older controls (OC) (error bars represent standard errors of the mean), and (B) correlations between the mean number of internal (left panel) and external (right panel) details for past and future events. (Adapted from Addis, D. R., Sacchetti, D. C., Ally, B. A., Budson, A. E., & Schacter, D. L., Neuropsychologia, 47, 2660–2671, 2009b.)

Investigating Age-Related Changes With an Experimental Recombination Task While the data from our initial studies support our key hypotheses, they leave open a number of important issues. We addressed two of them in a subsequent study of healthy older adults (Addis, Musicaro, Pan, & Schacter, 2010). First, following from the constructive episodic simulation hypothesis, we have argued that deficits in imagining future events among healthy older adults and AD patients reflect an impairment in retrieving details from prior episodes and recombining them into a novel

Y109937.indb 512

10/15/10 11:05:39 AM

Age-Related Changes in the Episodic Simulation of Past and Future Events • 513

imaginary scenario. But this is not the only reason why such changes might occur, or indeed why the past and future tasks were so strongly correlated. Another possibility is that similarities between remembering the past and imagining the future occur because younger and older adults “recast” entire memories of past events into the future. For example, when asked to imagine a future event involving a table, a participant might remember an incident that occurred within the last few weeks and then simply imagine the same episode occurring again within the next few weeks. While we encouraged subjects to generate novel imaginary events, the adapted AI paradigm used in our initial study did not allow us to control experimentally whether subjects recombined or recast past events. A second issue left open by our early results concerns whether the deficit documented for imagining future events reflects a more general deficit in imagining or simulating any detailed episodic event, regardless of temporal direction. For example, brain activity or psychological characteristics attributed to future events could equally well be attributed to imagined events, irrespective of whether those events refer to the future, the past, or the present. Remembered events, of course, must refer to the past, but it is also possible to imagine events that might have occurred in one’s personal past. To address this issue in the context of aging, it is necessary to compare directly imagining the future and imagining the past. We addressed the foregoing issues by adapting an experimental recombination paradigm that we had first developed for a neuroimaging study of young adults (Addis, Pan, et al., 2009). In this paradigm, participants first provide a set of autobiographical memories, each composed of a person, place, and object. They later return for a separate session in which they are cued to recall some of these episodes in as much detail as possible. For the imagination trials, details concerning person, place, and object are experimentally recombined from different events and participants are asked to imagine an event involving the recombined person, place, and object. To examine questions concerning temporal direction of simulated events, participants are asked on some trials to imagine an event that might occur in the future, and on other trials to imagine an event that might have occurred in the past. The fMRI data from this experimental recombination procedure study revealed activation of the same core network of regions documented in our earlier study of remembering the past and imagining the future; moreover, the same pattern of increased activity for imagining was evident for both the imagined future and the imagined past (Addis, Pan, et al., 2009).

Y109937.indb 513

10/15/10 11:05:39 AM

514 • Daniel L. Schacter, Brendan Gaesser, and Donna Rose Addis

Applying this approach to aging (Addis et al., 2010), we predicted based on the constructive episodic simulation hypothesis that even under conditions of experimental recombination that do not allow the recasting of entire remembered events as imagined events, older adults should still exhibit reduced internal and increased external details for imagined events that parallel the findings for remembered events. Further, since the constructive episodic simulation hypothesis holds that the processes of retrieving and recombining event details thought to be affected by aging are involved in both imagined future and imagined past events, similar age differences should be observed for imagined future and imagined past events. In an initial session, young and healthy elderly participants retrieved memories of 35 events in response to a list of event cues. All events had to be from the past five years, and specific in time and place (i.e., lasting no longer than one day). Participants devised a brief title for each event and specified three details from each memory: a person (other than themselves), an object featuring in the memory, and the location at which the event occurred. These details were used to create events for the AI trials, conducted one to three weeks after the initial session. The stimuli for the AI session each contained a person, place, and object event detail, and the corresponding title of the memory related to each detail. For the past-recall trials, each stimulus set comprised the person, place, and object from a single event. For past- and future-imagine trials, stimulus sets comprised person, place, and object details taken all from the same event, or randomly recombined from either two or three events. We included this recombination load manipulation to allow examination of possible effects of the number of memories drawn upon in a simulation. For each trial, participants were shown a cueing slide that indicated the task to be completed and displayed the three key event details. They were instructed to recall or imagine an event and to generate as much detail about that event as possible within a three-minute time limit. For the future- and past-imagine event trials, participants were asked to imagine a plausible personal experience involving the person, location, and object specified on the cueing slide, that might occur in the next five years or might have occurred (but did not actually occur) in the last five years. Protocols were scored using the AI procedure for segmenting details into internal (episodic) and external (semantic). As in our initial study, there were no age differences in temporal distance of remembered or imagined events. The key results are depicted in Figure 25.3. First, we observed the same effects of aging on generation of internal and external details under conditions of experimental

Y109937.indb 514

10/15/10 11:05:40 AM

Age-Related Changes in the Episodic Simulation of Past and Future Events • 515

Young

60

Old

40

20

0

Past-Imagine

Future-Imagine

Past-Recall

100

r = 0.81

80 60 40 20 20 40 60 80 Mean Future-Imagine Internal Detail

Past-Imagine

(a) Mean Future-Imagine Internal Detail

Mean Past-Imagine Internal Detail

Internal

80

r = 0.71

60 40 20 20 40 60 80 Mean Past-Recall Internal Detail Young

Mean Past-Imagine Internal Detail

Mean Number Deails Generated

recombination as in our original study: Older adults produced significantly fewer internal details and significantly more external details than did younger adults (Figure 25.3a). Second, this pattern of effects across age groups was qualitatively similar for imagined future events and imagined past events (Figure 25.3a). Third, the internal detail scores were strongly positively correlated across all three event conditions (see Figure 25.3b), and positive correlation of similar magnitudes was also seen for external detail scores. The findings that older adults showed reduced internal and increased external details under conditions of experimental recombination that preclude recasting remembered events as imagined events, and that detail scores across memory and imagination conditions were highly correlated, indicate that our initial observations of age-related reductions in

Future-Imagine

External

100

Past-Recall

r = 0.78

80 60 40 20 20 40 60 80 Mean Past-Recall Internal Detail

Old

(b)

Figure 25.3 Results from an experiment showing the mean number of internal and external details generated for recalled past events, imagined past events, and imagined future events by young and older adults (error bars represent standard errors of the mean), and (B) correlations between the mean number of internal details for imagined past and future events (left panel), imagined future and recalled past events (middle panel), and imagined past and recalled past events (right panel). (Adapted from Addis, D. R., Musicaro, R., Pan, L., & Schacter, D. L., Psychology and Aging, 25, 369–376, 2010.)

Y109937.indb 515

10/15/10 11:05:40 AM

516 • Daniel L. Schacter, Brendan Gaesser, and Donna Rose Addis

internal details during imagined future events, and strong correlations between remembering and imagining, cannot be attributed to recasting remembered past events as imagined future events. The finding that aging had similar effects on imagined past and imagined future events indicates that the age-related reduction in imagined internal details is not specific to temporal direction, but instead reflects a broader change in the ability to imagine that might occur in the future and that might have occurred in the past. The experiment also yielded a couple of other noteworthy findings. We found a significant effect of recombination load for both age groups, which was driven by an increase in the number of details imagined when the three presented details were drawn from the same event versus when they were drawn from three different events. Compared with events simulated under low recombination load, events simulated under high recombination load were lower in both internal and external detail. Further, the rated similarity of imagined events to previous experiences declined with higher recombination load, thereby demonstrating that the recombination manipulation induces people to simulate highly novel, imaginary events. Imagined events were also scored for how cohesively the person, place, and object details were integrated into one temporally and contextually specific event (event integration). Even when all three details are included in an event that occurs over one day (thus meeting the criterion of episodic), this criterion can be achieved by describing multiple mini-events occurring in the same day, but where none of the details actually coincide in the same temporal or spatial context. For example, when integrating a Boston friend, a Florida location, and an umbrella, one could imagine (1) being in Miami, walking in the rain with an umbrella, and unexpectedly bumping into the Boston friend, or (2) being in Miami walking in the rain with an umbrella, then flying back to Boston and dining with the Boston friend. Thus, we scored how many of the details coincided temporally and contextually in a coherent event (coinciding details score; 3 = all three details coincide, 2 = two details coincide in one mini-event, the other detail features in a separate mini-event; 1 = no details coincide). We found that this coinciding details score was significantly lower in older adults for both imagined past and future events, but the group difference was significant only when the critical details were drawn from three past events. These observations indicate that older adults have problems constructing a unified episode on the basis of recombined event details, especially when all the details are extracted from different past events.

Y109937.indb 516

10/15/10 11:05:41 AM

Age-Related Changes in the Episodic Simulation of Past and Future Events • 517

Age-Related Changes in Remembering and Imagining: How General Are They? All of the foregoing findings from the experimental recombination task are consistent with predictions made by the constructive episodic simulation hypothesis. Nonetheless, we noted (Addis et al., 2010) that the observed age-related deficits could also be attributable to general factors outside the domain of episodic memory that impact performance on the AI. Studies from the verbal discourse literature, for instance, have shown that older adults sometimes generate more off-topic speech that is irrelevant to the assigned task than do younger adults (Arbuckle & Gold, 1993; James, Burke, Austin, & Hulme, 1998; Trunk & Abrams, 2009); there may be some overlap between external details on the AI and irrelevant or off-topic speech. Production of off-topic speech could result from age-related deficits in inhibitory control (Zacks & Hasher, 1994) or differences in narrative style and communicative goals (Coupland & Coupland, 1995) when describing personal events. Either or both of these age-related changes could be relevant to our findings on the AI. For example, older adults might be unable to inhibit off-topic items that are coded as external details, or might be more focused on conveying the general significance and meaning of experiences than younger adults (e.g., Coupland & Coupland, 1995; James et al., 1998; LabouvieVief & Blanchard-Fields, 1982), perhaps resulting in more external and fewer internal details during both memory and imagination. To begin examining these possibilities, we (Gaesser, Sachetti, Addis, & Schacter, in press) have recently conducted experiments in which older and younger adults were instructed to describe a complex picture of a natural scene in as much detail as possible. The same pictures were used to cue either simulations of imagined events or memories of past events, and we scored the resulting protocols for internal and external details using an adapted version of the AI. We reasoned that if previously documented age-related declines in internal details during memory and imagination reflect the influence of either inhibitory deficits or changes in narrative style, then (1) similar patterns should be observed on the picture description task, and (2) no effects of aging should be observed in imagination or memory conditions after controlling for picture description performance. On the other hand, if age differences in memory and imagination revealed by the AI are explained by changes in episodic memory, then (1) age differences should be greater on the imagination and memory tasks than on the picture description task, and (2) there should be effects of aging in imagination or memory conditions even after controlling for picture description performance.

Y109937.indb 517

10/15/10 11:05:41 AM

518 • Daniel L. Schacter, Brendan Gaesser, and Donna Rose Addis

We conducted two experiments to test these ideas, but since the results were nearly identical, we focus here on Experiment 2 in Gaesser et al. (in press). In this experiment, 15 young and 15 healthy older adults were shown colored photographs that depict people engaged in a particular activity or set of activities, and were instructed either to (1) describe the different people, objects, and environment in the picture and their relationship to one another; (2) imagine events that could possibly occur in the next few years with the picture as the general setting; or (3) remember a personal experience that occurred in the last few years, using the picture as a cue to help focus on an event. There were four pictures per condition, and for each picture, participants were given three minutes to describe, imagine, or remember. For the picture description task, details that described the depicted scene were classified as internal; details that were inferred were classified as external. In contrast, for the imagination and memory tasks, details that went beyond the picture to an imagined or remembered experience were considered internal, while details that simply described the depicted scene were considered external. The main results, presented in Figure 25.4, are straightforward: Older adults generated fewer internal and more external details than younger adults in all three experimental conditions. To explore further whether there is a contribution of aging to imagination and memory performance beyond a general ability to describe events tapped by the picture description task, we conducted a series of hierarchical multiple regression analyses. There were several key outcomes of these analyses:

Mean Number of Details

80

Young Old

60 40 20 0

Description Imagination Internal

Memory

Description Imagination

Memory

External

Figure 25.4 Results from an experiment by Gaesser, Sacchetti, Addis, and Schacter (2009, submitted) depicting the mean number of internal and external details generated by younger and older adults in the picture description, imagination, and memory conditions. Error bars represent standard errors of the means. Figure adapted from Gaesser et al., Psychology and Aging, in press.

Y109937.indb 518

10/15/10 11:05:41 AM

Age-Related Changes in the Episodic Simulation of Past and Future Events • 519

(1) The number of internal details on the picture description task was a significant predictor of the number of internal details on the imagination and memory tasks; (2) when age was added as a predictor, it significantly—though modestly—improved the model’s ability to account for variance in memory and imagination performance; and (3) there was a small age deficit in imagination performance even after controlling for picture description and memory performance. Thus, while the results of this study highlight that nonepisodic factors contribute to age-related changes in remembering the past and imagining the future, not all age-related reductions in episodic details during memory and imagination were accounted for by picture description performance, in line with the constructive episodic simulation hypothesis.

Age-Related Changes in Episodic Simulation: Conclusions and Implications The studies reviewed here consistently show that aging is associated with parallel changes in remembering actual events that occurred in the past and simulating imaginary events that might occur in the future (or might have occurred in the past). Our findings fit well with the recently emerging literature discussed earlier that has revealed a variety of similarities at both cognitive and neural levels between remembering the past and imagining the future, and generally support the constructive episodic simulation hypothesis. We conclude by considering two general issues that seem especially worth pursuing. The first issue emerges from the aforementioned results comparing performance on the picture description task with memory and imagination tasks (Gaesser et al., in press). The fact that older adults showed largely parallel changes (reduced internal and increased external details) when describing a perceived scene, and when remembering a past event or imagining a future event, suggests that processes outside the domain of episodic memory contribute to age-related differences on the AI. From this perspective, a key task for future research is to distinguish among and further specify the nonepisodic sources of age-related performance changes during memory and imagination, such as changes in narrative style associated with age-related changes in communicative goals (Coupland & Coupland, 1995; James et al., 1998; Labouvie-Vief & Blanchard-Fields, 1982; Trunk & Abrams, 2009) or inhibitory deficits that result in the production of task-irrelevant information (Arbuckle & Gold, 1993; Arbuckle, Nohara-LeClair, & Pushkar, 2000; Zacks & Hasher, 1994). Previous studies on narrative communication in older adults have provided some support for each of these views. On the one hand, James

Y109937.indb 519

10/15/10 11:05:42 AM

520 • Daniel L. Schacter, Brendan Gaesser, and Donna Rose Addis

et al. (1998) report that older adults generated more off-topic speech than younger adults when asked to discuss autobiographical topics (describing their family, educational background, and a memorable vacation), but did not produce more off-topic speech when describing pictures for which personal information is not relevant (two unknown paintings taken from art history books and a simple black-and-white line drawing used in neuropsychological evaluations). James et al. argue that such findings provide evidence against an inhibitory deficit account of off-topic speech, which would predict more off-topic speech in both autobiographical narratives and picture descriptions. Instead, James et al. argue that off-topic speech reflects age-related changes in narrative style, with older adults more focused on communicating on the broad significance of life experiences rather than providing a concise narrative. Consistent with this view, they note that older adults’ narratives were rated as more interesting and informative than those of younger adults (see also Trunk & Abrams, 2009). On the other hand, and consistent with an inhibitory deficit account, several studies have found age differences in picture description (e.g., Wright, Capilouto, Wagovich, Cranfill, & Davis, 2005) as well as age-related increases in off-topic speech for nonpersonal topics (Arbuckle et al., 2000). According to the logic of James et al. (1998), our finding of agerelated changes on both nonautobiographical (i.e., picture description) and autobiographical (i.e., memory and imagination) tasks provides evidence in favor of an inhibitory deficit account. Of course, such logic presupposes that any age differences in narrative style or communicative goals that might account for our results would be evident on the memory and imagination tasks but not on the picture description task, and there is no direct evidence to support such an assumption. Moreover, it is important to note that the measures used by James et al. (and by others who have studied narrative discourse and aging) differ substantially from ours. None of these studies used measures that correspond to internal details in our work with the AI, and perhaps more important, external details on the AI likely overlap only to some extent with what has been characterized as off-topic speech in other studies. It is also worth considering another possible interpretation of the age differences in picture description that we observed. Although the picture description task clearly does not require episodic memory, it may nonetheless recruit some of the same processes involved in constructing autobiographical memories and simulating future events, as subjects attempt to “tell the story” depicted in the picture they are viewing. While the instructions used in our picture description task indicated that participants should describe picture contents without

Y109937.indb 520

10/15/10 11:05:42 AM

Age-Related Changes in the Episodic Simulation of Past and Future Events • 521

embellishment, informal examination of protocols produced during picture description indicates that they have a narrative structure that resembles protocols from the memory and imagination tasks. To the extent that subjects approach the task as one requiring a narrative, older adults may perform differently from younger adults for some of the same reasons as in the memory and imagination conditions. Similarly, perceiving and describing a complex scene may require short-term episodic or working memory to hold and integrate components of the scene (e.g., Hannula, Tranel, & Cohen, 2006), possibly creating problems for older adults. Currently available data do not speak to these possibilities, but future studies that manipulate task demands, scene complexity, or the narrative coherence of a scene could provide useful insights. A second issue worth noting is that while our work to date has focused on the analysis of age-related cognitive changes during remembering and imagining, it will also be important to initiate investigation of the underlying neural correlates of these changes. Many neuroimaging studies of age-related changes in memory have been reported (for review, see Grady, 2008), but we are not aware of any that have directly compared the neural correlates of remembered and imagined events in older adults. One particularly interesting question is whether such studies would reveal any age-related changes in the hippocampus during experimental paradigms like those that we have used to document hippocampal activation during remembering imagining in young adults (for review, see Schacter & Addis, 2009). Given that our behavioral studies have revealed reduced internal details during remembering and imagining in both healthy older adults and mild AD patients, who typically exhibit pathology in the hippocampus and related structures (Addis, Sacchetti, et al., 2009), we might expect to see changes in hippocampal activity during remembering and imagining in healthy elderly. By contrast, it has been argued that changes in the hippocampus and other components of the medial temporal lobe are important for AD but not for healthy aging, where memory-related changes may be more closely associated with frontal-striatal systems (e.g., Buckner, 2004). From this perspective, age-related changes in neural activity during remembering and imagining might be expected to occur primarily outside the hippocampus, possibly in the frontal-striatal networks implicated in other age-related changes. Another possibility is that older adults may engage regions mediating conceptual autobiographical information, such as anterior lateral temporal cortices (Addis, McIntosh, Moscovitch, Crawley, & McAndrews, 2004; Graham, Lee, Brett, & Patterson, 2003), more so than young adults in order to support the increased generation of external details. Whatever the

Y109937.indb 521

10/15/10 11:05:42 AM

522 • Daniel L. Schacter, Brendan Gaesser, and Donna Rose Addis

outcome, studies that allow us to link cognitive and neural processes when older adults remember past events and simulate possible future events should go a long way toward broadening our understand of the nature of function of age-related changes in memory.

Acknowledgments The research described in this chapter was supported by National Institute on Aging grant AG08441, awarded to D.L.S. D.R.A. was supported by a Royal Society of NZ Marsden Fund grant.

References Addis, D. R., McIntosh, A. R., Moscovitch, M., Crawley, A. P., & McAndrews, M. P. (2004). Characterizing spatial and temporal features of autobiographical memory retrieval networks: A partial least squares approach. NeuroImage, 23, 1460–1471. Addis, D. R., Musicaro, R., Pan, L., & Schacter, D. L. (2010). Episodic simulation of past and future events in older adults: Evidence from an experimental recombination task. Psychology and Aging, 25, 369–376. Addis, D. R., Pan, L., Vu, M.-A., Laiser, N., & Schacter, D. L. (2009). Constructive episodic simulation of the future and the past: Distinct subsystems of a core brain network mediate imagining and remembering. Neuropsychologia, 47, 2222–2238. Addis, D. R., Sacchetti, D. C., Ally, B. A., Budson, A. E., & Schacter, D. L. (2009). Episodic simulation of future events is impaired in mild Alzheimer’s disease. Neuropsychologia, 47, 2660–2671. Addis, D. R., Wong, A. T., & Schacter, D. L. (2007). Remembering the past and imagining the future: Common and distinct neural substrates during event construction and elaboration. Neuropsychologia, 45, 1363–1377. Addis, D. R., Wong, A. T., & Schacter, D. L. (2008). Age-related changes in the episodic simulation of future events. Psychological Science, 19, 33–41. Arbuckle, T. Y., & Gold, D. P. (1993). Aging, inhibition, and verbosity. Journal of Gerontology: Psychological Sciences, 48, P225–P232. Arbuckle, T. Y., Nohara-LeClair, M., & Pushkar, D. (2000). Effect of off-target verbosity on communication efficiency in a referential communication task. Psychology and Aging, 15, 65–77. Bjork, R. A. (1978). The updating of human memory. In G. H. Bower (Ed.), The psychology of learning and motivation (Vol. 12, pp. 235–259). New York: Academic Press. Bjork, R. A. (1989). Retrieval inhibition as an adaptive mechanism in human memory. In H. L. Roediger III & F.I.M. Craik (Eds.), Varieties of memory and consciousness: Essays in honor of Endel Tulving (pp. 309–330). Hillsdale, NJ: Erlbaum Associates.

Y109937.indb 522

10/15/10 11:05:42 AM

Age-Related Changes in the Episodic Simulation of Past and Future Events • 523

Botzung, A., Dankova, E., & Manning, L. (2008). Experiencing past and future personal events: Functional neuroimaging evidence on the neural bases of mental time travel. Brain and Cognition, 66, 202–212. Buckner, R. L. (2004). Memory and executive function in aging and AD: Multiple factors that cause decline and reserve factors that compensate. Neuron, 44, 195–208. Buckner, R. L., & Carroll, D. C. (2007). Self-projection and the brain. Trends in Cognitive Sciences, 11, 49–57. Buckner, R. L., Snyder, A. Z., Shannon, B. J., LaRossa, G., Sachs, R., Fotenos, A. F., et al. (2005). Molecular, structural, and functional characterization of Alzheimer’s disease: Evidence for a relationship between default activity, amyloid, and memory. Journal of Neuroscience, 25, 7709–7717. Coupland, N., & Coupland, J. (1995). Discourse, identity and aging. In J. F. Nussbaum & J. Coupland (Eds.), Handbook of communication and aging research (pp. 79–103). Hillsdale, N.J.: Erlbaum. Craik, F. I. M., & Salthouse, T. A. (Eds.). (2000). Handbook of aging and cognition (2nd ed.). Hillsdale, NJ: Erlbaum. D’Argembeau, A., Raffard, S., & van der Linden, M. (2008). Remembering the past and imagining the future in schizophrenia. Journal of Abnormal Psychology, 117, 247–251. D’Argembeau, A., & van der Linden, M. (2004). Phenomenal characteristics associated with projecting oneself back into the past and forward into the future: Influence of valence and temporal distance. Consciousness and Cognition, 13, 844–858. D’Argembeau, A., & van der Linden, M. (2006). Individual differences in the phenomenology of mental time travel: The effects of vivid visual imagery and emotion regulation strategies. Consciousness and Cognition, 15, 342–350. Einstein, G. O., & McDaniel, M. A. (2005). Prospective memory: Multiple retrieval process. Current Directions in Psychological Science, 14, 286–290. Gaesser, B., Sacchetti, D. C., Addis, D. R., & Schacter, D. L. (in press). Characterizing age-related changes in remembering the past and imagining the future. Psychology and Aging. Grady, C. L. (2008). Cognitive neuroscience of aging. Annals of the New York Academy of Sciences, 7, 127–144. Graham, K. S., Lee, A. C. H., Brett, M., & Patterson, K. (2003). The neural basis of autobiographical and semantic memory: New evidence from three PET studies. Cognitive, Affective, and Behavioral Neuroscience, 3, 234–254. Hannula, D. E., Tranel, D., & Cohen, N. J. (2006). The long and short of it: Relational memory impairments in amnesia, even at short lags. Journal of Neuroscience, 26, 8352–8359. Hassabis, D., Kumaran, D., & Maguire, E. A. (2007). Using imagination to understand the neural basis of episodic memory. Journal of Neuroscience, 27, 14365–14374. Hassabis, D., Kumaran, D., Vann, S. D., & Maguire, E. A. (2007). Patients with hippocampal amnesia cannot imagine new experiences. Proceedings of the National Academy of Sciences of the United States of America, 104, 1726–1731.

Y109937.indb 523

10/15/10 11:05:42 AM

524 • Daniel L. Schacter, Brendan Gaesser, and Donna Rose Addis

James, L. E., Burke, D. M., Austin, A., & Hulme, E. (1998). Production and perception of “verbosity” in younger and older adults. Psychology and Aging, 13, 355–367. Klein, S. B., Loftus, J., & Kihlstrom, J. F. (2002). Memory and temporal experience: The effects of episodic memory loss on an amnesic patient’s ability to remember the past and imagine the future. Social Cognition, 20, 353–379. Kliegel, M., Jager, T., & Phillips, L. H. (2008). Adult age differences in eventbased propsective memory: A meta-analysis on the role of focal versus nonfocal cues. Psychology and Aging, 23, 203–208. Labouvie-Vief, G., & Blanchard-Fields, I. (1982). Cognitive aging and psychological growth. Ageing and Society, 2, 183–209. Levine, B., Svoboda, E., Hay, J. F., Winocur, G., & Moscovitch, M. (2002). Aging and autobiographical memory: Dissociating episodic from semantic retrieval. Psychology and Aging, 17, 677–689. McKee, A. C., Au, R., Cabral, H. J., Kowall, N. W., Seshadri, S., et al. (2006). Visual association pathology in preclinical Alzheimer disease. Journal of Neuropathology and Experimental Neurology, 65, 621–630. Okuda, J., Fujii, T., Ohtake, H., Tsukiura, T., Tanji, K., Suzuki, K., et al. (2003). Thinking of the future and the past: The roles of the frontal pole and the medial temporal lobes. Neuroimage, 19, 1369–1380. Schacter, D. L. (1999). The seven sins of memory: Insights from psychology and cognitive neuroscience. American Psychologist, 54, 182–203. Schacter, D. L., & Addis, D. R. (2007a). The cognitive neuroscience of constructive memory: Remembering the past and imagining the future. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 362, 773–786. Schacter, D. L., & Addis, D.R. (2007b). The ghosts of past and future. Nature, 445, 27. Schacter, D. L., & Addis, D. R. (2009). On the nature of medial temporal lobe contributions to the constructive simulation of future events. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 364, 1245–1253. Schacter, D. L., Addis, D. R., & Buckner, R. L. (2007). The prospective brain: Remembering the past to imagine the future. Nature Reviews Neuroscience, 8, 657–661. Schacter, D. L., Addis, D. R., & Buckner, R. L. (2008). Episodic simulation of future events: Concepts, data, and applications. Annals of the New York Academy of Sciences, 1124, 39–60. Spreng, R. N., & Levine, B. (2006). The temporal distribution of past and future autobiographical events across the lifespan. Memory and Cognition, 34, 1644–1651. Spreng, R. N., Mar, R. A., & Kim, A. S. N. (2009). The common neural basis of prospection, navigation, theory of mind, and the default mode: A quantiative meta-analysis. Journal of Cognitive Neuroscience, 21, 489–510.

Y109937.indb 524

10/15/10 11:05:42 AM

Age-Related Changes in the Episodic Simulation of Past and Future Events • 525

Storm, B. C., Bjork, E. L., & Bjork, R. A. (2008). Accelerated relearning after retrieval-induced forgetting: The benefit of being forgotten. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34, 230–236. Suddendorf, T., & Corballis, M. C. (2007). The evolution of foresight: What is mental time travel and is it unique to humans? Behavioral and Brain Sciences, 30, 299–313. Szpunar, K. K. (2010). Episodic future thought: An emerging concept. Perspectives on Psychological Science. 5, 142–162. Szpunar, K. K., & McDermott, K. B. (2008). Episodic future thought and its relation to remembering: Evidence from ratings of subjective experience. Consciousness and Cognition, 17, 330–334. Szpunar, K. K., Watson, J. M., & McDermott, K. B. (2007). Neural substrates of envisioning the future. Proceedings of the National Academy of Sciences of the United States of America, 104, 642–647. Trunk, D. J., & Abrams, L. (2009). Do younger and older adults’ communicative goals influence off-topic speech in autobiographical narratives? Psychology and Aging, 24, 324–337. Tulving, E. (1985). Memory and consciousness. Canadian Psychologist, 25, 1–12. Williams, J. M., Ellis, N. C., Tyers, C., Healy, H., Rose, G., & MacLeod, A. K. (1996). The specificity of autobiographical memory and imageability of the future. Memory and Cognition, 24, 116–125. Wright, H. H., Capilouto, G. J., Wagovich, S. A., Cranfill, T. B., & Davis, J. E. (2005). Development and reliability of a quantitative measure of adults’ narratives. Aphaisology, 19, 263–273. Zacks, R. T., & Hasher, L. (1994). Directed ignoring: Inhibitory regulation of working memory. In D. Dagenbach & T. H. Carr (Eds.), Inhibitory processes in attention, memory, and language (pp. 241–264). San Diego, CA: Academic Press.

Y109937.indb 525

10/15/10 11:05:42 AM

Y109937.indb 526

10/15/10 11:05:42 AM

Index A A-B A-C paired-associate tasks, 138 learning in, 81 Abstract knowledge, 72 Abstract pictures, memory for, 451–452 Accessibility, role of inhibition in countering unwanted, 108–118 Accessible visualizations, 236 Accuracy, information about and others’ metamemories, 401–402 Acquired memory space, 53 ACT theoretical framework, 265 Activation, reminding and, 82–83 Active repetition, 200 Adaptive forgetting, 108, 153–154 interference-dependence during selective retrieval, 109–111 retrieval stopping and, 115–118 Adaptive functions of memory, simulation of future events and, 505–506 Adding ideas, science learning and, 240 ADHD, retrieval-induced forgetting and, 93 Age-related changes in remembering and imagining, 517–519 investigation of with experimental recombination task, 512–516 Aging constructive episodic simulation studies, 508–510 future event simulation and, 507–508

Alpha oscillations, lexical priming at encoding and, 483 Altered stimulating conditions, 3 Alzheimer’s disease, future event simulation in, 510–516 Amnesia, processing and encoding in patients with, 478–479 Analogy, problem solving by, 74–75 Androgens, 463 recognition memory performance and, 464 Anticipated imminent recall. See Tip-ofthe-tongue (TOT) states Application, 200 Attention-deficit hyperactivity disorder (ADHD), retrieval-induced forgetting and, 93 Attribution errors, 15–16 Autobiographical interview (AI), use of in studies of constructive episodic simulation, 508–510 Autobiographical memories, age-related declines in, 508

B Bayesian likelihood recognition models, 431–432 Bidimensional models, 411 Bivariate recognition and identification models, 417–423 Blocking, 11–12, 269–270 experimentally induced in creative problem solving, 159–160

527

Y109937.indb 527

10/15/10 11:05:43 AM

528 • Index experimentally induced in memory tasks, 156–159 Bloom’s taxonomy, 186 Brain activity during retrieval avoidance, 487–490 Brainstorming, 163

C Calibration, 403 Carryover assumption, 120–121, 125 Classification judgments, 72–73 Classroom studies, visualizations vs. typical instruction, 243 Cognition, reminding and, 72–76 Cognitive aging, constructive episodic simulation hypothesis applied to, 507 Cognitive antidote principle, 283–288, 293 Cognitive demands, 283–284 Cognitive events, 230 learning-effective, 232 Cognitively silent encoding, 478, 480 Collaborative inhibition effect, 163 Commission errors, 303–304, 312–313 Comparing ourselves to others, 395–400 Comparison, 200 Competing memories, retrieval of, 4 Competition inhibitory control and the resolution of, 94–100 overcoming, 97–98 resolution of, 136 retrieval-induced forgetting and, 89 Competition dependence, 94–96 Competitor interference, 121–122 manipulations of, 111–113 Complex selection, 241–243, 247–249 Complex systems, visualizations that support inquiry around, 237 Computational memory models, 434 Computational models of free recall, 427–428 Computer simulation models of memory, features and design issues, 434–443 Concept familiarity, relationship of with picture memory, 454–455 Concept learning, 72–73

Y109937.indb 528

Conceptual representations, 72–73 Confidence, information about and others’ metamemories, 401–402 Conformity effect, 160–161 Connectionist models of memory, 175–176 Conscious episodic recollection, 485 encoding for, 478–481 Consolidation, 143 reinforcement as, 54–55 Construct trials, 11 Constructive episodic simulation hypothesis, 506–507 future event simulation in Alzheimer’s disease, 510–516 healthy aging observations and, 508–510 implications of age-related changes for, 519–522 Content organization knowledge, effects of text expectations on, 225–227 Context, 7, 9 Contextual material, generation advantage and, 357–360 Contextual reinstatement, modeling, 441 Contextual variables, desirable difficulties and, 177–178 Continuous distractor paradigm, 220 Contracting spacing, 29 Control processes, output-bound memory accuracy and, 313–315 Convergence solution, 74 Corrective feedback, 348 memory and, 264–265 Cortical activation, 343 use of to validate metacognitions, 331 Cortisol, 465 inclusion performance and, 472 Creative problem solving experimentally induced fixation and, 159–160 incubation effects in, 167–169 overcoming fixation in, 98–100 Creativity tasks, experimentally induced fixation in, 160–163 Criterial tasks

10/15/10 11:05:43 AM

Index • 529 desirable difficulties and, 179 transfer-appropriate processing and, 186 Critical error performance, 311 Critical lure, 303, 313–314 Critique, use of a desirable difficulty, 249–250 Cross-domain calibration, 401–402 Cue overload, 91 Cue repetition, 81 Cyberlearning, investigation of visualizations using, 245

D Deceptive clarity, 235 generation activities and, 245–249 visualizations and, 237–239 Decision processes, 433 Deese–Roediger–McDermott (DRM) paradigm, 297–298, 430 output-bound accuracy in, 312–315 studies of output-bound memory accuracy using, 303–310 use of to measure output-bound memory accuracy, 299–302 Dehydroepiandrosterone (DHEA). See DHEA Delay generation effect durability and, 357–358 inhibition and, 139–142 Delay-based forgetting. See also Forgetting modeling, 440–441 Delayed feedback, 265–266 Demand-success trade-off problem, 118–119 essence of, 119–122 examples of in memory inhibition, 123–128 Depth of processing principle, 279–281, 293 Design fixation, 161–162 Desirable difficulties, 35–41, 175–176 characteristics of learners and, 180–182 contextual variables and, 177–178 core assumptions of, 178–180 critiquing activities as, 249–250

Y109937.indb 529

feedback and, 263–266 forgetting and, 8 generation activities as, 245–249 interleaving and, 269–271 nature of materials and, 182–185 region of proximal learning framework and, 259–261 retrieval practice and, 261–263 scientific visualizations and, 240–245 transfer-appropriate processing and, 186–194 visualizations and, 238 Determination ability, 222 Device-directed practice, programmed instruction and, 209–212 Dewey’s model of teaching and learning, 201 DHEA, 463 effect of on recognition memory, 464–465 experiment on recognition memory effects of, 465–472 functions of, 464 Difficulty, reminding and, 77–78 Directed forgetting competition and, 95 manipulations, 112 Distinctive processing, competition and, 96 Distinguishing among ideas, science learning and, 240 Distractors, effect of on learner recall, 223–225 Distributed coding, 65 Disuse framework, 1–2 Drawing tasks, scientific learning and, 246–249 Dual-code theory, 447–448 imagen and logogen codes of, 449 Dual-process models, 412–413, 433 Yonelinas, 414 Dual-process recognition decision theories, 432–434 Dual-process signal detection (DPSD) theory, 433–434, 439–440 Dynamic visualizations, educational value of, 236

10/15/10 11:05:43 AM

530 • Index

E Ecological representativeness, 315–317 Education. See also Learners; Learning; Training application of psychology to, 199–200 early approaches to teaching and learning, 200–201 Guided Cognition of unsupervised learning, 229–231 implications of initial recognition processes for, 224–225 implications of initial-retrieval spacing for, 220 implications of initial-test level of processing on, 220 implications of recognition discrimination and test expectations on, 228–229 knowledge integrational instructional pattern in, 239–240 laboratory work with realistic materials, 206–209 making practice more effective, 201–202 place of testing in, 217 retrieval practice in the classroom, 202–206 visualizations and, 236–237 (See also Visualizations) Effector-dependent representations, facilitation of with physical practice, 290 Effector-independent representations, facilitation of with mental practice, 290 Effortful retrieval, 9–10 Electrophysiological neuroimaging, 477 Eliciting ideas, science learning and, 240, 247–248 Embedded implicit tests, 229–231 Encoding, prestimulus neural oscillations at, 478–487 Encoding fluency, 389 Encoding instructions, output-bound accuracy and, 307 Encoding operations, 177–178 generation and, 354–356, 358–359 retrieval processes and, 187

Y109937.indb 530

Encoding specificity, 279 principle of, 186 Encoding variability, 9 Environmental context, 431 modeling, 440–441 Episodic memory. See also Constructive episodic simulation hypothesis age-related changes in, 517–519 metamemory judgments for tasks, 388 Episodic reminding, 82. See also Reminding Equal-variance Tanner model, 417–418 Equally spaced retrieval, expanding retrieval vs., 32–33, 35–41 Errorless retrieval, 23 Estrogens, 463 recognition memory performance and, 464 Estrone, 465 inclusion performance and, 472 Event similarity, retrieval and, 79–80 Evolution memory and, 64–65 storage capacity of memory and, 56–57 Ex-Gaussian model, 414, 419–420 Exemplars, 73 region of proximal learning framework and, 269–270 retrieval-induced forgetting and, 96 Expanding intervals, 201–202 Expanding retrieval, 31–32 desirable difficulties and, 35–41 research and controversy, 32–34 Expanding spacing, 29 Expected test characteristics, student learning and, 218 Experience-based judgments, 368 Experiment 5 (GHK5), 409 operating characteristics for, 410 Experimental recombination paradigm, investing age-related changes using, 512–516 Explicit recall, 233 Explicit recognition, 233 Extrinsic cues, 367 Extrinsic factors, 351 generation, 354–355 Eyewitness memory

10/15/10 11:05:43 AM

Index • 531 lineups, 391–392 misleading postevent information, 392 surveys on memory relating to, 394–395

F Facilitation, 121, 141–142 False and verdical recall modeling, 428 False consensus effect, 395 False memories, 297 DRM paradigm findings and, 298–299 representative design of studies of, 315–317 Featural representations, modeling of, 441 Feedback, 207, 263–266, 283–285 learning frameworks and, 260–261 Feedback inhibition, 144 Feeling-of-knowing (FOK) judgments. See also TOT-FOK relations relation of to TOT states, 332 Feeling-of-knowing (FOK) ratings, 329 Fixation, 155–156. See also Forgetting fixation hypothesis experimentally induced in creative problem solving, 159–160 experimentally induced in memory tasks, 158–159 overcoming in creative problem solving, 98–100 Flashbulb memory, 390–391 Flashcards, 210 Fluctuation model, 9 Font size, word memory and, 456–458 Foreign language learning, 202, 206–207 inhibition and, 111–112 Foresight bias, 14, 351–352 Forgetting active, 136 adaptive, 153–154 (See also Adaptive forgetting) as a facilitator of learning, 7–12 enhancing relearning through, 24–26 future predictions of, 368, 369–373 how learning and remembering contribute to, 3–7 importance of, 1–3

Y109937.indb 531

interplay of with remembering, 5–7 successful, 96–97 Forgetting effects, memory blocking and, 158–159 Forgetting fixation hypothesis, 155–156. See also Fixation Free recall computational models of, 427–428 picture memory and, 450 spacing effect in, 25 Free report vs. forced report tests, 299–300 fSAM model, 428, 434 encoding processes, 437 going beyond the SAM framework, 440–443 memory stores of, 436–437 proposed modifications to generalize for recognition memory, 438–440 recall processes, 437 Functional forgetting, 108 Fundamental attribution error, 15 Future event simulation, aging and, 507–508

G Gamma oscillations, lexical priming at encoding and, 483 Gaussian models, 413–414 two-dimensional, 417 General memory models, previous efforts toward developing, 429–432 Generalization, 200 Generation as a condition of learning, 352–354 making learners sensitive to as a condition of learning, 354–357 Generation activities as desirable difficulties, 179 constructivist perspective of, 200 use of to overcome deceptive clarity, 245–249 Generation effects, 200, 262–263 coding instructions and, 353 durability of, 357–358 explanations of, 354

10/15/10 11:05:44 AM

532 • Index laboratory-based research on, 206–209 Generation tasks, processing of, 187–188 Generation vs. selection, 241–243 Global matching models, 414, 429–432 Goodman-Kruskal Gamma correlation, 403 Grain size of information, 314 Guided Cognition, 218, 229–231 approach to test-driven instructional design, 232–233

H Healthy aging, constructive episodic simulation and, 508–510 Hebian learning, 486 Hemodynamic neuroimaging, 477 Herbart’s analysis of teaching and learning, 200–201 High-confidence errors, 266 Higher cognition, reminding and, 72–76 Hindsight bias, 351–352, 369 Hippocampo-neocortical dialogue theory, 143 Hippocampus, 506 age-related changes in, 521–522 encoding in, 480 mnemonic control and downregulation of activity in, 118 Hooke memory model, 58–60 Hooke, Robert, 57–58 How we learn, 12–16 Human memory. See also Memory; Memory storage capacity of, 56–57 lifetime storage capacity of, 60–62 stability bias in, 365, 367–369 Hypermnesia, 154–155, 163

I I Can Learn program, 209–210 Illusions of competence, 351–352 Imagen codes, 449 Imagery-induced forgetting, 98 Imagining. See also Future event simulation age-related changes in, 517–519

Y109937.indb 532

Immediate feedback, 265–266 Implicit memory blocking, 156–159, 166 Implicit recall, 233 Implicit recognition, 233 Implicit tests, 229–231 Incubated reminiscence effects, 163–166 Incubation, 155 Incubation effects creative problem solving and, 159–160 forgetting fixation and, 157 in creative problem solving, 167–169 in memory tasks, 163–166 Incubation interval, 156 Induction, interleaving and, 271 Inferior parietal lobule, 506 Information retrieval, 13–15 Information theory, use of to model memory functions, 57 Inhibited memories, 5–7 Inhibition, 89, 90–91 creative problem solving and, 98–100 delay and, 139–142 difficult retrieval and, 114–115 evidence for, 91–93 foreign language learning and, 111–112 measures of, 142 memory and, 108, 123–128 overcoming competition and, 97–98 release of, 144 role of in countering unwanted accessibility, 108–118 two-point problem and assessment of, 125–127 Inhibition failure, 120 Inhibitory control, 136–138 demand-success trade-off problem, 118–128 resolution of competition and, 94–100 Inhibitory deficits, retrieval-induced forgetting in individuals with, 93 Inhibitory forgetting, 140–141 retrieval suppression and, 117–118 Initial learning, 187–188 Initial recognition processes, effects of on long-term memory, 221–225

10/15/10 11:05:44 AM

Index • 533 Initial tests, memory modification by, 218 Initial-retrieval spacing, effects of on long term memory, 218–220 Initial-test level of processing, effects of on long-term memory, 220 Input-bound measures of memory performance, 299–302 Instructional design, Guided Cognition approach to, 232–233 Instructional manipulation, 112 Integration, competition dependence and, 96 Interference, 4 competition dependence and, 95 manipulating demands of, 113–115 retrieval-induced forgetting and, 92 Interference-dependence demand-success trade-off problem of, 118–128 retrieval stopping and, 115–118 selective retrieval and, 108–115 Interleaving, 269–271 learning frameworks and, 260–261 Interpolated interference learning, 138 Intrinsic cues, 367 Intrinsic factors, 351 Introspection, 396 Introspective reports, 329–330 Intrusive memories, retrieval stopping and, 115–118 Item difficulty, prediction accuracy and, 376–378 Item recognition, 409 bivariate models of, 417–423 one-dimensional models of, 411–417 Item-specific attention, neuroimaging results due to, 483 Item-specific processing, competition and, 96 Item-to-context associations, 225, 227 Item-to-item associations, 225, 227

J Judgment of the rate of learning (jROL), 267 Judgments of learning, 14, 267–268, 350–352, 388–390 extrinsic factors and, 354–355

Y109937.indb 533

past vs. future basis of, 367 predicting those of others, 393–394 reinterpreting based on the stability bias, 382 theory-based vs. experience-based, 368

K Keyword method, 207 Knowledge integration instructional pattern, 239–240 Knowledge of results (KR), 263–264 Knowledge structures, understanding and, 75–76

L Labor-in-vain effect, 260 Laboratory studies visualizations and, 241–243 work with realistic education materials, 206–209 Lag effects, 77 Lateral temporal cortex, 506 Learners active hypothesis-testing role for, 72 characteristics of and desirable difficulty, 180–182 illusion of competence of, 351–352 mental models of ourselves as, 13–15 metacognitively sophisticated, 15–16 nature of materials and, 182–185 Learning, 12–16 conditions that enhance, 8–9 contribution of to forgetting, 3–7 desirable difficulties and, 35–41, 175–176 early approaches to, 200–201 expanding retrieval and, 32–33 expected test characteristics and, 218 extrinsic factors affecting, 351 feedback and, 263–266 forgetting as a facilitator of, 7–12 frameworks addressing, 259–261 future predictions of, 368–369, 369–373 generation as a condition of, 352–354 in A-B A-C paired-associate tasks, 81 interleaving and, 269–271

10/15/10 11:05:44 AM

534 • Index interleaving of, 11–12 intrinsic factors affecting, 351 making learners sensitive generation, 354–357 retrieval practice paradigm and, 261–263 spacing and, 267–269 test taking and, 352 Learning events, power of tests as, 348–349 Learning for tests (TE-2), 225–229 Left frontal lobe, encoding in, 484 Legal system, reliance on metamemory of others in, 391–392 Leitner boxes, 210 Lexical decoding, 180–182 Lineups, eyewitness memory and, 391–392 List length effect, 431 List reconstruction task, 225 List strength effect, 431 Logogen codes, 449 Long-term memory effects of initial recognition processes on, 221–225 effects of initial-retrieval spacing on, 218–220 effects of initial-test level of processing on, 220 sleep and, 143 Long-term overconfidence, 379–381 Long-term potentiation (LTP), 55, 144 Long-term recall, 220 Long-term recency paradigm, 219–220 Long-term retention, retrieval schedules and, 35–41 LTP, 55

M Manipulations competitor interference, 111–113 effect of on predictions of memory, 389 fixation effects and, 156 practice task interference demands, 113–115 Markov model, 260 Massed practice, 269–270, 350

Y109937.indb 534

Massed testing, expanding retrieval and, 32–34 Material-appropriate analysis, 184 Medial prefrontal cortex, 506 Medial temporal lobe age-related changes in, 521–522 encoding in, 480, 483–484 role of in remembering past and imagining future, 506 Memory assessments, 366 connectionist models of, 175–176 consolidation of, 143 (See also Consolidation) constructive aspects of, 506–507 effect of interleaving on, 270–271 explicit understanding of our own, 392–395 flashbulb, 390–391 generation effect and, 262–263 inhibition and, 108 learners and, 186–189 lists of semantic associates and, 82–83 modification of by initial tests, 218 monitoring, 308–309, 366–367 performance, 187 pictures vs. words, 447–448 predicting future performance of, 388–390 regression effects of, 139 reminding and, 71 repeated materials and, 77–79 resolving power of, 134 retrieval as a modifier of, 4–5, 26–28 semantically related materials and, 79–80 tetrahedral model of, 176–177 Memory blocking. See Blocking Memory blocking and recovery paradigm, 158–159 Memory blocking effect (MBE), 166 Memory distortion, 297 Memory formation, brain activity and, 481 Memory inhibition, examples of demand-success trade-off problem in, 123–128 Memory modeling

10/15/10 11:05:44 AM

Index • 535 advantages and challenges of, 428–429 general memory models, 429–432 Memory performance distinction between input- and output-bound perspectives on, 311 input-bound and output-bound measures of, 299–302 Memory processes, modeling based on, 433 Memory retrieval, 133–134 reminding and, 74–75 Memory storage capacity of, 56–57 Hooke’s estimate of, 60 Hooke’s model of, 58–60 total amount available, 60–62 Memory tasks experimentally induced fixation in, 156–159 incubation effects in, 163–166 Memory without organization, 51, 53–54 Mental contexts, modeling, 440–441 Mental fixation, 99 Mental imagery, overcoming competition during, 97–98 Mental models of ourselves as learners, 13–15 Mental practice, facilitation of effectorindependent representations with, 290 Mental practice principle, 288–292, 293 Metacognition, 329 differentiating, 331–332 importance of, 366 introspection and, 330, 396–400 TOT-FOK relations in, 343–344 (See also TOT-FOK relations) Metacognitive concepts, validating, 330–331 Metacognitive mismatch items, 266 Metacognitive monitoring, outputbound memory accuracy and, 313–315 Metacognitive processes, monitoring learning using, 349–352 Metacomprehension, 177 criterial tests and, 192 desirable difficulties and, 189–194

Y109937.indb 535

Metamemory, 329, 387 explicit understanding of our own memory and, 392–395 eyewitness memory, 391–392 flashbulb memories, 390–391 judgments of learning, 388–390 (See also Judgments of learning) our own, 388–392 relying on others’, 400–401 Metamemory judgments, 367 Midazolam, 450 MINERVA II model, 414, 429 Mirror effects, 430, 454 Misleading postevent information, eyewitness memory and, 392 Mixture models, 413, 419 Mnemonic competitors, REM sleep and, 145 Mnemonic control, 118 Mnemonic cues, 367 Mnemonic enhancement, 80 Mnemonic recovery, delay and, 139–140 Modeling, advantages and challenging, 428–429 Multiple representations, linking of with visualizations, 236, 244

N Name game, 203 Narrative learning, block and, 269–270 Nature of materials, desirable difficulties and, 182–185 Nelson-Denny Reading Test, 180–182 Neural mechanism hypothesis, 54–55 Neural processes of retrieval inhibition, 487–493 Neuroimaging, 477 prestimulus neural oscillations demonstrated by, 478–487 retrieval avoidance and, 487–490 New theory of disuse, 2, 4, 145, 210, 350 Nrp items, 90

O One-dimensional models of identification and recognition, 411–417 Order information measures, 226

10/15/10 11:05:44 AM

536 • Index Organizational processing, 182–183 Output interference, 91 Output-bound memory accuracy, 299–302, 311–313 underlying mechanisms of, 313–315 use of DRM paradigm to study, 303–310 Overconfidence, 351–352, 366 long-term, 379–381

P Paired-associate learning, 81, 206–207 Parahippocampal cortex, 506 Parallel error, 15–16 Paraphrasing, 199 Parsimony, consideration of in modeling, 434 Part-set cuing, 91 Pathological aging, future event simulation in Alzheimer’s disease, 510–516 Pattern dissociation, difficulty in modeling item and associate recognition due to, 431 Pattern matching, reminding and, 83 Perceptual representation systems, priming in, 484 Perceptual-lexical repetition priming, encoding for, 481–487 Performance, learning and, 13–14 Perspective imagining, 373–376 Perspective taking, stability bias and, 369 Phase-6 software, 210 Phonological processing principle, 281–282, 293 Physical practice, facilitation of effectordependent representations with, 290 Picture memory, 448–450 effect of DHEA administration on, 464 labeling and remembering, 452–455 Picture superiority effect (PSE), 448 counterexamples to, 450–451 Picture vs. word recognition, 455–458 Pivotal cases, visualizations as, 244–245 Planning fallacy, 383

Y109937.indb 536

Posterior cingulate/retrosplenial cortex, 506 Practice spacing, 49–53, 64–65. See also Spacing schedules Practice task interference demands, manipulations of, 113–115 Practice tests, 200 distributed, 208 Predicting future forgetting, 369–373 Predicting future learning, 369–373 Predicting future remembering, 383, 389–390 Predicting memory performance of others, 393–395 Prefrontal cortex, 93, 115 Preparation, 200 Present-and-test approach, 209–210 Presentation, 200 modality of and output-bound accuracy, 307 Presentation formats, depth of processing principle and, 279–281 Presentation rate interaction, spacing and, 268–269 Prestimulus neural oscillations at encoding, 478–487 Presumption of calibration hypothesis, 400–401 Priming at encoding, 481–487 Proactive interference, 4, 154 Probed recall, 220 Problem solving, overcoming fixation in, 98–100 Problem solving by analogy, 74–75 Problem solving vs. remembering a solution, 10–11 Procedural account of generation effect, 354 Procedural reinstatement principle, 278–279, 293 Processing encoding and, 478–479 generation and strategies for, 358 levels of, 27 Productivity deficit, 163 Programmed instruction, devicedirected practice and, 209–212 Proposition-specific processing, 179, 181, 184

10/15/10 11:05:44 AM

Index • 537 criterial tasks and, 187–189 enhancement of metacomprehension by, 191–192 Prototype view, 72–73 Psychologically plausible models, 434–435 Pure Gaussian model, 414

Q Quantity measures of memory performance, 299–302

R Random storage model, 60–62 Random walk model, 53–54 Rapid eye movement (REM) sleep, offline restorative memory mechanism during, 145 Reading to recitation ratio, 203–204 Reading vs. generation, 355–357 Recall modeling false and verdical, 428 features and design issues, 434–443 Recall tests, 225 Receiver operating characteristic (ROC) curve, 222, 409 Reciprocal inhibition hypothesis, 138 Recognition decision theories, dualprocess, 432–434 Recognition difficulty, 221 Recognition discrimination, effects of test expectations on, 227–229 Recognition memory, 450–451 effect of DHEA administration on, 464–465 experiment to determine effect of DHEA administration on, 465–472 global matching models applied to, 430 Recognition models Bayesian likelihood, 431–432 features and design issues, 434–443 proposed modifications to fSAM for, 438–440 Recognition tests, 225 Recollection model, 418–419 Recollection rejection, 313

Y109937.indb 537

Redundant generation-text type combinations, processing time of, 184 Reexposure memory blocking and, 158 retrieval-induced forgetting and, 113 Reflecting on ideas, science learning and, 240 Region of proximal learning framework desirable difficulties and, 259–261 feedback and, 263–266 interleaving and, 269–271 retrieval practice and, 261–263 stop rule, 267–268 Regression effects, spontaneous recovery and, 138–139 Rehearsal-preventing tasks, 218 Reinforcement, as consolidation, 54–55 Relational processing, 179, 182–183, 184 criterial tasks and, 187–189 Relearning, forgetting to enhance, 24–26 Release of inhibition, 144 Reloading hypothesis, 11–12 Remembering age-related changes in, 517–519 contribution of to forgetting, 3–7 interplay of with forgetting, 5–7 Reminding higher cognition and, 72–76 purpose of, 83–84 simple memory tasks and, 76–83 Reminiscence, 155, 163–166 forgetting fixation and, 156–157 Remote Associates Test (RAT), 99–100 fixation effect and, 160, 167 incubation effects and, 167–168 Repeated materials, memory for, 77–79 Repetition cue, 81 similarity-induced reminding and, 79–80 spaced, 200 Repetition paradigms, 77–78 Repetition priming, 279–280 Repetition spacing, learning and, 49–53 Representations, modeling of, 441–442 Representative design, 315–317 Reproductive inhibition, 3 Resolution of competition, inhibitory control and, 94–100

10/15/10 11:05:44 AM

538 • Index Response competition, 91 forgetting caused by, 156 Response override, 109 Response set suppression hypothesis, 139 Retention, 308–309 generation advantage and, 358–360 learning and, 13–15 test effects and, 348–349 Retrieval as a memory modifier, 4–5, 26–28 cues, 134–135 encoding and, 478–479 medial temporal lobe activation at, 484–485 mnemonic enhancement of, 80 test effects and, 348–349 unsuccessful, 96–97 Retrieval fluency, 389 Retrieval inhibition, 3, 135 neural processes of, 487–493 sleep and, 140 Retrieval practice paradigm, 90–91, 92, 135–136, 199, 261–263, 267 learning frameworks and, 260–261 use of in the classroom, 202–206 Retrieval practice schedules, 9–10 desirable difficulties and, 36–39 expanding, 29–32 Retrieval processes, 187 Retrieval stopping, interferencedependence during, 115–118 Retrieval strength, 2, 5–6, 14, 137, 350 sleep and, 140, 145 Retrieval-induced facilitation, 141–142 Retrieval-induced forgetting, 4–7, 89–90 competition as a necessary condition for, 94–96 definition of, 91 inhibition and, 110–111, 135–138 manipulating of interference demands and, 113–115 neural basis of, 490–493 output interference and, 91 strength-based associative interference and, 92 temporal parameters of, 141 theoretical accounts of, 90–91

Y109937.indb 538

Retrieving effectively from memory (REM) model, 414, 431–432, 441 Retroactive interference, 4, 91, 138–139 connection of with consolidation, 143 mental exertion and, 140 Reversible memory blocks, 156–159 Rp+ items, 90 Rp– items, 90 Rule learning, 72–73

S SAM recognition model, 438 Scaffolding, 260 Schemas, understanding and, 75–76 Science learning knowledge integration instructional pattern for, 239–240 use of visualizations for, 236 Scientific visualizations, desirable difficulties and, 240–245 Search of Associate Memory (SAM) framework, 429–431, 435–436 Selection activities, 247–249 Selection vs. generation, 241–243 Selective retrieval, interferencedependence during, 109–111 Self-directed study, metacomprehension and, 189–194 Self-generation, 262–263 learning frameworks and, 260–261 Semantic associates, memory for lists of, 82–83 Semantic generation, overcoming competition during, 97–98 Semantically related materials, memory for, 79–80 Sharpened responding, 486 Short-term overconfidence, 379–381 Signal detection models, 417 Signal detection theory, 430 simple Gaussian, 412 Simple memory tasks, reminding and, 76–83 SIMPLE tutorials, 210–212 Simulation of future events, 505–506 Single-process model, 433 Skewed distribution model, 413, 419

10/15/10 11:05:45 AM

Index • 539 Sleep role of in retrieval inhibition, 140–141 synaptic homeostasis and, 142–145 Source identification, 409 bivariate models of, 417–423 one-dimensional models of, 411–417 Source monitoring, 82 modeling, 440–441 Spaced practice, 49–53, 267–269, 349 Spaced repetition, 200 Spaced retrieval. See also Spacing effect advice for students, 41–43 Spacing effect, 7, 24–25, 27–28, 55, 77, 201–202 desirable difficulties and, 35 laboratory-based research on, 206–209 learning frameworks and, 260–261 presentation rate and, 268–269 Spacing of materials, 199 Spacing schedules, 29, 63–64. See also Practice spacing Sparse distributed coding, 65 Spatial representations, modeling of, 442–443 Spelling, spaced practice and, 205–206 Spontaneous recovery, regression effects and, 138–139 Stability bias, 13, 365, 367–369 experiment on, 369–373 experimental results, 378–379 older adults and, 379 possible explanations of, 369 Standard selection tasks, 241–243, 247–249 Stop rule, 267–268 Storage strength, 2, 5–6, 14, 137, 350 sleep and, 145 Strategy disruption, 91 Strength-based associative interference, retrieval-induced forgetting and, 92 Strength-based global matching models, 429–431 Study. See also Massed practice; Spaced practice monitoring one’s learning during, 349–352 test effects on, 349

Y109937.indb 539

Successful forgetting, unsuccessful retrieval and, 96–97 Superadditivity, 77–78 Supportive curriculum materials, deceptive clarity of visualizations and, 239 Suppression, 123–124 Symbiosis, definition, 16 Synaptic homeostasis, sleep and, 142–145

T Tanner model, 417–418 Taxonomic strength manipulation of, 94, 111 retrieval-induced forgetting and, 96 TE-1 (test effects type 1), 218 learning from, 218–225 TE-2 (test effects type 2), learning for, 225–229 Teaching, early approaches to, 200–201 Technology-enhanced Learning in Science (TELS), visualization examples from, 245–250 Temporal distance, 511 Test effects, 217 categories of, 218 Test expectations effect of on content organization knowledge, 225–227 effect of on recognition discrimination, 227–229 Test taking, learning from the experience of, 352 Test-driven instructional design, Guided Cognition approach to, 232–233 Test-expectancy effects. See TE-2 (test effects type 2) Test-taking effects. See TE-1 (test effects type 1) Testing, 200, 262–263 expanding retrieval and, 32–33 generation advantage of, 354–357, 358–359 laboratory-based research on, 206–209 learning frameworks and, 260–261 place of in education, 217

10/15/10 11:05:45 AM

540 • Index retrieval and, 26–28 spaced practice and, 204–206 taking advantage of increased emphasis on, 231–233 Testing effect, retrieval and, 41 Testosterone, 463 Tests embedded implicit, 229–231 power of as learning events, 348–349 Tetrahedral model of memory, 184–185 core assumptions of, 178–180 interaction between encoding and criterial tasks, 186 theoretical framing of, 177–178 The Analysis of Behavior (Holland & Skinner), 209 Theoretical models, 412–414 Theory of distributed associative memory (TODAM2), 429, 441 Theory-based judgments, 368 Theta oscillations, prestimulus neural oscillations at encoding and, 479–480 Think/no-think paradigm, 115–118 Thorndyke’s law of disuse, 3 Threshold decision process, 433 Time frames, effect of on learner retention, 208–209 Tip-of-the-tongue (TOT) states, 330–331, 332. See also TOTFOK relations TOT-FOK relations, 332, 343–344 experimental data, 333–343 Training cognitive antidote principle, 283–288 depth of processing principle, 279–281 mental practice principle, 288–292 phonological processing principle, 281–282 principles of, 277–278, 293 procedural reinstatement principle, 278–279 Transfer-appropriate processing, 279, 354 desirable difficulties and, 186–194 Transitional learning stage, 260 Two is not better than one conundrum, 56 resolution of, 62–64

Y109937.indb 540

Two-point problem, 124–126 Two-string problem, 155

U Understanding, reminding and, 75–76 Unequal variance signal detection (UVSD) model, 433–434, 440 Unequal-variance Gaussian model, 414 Uniform intervals, 202 Uniform-moderate spacing, 29 Uniform-short spacing, 29 Univariate models of identification and recognition, 411–417 Unseen processes, visualizations for, 237 Unsuccessful retrieval, successful forgetting and, 96–97 Unsupervised learning, 15 guided cognition of, 229–231 Unwanted accessibility, role of inhibition in countering, 108–118 Updating, 154

V Variability, encoding, 9 Variable encoding, 200 Ventral visual stream, hemodynamic activity in at encoding, 483–484, 486 Verbal context, effect of on modeling, 431 Verdical recall, 298 Virtual experiments, 235 Visual distinctiveness, memory modulation and, 456–458 Visual learners, deceptive clarity of visualizations and, 239 Visualization studies, pretest-posttest design for, 243–244 Visualizations, 235–236 deceptive clarity and, 237–239 examples from Technology-enhanced Learning in Science (TELS), 245–250 scientific, 240–245 von Mises distribution, 420

10/15/10 11:05:45 AM

Index • 541

W

Y

Water jar problem, 155 Web-based Inquiry Science Environment (WISE), 245 Word decoding, 180–182 Word fragment completion paradigm, implicit memory blocks and, 157–158 Word vs. numeral presentation formats, 280–281 Working memory, phonological loop of, 281

Yonelina recollection model, 418–419

Y109937.indb 541

Z z-ROC curve, 431 Zone of proximal development, 260

10/15/10 11:05:45 AM